R package: camiller
Aug 5, 2018 · 830 words
I recently decided I had collected enough snippets and convenience functions in R that rather than pasting them in Rmarkdown documents scattered all over different projects, I should bite the bullet and build a package. I’d written a package once before for work—a collection of mostly wrapper functions for making profiles of ACS data with the acs
package—and while it had one or two vignettes, it had pretty poor documentation and no tests.
This time around, I decided to be more intentional and build something that might last. Alongside this, I was also building a bunch of code for working with Census data, as well as other open government data on unemployment, job counts, wages, etc. I figured I’d split these two concerns into one package of more specific tasks for work, called cwi
, and one package of broader, how-do-I-tidyeval tasks for myself. I also figured I’d take a deep dive into R development by not only testing with testthat
, but also building documentation sites with pkgdown
and setting up Travis-CI to build and deploy everything.
So that’s camiller
. It’s a work in progress, but there are some things I’m happy with.
library(tidyverse)
library(tidycensus)
library(camiller)
library(showtext)
I often have to compose smaller geographies, such as census tracts or towns, into larger geographies, such as city neighborhoods or regions of towns, and then aggregate some data. It gets tedious, especially because those groups might not be mutually exclusive. Same goes for other groupings, like populations by age or education level. So I started working out an add_grps
function that adds up subgroups and binds them all together into a data frame quickly.
For instance, to calculate populations in households by their ratio to the federal poverty line, with data from the 2016 ACS:
poverty <- get_acs(geography = "county subdivision", table = "C17002",
year = 2016, state = "09", county = "09") %>%
camiller::town_names(NAME) %>%
rename(name = NAME) %>%
filter(name %in% c("New Haven", "Hamden", "West Haven", "East Haven")) %>%
cwi::label_acs() %>%
mutate(label = str_remove(label, "Total!!")) %>%
group_by(name) %>%
add_grps(list(total = "Total",
poverty = c("Under .50", ".50 to .99"),
low_income = c("Under .50", ".50 to .99", "1.00 to 1.24",
"1.25 to 1.49", "1.50 to 1.84", "1.85 to 1.99")),
group = label)
poverty
## # A tibble: 12 x 3
## # Groups: name [4]
## name label estimate
## <chr> <fct> <dbl>
## 1 East Haven total 28739
## 2 East Haven poverty 2630
## 3 East Haven low_income 6253
## 4 Hamden total 56196
## 5 Hamden poverty 4724
## 6 Hamden low_income 10973
## 7 New Haven total 121847
## 8 New Haven poverty 31848
## 9 New Haven low_income 59454
## 10 West Haven total 51905
## 11 West Haven poverty 7990
## 12 West Haven low_income 18400
But just numbers don’t do a whole lot—New Haven is much bigger than its suburbs, so it’s far more useful to calculate rates. In this case, there are three groups—total population for whom poverty status is determined, population in households with incomes below the poverty line, and population in households with incomes less than 2 times the poverty line. But I want to divide the second two of these groups over the first. And reshaping the data for that is awkward, let alone the fact that I might have to do it for 20 tables in a day.
So I wrote calc_shares
:
poverty_rates <- poverty %>%
calc_shares(group = label, denom = "total")
poverty_rates
## # A tibble: 12 x 4
## # Groups: name [4]
## name label estimate share
## <chr> <fct> <dbl> <dbl>
## 1 East Haven total 28739 NA
## 2 East Haven poverty 2630 0.092
## 3 East Haven low_income 6253 0.218
## 4 Hamden total 56196 NA
## 5 Hamden poverty 4724 0.084
## 6 Hamden low_income 10973 0.195
## 7 New Haven total 121847 NA
## 8 New Haven poverty 31848 0.261
## 9 New Haven low_income 59454 0.488
## 10 West Haven total 51905 NA
## 11 West Haven poverty 7990 0.154
## 12 West Haven low_income 18400 0.354
Cool. Now I can make some actual comparisons. I can use the ggplot2
theme I put together for this package to make it a little cleaner than the defaults.
font_add_google("Archivo Narrow", "archivo")
showtext_auto()
poverty_rates %>%
ungroup() %>%
filter(!is.na(share)) %>%
mutate(name = as.factor(name) %>% fct_reorder2(label, share)) %>%
mutate(label = fct_relabel(label, function(x) str_replace_all(x, "_", "-") %>% camiller::cap_first())) %>%
ggplot(aes(x = name, y = share)) +
geom_col(fill = "skyblue3", width = 0.8, alpha = 0.9) +
scale_y_continuous(labels = scales::percent) +
facet_wrap(~ label) +
theme_din(base_family = "archivo") +
labs(x = NULL, y = NULL,
title = "Poverty and low-income rates by town",
subtitle = "New Haven and Inner Ring suburbs, 2016",
caption = "Source: US Census Bureau 2016 5-year estimates")
Pretty cute! There are a few more things going on in camiller
, including a themed_label
function that wraps around cowplot::draw_label()
to make labels that fit the aesthetics of a theme to tack onto a grid of ggplot
s.
See camiller
here.