This function combines several ways titles and names may need to be formatted. It's meant to be simple, yet flexible.
clean_titles(
x,
cap_all = FALSE,
split_case = TRUE,
keep_running_caps = TRUE,
space = "_",
remove = NULL
)
A character vector
Logical: if TRUE
, first letter of each word after splitting will be capitalized. If FALSE
, only the first character of the string will be capitalized. Note that in order to balance this with respecting consecutive capital letters, such as from acronyms,
Logical: if TRUE
, consecutive lowercase-uppercase pairs will be treated as two words to be separated.
Logical: if TRUE
, consecutive uppercase letters will be kept uppercase.
Character vector of characters and/or regex patterns that should be replaced with a space to separate words.
Character vector of characters and/or regex patterns that will be removed before any other operations; if NULL
, nothing is removed.
A character vector with each item newly formatted
Examples of possible common operations include:
"TownName" --> "Town Name"
"town_name" --> "Town Name"
"town_name" --> "Town name"
"RegionABC" --> "Region ABC"
"TOWN_NAME" --> "Town Name"
t1 <- c("GreaterNewHaven", "greater_new_haven", "GREATER_NEW_HAVEN")
clean_titles(t1, cap_all = TRUE, keep_running_caps = FALSE)
#> [1] "Greater New Haven" "Greater new haven" "Greater New Haven"
t2 <- c("Male!CollegeGraduates", "Male CollegeGraduates")
clean_titles(t2, space = c("_", "!"))
#> [1] "Male college graduates" "Male college graduates"
t3 <- c("Greater BPT Men", "Greater BPT Men HBP", "GreaterBPT_men", "greaterBPT")
clean_titles(t3, cap_all = FALSE)
#> [1] "Greater BPT men" "Greater BPT men HBP" "Greater BPT men"
#> [4] "Greater BPT"
t4 <- c("New Haven town, New Haven County, Connecticut",
"Newtown town, Fairfield County, Connecticut")
clean_titles(t4, cap_all = TRUE, remove = " town,.+")
#> [1] "New Haven" "Newtown"