These two utility functions combine dplyr::filter with base cumsum, filtering to include only rows before a condition occurs for the first time (filter_until) or only rows after a condition has occurred for the first time (filter_after). This is mostly useful for pulling data out of spreadsheets or other formats that may have been made more for presentation than for analysis--think unnecessary headings, footnotes, or references.

filter_until(.data, ...)

filter_after(.data, ..., .streak = TRUE)

Arguments

.data

A data frame.

...

A set of one or more logical conditions. This follows what is allowed for dplyr::filter, though without the comma shorthand to combine multiple conditions.

.streak

For filter_after, logical: if TRUE (the default), all consecutive occurrences of the logical conditions are considered a match, rather than just the first occurrence.

.include_first

(DEPRECATED) For filter_after, logical: if FALSE (the default), the row that flags the condition will be excluded, returning only the rows after the occurrence. If TRUE, the row that matches the condition is included. If this is multiple rows, only the last is included. Using this will likely leave you with unwanted data, such as headings you'll then want to convert to column names.

Value

A subset version of the original data frame.

See also

dplyr::filter

Examples

messy_notes <- tibble::tribble(
~x1,       ~x2,        ~x3,
"A",       "dog",      0,
"B",       "cat",      1,
"Source:", "xyz.com",  NA,
"Date:",   "Jan",      2021
)
messy_summary <- tibble::tribble(
~x1,        ~x2,
"A",        1,
"B",        5,
"C",        9,
"Weights",  NA,
"A",        0.2,
"B",        0.5
)

# Use filter_until to get the data until the source information starts--there
# are several conditions that will work
filter_until(messy_notes, is.na(x3))
#> # A tibble: 2 × 3
#>   x1    x2       x3
#>   <chr> <chr> <dbl>
#> 1 A     dog       0
#> 2 B     cat       1
filter_until(messy_notes, grepl("\\:", x1))
#> # A tibble: 2 × 3
#>   x1    x2       x3
#>   <chr> <chr> <dbl>
#> 1 A     dog       0
#> 2 B     cat       1

# Use filter_until to get the data up until the weights table, and
# filter_after to get the weights table.
filter_until(messy_summary, x1 == "Weights" & is.na(x2))
#> # A tibble: 3 × 2
#>   x1       x2
#>   <chr> <dbl>
#> 1 A         1
#> 2 B         5
#> 3 C         9
filter_after(messy_summary, x1 == "Weights" & is.na(x2))
#> # A tibble: 2 × 2
#>   x1       x2
#>   <chr> <dbl>
#> 1 A       0.2
#> 2 B       0.5