These two utility functions combine dplyr::filter
with base
cumsum
, filtering to include only rows before a condition occurs for the
first time (filter_until
) or only rows after a condition has occurred for
the first time (filter_after
). This is mostly
useful for pulling data out of spreadsheets or other formats that may have
been made more for presentation than for analysis--think unnecessary
headings, footnotes, or references.
filter_until(.data, ...)
filter_after(.data, ..., .streak = TRUE)
A data frame.
A set of one or more logical conditions. This follows what is
allowed for dplyr::filter
, though without the comma shorthand to combine
multiple conditions.
For filter_after
, logical: if TRUE
(the default),
all consecutive occurrences of the logical conditions are considered a match,
rather than just the first occurrence.
(DEPRECATED) For filter_after
, logical: if FALSE
(the default),
the row that flags the condition will be excluded, returning only the rows
after the occurrence. If TRUE
, the row that matches the condition is included.
If this is multiple rows, only the last is included. Using this will likely
leave you with unwanted data, such as headings you'll then want to convert
to column names.
A subset version of the original data frame.
dplyr::filter
messy_notes <- tibble::tribble(
~x1, ~x2, ~x3,
"A", "dog", 0,
"B", "cat", 1,
"Source:", "xyz.com", NA,
"Date:", "Jan", 2021
)
messy_summary <- tibble::tribble(
~x1, ~x2,
"A", 1,
"B", 5,
"C", 9,
"Weights", NA,
"A", 0.2,
"B", 0.5
)
# Use filter_until to get the data until the source information starts--there
# are several conditions that will work
filter_until(messy_notes, is.na(x3))
#> # A tibble: 2 × 3
#> x1 x2 x3
#> <chr> <chr> <dbl>
#> 1 A dog 0
#> 2 B cat 1
filter_until(messy_notes, grepl("\\:", x1))
#> # A tibble: 2 × 3
#> x1 x2 x3
#> <chr> <chr> <dbl>
#> 1 A dog 0
#> 2 B cat 1
# Use filter_until to get the data up until the weights table, and
# filter_after to get the weights table.
filter_until(messy_summary, x1 == "Weights" & is.na(x2))
#> # A tibble: 3 × 2
#> x1 x2
#> <chr> <dbl>
#> 1 A 1
#> 2 B 5
#> 3 C 9
filter_after(messy_summary, x1 == "Weights" & is.na(x2))
#> # A tibble: 2 × 2
#> x1 x2
#> <chr> <dbl>
#> 1 A 0.2
#> 2 B 0.5