Execute dplyr operation only if column exists
With across()
in dplyr > 1.0.0 you can now use any_of
when filtering. Compare original with all columns:
mtcars %>%
filter(am == 1) %>%
filter(cyl == 4)
With cyl
removed, it throws an error:
mtcars %>%
select(!cyl) %>%
filter(am == 1) %>%
filter(cyl == 4)
Using any_of
(note you have to write "cyl"
and not cyl
):
mtcars %>%
select(!cyl) %>%
filter(am == 1) %>%
filter(across(any_of("cyl"), ~.x == 4))
#N.B. this is equivalent to just filtering by `am == 1`.
Avoid this trap:
On a busy day, one might do like the following:
library(dplyr)
df <- data.frame(A = 1:3, B = letters[1:3], stringsAsFactors = F)
> df %>% mutate( C = ifelse("D" %in% colnames(.), D, B))
# Notice the values on "C" colum. No error thrown, but the logic and result is wrong
A B C
1 1 a a
2 2 b a
3 3 c a
Why? Because "D" %in% colnames(.)
returns only one value of TRUE
or FALSE
, and therefore ifelse
operates only once. Then the value is broadcasted to the whole column!
Correct way:
> df %>% mutate( C = if("D" %in% colnames(.)) D else B)
A B C
1 1 a a
2 2 b b
3 3 c c
I know I'm late to the party, but here's an answer somewhat more in line with what you were originally thinking:
mtcars %>%
filter(am == 1) %>%
{
if("cyl" %in% names(.)) filter(., cyl == 4) else .
}
Basically, you were missing the .
in filter
. Note this is because the pipeline doesn't add .
to filter(expr)
since it is in an expression surrounded by {}
.
Because of the way the scopes here work, you cannot access the dataframe from within your if
statement. Fortunately, you don't need to.
Try:
mtcars %>%
filter(am == 1) %>%
filter({if("cyl" %in% names(.)) cyl else NULL} == 4)
Here you can use the '.
' object within the conditional so you can check if the column exists and, if it exists, you can return the column to the filter
function.
EDIT: as per docendo discimus' comment on the question, you can access the dataframe but not implicitly - i.e. you have to specifically reference it with .