dplyr filter with condition on multiple columns

dplyr >= 1.0.4

If you're using dplyr version >= 1.0.4 you really should use if_any or if_all, which specifically combines the results of the predicate function into a single logical vector making it very useful in filter. The syntax is identical to across, but these verbs were added to help fill this need: if_any/if_all.

library(dplyr)

dataset %>% 
  filter(if_all(-c(father, mother), ~ is.na(.)), if_all(c(father, mother), ~ !is.na(.)))

Here I have written out the variable names, but you can use any tidy selection helper to specify variables (e.g., column ranges by name or location, regular expression matching, substring matching, starts with/ends with, etc.).

Output

  father mother children cousins
1      1      1       NA      NA

None of the answers seems to be an adaptable solution. I think the intention is not to list all the variables and values to filter the data.

One easy way to achieve this is through merging. If you have all the conditions in df_filter then you can do this:

df_results = df_filter %>% left_join(df_all)

A possible dplyr(0.5.0.9004 <= version < 1.0) solution is:

# > packageVersion('dplyr')
# [1] ‘0.5.0.9004’

dataset %>%
    filter(!is.na(father), !is.na(mother)) %>%
    filter_at(vars(-father, -mother), all_vars(is.na(.)))

Explanation:

vars(-father, -mother): select all columns except father and mother.
all_vars(is.na(.)): keep rows where is.na is TRUE for all the selected columns.

note: any_vars should be used instead of all_vars if rows where is.na is TRUE for any column are to be kept.

Update (2020-11-28)

As the _at functions and vars have been superseded by the use of across since dplyr 1.0, the following way (or similar) is recommended now:

dataset %>%
    filter(across(c(father, mother), ~ !is.na(.x))) %>%
    filter(across(c(-father, -mother), is.na))

See more example of across and how to rewrite previous code with the new approach here: Colomn-wise operatons or type vignette("colwise") in R after installing the latest version of dplyr.

A dplyr solution:

test <- dataset %>% 
  filter(father==1 & mother==1 & rowSums(is.na(.[,3:4]))==2)

Where '2' is the number of columns that should be NA.

This gives:

> test
  father mother children cousins
1      1      1       NA      NA

You can apply this logic in base R as well:

dataset[dataset$father==1 & dataset$mother==1 & rowSums(is.na(dataset[,3:4]))==2,]

dplyr filter with condition on multiple columns

dplyr >= 1.0.4

Tags:

R

Dplyr

Related

Recent Posts