Filter data frame by character column name (in dplyr)
Using rlang's injection paradigm
From the current dplyr documentation (emphasis by me):
dplyr used to offer twin versions of each verb suffixed with an underscore. These versions had standard evaluation (SE) semantics: rather than taking arguments by code, like NSE verbs, they took arguments by value. Their purpose was to make it possible to program with dplyr. However, dplyr now uses tidy evaluation semantics. NSE verbs still capture their arguments, but you can now unquote parts of these arguments. This offers full programmability with NSE verbs. Thus, the underscored versions are now superfluous.
So, essentially we need to perform two steps to be able to refer to the value "this"
of the variable column
inside dplyr::filter()
:
We need to turn the variable
column
which is of type character into typesymbol
.Using base R this can be achieved by the function
as.symbol()
which is an alias foras.name()
. The former is preferred by the tidyverse developers because itfollows a more modern terminology (R types instead of S modes).
Alternatively, the same can be achieved by
rlang::sym()
from the tidyverse.We need to inject the symbol from 1) into the
dplyr::filter()
expression.This is done by the so called injection operator
!!
which is basically syntactic sugar allowing to modify a piece of code before R evaluates it.(In earlier versions of
dplyr
(or the underlyingrlang
respectively) there used to be situations (incl. yours) where!!
would collide with the single!
, but this is not an issue anymore since!!
gained the right operator precedence.)
Applied to your example:
library(dplyr)
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(!!as.symbol(column) == 1)
# this that
# 1 1 1
Using alternative solutions
Other ways to refer to the value "this"
of the variable column
inside dplyr::filter()
that don't rely on rlang's injection paradigm include:
Via the tidyselection paradigm, i.e.
dplyr::if_any()
/dplyr::if_all()
withtidyselect::all_of()
df %>% filter(if_any(.cols = all_of(column), .fns = ~ .x == 1))
Via rlang's
.data
pronoun and base R's[[
:df %>% filter(.data[[column]] == 1)
Via magrittr's
.
argument placeholder and base R's[[
:df %>% filter(.[[column]] == 1)
I would steer clear of using get()
all together. It seems like it would be quite dangerous in this situation, especially if you're programming. You could use either an unevaluated call or a pasted character string, but you'll need to use filter_()
instead of filter()
.
df <- data.frame(this = c(1, 2, 2), that = c(1, 1, 2))
column <- "this"
Option 1 - using an unevaluated call:
You can hard-code y
as 1
, but here I show it as y
to illustrate how you can change the expression values easily.
expr <- lazyeval::interp(quote(x == y), x = as.name(column), y = 1)
## or
## expr <- substitute(x == y, list(x = as.name(column), y = 1))
df %>% filter_(expr)
# this that
# 1 1 1
Option 2 - using paste()
(and obviously easier):
df %>% filter_(paste(column, "==", 1))
# this that
# 1 1 1
The main thing about these two options is that we need to use filter_()
instead of filter()
. In fact, from what I've read, if you're programming with dplyr
you should always use the *_()
functions.
I used this post as a helpful reference: character string as function argument r, and I'm using dplyr
version 0.3.0.2.
Here's another solution for the latest dplyr version:
df <- data.frame(this = c(1, 2, 2),
that = c(1, 1, 2))
column <- "this"
df %>% filter(.[[column]] == 1)
# this that
#1 1 1