Pass arguments to dplyr functions
In the devel version of dplyr
(soon to be released 0.6.0
), we can also make use of slightly different syntax for passing the variables.
f1 <- function(df, grp.var, uniq.var) {
grp.var <- enquo(grp.var)
uniq.var <- enquo(uniq.var)
df %>%
group_by(!!grp.var) %>%
summarise(n_uniq = n_distinct(!!uniq.var)) %>%
filter(n_uniq >1)
}
res2 <- f1(iris, Sepal.Length, Sepal.Width)
res1 <- not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
identical(res1, res2)
#[1] TRUE
Here enquo
takes the arguments and returns the value as a quosure
(similar to substitute in base R) by evaluating the function arguments lazily and inside the summarise, we ask it to unquote (!! or UQ) so that it gets evaluated.
Here's the way to do it from rlang 0.4 using curly curly {{
pseudo operator :
library(dplyr)
not.uniq.per.group <- function(data, group.var, uniq.var) {
data %>%
group_by({{ group.var }}) %>%
summarise(n.uniq = n_distinct({{ uniq.var }})) %>%
filter(n.uniq > 1)
}
iris %>% not.uniq.per.group(Sepal.Length, Sepal.Width)
#> # A tibble: 25 x 2
#> Sepal.Length n.uniq
#> <dbl> <int>
#> 1 4.4 3
#> 2 4.6 4
#> 3 4.8 3
#> 4 4.9 5
#> 5 5 8
#> 6 5.1 6
#> 7 5.2 4
#> 8 5.4 4
#> 9 5.5 6
#> 10 5.6 5
#> # ... with 15 more rows
Like the old dplyr versions up to 0.5, the new dplyr has facilities for both standard evaluation (SE) and nonstandard evaluation (NSE). But they are expressed differently than before.
If you want an NSE function, you pass bare expressions and use enquo to capture them as quosures. If you want an SE function, just pass quosures (or symbols) directly, then unquote them in the dplyr calls. Here is the SE solution to the question:
library(tidyverse)
library(rlang)
f1 <- function(df, grp.var, uniq.var) {
df %>%
group_by(!!grp.var) %>%
summarise(n_uniq = n_distinct(!!uniq.var)) %>%
filter(n_uniq > 1)
}
a <- f1(iris, quo(Sepal.Length), quo(Sepal.Width))
b <- f1(iris, sym("Sepal.Length"), sym("Sepal.Width"))
identical(a, b)
#> [1] TRUE
Note how the SE version enables you to work with string arguments - just turn them into symbols first using sym()
. For more information, see the programming with dplyr vignette.
You need to use the standard evaluation versions of the dplyr
functions (just append '_' to the function names, ie. group_by_
& summarise_
) and pass strings to your function, which you then need to turn into symbols. To parameterise the argument of summarise_, you will need to use interp()
, which is defined in the lazyeval
package. Concretely:
library(dplyr)
library(lazyeval)
not.uniq.per.group <- function(df, grp.var, uniq.var) {
df %>%
group_by_(grp.var) %>%
summarise_( n_uniq=interp(~n_distinct(v), v=as.name(uniq.var)) ) %>%
filter(n_uniq > 1)
}
not.uniq.per.group(iris, "Sepal.Length", "Sepal.Width")
Note that in recent versions of dplyr
the standard evaluation versions of the dplyr functions have been "soft deprecated" in favor of non-standard evaluation.
See the Programming with dplyr
vignette for more information on working with non-standard evaluation.