dplyr: How to use group_by inside a function?
For programming, group_by_
is the counterpart to group_by
:
library(dplyr)
mytable <- function(x, ...) x %>% group_by_(...) %>% summarise(n = n())
mytable(iris, "Species")
# or iris %>% mytable("Species")
which gives:
Species n
1 setosa 50
2 versicolor 50
3 virginica 50
Update At the time this was written dplyr used %.%
which is what was originally used above but now %>%
is favored so have changed above to that to keep this relevant.
Update 2 regroup is now deprecated, use group_by_ instead.
Update 3 group_by_(list(...))
now becomes group_by_(...)
in new version of dplyr as per Roberto's comment.
Update 4 Added minor variation suggested in comments.
Update 5: With rlang/tidyeval it is now possible to do this:
library(rlang)
mytable <- function(x, ...) {
group_ <- syms(...)
x %>%
group_by(!!!group_) %>%
summarise(n = n())
}
mytable(iris, "Species")
or passing Species
unevaluated, i.e. no quotes around it:
library(rlang)
mytable <- function(x, ...) {
group_ <- enquos(...)
x %>%
group_by(!!!group_) %>%
summarise(n = n())
}
mytable(iris, Species)
Update 6: There is now a {{...}} notation that works if there is just one grouping variable:
mytable <- function(x, group) {
x %>%
group_by({{group}}) %>%
summarise(n = n())
}
mytable(iris, Species)
UPDATE: As of dplyr 0.7.0 you can use tidy eval to accomplish this.
See http://dplyr.tidyverse.org/articles/programming.html for more details.
library(tidyverse)
data("iris")
my_table <- function(df, group_var) {
group_var <- enquo(group_var) # Create quosure
df %>%
group_by(!!group_var) %>% # Use !! to unquote the quosure
summarise(n = n())
}
my_table(iris, Species)
> my_table(iris, Species)
# A tibble: 3 x 2
Species n
<fctr> <int>
1 setosa 50
2 versicolor 50
3 virginica 50
As a complement to the Update 6 in the answer by @G. Grothendieck, if you want to use a string as an argument in your summary function, instead of embracing the argument with doubled braces ({{
), you should use the .data
pronoun as described in the Programming vignette: Loop over multiple variables:
mytable <- function( x, group ) {
x %>%
group_by( .data[[group]] ) %>%
summarise( n = n() )
}
group_string <- 'Species'
mytable( iris, group_string )
`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 3 x 2
Species n
<fct> <int>
1 setosa 50
2 versicolor 50
3 virginica 50