dplyr - groupby on multiple columns using variable names

You can use the helpers from rlang package, which is created by the same team that created dplyr. When using dplyr and other tidyverse packages, you don't have to load the rlang packages in order to use those helpers.

Specifically, you can use the syms function and the !!! function like so:

library(dplyr)

group_cols <- c("vs", "am")

mtcars %>% 
  group_by(!!!syms(group_cols)) %>% 
  summarize(mean_wt = mean(wt))

This closely-related question and answer explains how the !! operator and sym function are used for a single column name (i.e. a length-one character vector).

With dplyr 1.0.0, we have the following possibility based on the "normal" group_by:

library(dplyr)

group_cols <- c("vs", "am")

mtcars %>% 
  group_by(across(all_of(group_cols))) %>% 
  summarize(mean_wt = mean(wt))

dplyr version >1.0

With more recent versions of dplyr, you should use across along with a tidyselect helper function. See help("language", "tidyselect") for a list of all the helper functions. In this case if you want all columns in a character vector, use all_of()

cols <- c("mpg","hp","wt")
mtcars %>% 
   group_by(across(all_of(cols))) %>% 
   summarize(x=mean(gear))

original answer (older versions of dplyr)

If you have a vector of variable names, you should pass them to the .dots= parameter of group_by_. For example:

mtcars %>% 
   group_by_(.dots=c("mpg","hp","wt")) %>% 
   summarize(x=mean(gear))

dplyr - groupby on multiple columns using variable names

dplyr version >1.0

original answer (older versions of dplyr)

Tags:

Group By

R

Dplyr

Shiny

Related

Recent Posts