How to interpret dplyr message `summarise()` regrouping output by 'x' (override with `.groups` argument)?
It is just a friendly warning message. By default, if there is any grouping before the summarise
, it drops one group variable i.e. the last one specified in the group_by
. If there is only one grouping variable, there won't be any grouping attribute after the summarise
and if there are more than one i.e. here it is two, so, the attribute for grouping is reduce to 1 i.e. the data would have the 'year' as grouping attribute. As a reproducible example
library(dplyr)
mtcars %>%
group_by(am) %>%
summarise(mpg = sum(mpg))
#`summarise()` ungrouping output (override with `.groups` argument)
# A tibble: 2 x 2
# am mpg
#* <dbl> <dbl>
#1 0 326.
#2 1 317.
The message is that it is ungroup
ing i.e when there is a single group_by
, it drops that grouping after the summarise
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg))
#`summarise()` regrouping output by 'am' (override with `.groups` argument)
# A tibble: 4 x 3
# Groups: am [2]
# am vs mpg
# <dbl> <dbl> <dbl>
#1 0 0 181.
#2 0 1 145.
#3 1 0 118.
#4 1 1 199.
Here, it drops the last grouping and regroup with the 'am'
If we check the ?summarise
, there is .groups
argument which by default is "drop_last"
and the other options are "drop"
, "keep"
, "rowwise"
.groups - Grouping structure of the result.
"drop_last": dropping the last level of grouping. This was the only supported option before version 1.0.0.
"drop": All levels of grouping are dropped.
"keep": Same grouping structure as .data.
"rowwise": Each row is it's own group.
When .groups is not specified, you either get "drop_last" when all the results are size 1, or "keep" if the size varies. In addition, a message informs you of that choice, unless the option "dplyr.summarise.inform" is set to FALSE.
i.e. if we change the .groups
in summarise
, we don't get the message because the group attributes are removed
mtcars %>%
group_by(am) %>%
summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 2 x 2
# am mpg
#* <dbl> <dbl>
#1 0 326.
#2 1 317.
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg), .groups = 'drop')
# A tibble: 4 x 3
# am vs mpg
#* <dbl> <dbl> <dbl>
#1 0 0 181.
#2 0 1 145.
#3 1 0 118.
#4 1 1 199.
mtcars %>%
group_by(am, vs) %>%
summarise(mpg = sum(mpg), .groups = 'drop') %>%
str
#tibble [4 × 3] (S3: tbl_df/tbl/data.frame)
# $ am : num [1:4] 0 0 1 1
# $ vs : num [1:4] 0 1 0 1
# $ mpg: num [1:4] 181 145 118 199
Previously, this warning was not issued and it could lead to situations where the OP does a mutate
or something else assuming there is no grouping and results in unexpected output. Now, the warning gives the user an indication that we should be careful that there is a grouping attribute
NOTE: The .groups
right now is experimental
in its lifecycle. So, the behaviour could be modified in the future releases
Depending upon whether we need any transformation of the data based on the same grouping variable (or not needed), we could select the different options in .groups
.