How to extract one specific group in dplyr
Try this where groups
is a vector of group numbers. Here 1:2
means the first two groups:
select_groups <- function(data, groups, ...)
data[sort(unlist(attr(data, "indices")[ groups ])) + 1, ]
mtcars %>% group_by(cyl) %>% select_groups(1:2)
The selected rows appear in the original order. If you prefer that the rows appear in the order that the groups are specified (e.g. in the above eaxmple the rows of the first group followed by the rows of the second group) then remove the sort
.
With a bit of dplyr
along with some nesting/unnesting (supported by tidyr
package), you could establish a small helper to get the first (or any) group
first = function(x) x %>% nest %>% ungroup %>% slice(1) %>% unnest(data)
mtcars %>% group_by(cyl) %>% first()
By adjusting the slicing you could also extract the nth or any range of groups by index, but typically the first or the last is what most users want.
The name is inspired by functional APIs which all call it first
(see stdlibs of i.e. kotlin, python, scala, java, spark).
Edit: Faster Version
A more scalable version (>50x faster on large datasets) that avoids nesting would be
first_group = function(x) x %>%
select(group_cols()) %>%
distinct %>%
ungroup %>%
slice(1) %>%
{ semi_join(x, .)}
A another positive side-effect of this improved version is that it fails if not grouping is present in x
.