How to extract one specific group in dplyr

Try this where groups is a vector of group numbers. Here 1:2 means the first two groups:

select_groups <- function(data, groups, ...) 
   data[sort(unlist(attr(data, "indices")[ groups ])) + 1, ]

mtcars %>% group_by(cyl) %>% select_groups(1:2)

The selected rows appear in the original order. If you prefer that the rows appear in the order that the groups are specified (e.g. in the above eaxmple the rows of the first group followed by the rows of the second group) then remove the sort.

With a bit of dplyr along with some nesting/unnesting (supported by tidyr package), you could establish a small helper to get the first (or any) group

first = function(x) x %>% nest %>% ungroup %>% slice(1) %>% unnest(data)
mtcars %>% group_by(cyl) %>% first()

By adjusting the slicing you could also extract the nth or any range of groups by index, but typically the first or the last is what most users want.

The name is inspired by functional APIs which all call it first (see stdlibs of i.e. kotlin, python, scala, java, spark).

Edit: Faster Version

A more scalable version (>50x faster on large datasets) that avoids nesting would be

first_group = function(x) x %>%
    select(group_cols()) %>%
    distinct %>%
    ungroup %>%
    slice(1) %>%
    { semi_join(x, .)}

A another positive side-effect of this improved version is that it fails if not grouping is present in x.

How to extract one specific group in dplyr

Tags:

Group By

R

Dplyr

Related

Recent Posts