dplyr - Group by and select TOP x %
Here's another way
mtcars %>%
select(gear, wt) %>%
arrange(gear, desc(wt)) %>%
group_by(gear) %>%
slice(seq(n()*.2))
gear wt
(dbl) (dbl)
1 3 5.424
2 3 5.345
3 3 5.250
4 4 3.440
5 4 3.440
6 5 3.570
I take "top" to mean "having the highest value for wt
" and so used desc()
.
I believe this gets to the answer you're looking for.
library(dplyr)
mtcars %>% select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, wt) %>%
filter(row_number() / n() <= .2)
I know this is coming late, but might help someone now. dplyr has a new function top_frac
library(dplyr)
mtcars %>%
select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, wt) %>%
top_frac(n = 0.2,wt = wt)
Here n is the fraction of rows to return and wt is the variable to be used for ordering.
The output is as below.
gear wt
3 5.250
3 5.345
3 5.424
4 3.440
4 3.440
5 3.570
Or another option with dplyr:
mtcars %>% select(gear, wt) %>%
group_by(gear) %>%
arrange(gear, desc(wt)) %>%
filter(wt > quantile(wt, .8))
Source: local data frame [7 x 2]
Groups: gear [3]
gear wt
(dbl) (dbl)
1 3 5.424
2 3 5.345
3 3 5.250
4 4 3.440
5 4 3.440
6 4 3.190
7 5 3.570