Select the top N values by group
dplyr
does the trick
mtcars %>%
arrange(desc(mpg)) %>%
group_by(cyl) %>% slice(1:2)
mpg cyl disp hp drat wt qsec vs am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1
2 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1
3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
4 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
5 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2
6 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
This seems more straightforward using data.table
as it performs the sort while setting the key.
So, if I were to get the top 3 records in sort (ascending order), then,
require(data.table)
d <- data.table(mtcars, key="cyl")
d[, head(.SD, 3), by=cyl]
does it.
And if you want the descending order
d[, tail(.SD, 3), by=cyl] # Thanks @MatthewDowle
Edit: To sort out ties using mpg
column:
d <- data.table(mtcars, key="cyl")
d.out <- d[, .SD[mpg %in% head(sort(unique(mpg)), 3)], by=cyl]
# cyl mpg disp hp drat wt qsec vs am gear carb rank
# 1: 4 22.8 108.0 93 3.85 2.320 18.61 1 1 4 1 11
# 2: 4 22.8 140.8 95 3.92 3.150 22.90 1 0 4 2 1
# 3: 4 21.5 120.1 97 3.70 2.465 20.01 1 0 3 1 8
# 4: 4 21.4 121.0 109 4.11 2.780 18.60 1 1 4 2 6
# 5: 6 18.1 225.0 105 2.76 3.460 20.22 1 0 3 1 7
# 6: 6 19.2 167.6 123 3.92 3.440 18.30 1 0 4 4 1
# 7: 6 17.8 167.6 123 3.92 3.440 18.90 1 0 4 4 2
# 8: 8 14.3 360.0 245 3.21 3.570 15.84 0 0 3 4 7
# 9: 8 10.4 472.0 205 2.93 5.250 17.98 0 0 3 4 14
# 10: 8 10.4 460.0 215 3.00 5.424 17.82 0 0 3 4 5
# 11: 8 13.3 350.0 245 3.73 3.840 15.41 0 0 3 4 3
# and for last N elements, of course it is straightforward
d.out <- d[, .SD[mpg %in% tail(sort(unique(mpg)), 3)], by=cyl]
Just sort by whatever (mpg for example, question is not clear on this)
mt <- mtcars[order(mtcars$mpg), ]
then use the by function to get the top n rows in each group
d <- by(mt, mt["cyl"], head, n=4)
If you want the result to be a data.frame:
Reduce(rbind, d)
Edit: Handling ties is more difficult, but if all ties are desired:
by(mt, mt["cyl"], function(x) x[rank(x$mpg) %in% sort(unique(rank(x$mpg)))[1:4], ])
Another approach is to break ties based on some other information, e.g.,
mt <- mtcars[order(mtcars$mpg, mtcars$hp), ]
by(mt, mt["cyl"], head, n=4)