How to rank within groups in R?
The top rated answer (by cdeterman) is actually incorrect. The order function provides the location of the 1st, 2nd, 3rd, etc ranked values not the ranks of the values in their current order.
Let’s take a simple example where we want to rank, starting with the largest, grouping by customer name. I have included a manual ranking so we can check the values
> df
customer_name order_values manual_rank
1 John 2 5
2 John 5 2
3 John 9 1
4 John 1 6
5 John 4 3
6 John 3 4
7 Lucy 4 4
8 Lucy 9 1
9 Lucy 6 3
10 Lucy 2 6
11 Lucy 8 2
12 Lucy 3 5
If I run the code suggested by cdeterman I get the following incorrect ranks:
> df %>%
+ group_by(customer_name) %>%
+ mutate(my_ranks = order(order_values, decreasing=TRUE))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank my_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 3
2 John 5 2 2
3 John 9 1 5
4 John 1 6 6
5 John 4 3 1
6 John 3 4 4
7 Lucy 4 4 2
8 Lucy 9 1 5
9 Lucy 6 3 3
10 Lucy 2 6 1
11 Lucy 8 2 6
12 Lucy 3 5 4
Order is used to re-order dataframes into decreasing or increasing order. What we actually want is to run the order function twice, with the second order function giving us the actual ranks we want.
> df %>%
+ group_by(customer_name) %>%
+ mutate(good_ranks = order(order(order_values, decreasing=TRUE)))
Source: local data frame [12 x 4]
Groups: customer_name [2]
customer_name order_values manual_rank good_ranks
<fctr> <dbl> <dbl> <int>
1 John 2 5 5
2 John 5 2 2
3 John 9 1 1
4 John 1 6 6
5 John 4 3 3
6 John 3 4 4
7 Lucy 4 4 4
8 Lucy 9 1 1
9 Lucy 6 3 3
10 Lucy 2 6 6
11 Lucy 8 2 2
12 Lucy 3 5 5
This can be achieved with ave
and rank
. ave
passes the proper groups to rank
. The result from rank
is reversed due to the requested order:
with(x, ave(as.numeric(order_dates), customer_name, FUN=function(x) rev(rank(x))))
## [1] 3 1 1 2 1
You can do this pretty cleanly with dplyr
library(dplyr)
df %>%
group_by(customer_name) %>%
mutate(my_ranks = order(order(order_values, order_dates, decreasing=TRUE)))
Source: local data frame [5 x 4]
Groups: customer_name
customer_name order_dates order_values my_ranks
1 John 2010-11-01 15 3
2 Bob 2008-03-25 12 1
3 Alex 2009-11-15 5 1
4 John 2012-08-06 15 2
5 John 2015-05-07 20 1