How can I rank observations in-group faster?

A few alternatives with the data.table and dplyr packages.

data.table:

library(data.table)
# setDT(foo) is needed to convert to a data.table

# option 1:
setDT(foo)[, rn := rowid(person)]   

# option 2:
setDT(foo)[, rn := 1:.N, by = person]

both give:

> foo
   person year rn
1:  pers1 1999  1
2:  pers1 2000  2
3:  pers1 2003  3
4:  pers2 1998  1
5:  pers2 2011  2

If you want a true rank, you should use the frank function:

setDT(foo)[, rn := frank(year, ties.method = 'dense'), by = person]

dplyr:

library(dplyr)
# method 1
foo <- foo %>% group_by(person) %>% mutate(rn = row_number())
# method 2
foo <- foo %>% group_by(person) %>% mutate(rn = 1:n())

both giving a similar result:

> foo
Source: local data frame [5 x 3]
Groups: person [2]

  person  year    rn
  (fctr) (dbl) (int)
1  pers1  1999     1
2  pers1  2000     2
3  pers1  2003     3
4  pers2  1998     1
5  pers2  2011     2

Would by do the trick?

> foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011),obs=c(1,2,3,1,2))
> foo
  person year obs
1  pers1 1999   1
2  pers1 2000   2
3  pers1 2003   3
4  pers2 1998   1
5  pers2 2011   2
> by(foo, foo$person, nrow)
foo$person: pers1
[1] 3
------------------------------------------------------------ 
foo$person: pers2
[1] 2

The answer from Marek in this question has proven very useful in the past. I wrote it down and use it almost daily since it was fast and efficient. We'll use ave() and seq_along().

foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011))

foo <- transform(foo, obs = ave(rep(NA, nrow(foo)), person, FUN = seq_along))
foo

  person year obs
1  pers1 1999   1
2  pers1 2000   2
3  pers1 2003   3
4  pers2 1998   1
5  pers2 2011   2

Another option using plyr

library(plyr)
ddply(foo, "person", transform, obs2 = seq_along(person))

  person year obs obs2
1  pers1 1999   1    1
2  pers1 2000   2    2
3  pers1 2003   3    3
4  pers2 1998   1    1
5  pers2 2011   2    2

How can I rank observations in-group faster?

Tags:

Optimization

R

Related

Recent Posts