R - Keep first observation per group identified by multiple variables (Stata equivalent "bys var1 var2 : keep if _n == 1")
The package dplyr makes this kind of things easier.
library(dplyr)
mydata %>% group_by(id, day) %>% filter(row_number(value) == 1)
This command requires more memory in R than in Stata: rows are not suppressed in place, a new copy of the dataset is created.
I would order the data.frame
at which point you can look into using by
:
mydata <- mydata[with(mydata, do.call(order, list(id, day, value))), ]
do.call(rbind, by(mydata, list(mydata$id, mydata$day),
FUN=function(x) head(x, 1)))
Alternatively, look into the "data.table" package. Continuing with the ordered data.frame
from above:
library(data.table)
DT <- data.table(mydata, key = "id,day")
DT[, head(.SD, 1), by = key(DT)]
# id day value
# 1: 1 1 10
# 2: 1 2 15
# 3: 1 3 20
# 4: 2 1 40
# 5: 2 2 30
# 6: 3 2 22
# 7: 3 3 24
# 8: 4 1 11
# 9: 4 2 11
# 10: 4 3 12
Or, starting from scratch, you can use data.table
in the following way:
DT <- data.table(id, day, value, key = "id,day")
DT[, n := rank(value, ties.method="first"), by = key(DT)][n == 1]
And, by extension, in base R:
Ranks <- with(mydata, ave(value, id, day, FUN = function(x)
rank(x, ties.method="first")))
mydata[Ranks == 1, ]