dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

Update

With dplyr >= 0.3 you can use the slice function in combination with which.min, which would be my favorite approach for this task:

df %>% group_by(A) %>% slice(which.min(x))
#Source: local data frame [3 x 3]
#Groups: A
#
#  A x          y
#1 A 1  0.2979772
#2 B 2 -1.1265265
#3 C 5 -1.1952004

Original answer

For the sample data, it is also possible to use two filter after each other:

group_by(df, A) %>% 
  filter(x == min(x)) %>% 
  filter(1:n() == 1)

Just for completeness: Here's the final dplyr solution, derived from the comments of @hadley and @Arun:

library(dplyr)
df.g <- group_by(df, A)
filter(df.g, rank(x, ties.method="first")==1)

For what it's worth, here's a data.table solution, to those who may be interested:

# approach with setting keys
dt <- as.data.table(df)
setkey(dt, A,x)
dt[J(unique(A)), mult="first"]

# without using keys
dt <- as.data.table(df)
dt[dt[, .I[which.min(x)], by=A]$V1]

dplyr filter: Get rows with minimum of variable, but only the first if multiple minima

Update

Original answer

Tags:

R

Dplyr

Related

Recent Posts