Extract rows for the first occurrence of a variable in a data frame
t.first <- species[match(unique(species$Taxa), species$Taxa),]
should give you what you're looking for. match
returns indices of the first match in the compared vectors, which give you the rows you need.
In the following command, duplicated
creates a logical index for duplicated data$Taxa
values. A subset of the data frame without the corresponding rows is created with:
data[!duplicated(data$Taxa), ]
The result:
Date Taxa
1 2012-05-17 A
2 2011-08-31 B
3 2012-09-06 C
Here is a dplyr
option that is not dependent on the data being sorted in date order and accounts for ties:
library(dplyr)
df %>%
mutate(Date = as.Date(Date)) %>%
group_by(Taxa) %>%
filter(Date == min(Date)) %>%
slice(1) %>% # takes the first occurrence if there is a tie
ungroup()
# A tibble: 3 x 2
Date Taxa
<date> <chr>
1 2012-05-17 A
2 2011-08-31 B
3 2012-09-06 C
# sample data:
df <- read.table(text = 'Date Taxa
2013-07-12 A
2011-08-31 B
2012-09-06 C
2012-05-17 A
2013-07-12 C
2012-09-07 B', header = TRUE, stringsAsFactors = FALSE)
And you could get the same by sorting by date as well:
df %>%
mutate(Date = as.Date(Date)) %>%
group_by(Taxa) %>%
arrange(Date) %>%
slice(1) %>%
ungroup()