Extract rows for the first occurrence of a variable in a data frame

t.first <- species[match(unique(species$Taxa), species$Taxa),]

should give you what you're looking for. match returns indices of the first match in the compared vectors, which give you the rows you need.

In the following command, duplicated creates a logical index for duplicated data$Taxa values. A subset of the data frame without the corresponding rows is created with:

data[!duplicated(data$Taxa), ]

The result:

        Date Taxa
1 2012-05-17    A
2 2011-08-31    B
3 2012-09-06    C

Here is a dplyr option that is not dependent on the data being sorted in date order and accounts for ties:

library(dplyr)
df %>% 
  mutate(Date = as.Date(Date)) %>% 
  group_by(Taxa) %>% 
  filter(Date == min(Date)) %>% 
  slice(1) %>% # takes the first occurrence if there is a tie
  ungroup()

# A tibble: 3 x 2
  Date       Taxa 
  <date>     <chr>
1 2012-05-17 A    
2 2011-08-31 B    
3 2012-09-06 C 

# sample data:
df <- read.table(text = 'Date          Taxa
                         2013-07-12    A
                         2011-08-31    B
                         2012-09-06    C
                         2012-05-17    A
                         2013-07-12    C
                         2012-09-07    B', header = TRUE, stringsAsFactors = FALSE)

And you could get the same by sorting by date as well:

df %>% 
  mutate(Date = as.Date(Date)) %>% 
  group_by(Taxa) %>% 
  arrange(Date) %>% 
  slice(1) %>% 
  ungroup()

Extract rows for the first occurrence of a variable in a data frame

Tags:

R

Related

Recent Posts