R: how to remove duplicate rows by column

Here's a dplyr based solution in case you are interested (edited to include Gregor's suggestions)

library(dplyr)
group_by(df, id, gender) %>% slice(1)

#> # A tibble: 4 x 3
#> # Groups:   id, gender [4]
#>      id gender variant
#>   <dbl> <fctr>  <fctr>
#> 1     1 Female       a
#> 2     1   Male       c
#> 3     2 Female       d
#> 4     2   Male       e

It might also be worth using the arrange function as well depending on which values of variant should be removed.

df[!duplicated(df[ , c("id","gender")]),]

#     id  gender  variant
#  1   1  Female     a
#  3   1   Male      c
#  4   2  Female     d
#  5   2   Male      e

Another way of doing this using subset as below:

subset(df, !duplicated(subset(df, select=c(id, gender))))

#   id  gender variant
# 1  1  Female     a
# 3  1    Male     c
# 4  2  Female     d
# 5  2    Male     e

R: how to remove duplicate rows by column

Tags:

Duplicates

R

Related

Recent Posts