How can I remove all duplicates so that NONE are left in a data frame?

This will extract the rows which appear only once (assuming your data frame is named df):

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How it works: The function duplicated tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE is used, the function starts at the last line.

Boths boolean results are combined with | (logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using ! thereby creating a boolean vector indicating lines appearing only once.

Try it

library(dplyr)

DF1 <- data.frame(Part = c(1,2,3,4,5), Age = c(23,34,23,25,24),  B.P = c(87,76,75,75,78))

DF2 <- data.frame(Part =c(3,5), Age = c(23,24), B.P = c(75,78))

DF3 <- rbind(DF1,DF2)

DF3 <- DF3[!(duplicated(DF3) | duplicated(DF3, fromLast = TRUE)), ]

A possibility involving dplyr could be:

df %>%
 group_by_all() %>%
 filter(n() == 1)

Or:

df %>%
 group_by_all() %>%
 filter(!any(row_number() > 1))

Since dplyr 1.0.0, the preferable way would be:

data %>%
    group_by(across(everything())) %>%
    filter(n() == 1)

How can I remove all duplicates so that NONE are left in a data frame?

Tags:

Unique

Duplicates

R

R Faq

Related

Recent Posts