How can I remove all duplicates so that NONE are left in a data frame?
This will extract the rows which appear only once (assuming your data frame is named df
):
df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]
How it works: The function duplicated
tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE
is used, the function starts at the last line.
Boths boolean results are combined with |
(logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using !
thereby creating a boolean vector indicating lines appearing only once.
Try it
library(dplyr)
DF1 <- data.frame(Part = c(1,2,3,4,5), Age = c(23,34,23,25,24), B.P = c(87,76,75,75,78))
DF2 <- data.frame(Part =c(3,5), Age = c(23,24), B.P = c(75,78))
DF3 <- rbind(DF1,DF2)
DF3 <- DF3[!(duplicated(DF3) | duplicated(DF3, fromLast = TRUE)), ]
A possibility involving dplyr
could be:
df %>%
group_by_all() %>%
filter(n() == 1)
Or:
df %>%
group_by_all() %>%
filter(!any(row_number() > 1))
Since dplyr 1.0.0
, the preferable way would be:
data %>%
group_by(across(everything())) %>%
filter(n() == 1)