Find indices of duplicated rows

If you are using a keyed data.table, then you can use the following elegant syntax

library(data.table)
DT <- data.table(A = rep(1:3, each=4), 
                 B = rep(1:4, each=3), 
                 C = rep(1:2, 6), key = "A,B,C")

DT[unique(DT[duplicated(DT)]),which=T]

To unpack

DT[duplicated(DT)] subsets those rows which are duplicates.
unique(...) returns only the unique combinations of the duplicated rows. This deals with any cases with more than 1 duplicate (duplicate duplicates eg triplicates etc)
DT[..., which = T] merges the duplicate rows with the original, with which=T returning the row number (without which = T it would just return the data).

You could also use

 DT[,count := .N,by = list(A,B,C)][count>1, which=T]

Here's an example:

df <- data.frame(a = c(1,2,3,4,1,5,6,4,2,1))

duplicated(df) | duplicated(df, fromLast = TRUE)
#[1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE FALSE  TRUE  TRUE  TRUE

How it works?

The function duplicated(df) determines duplicate elements in the original data. The fromLast = TRUE indicates that "duplication should be considered from the reverse side". The two resulting logical vectors are combined using | since a TRUE in at least one of them indicates a duplicated value.

Find indices of duplicated rows

How it works?

Tags:

Duplicates

R

Dataframe

Related

Recent Posts