Check if each row of a data frame is contained in another data frame

One way is to paste the rows together, and compare them with %in%. The result is a logical vector the length of nrow(df1), as requested.

do.call(paste0, df1) %in% do.call(paste0, df2)
# [1] TRUE TRUE TRUE

Try:

Filter(function(x) x > 0, which(duplicated(rbind(df2, df1))) - nrow(df2))

It will tell you which row numbers in df1 occur in df2. If you want an atomic vector of logicals like in Richard Scriven's answer, try

duplicated(rbind(df2, df1))[-seq_len(nrow(df2))]

It is also faster since it uses an internal C function duplicated (mine is rowcheck2)

> microbenchmark(rowcheck(df1, df2), rowcheck2(df1, df2))
 Unit: milliseconds
                expr      min       lq   median       uq       max neval
  rowcheck(df1, df2) 2.045210 2.169182 2.328296 3.539328 13.971517   100
  rowcheck2(df1, df2) 1.046207 1.112395 1.243390 1.727921  7.442499   100

Check if each row of a data frame is contained in another data frame

Tags:

R

Dataframe

Related

Recent Posts