R selecting all rows from a data frame that don't appear in another

Here's another way:

x <- rbind(test2, test)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(test2), ]
#        number   fruit ID1 ID2
# item1 number1 papayas  22  33
# item3 number3 peaches 441  25
# item4 number4  apples 123  13

Edit: modified to preserve row names.


There are two ways to solve this, using data.table and sqldf

library(data.table)
test<- fread('
item number fruit ID1 ID2 
item1 "number1" "apples"  "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "44" "25"
item4 "number4" "apples"  "12" "13"
')
test2<- fread('
item number fruit ID1 ID2 
item1 "number1" "papayas" "22"  "33"
item2 "number2" "oranges" "13"  "33"
item3 "number3" "peaches" "441" "25"
item4 "number4" "apples"  "123" "13"
item5 "number3" "peaches" "44"  "25"
item6 "number4" "apples"  "12"  "13"
item7 "number1" "apples"  "22"  "33"
')

data.table approach, this enables you to select which columns you want to compare

setkey(test,item,number,fruit,ID1,ID2)
setkey(test2,item,number,fruit,ID1,ID2)
test[!test2]
item  number   fruit ID1 ID2
1: item1 number1  apples  22  33
2: item3 number3 peaches  44  25
3: item4 number4  apples  12  13

Sql approach

sqldf('select * from test except select * from test2')
item  number   fruit ID1 ID2
1: item1 number1  apples  22  33
2: item3 number3 peaches  44  25
3: item4 number4  apples  12  13

The following should get you there:

rows <- unique(unlist(mapply(function(x, y) 
          sapply(setdiff(x, y), function(d) which(x==d)), test2, test1)))
test2[rows, ]

What's happening here is:

  • mapply is used to do a column-wise comparison between the two datasets.
  • It uses setdiff to find any item which are in the former but not the latter
  • which identifies which row of the former is not present.
  • unique(unlist(....)) grabs all unique rows

  • Then we use that as a filter to the former, ie test2

Results:

       number   fruit ID1 ID2
item1 number1 papayas  22  33
item3 number3 peaches 441  25
item4 number4  apples 123  13

edit:

Make sure that your test & test2 are data.frames and not matrices, since mapply iterates over each element of a matrix, but over each column of a data.frame

test  <- as.data.frame(test,  stringsAsFactors=FALSE)
test2 <- as.data.frame(test2, stringsAsFactors=FALSE)