R selecting all rows from a data frame that don't appear in another
Here's another way:
x <- rbind(test2, test)
x[! duplicated(x, fromLast=TRUE) & seq(nrow(x)) <= nrow(test2), ]
# number fruit ID1 ID2
# item1 number1 papayas 22 33
# item3 number3 peaches 441 25
# item4 number4 apples 123 13
Edit: modified to preserve row names.
There are two ways to solve this, using data.table and sqldf
library(data.table)
test<- fread('
item number fruit ID1 ID2
item1 "number1" "apples" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "44" "25"
item4 "number4" "apples" "12" "13"
')
test2<- fread('
item number fruit ID1 ID2
item1 "number1" "papayas" "22" "33"
item2 "number2" "oranges" "13" "33"
item3 "number3" "peaches" "441" "25"
item4 "number4" "apples" "123" "13"
item5 "number3" "peaches" "44" "25"
item6 "number4" "apples" "12" "13"
item7 "number1" "apples" "22" "33"
')
data.table approach, this enables you to select which columns you want to compare
setkey(test,item,number,fruit,ID1,ID2)
setkey(test2,item,number,fruit,ID1,ID2)
test[!test2]
item number fruit ID1 ID2
1: item1 number1 apples 22 33
2: item3 number3 peaches 44 25
3: item4 number4 apples 12 13
Sql approach
sqldf('select * from test except select * from test2')
item number fruit ID1 ID2
1: item1 number1 apples 22 33
2: item3 number3 peaches 44 25
3: item4 number4 apples 12 13
The following should get you there:
rows <- unique(unlist(mapply(function(x, y)
sapply(setdiff(x, y), function(d) which(x==d)), test2, test1)))
test2[rows, ]
What's happening here is:
mapply
is used to do a column-wise comparison between the two datasets.- It uses
setdiff
to find any item which are in the former but not the latter which
identifies which row of the former is not present.unique(unlist(....))
grabs all unique rowsThen we use that as a filter to the former, ie
test2
Results:
number fruit ID1 ID2
item1 number1 papayas 22 33
item3 number3 peaches 441 25
item4 number4 apples 123 13
edit:
Make sure that your test
& test2
are data.frames
and not matrices
, since mapply
iterates over each element of a matrix, but over each column of a data.frame
test <- as.data.frame(test, stringsAsFactors=FALSE)
test2 <- as.data.frame(test2, stringsAsFactors=FALSE)