Minus operation of data frames
I remember coming across this exact issue quite a few months back. Managed to sift through my Evernote one-liners.
Note: This is not my solution. Credit goes to whoever wrote it (whom I can't seem to find at the moment).
If you don't worry about rownames
then you can do:
df1[!duplicated(rbind(df2, df1))[-seq_len(nrow(df2))], ]
# c1 c2
# 1 a 1
# 2 b 2
Edit: A data.table
solution:
dt1 <- data.table(df1, key="c1")
dt2 <- data.table(df2)
dt1[!dt2]
or better one-liner (from v1.9.6+):
setDT(df1)[!df2, on="c1"]
This returns all rows in df1
where df2$c1
doesn't have a match with df1$c1
.
I prefer sqldf
package:
require(sqldf)
sqldf("select * from df1 except select * from df2")
## c1 c2
## 1 a 1
## 2 b 2
You can create identifier columnas then subset:
e.g.
df1 <- data.frame(c1=c("a","b","c","d"),c2=c(1,2,3,4), indf1 = rep("Y",4) )
df2 <- data.frame(c1=c("c","d","e","f"),c2=c(3,4,5,6),indf2 = rep("Y",4) )
merge(df1,df2)
# c1 c2 indf1 indf2
#1 c 3 Y Y
#2 d 4 Y Y
bigdf <- merge(df1,df2,all=TRUE)
# c1 c2 indf1 indf2
#1 a 1 Y <NA>
#2 b 2 Y <NA>
#3 c 3 Y Y
#4 d 4 Y Y
#5 e 5 <NA> Y
#6 f 6 <NA> Y
Then subset how you wish:
bigdf[is.na(bigdf$indf1) ,]
# c1 c2 indf1 indf2
#5 e 5 <NA> Y
#6 f 6 <NA> Y
bigdf[is.na(bigdf$indf2) ,] #<- output you requested those not in df2
# c1 c2 indf1 indf2
#1 a 1 Y <NA>
#2 b 2 Y <NA>