Merge unequal dataframes and replace missing rows with 0
Take a look at the help page for merge. The all
parameter lets you specify different types of merges. Here we want to set all = TRUE
. This will make merge return NA
for the values that don't match, which we can update to 0 with is.na()
:
zz <- merge(df1, df2, all = TRUE)
zz[is.na(zz)] <- 0
> zz
x y
1 a 0
2 b 1
3 c 0
4 d 0
5 e 0
Updated many years later to address follow up question
You need to identify the variable names in the second data table that you aren't merging on - I use setdiff()
for this. Check out the following:
df1 = data.frame(x=c('a', 'b', 'c', 'd', 'e', NA))
df2 = data.frame(x=c('a', 'b', 'c'),y1 = c(0,1,0), y2 = c(0,1,0))
#merge as before
df3 <- merge(df1, df2, all = TRUE)
#columns in df2 not in df1
unique_df2_names <- setdiff(names(df2), names(df1))
df3[unique_df2_names][is.na(df3[, unique_df2_names])] <- 0
Created on 2019-01-03 by the reprex package (v0.2.1)
Or, as an alternative to @Chase's code, being a recent plyr fan with a background in databases:
require(plyr)
zz<-join(df1, df2, type="left")
zz[is.na(zz)] <- 0