Pandas merge dataframes with shared column, fillna in left with right
You can mask the data.
original data:
print(df)
one two three
0 1 1.0 1.0
1 2 NaN 2.0
2 3 3.0 NaN
print(df2)
one two three
0 4 4 4
1 4 2 4
2 4 4 3
See below, mask just fills based on condition.
# mask values where isna()
df1[['two','three']] = df1[['two','three']]\
.mask(df1[['two','three']].isna(),df2[['two','three']])
output:
one two three
0 1 1.0 1.0
1 2 2.0 2.0
2 3 3.0 3.0
Only if the indices are alligned (important note), we can use update
:
df1['b'].update(df2['b'])
a b e
0 1 0.0 a
1 2 1.0 1
2 3 0.0 2
3 4 1.0 b
Or simply fillna
:
df1['b'].fillna(df2['b'], inplace=True)
If you're indices are not alligned, see WenNYoBen's answer or comment underneath.
Short version
df1.b.fillna(df1.a.map(df2.set_index('a').b),inplace=True)
df1
Out[173]:
a b e
0 1 0.0 a
1 2 1.0 1
2 3 0.0 2
3 4 1.0 b
Since you mentioned there will be multiple columns
df = df1.combine_first(df1[['a']].merge(df2, on='a', how='left'))
df
Out[184]:
a b e
0 1 0.0 a
1 2 1.0 1
2 3 0.0 2
3 4 1.0 b
Also we can pass to fillna
with df
df1.fillna(df1[['a']].merge(df2, on='a', how='left'))
Out[185]:
a b e
0 1 0.0 a
1 2 1.0 1
2 3 0.0 2
3 4 1.0 b
The problem confusing merge is that both dataframes have a 'b' column, but the left and right versions have NaNs in mismatched places. You want to avoid getting unwanted multiple 'b' columns 'b_x', 'b_y' from merge
in the first place:
- slice the non-shared columns 'a','e' from df1
- do
merge(df2, 'left')
, this will pick up 'b' from the right dataframe (since it only exists in the right df) - finally do
df1.update(...)
, this will update the NaNs in the column 'b' taken from df2 withdf1['b']
Solution:
df1.update(df1[['a', 'e']].merge(df2, 'left'))
df1
a b e
0 1 0.0 a
1 2 1.0 1
2 3 0.0 2
3 4 1.0 b
Note: Because I used merge(..., how='left')
, I preserve the row order of the calling dataframe. If my df1
had values of a
that were not in order
a b e
0 1 0.0 a
1 2 1.0 1
2 4 1.0 b
3 3 NaN 2
The result would be
df1.update(df1[['a', 'e']].merge(df2, 'left'))
df1
a b e
0 1 0.0 a
1 2 1.0 1
2 4 1.0 b
3 3 0.0 2
Which is as expected.
Further...
If you want to be more explicit when there may be more columns involved
df1.update(df1.drop('b', 1).merge(df2, 'left', 'a'))
Even Further...
If you don't want to update
the dataframe, we can use combine_first
Quick
df1.combine_first(df1[['a', 'e']].merge(df2, 'left'))
Explicit
df1.combine_first(df1.drop('b', 1).merge(df2, 'left', 'a'))
EVEN FURTHER!...
The 'left'
merge
may preserve order but NOT the index. This is the ultra conservative approach:
df3 = df1.drop('b', 1).merge(df2, 'left', on='a').set_index(df1.index)
df1.combine_first(df3)