Python pandas - particular merge/replacement
This takes a couple steps, left merge
on the columns that match, this will create 'x' and 'y' where there are clashes:
In [25]:
merged = df.merge(subdf, on=['id', 'name'], how='left')
merged
Out[25]:
id name val1_x val2_x val3 val1_y val2_y
0 1 a 0 0 0 0.3 4
1 2 a 0 0 0 NaN NaN
2 1 b 0 0 0 0.4 5
3 2 b 0 0 0 NaN NaN
4 1 c 0 0 0 NaN NaN
5 2 c 0 0 0 0.7 4
In [26]:
# take the values that of interest from the clashes
merged['val1'] = np.max(merged[['val1_x', 'val1_y']], axis=1)
merged['val2'] = np.max(merged[['val2_x', 'val2_y']], axis=1)
merged
Out[26]:
id name val1_x val2_x val3 val1_y val2_y val1 val2
0 1 a 0 0 0 0.3 4 0.3 4
1 2 a 0 0 0 NaN NaN 0.0 0
2 1 b 0 0 0 0.4 5 0.4 5
3 2 b 0 0 0 NaN NaN 0.0 0
4 1 c 0 0 0 NaN NaN 0.0 0
5 2 c 0 0 0 0.7 4 0.7 4
In [27]:
# drop the additional columns
merged = merged.drop(labels=['val1_x', 'val1_y','val2_x', 'val2_y'], axis=1)
merged
Out[27]:
id name val3 val1 val2
0 1 a 0 0.3 4
1 2 a 0 0.0 0
2 1 b 0 0.4 5
3 2 b 0 0.0 0
4 1 c 0 0.0 0
5 2 c 0 0.7 4
Another method would be to sort both df's on 'id' and 'name' and then call update
:
In [30]:
df = df.sort(columns=['id','name'])
subdf = subdf.sort(columns=['id','name'])
df.update(subdf)
df
Out[30]:
id name val1 val2 val3
0 1 a 0.3 4 0
2 2 c 0.7 4 0
4 1 c 0.0 0 0
1 1 b 0.4 5 0
3 2 b 0.0 0 0
5 2 c 0.0 0 0