Is there a better more readable way to coalese columns in pandas
The Pandas equivalent to COALESCE
is the method fillna()
:
result = column_a.fillna(column_b)
The result is a column where each value is taken from column_a
if that column provides a non-null value, otherwise the value is taken from column_b
. So your combo1
can be produced with:
df['first'].fillna(df['second']).fillna(df['third'])
giving:
0 A
1 C
2 B
3 None
4 A
And your combo2
can be produced with:
(df['second']).fillna(df['third']).fillna(df['first'])
which returns the new column:
0 C
1 C
2 B
3 None
4 B
If you wanted an efficient operation called coalesce
, it could simply combine columns with fillna()
from left to right and then return the result:
def coalesce(df, column_names):
i = iter(column_names)
column_name = next(i)
answer = df[column_name]
for column_name in i:
answer = answer.fillna(df[column_name])
return answer
print coalesce(df, ['first', 'second', 'third'])
print coalesce(df, ['second', 'third', 'first'])
which gives:
0 A
1 C
2 B
3 None
4 A
0 C
1 C
2 B
3 None
4 B