pandas apply function that returns multiple values to rows in pandas dataframe
I've tried returning a tuple (I was using functions like scipy.stats.pearsonr
which return that kind of structures) but It returned a 1D Series instead of a Dataframe which was I expected. If I created a Series manually the performance was worse, so I fixed It using the result_type
as explained in the official API documentation:
Returning a Series inside the function is similar to passing result_type='expand'. The resulting column names will be the Series index.
So you could edit your code this way:
def myfunc(a, b, c):
# do something
return (e, f, g)
df.apply(myfunc, axis=1, result_type='expand')
Return Series
and it will put them in a DataFrame.
def myfunc(a, b, c):
do something
return pd.Series([e, f, g])
This has the bonus that you can give labels to each of the resulting columns. If you return a DataFrame it just inserts multiple rows for the group.
Based on the excellent answer by @U2EF1, I've created a handy function that applies a specified function that returns tuples to a dataframe field, and expands the result back to the dataframe.
def apply_and_concat(dataframe, field, func, column_names):
return pd.concat((
dataframe,
dataframe[field].apply(
lambda cell: pd.Series(func(cell), index=column_names))), axis=1)
Usage:
df = pd.DataFrame([1, 2, 3], index=['a', 'b', 'c'], columns=['A'])
print df
A
a 1
b 2
c 3
def func(x):
return x*x, x*x*x
print apply_and_concat(df, 'A', func, ['x^2', 'x^3'])
A x^2 x^3
a 1 1 1
b 2 4 8
c 3 9 27
Hope it helps someone.