Can pandas groupby aggregate into a list, rather than sum, mean, etc?
I used the following
grouped = df.groupby('A')
df = grouped.aggregate(lambda x: tuple(x))
df['grouped'] = df['B'] + df['C']
I am answering the question as stated in its title and first sentence: the following aggregates values to lists:
df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist())
Below this is demonstrated in a simple example:
import pandas as pd
df = pd.DataFrame( {'A' : [1, 1, 1, 1, 2, 2, 3], 'B' : [10, 12, 11, 10, 11, 12, 14], 'C' : [22, 20, 8, 10, 13, 10, 0]})
print df
df2 = df.groupby('A').aggregate(lambda tdf: tdf.unique().tolist())
print df2
# Old version:
# df2=df.groupby(['A']).apply(lambda tdf: pd.Series( dict([[vv,tdf[vv].unique().tolist()] for vv in tdf if vv not in ['A']]) ))
The output is as follows:
In [3]: run tmp
A B C
0 1 10 22
1 1 12 20
2 1 11 8
3 1 10 10
4 2 11 13
5 2 12 10
6 3 14 0
[7 rows x 3 columns]
B C
A
1 [10, 12, 11] [22, 20, 8, 10]
2 [11, 12] [13, 10]
3 [14] [0]
[3 rows x 2 columns]
Similar solution, but fairly transparent (I think). you can get full list or unique lists.
df = pd.DataFrame({'A':[1,1,2,2,2,3,3,3,4,5],
'B':[6,7, 8,8,9, 9,9,10,11,12],
'C':['foo']*10})
df
Out[24]:
A B C
0 1 6 foo
1 1 7 foo
2 2 8 foo
3 2 8 foo
4 2 9 foo
5 3 9 foo
6 3 9 foo
7 3 10 foo
8 4 11 foo
9 5 12 foo
list_agg = df.groupby(by='A').agg({'B':lambda x: list(x),
'C':lambda x: tuple(x)})
list_agg
Out[26]:
C B
A
1 (foo, foo) [6, 7]
2 (foo, foo, foo) [8, 8, 9]
3 (foo, foo, foo) [9, 9, 10]
4 (foo,) [11]
5 (foo,) [12]
unique_list_agg = df.groupby(by='A').agg({'B':lambda x: list(pd.unique(x)),
'C':lambda x: tuple(pd.unique(x))})
unique_list_agg
Out[28]:
C B
A
1 (foo,) [6, 7]
2 (foo,) [8, 9]
3 (foo,) [9, 10]
4 (foo,) [11]
5 (foo,) [12]
Here is a one liner
# if list of unique items is desired, use set
df.groupby('A',as_index=False)['B'].aggregate(lambda x: set(x))
# if duplicate items are okay, use list
df.groupby('A',as_index=False)['B'].aggregate(lambda x: list(x))