How to sum a column grouped by other columns in a list?
Create to DataFrame
and aggregate third column converted to integers by first and second columns, last convert back to list
s:
df = pd.DataFrame(L)
L = df[2].astype(int).groupby([df[0], df[1]]).sum().reset_index().values.tolist()
print (L)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
And solution with defaultdict, python 3.x only:
from collections import defaultdict
d = defaultdict(int)
#https://stackoverflow.com/a/10532492
for *head, tail in L:
d[tuple(head)] += int(tail)
d = [[*i, j] for i, j in sorted(d.items())]
print (d)
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
dynamically grouping by all columns except the last one:
In [24]: df = pd.DataFrame(data)
In [25]: df.groupby(df.columns[:-1].tolist(), as_index=False).agg(lambda x: x.astype(int).sum()).values.tolist()
Out[25]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
This is an O(n) solution via collections.defaultdict
, adaptable to any number of keys.
If your desired output is a list, then this may be preferable to a solution via Pandas, which requires conversion to and from a non-standard type.
from collections import defaultdict
lst = [['Andrew', '1', '9'], ['Peter', '1', '10'], ['Andrew', '1', '8'],
['Peter', '1', '11'], ['Sam', '4', '9'], ['Andrew', '2', '2']]
d = defaultdict(int)
for *keys, val in lst:
d[tuple(keys)] += int(val)
res = [[*k, v] for k, v in sorted(d.items())]
Result
[['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]
Explanation
- Cycle through your list of lists, define keys / value and add to your
defaultdict
of lists. - Use a list comprehension to convert dictionary to desired output.
Op1
You can pass a index sum
and add tolist convert back to list
pd.DataFrame(L).\
set_index([0,1])[2].astype(int).sum(level=[0,1]).\
reset_index().values.tolist()
Out[78]: [['Andrew', '1', 17], ['Peter', '1', 21], ['Sam', '4', 9], ['Andrew', '2', 2]]
Op2
For list of list you can using groupby
from itertools
from itertools import groupby
[k+[sum(int(v) for _,_, v in g)] for k, g in groupby(sorted(l), key = lambda x: [x[0],x[1]])]
Out[98]: [['Andrew', '1', 17], ['Andrew', '2', 2], ['Peter', '1', 21], ['Sam', '4', 9]]