What's the most concise way in Python to group and sum a list of objects by the same property
The defaultdict
approach is probably better, assuming c.Y
is hashable, but here's another way:
from itertools import groupby
from operator import attrgetter
get_y = attrgetter('Y')
tuples = [(y, sum(c.Z for c in cs_with_y) for y, cs_with_y in
groupby(sorted(cs, key=get_y), get_y)]
To be a little more concrete about the differences:
This approach requires making a sorted copy of
cs
, which takes O(n log n) time and O(n) extra space. Alternatively, you can docs.sort(key=get_y)
to sortcs
in-place, which doesn't need extra space but does modify the listcs
. Note thatgroupby
returns an iterator so there's not any extra overhead there. If thec.Y
values aren't hashable, though, this does work, whereas thedefaultdict
approach will throw aTypeError
.But watch out -- in recent Pythons it'll raise
TypeError
if there are any complex numbers in there, and maybe in other cases. It might be possible to make this work with an appropriatekey
function --key=lambda e: (e.real, e.imag) if isinstance(e, complex) else e
seems to be working for anything I've tried against it right now, though of course custom classes that override the__lt__
operator to raise an exception are still no go. Maybe you could define a more complicated key function that tests for this, and so on.Of course, all we care about here is that equal things are next to each other, not so much that it's actually sorted, and you could write an O(n^2) function to do that rather than sort if you so desired. Or a function that's O(num_hashable + num_nonhashable^2). Or you could write an O(n^2) / O(num_hashable + num_nonhashable^2) version of
groupby
that does the two together.sblom's answer works for hashable
c.Y
attributes, with minimal extra space (because it computes the sums directly).philhag's answer is basically the same as sblom's, but uses more auxiliary memory by making a list of each of the
c
s -- effectively doing whatgroupby
does, but with hashing instead of assuming it's sorted and with actual lists instead of iterators.
So, if you know your c.Y
attribute is hashable and only need the sums, use sblom's; if you know it's hashable but want them grouped for something else as well, use philhag's; if they might not be hashable, use this one (with extra worrying as noted if they might be complex or a custom type that overrides __lt__
).
from collections import defaultdict
totals = defaultdict(int)
for c in cs:
totals[c.Y] += c.Z
tuples = totals.items()
You can use collections.defaultdict
to group the list by y values, and then sum over their z values:
import collections
ymap = collections.defaultdict(list)
for c in listOfCs:
ymap[c.Y].append(c)
print ([(y, sum(c.Z for c in clist)) for y,clist in ymap.values()])