itertools.groupby() not grouping correctly
itertools.groupby collects together contiguous items with the same key.
If you want all items with the same key, you have to sort self.data
first.
for mid, group in itertools.groupby(
sorted(self.data,key=operator.itemgetter(1)), key=operator.itemgetter(1)):
Below "fixes" several annoyances with Python's itertools.groupby
.
def groupby2(l, key=lambda x:x, val=lambda x:x, agg=lambda x:x, sort=True):
if sort:
l = sorted(l, key=key)
return ((k, agg((val(x) for x in v))) \
for k,v in itertools.groupby(l, key=key))
Specifically,
- It doesn't require that you sort your data.
- It doesn't require that you must use
key
as named parameter only. - The output is clean generator of
tuple(key, grouped_values)
where values are specified by 3rd parameter. - Ability to apply aggregation functions like sum or avg easily.
Example Usage
import itertools
from operator import itemgetter
from statistics import *
t = [('a',1), ('b',2), ('a',3)]
for k,v in groupby2(t, itemgetter(0), itemgetter(1), sum):
print(k, v)
This prints,
a 4
b 2
Play with this code
Variant without sorting (via dictionary). Should be better performance-wise.
def full_group_by(l, key=lambda x: x):
d = defaultdict(list)
for item in l:
d[key(item)].append(item)
return d.items()