Split a list of tuples into sub-lists of the same tuple field
Use itertools.groupby:
import itertools
import operator
data=[(1, 'A', 'foo'),
(2, 'A', 'bar'),
(100, 'A', 'foo-bar'),
('xx', 'B', 'foobar'),
('yy', 'B', 'foo'),
(1000, 'C', 'py'),
(200, 'C', 'foo'),
]
for key,group in itertools.groupby(data,operator.itemgetter(1)):
print(list(group))
yields
[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')]
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')]
[(1000, 'C', 'py'), (200, 'C', 'foo')]
Or, to create one list with each group as a sublist, you could use a list comprehension:
[list(group) for key,group in itertools.groupby(data,operator.itemgetter(1))]
The second argument to itertools.groupby
is a function which itertools.groupby
applies to each item in data
(the first argument). It is expected to return a key
. itertools.groupby
then groups together all contiguous items with the same key
.
operator.itemgetter(1) picks off the second item in a sequence.
For example, if
row=(1, 'A', 'foo')
then
operator.itemgetter(1)(row)
equals 'A'
.
As @eryksun points out in the comments, if the categories of the tuples appear in some random order, then you must sort data
first before applying itertools.groupby
. This is because itertools.groupy
only collects contiguous items with the same key into groups.
To sort the tuples by category, use:
data2=sorted(data,key=operator.itemgetter(1))
collections.defaultdict
itertools.groupby
requires the input to be sorted by the key field, otherwise you will have to sort first, incurring O(n log n) cost. For guaranteed O(n) time complexity, you can use a defaultdict
of lists:
from collections import defaultdict
dd = defaultdict(list)
for item in data:
dd[item[1]].append(item)
res = list(dd.values())
print(res)
[[(1, 'A', 'foo'), (2, 'A', 'bar'), (100, 'A', 'foo-bar')],
[('xx', 'B', 'foobar'), ('yy', 'B', 'foo')],
[(1000, 'C', 'py'), (200, 'C', 'foo')]]