Splitting a list of tuples to several lists by the same tuple items
This can be done relatively efficiently with a supporting dict
:
def split_by_idx(items, idx=1):
result = {}
for item in items:
key = item[idx]
if key not in result:
result[key] = []
result[key].append(item)
return result
and the lists can be collected from result
with dict.values()
:
lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]
d = split_by_idx(lst)
print(list(d.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]
This could be implemented also with dict.setdefault()
or a defaultdict
which are fundamentally the same except that you do not explicitly have to handle the "key not present" case:
def split_by_idx_sd(items, idx=1):
result = {}
for item in items:
result.setdefault(item[idx], []).append(item)
return result
import collections
def split_by_idx_dd(items, idx=1):
result = collections.defaultdict(list)
for item in items:
result[item[idx]].append(item)
return result
Timewise, the dict
-based solution is the fastest for your input:
%timeit split_by_idx(lst)
# 1000000 loops, best of 3: 776 ns per loop
%timeit split_by_idx_sd(lst)
# 1000000 loops, best of 3: 866 ns per loop
%timeit split_by_idx_dd(lst)
# 1000000 loops, best of 3: 1.16 µs per loop
but you would get different timings depending on the "collision rate" of your input. In general, you should expect split_by_idx()
to be the fastest with low collision rate (i.e. most of the entries create a new element of the dict
), while split_by_idx_dd()
should be fastest for high collision rate (i.e. most of the entries get appended to existing defaultdict
key).
You could use a collections.defaultdict
to group by colour:
from collections import defaultdict
lst = [("hello", "Blue"), ("hi", "Red"), ("hey", "Blue"), ("yo", "Green")]
colours = defaultdict(list)
for word, colour in lst:
colours[colour].append((word, colour))
print(colours)
# defaultdict(<class 'list'>, {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]})
Or if you prefer using no libraries, dict.setdefault
is an option:
colours = {}
for word, colour in lst:
colours.setdefault(colour, []).append((word, colour))
print(colours)
# {'Blue': [('hello', 'Blue'), ('hey', 'Blue')], 'Red': [('hi', 'Red')], 'Green': [('yo', 'Green')]}
If you just want the colour tuples separated into nested lists of tuples, print the values()
as a list:
print(list(colours.values()))
# [[('hello', 'Blue'), ('hey', 'Blue')], [('hi', 'Red')], [('yo', 'Green')]]
Benefit of the above approaches is they automatically initialize empty lists for new keys as you add them, so you don't have to do that yourself.