Better to add item to a set, or convert final list to a set?
Option 2 sounds the most logical to me, especially with a defaultdict it should be fairly easy to do :)
import pprint
import collections
data = '''ID1 ID2 ID3
ID1 ID4 ID5
ID3 ID5 ID7 ID6'''
groups = collections.defaultdict(set)
for row in data.split('\n'):
cols = row.split()
for groupcol in cols:
for col in cols:
if col is not groupcol:
groups[groupcol].add(col)
pprint.pprint(dict(groups))
Results:
{'ID1': set(['ID2', 'ID3', 'ID4', 'ID5']),
'ID2': set(['ID1', 'ID3']),
'ID3': set(['ID1', 'ID2', 'ID5', 'ID6', 'ID7']),
'ID4': set(['ID1', 'ID5']),
'ID5': set(['ID1', 'ID3', 'ID4', 'ID6', 'ID7']),
'ID6': set(['ID3', 'ID5', 'ID7']),
'ID7': set(['ID3', 'ID5', 'ID6'])}
TL;DR: Go with option 2. Just use sets from the start.
In Python, sets are hash-sets, and lists are dynamic arrays. Inserting is O(1)
for both, but checking if an element exists is O(n)
for the list and O(1)
for the set.
So option 1 is immediately out. If you are inserting n
items and need to check the list every time, then the overall complexity becomes O(n^2)
.
Options 2 and 3 are both optimal at O(n)
overall. Option 2 might be faster in micro-benchnarks because you don't need to move objects between collections. In practice, choose the option that is easier to read and maintain in your specific circumstance.