Better to add item to a set, or convert final list to a set?

Option 2 sounds the most logical to me, especially with a defaultdict it should be fairly easy to do :)

import pprint
import collections

data = '''ID1 ID2 ID3
ID1 ID4 ID5
ID3 ID5 ID7 ID6'''

groups = collections.defaultdict(set)

for row in data.split('\n'):
    cols = row.split()
    for groupcol in cols:
        for col in cols:
            if col is not groupcol:
                groups[groupcol].add(col)

pprint.pprint(dict(groups))

Results:

{'ID1': set(['ID2', 'ID3', 'ID4', 'ID5']),
 'ID2': set(['ID1', 'ID3']),
 'ID3': set(['ID1', 'ID2', 'ID5', 'ID6', 'ID7']),
 'ID4': set(['ID1', 'ID5']),
 'ID5': set(['ID1', 'ID3', 'ID4', 'ID6', 'ID7']),
 'ID6': set(['ID3', 'ID5', 'ID7']),
 'ID7': set(['ID3', 'ID5', 'ID6'])}

TL;DR: Go with option 2. Just use sets from the start.

In Python, sets are hash-sets, and lists are dynamic arrays. Inserting is O(1) for both, but checking if an element exists is O(n) for the list and O(1) for the set.

So option 1 is immediately out. If you are inserting n items and need to check the list every time, then the overall complexity becomes O(n^2).

Options 2 and 3 are both optimal at O(n) overall. Option 2 might be faster in micro-benchnarks because you don't need to move objects between collections. In practice, choose the option that is easier to read and maintain in your specific circumstance.

Better to add item to a set, or convert final list to a set?

Tags:

Python

Loops

Set

Related

Recent Posts