Finding matching keys in two large dictionaries and doing it fast
Use sets, because they have a built-in intersection
method which ought to be quick:
myRDP = { 'Actinobacter': 'GATCGA...TCA', 'subtilus sp.': 'ATCGATT...ACT' }
myNames = { 'Actinobacter': '8924342' }
rdpSet = set(myRDP)
namesSet = set(myNames)
for name in rdpSet.intersection(namesSet):
print name, myNames[name]
# Prints: Actinobacter 8924342
You could do this:
for key in myRDP:
if key in myNames:
print key, myNames[key]
Your first attempt was slow because you were comparing every key in myRDP with every key in myNames. In algorithmic jargon, if myRDP has n elements and myNames has m elements, then that algorithm would take O(n×m) operations. For 600k elements each, this is 360,000,000,000 comparisons!
But testing whether a particular element is a key of a dictionary is fast -- in fact, this is one of the defining characteristics of dictionaries. In algorithmic terms, the key in dict
test is O(1), or constant-time. So my algorithm will take O(n) time, which is one 600,000th of the time.