Dictionary access speed comparison with integer key against string key
This was my question too. Apparently, dictionaries with string keys are more efficient, but access times are really close. I ran the following code using Python 3:
import random
import timeit
import uuid
DICT_INT = dict()
DICT_STR = dict()
DICT_MIX = dict()
for i in range(2000000):
DICT_INT[i] = uuid.uuid4().hex
DICT_STR[str(i)] = uuid.uuid4().hex
DICT_MIX[i if random.randrange(2) else str(i)] = uuid.uuid4().hex
def int_lookup():
int_key = random.randrange(len(DICT_INT))
str_key = str(int_key)
mix_key = int_key if int_key % 2 else str_key
return int_key in DICT_INT
def str_lookup():
int_key = random.randrange(len(DICT_STR))
str_key = str(int_key)
mix_key = int_key if int_key % 2 else str_key
return str_key in DICT_STR
def mix_lookup():
int_key = random.randrange(len(DICT_MIX))
str_key = str(int_key)
mix_key = int_key if int_key % 2 else str_key
return mix_key in DICT_MIX
print('Int dict lookup: ', end='')
print(timeit.timeit('int_lookup', 'from __main__ import int_lookup', number=1000000000))
print('Str dict lookup: ', end='')
print(timeit.timeit("str_lookup", 'from __main__ import str_lookup', number=1000000000))
print('Mix dict lookup: ', end='')
print(timeit.timeit("mix_lookup", 'from __main__ import mix_lookup', number=1000000000))
and this is the result:
Int dict lookup: 12.395361029000014
Str dict lookup: 12.097380312000041
Mix dict lookup: 12.109765773000163
CPython's dict
implementation is in fact optimized for string key lookups. There are two different functions, lookdict
and lookdict_string
(lookdict_unicode
in Python 3), which can be used to perform lookups. Python will use the string-optimized version until a search for non-string data, after which the more general function is used. You can look at the actual implementation by downloading CPython's source and reading through dictobject.c
.
As a result of this optimization, lookups are faster when a dict
has all string keys.
I'm afraid your times don't really prove very much.
Your test for string in Dint is fastest: in general a test for anything that is not in a dictionary is quite likely to be fast, but that's only because you were lucky and first time hit an empty cell so the lookup could terminate. If you were unlucky and chose a value that hit one or more full cells then it could end up slower than the cases that actually find something.
Testing for an arbitrary string in a dictionary has to calculate the hash code for the string. That takes time proportional to the length of the string, but Python has a neat trick and only ever calculates it once for each string. Since you use the same string over and over in your timing test the time taken to calculate the hash is lost as it only happens the first time and not the other 99999999 times. If you were using a different string each time you would get a very different result.
Python has optimised code for dictionaries where the keys are strings. Overall you should find that using string keys where you use the same keys multiple times is slightly faster, but if you have to keep converting integers to string before the lookup you'll lose that advantage.