What is the time complexity of checking membership in dict.items()?
Lookup in an instance of dict_items
is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)
dictitems_contains
doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs.
(Note: all of the following links are just to different lines of dictitems_contain
, if you don't want to click on them individually.)
To evaluate
(-1, [1]) in d2.items()
it first extracts the key from the tuple, then tries to find that key in the underlying dict
. If that lookup fails, it immediately returns false. Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict.
At no point does dictitems_contains
need to hash the second element of the tuple.
It's not clear in what ways an instance of dict_items
is not set-like when the values are non-hashable, as mentioned in the documentation.
A simplified, pure-Python implementation of dict_items.__contains__
might look something like
class DictItems:
def __init__(self, d):
self.d = d
def __contains__(self, t):
key = t[0]
value = t[1]
try:
dict_value = self.d[key] # O(1) lookup
except KeyError:
return False
return value == dict_value # Arbitrarily expensive comparison
...
where d.items()
returns DictItems(d)
.
Short-answer
The time complexity of membership testing in item views is O(1)
.
Psuedo-code for lookup
This is how the membership testing works:
def dictitems_contains(dictview, key_value_pair):
d = dictview.mapping
k, v = key_value_pair
try:
return d[k] == v
except KeyError:
return False
Actual Code
Here's the C source code:
static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
int result;
PyObject *key, *value, *found;
if (dv->dv_dict == NULL)
return 0;
if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
return 0;
key = PyTuple_GET_ITEM(obj, 0);
value = PyTuple_GET_ITEM(obj, 1);
found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
if (found == NULL) {
if (PyErr_Occurred())
return -1;
return 0;
}
Py_INCREF(found);
result = PyObject_RichCompareBool(found, value, Py_EQ);
Py_DECREF(found);
return result;
}
Timing evidence for O(1) complexity
We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))' '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))' '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop
$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))' '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop
Evidence that lookup calls hash()
We can monitor hash calls by patching _hash_():
class Int(int):
def __hash__(self):
print('Hash called')
return hash(int(self))
Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:
>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True