What is the time complexity of checking membership in dict.items()?

Lookup in an instance of dict_items is an O(1) operation (though one with an arbitrarily large constant, related to the complexity of comparing values.)

dictitems_contains doesn't simply try to hash the tuple and look it up in a set-like collection of key/value pairs.

(Note: all of the following links are just to different lines of dictitems_contain, if you don't want to click on them individually.)

To evaluate

(-1, [1]) in d2.items()

it first extracts the key from the tuple, then tries to find that key in the underlying dict. If that lookup fails, it immediately returns false. Only if the key is found does it then compare the value from the tuple to the value mapped to the key in the dict.

At no point does dictitems_contains need to hash the second element of the tuple.

It's not clear in what ways an instance of dict_items is not set-like when the values are non-hashable, as mentioned in the documentation.

A simplified, pure-Python implementation of dict_items.__contains__ might look something like

class DictItems:
    def __init__(self, d):
        self.d = d

    def __contains__(self, t):
        key = t[0]
        value = t[1]
        try:
            dict_value = self.d[key]  # O(1) lookup
        except KeyError:
            return False
    
        return value == dict_value  # Arbitrarily expensive comparison

    ...

where d.items() returns DictItems(d).

Short-answer

The time complexity of membership testing in item views is O(1).

Psuedo-code for lookup

This is how the membership testing works:

def dictitems_contains(dictview, key_value_pair):
    d = dictview.mapping
    k, v = key_value_pair
    try:
        return d[k] == v
    except KeyError:
        return False

Actual Code

Here's the C source code:

static int
dictitems_contains(_PyDictViewObject *dv, PyObject *obj)
{
    int result;
    PyObject *key, *value, *found;
    if (dv->dv_dict == NULL)
        return 0;
    if (!PyTuple_Check(obj) || PyTuple_GET_SIZE(obj) != 2)
        return 0;
    key = PyTuple_GET_ITEM(obj, 0);
    value = PyTuple_GET_ITEM(obj, 1);
    found = PyDict_GetItemWithError((PyObject *)dv->dv_dict, key);
    if (found == NULL) {
        if (PyErr_Occurred())
            return -1;
        return 0;
    }
    Py_INCREF(found);
    result = PyObject_RichCompareBool(found, value, Py_EQ);
    Py_DECREF(found);
    return result;
}

Timing evidence for O(1) complexity

We get the same constant lookup time regardless of the dictionary size (in these cases: 100, 1,000, and 10,000).

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(100))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(1_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.2 nsec per loop

$ python3.8 -m timeit -s 'd = dict.fromkeys(range(10_000))'  '(99, None) in d.items()'
5000000 loops, best of 5: 92.1 nsec per loop

Evidence that lookup calls hash()

We can monitor hash calls by patching _hash_():

class Int(int):
    def __hash__(self):
        print('Hash called')
        return hash(int(self))

Applying the monitoring tool show that hashing occurs when the dictionary is created and again when doing membership testing on the items view:

>>> d = {Int(1): 'one'}
Hash called
>>> (Int(1), 'one') in d.items()
Hash called
True

What is the time complexity of checking membership in dict.items()?

Short-answer

Psuedo-code for lookup

Actual Code

Timing evidence for O(1) complexity

Evidence that lookup calls hash()

Tags:

Python

Dictionary

Python 3.X

Time Complexity

Related

Recent Posts