Reversible dictionary for python

If your keys and values are non-overlapping, one obvious approach is to simply store them in the same dict. ie:

class BidirectionalDict(dict):
    def __setitem__(self, key, val):
        dict.__setitem__(self, key, val)
        dict.__setitem__(self, val, key)

    def __delitem__(self, key):
        dict.__delitem__(self, self[key])
        dict.__delitem__(self, key)

d = BidirectionalDict()
d['foo'] = 4
print d[4]   # Prints 'foo'

(You'll also probably want to implement things like the __init__, update and iter* methods to act like a real dict, depending on how much functionality you need).

This should only involve one lookup, though may not save you much in memory (you still have twice the number of dict entries after all). Note however that neither this nor your original will use up twice as much space: the dict only takes up space for the references (effectively pointers), plus an overallocation overhead. The space taken up by your data itself will not be repeated twice since the same objects are pointed to.


Related posts:

Python mapping inverse

Python 1:1 mappings

Of course, if all values and keys are unique, couldn't you just use a single dictionary, and insert both key:value and value:key initially?


In The Art of Computer Programming, Vokume 3 Knuth has a section on lookups of secondary keys. For purposes of your question, the value could be considered the secondary key.

The first suggestion is to do what you have done: make an efficient index of the keys by value.

The second suggestion is to setup a large btree that is a composite index of the clustered data, where the branch nodes contain values and the leaves contain the key data and pointers to the larger record (if there is one.)

If the data is geometric (as yours appears to be) there are things called post-office trees. It can answer questions like, what is the nearest object to point x. A few examples are here: http://simsearch.yury.name/russir/01nncourse-hand.pdf Another simple option for this kind of query is the quadtree and the k-d tree. http://en.wikipedia.org/wiki/Quadtree

Another final option is combinatorial hashing, where you combine the key and value into a special kind of hash that lets you do efficient lookups on the hash, even when you don't have both values. I couldn't find a good combinatorial hash explanation online, but it is in TAoCP, Volume 3 Second Edition on page 573.

Granted, for some of these you may have to write your own code. But if memory or performance is really key, you might want to take the time.