How would I implement a dict with Abstract Base Classes in Python?
The best way to demonstrate this without actually using a dict
anywhere is probably to implement something dead simple, very different from dict
, and not completely useless. Like a fixed-sized mapping of fixed-size bytes
to same-fixed-size bytes
. (You might use this for, e.g., a routing table—it'll be much more compact than a dict
mapping unpacked keys to unpacked values, although obviously at the cost of speed and flexibility.)
A hash table is just an array of (hash, key, value)
tuples. Since the whole point of this is packing data in, we cram those into a struct
, meaning we can just use a big bytearray
for storage. To mark a slot empty, we set its hash value to 0
—which means we need to "escape" any real 0
by turning it into a 1
, which is stupid, but simpler to code. We'll also use the dumbest possible probe
algorithm for simplicity.
class FixedHashTable(object):
hashsize = 8
def __init__(self, elementsize, size):
self.elementsize = elementsize
self.size = size
self.entrysize = self.hashsize + self.elementsize * 2
self.format = 'q{}s{}s'.format(self.elementsize, self.elementsize)
assert struct.calcsize(self.format) == self.entrysize
self.zero = b'\0' * self.elementsize
self.store = bytearray(struct.pack(self.format, 0,
self.zero, self.zero)
) * self.size
def hash(self, k):
return hash(k) or 1
def stash(self, i, h, k, v):
entry = struct.pack(self.format, h, k, v)
self.store[i*self.entrysize:(i+1)*self.entrysize] = entry
def fetch(self, i):
entry = self.store[i*self.entrysize:(i+1)*self.entrysize]
return struct.unpack(self.format, entry)
def probe(self, keyhash):
i = keyhash % self.size
while True:
h, k, v = self.fetch(i)
yield i, h, k, v
i = (i + 1) % self.size
if i == keyhash % self.size:
break
As the error message says, you need to provide implementations for the abstract methods __delitem__
, __getitem__
, __iter__
, __len__
, and __setitem__
. However, a better place to look is the docs, which will tell you that if you implement those five methods (plus any other methods required by the base classes, but as you can see from the table there are none), you'll get all the other methods for free. You may not get the most efficient possible implementations of all of them, but we'll come back to that.
First, let's deal with __len__
. Normally people expect this to be O(1), which means we need to keep track of it independently, updating it as needed. So:
class FixedDict(collections.abc.MutableMapping):
def __init__(self, elementsize, size):
self.hashtable = FixedHashTable(elementsize, size)
self.len = 0
Now, __getitem__
just probes until it finds the desired key or reaches the end:
def __getitem__(self, key):
keyhash = self.hashtable.hash(key)
for i, h, k, v in self.hashtable.probe(keyhash):
if h and k == key:
return v
And __delitem__
does the same thing, except it empties out the slot if found, and updates len
.
def __delitem__(self, key):
keyhash = self.hashtable.hash(key)
for i, h, k, v in self.hashtable.probe(keyhash):
if h and k == key:
self.hashtable.stash(i, 0, self.hashtable.zero, self.hashtable.zero)
self.len -= 1
return
raise KeyError(key)
__setitem__
is a bit trickier—if found, we have to replace the value in the slot; if not, we have to fill an empty slot. And here we have to deal with the fact that the hash table may be full. And of course we have to take care of len
:
def __setitem__(self, key, value):
keyhash = self.hashtable.hash(key)
for i, h, k, v in self.hashtable.probe(keyhash):
if not h or k == key:
if not h:
self.len += 1
self.hashtable.stash(i, keyhash, key, value)
return
raise ValueError('hash table full')
And that leaves __iter__
. Just as with a dict
, we don't have any particular order, so we can just iterate the hash table slots and yield all the non-empty ones:
def __iter__(self):
return (k for (h, k, v) in self.hashtable.fetch(i)
for i in range(self.hashtable.size) if h)
While we're at it, we might as well write a __repr__
. Note that we can use the fact that we get items
for free:
def __repr__(self):
return '{}({})'.format(type(self).__name__, dict(self.items()))
However, note that the default items
just creates an ItemsView(self)
, and if you track that through the source, you'll see that it iterates self
and looks up each value. You can obviously do better if the performance matters:
def items(self):
pairs = ((k, v) for (h, k, v) in self.hashtable.fetch(i)
for i in range(self.hashtable.size) if h)
return collections.abc.ItemsView._from_iterable(pairs)
And likewise for values
, and possibly other methods.
How would I implement a dict with Abstract Base Classes?
A good answer will demonstrate how to make this work, specifically without subclassing dict.
Here's the error message: TypeError: Can't instantiate abstract class D with abstract methods __delitem__, __getitem__, __iter__, __len__, __setitem__
It turns out that one must implement them to use the Abstract Base Class (ABC), MutableMapping
.
Implementation
So I implement a mapping that works like a dict in most respects that uses the object's attribute reference dict for the mapping. (Delegation is not the same as inheritance, so we'll just delegate to the instance __dict__
, we could use any other ad-hoc mapping, but you don't seem to care about that part of the implementation. It makes sense to do it this way in Python 2, because MutableMapping doesn't have __slots__
in Python 2, so you're creating a __dict__
either way. In Python 3, you could avoid dicts altogether by setting __slots__
.)
from collections.abc import MutableMapping
class D(MutableMapping):
'''
Mapping that works like both a dict and a mutable object, i.e.
d = D(foo='bar')
and
d.foo returns 'bar'
'''
# ``__init__`` method required to create instance from class.
def __init__(self, *args, **kwargs):
'''Use the object dict'''
self.__dict__.update(*args, **kwargs)
# The next five methods are requirements of the ABC.
def __setitem__(self, key, value):
self.__dict__[key] = value
def __getitem__(self, key):
return self.__dict__[key]
def __delitem__(self, key):
del self.__dict__[key]
def __iter__(self):
return iter(self.__dict__)
def __len__(self):
return len(self.__dict__)
# The final two methods aren't required, but nice for demo purposes:
def __str__(self):
'''returns simple dict representation of the mapping'''
return str(self.__dict__)
def __repr__(self):
'''echoes class, id, & reproducible representation in the REPL'''
return '{}, D({})'.format(super(D, self).__repr__(),
self.__dict__)
Demonstration
And to demonstrate the usage:
>>> d = D((e, i) for i, e in enumerate('abc'))
>>> d
<__main__.D object at 0x7f75eb242e50>, D({'b': 1, 'c': 2, 'a': 0})
>>> d.a
0
>>> d.get('b')
1
>>> d.setdefault('d', []).append(3)
>>> d.foo = 'bar'
>>> print(d)
{'b': 1, 'c': 2, 'a': 0, 'foo': 'bar', 'd': [3]}
And for ensuring the dict API, lesson learned is that you can always check for collections.abc.MutableMapping
:
>>> isinstance(d, MutableMapping)
True
>>> isinstance(dict(), MutableMapping)
True
And while a dict is always going to be an instance of a MutableMapping due to registration on collections import, the reverse is not always true:
>>> isinstance(d, dict)
False
>>> isinstance(d, (dict, MutableMapping))
True
After performing this exercise, it is clear to me that using Abstract Base Classes provides only the guarantee of a standard API for users of the class. In this case, users assuming a MutableMapping object will be guaranteed the standard API for Python.
Caveats:
The fromkeys
class constructor method is not implemented.
>>> dict.fromkeys('abc')
{'b': None, 'c': None, 'a': None}
>>> D.fromkeys('abc')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'D' has no attribute 'fromkeys'
One could mask the builtin dict methods like get
or setdefault
>>> d['get'] = 'baz'
>>> d.get('get')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'str' object is not callable
It's fairly simple to unmask again:
>>> del d['get']
>>> d.get('get', 'Not there, but working')
'Not there, but working'
But I wouldn't use this code in production.
Demonstration without a dict, Python 3:
>>> class MM(MutableMapping):
... __delitem__, __getitem__, __iter__, __len__, __setitem__ = (None,) *5
... __slots__ = ()
...
>>> MM().__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MM' object has no attribute '__dict__'