What is a fast pythonic way to deepcopy just data from a python dict or list ?
It really depends on your needs. deepcopy
was built with the intention to do the (most) correct thing. It keeps shared references, it doesn't recurse into infinite recursive structures and so on... It can do that by keeping a memo
dictionary in which all encountered "things" are inserted by reference. That's what makes it quite slow for pure-data copies. However I would almost always say that deepcopy
is the most pythonic way to copy data even if other approaches could be faster.
If you have pure-data and a limited amount of types inside it you could build your own deepcopy
(build roughly after the implementation of deepcopy
in CPython):
_dispatcher = {}
def _copy_list(l, dispatch):
ret = l.copy()
for idx, item in enumerate(ret):
cp = dispatch.get(type(item))
if cp is not None:
ret[idx] = cp(item, dispatch)
return ret
def _copy_dict(d, dispatch):
ret = d.copy()
for key, value in ret.items():
cp = dispatch.get(type(value))
if cp is not None:
ret[key] = cp(value, dispatch)
return ret
_dispatcher[list] = _copy_list
_dispatcher[dict] = _copy_dict
def deepcopy(sth):
cp = _dispatcher.get(type(sth))
if cp is None:
return sth
else:
return cp(sth, _dispatcher)
This only works correct for all immutable non-container types and list
and dict
instances. You could add more dispatchers if you need them.
# Timings done on Python 3.5.3 - Windows - on a really slow laptop :-/
import copy
import msgpack
import json
import string
data = {'name':'John Doe','ranks':{'sports':13,'edu':34,'arts':45},'grade':5}
%timeit deepcopy(data)
# 11.9 µs ± 280 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit copy.deepcopy(data)
# 64.3 µs ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit json.loads(json.dumps(data))
# 65.9 µs ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
%timeit msgpack.unpackb(msgpack.packb(data))
# 56.5 µs ± 2.53 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
Let's also see how it performs when copying a big dictionary containing strings and integers:
data = {''.join([a,b,c]): 1 for a in string.ascii_letters for b in string.ascii_letters for c in string.ascii_letters}
%timeit deepcopy(data)
# 194 ms ± 5.37 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit copy.deepcopy(data)
# 1.02 s ± 46.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit json.loads(json.dumps(data))
# 398 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit msgpack.unpackb(msgpack.packb(data))
# 238 ms ± 8.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
@MSeifert The suggested answer is not accurate
So far i found ujson.loads(ujson.dumps(my_dict)) to be the fastest option which looks strange (how translating dict to string and then from string to new dict is faster then some pure copy)
Here is an example of the methods i tried and their running time for small dictionary (the results of course are more clear with larger dictionary):
x = {'a':1,'b':2,'c':3,'d':4, 'e':{'a':1,'b':2}}
#this function only handle dict of dicts very similar to the suggested solution
def fast_copy(d):
output = d.copy()
for key, value in output.items():
output[key] = fast_copy(value) if isinstance(value, dict) else value
return output
from copy import deepcopy
import ujson
%timeit deepcopy(x)
13.5 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit fast_copy(x)
2.57 µs ± 31.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
%timeit ujson.loads(ujson.dumps(x))
1.67 µs ± 14.8 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
is there any other C extension that might work better than ujson? it very strange that this is the fastest method to copy large dict.
I think you can manually implement what you need by overriding object.__deepcopy__
.
A pythonic way to do this is creating your custom dict
extends from builtin dict
and implement your custom __deepcopy__
.