Python: performance comparison of using `pickle` or `marshal` and using `re`

For pure speed, marshal will get you the fastest results.

Timings:

>>> timeit.timeit("pickle.dumps([1,2,3])","import pickle",number=10000)
0.2939901351928711
>>> timeit.timeit("json.dumps([1,2,3])","import json",number=10000)
0.09756112098693848
>>> timeit.timeit("pickle.dumps([1,2,3])","import cPickle as pickle",number=10000)
0.031056880950927734
>>> timeit.timeit("marshal.dumps([1,2,3])","import marshal", number=10000)
0.00703883171081543

When somebody are thinking about performance he should to remember 3 things:

  • Don't trust anybody - any benchmark can lie (by a different reasons: unprofessional, marketing, etc.)
  • Always measure your case - for example, cache system and statistics have totally different requirements. In one case you need to read as fast as possible, in other case - write
  • Repeat tests - new version of any software could be faster/slower, so any update could introduce benefits/penalties

For example, here is results of my benchmark:

jimilian$ python3.5 serializators.py
iterations= 100000
data= 'avzvasdklfjhaskldjfhkweljrqlkjb*@&$Y)(!#&$G@#lkjabfsdflb(*!G@#$(GKLJBmnz,bv(PGDFLKJ'
==== DUMP ====
Pickle:
>> 0.09806302400829736
Json: 2.0.9
>> 0.12253901800431777
Marshal: 4
>> 0.09477431800041813
Msgpack: (0, 4, 7)
>> 0.16701826300413813

==== LOAD ====
Pickle:
>> 0.10376790800364688
Json: 2.0.9
>> 0.30041573599737603
Marshal: 4
>> 0.034003349996055476
Msgpack: (0, 4, 7)
>> 0.061493027009419166

jimilian$ python3.5 serializators.py
iterations= 100000
data= [1,2,3]*100
==== DUMP ====
Pickle:
>> 0.9678693519963417
Json: 2.0.9
>> 4.494351467001252
Marshal: 4
>> 0.8597690019960282
Msgpack: (0, 4, 7)
>> 1.2778299400088144

==== LOAD ====
Pickle:
>> 1.0350999219954247
Json: 2.0.9
>> 3.349724347004667
Marshal: 4
>> 0.468191737003508
Msgpack: (0, 4, 7)
>> 0.3629750510008307

jimilian$ python2.7 serializators.py
iterations= 100000
data= [1,2,3]*100
==== DUMP ====
Pickle:
>> 50.5894570351
Json: 2.0.9
>> 2.69190311432
cPickle: 1.71
>> 5.14689707756
Marshal: 2
>> 0.539206981659
Msgpack: (0, 4, 7)
>> 0.752672195435

==== LOAD ====
Pickle:
>> 58.8052768707
Json: 2.0.9
>> 3.50090789795
cPickle: 1.71
>> 8.46298909187
Marshal: 2
>> 0.469168901443
Msgpack: (0, 4, 7)
>> 0.315001010895

So, as you can see sometimes it's better to use Pickle (python3, long string, dump), sometimes - msgpack (python3, long array, load), in python2 - things works completely different. That's why nobody can give certain answer that will be valid for everybody.


Time them and find out!

I'd expect cPickle to be the fastest but that's no guarantee.