Pickle or json?

If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.

If you are more concerned with interoperability, security, and/or human readability, then use JSON.


The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:

  • cPickle 3.8x faster loading
  • cPickle 1.5x faster reading
  • cPickle slightly smaller encoding

Reproduce this yourself with this gist, which is based on the Konstantin's benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

Results with python 2.7 on a decent 2015 Xeon processor:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

Python 3.4 with pickle protocol 3 is even faster.


I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.


If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).


You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/