Pickle alternatives

  • Protocol Buffer - e.g. used in Caffe; maintains type information, but you have to put quite much effort in it compared to pickle
  • MessagePack: See python package - supports streaming (source)
  • BSON: see python package docs

Depending on what exactly you want to store, there are other alternatives:

  • Apache Feather
  • Apache Avro
  • Apache Parquet

The way to compare those is:

  • Ease of use / Programming language support / Tooling support
  • Being readable by a human
  • Storage size
  • Read-time
  • Write-time
  • Features: (1) Append data (2) Read single row (3) having a schema

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler.

advantages over XML:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

https://developers.google.com/protocol-buffers/docs/pythontutorial


Pickle is actually quite fast so long as you aren't using the (default) ASCII protocol. Just make sure to dump using protocol=pickle.HIGHEST_PROTOCOL.


I think you should give PyTables a look. It should be ridiculously fast, at least faster than using an RDBMS, since it's very lax and doesn't impose any read/write restrictions, plus you get a better interface for managing your data, at least compared to pickling it.