Pickle alternatives
- Protocol Buffer - e.g. used in Caffe; maintains type information, but you have to put quite much effort in it compared to pickle
- MessagePack: See python package - supports streaming (source)
- BSON: see python package docs
Depending on what exactly you want to store, there are other alternatives:
- Apache Feather
- Apache Avro
- Apache Parquet
The way to compare those is:
- Ease of use / Programming language support / Tooling support
- Being readable by a human
- Storage size
- Read-time
- Write-time
- Features: (1) Append data (2) Read single row (3) having a schema
Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
advantages over XML:
- are simpler
- are 3 to 10 times smaller
- are 20 to 100 times faster
- are less ambiguous
- generate data access classes that are easier to use programmatically
https://developers.google.com/protocol-buffers/docs/pythontutorial
Pickle is actually quite fast so long as you aren't using the (default) ASCII protocol. Just make sure to dump using protocol=pickle.HIGHEST_PROTOCOL
.
I think you should give PyTables a look. It should be ridiculously fast, at least faster than using an RDBMS, since it's very lax and doesn't impose any read/write restrictions, plus you get a better interface for managing your data, at least compared to pickling it.