Python object persistence

For beginners, You can port your shelve databases to ZODB databases like this:

#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re

reload(sys)
sys.setdefaultencoding("utf-8")

parser = OptionParser()

parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")

parser.set_defaults()
options, args = parser.parse_args()

if options.in_file == False or options.out_file == False :
    print "Need input and output database filenames"
    exit(1)

db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()

for key, value in db.iteritems() :
    print "Copying key: " + str(key)
    newdb[key] = value

transaction.commit()

Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:

  • Transparent persistence for Python objects

    ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.

  • Full ACID-compatible transaction support (including savepoints)

    This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.

ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)

The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.

Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.

ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.

The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.