*large* python dictionary with persistence storage for quick look-ups

No one has mentioned dbm. It is opened like a file, behaves like a dictionary and is in the standard distribution.

From the docs https://docs.python.org/3/library/dbm.html

import dbm

# Open database, creating it if necessary.
with dbm.open('cache', 'c') as db:

    # Record some values
    db[b'hello'] = b'there'
    db['www.python.org'] = 'Python Website'
    db['www.cnn.com'] = 'Cable News Network'

    # Note that the keys are considered bytes now.
    assert db[b'www.python.org'] == b'Python Website'
    # Notice how the value is now in bytes.
    assert db['www.cnn.com'] == b'Cable News Network'

    # Often-used methods of the dict interface work too.
    print(db.get('python.org', b'not present'))

    # Storing a non-string key or value will raise an exception (most
    # likely a TypeError).
    db['www.yahoo.com'] = 4

# db is automatically closed when leaving the with statement.

I would try this before any of the more exotic forms, and using shelve/pickle will pull everything into memory on loading.

Cheers

Tim


If you want to persist a large dictionary, you are basically looking at a database.

Python comes with built in support for sqlite3, which gives you an easy database solution backed by a file on disk.


In principle the shelve module does exactly what you want. It provides a persistent dictionary backed by a database file. Keys must be strings, but shelve will take care of pickling/unpickling values. The type of db file can vary, but it can be a Berkeley DB hash, which is an excellent light weight key-value database.

Your data size sounds huge so you must do some testing, but shelve/BDB is probably up to it.

Note: The bsddb module has been deprecated. Possibly shelve will not support BDB hashes in future.


Without a doubt (in my opinion), if you want this to persist, then Redis is a great option.

  1. Install redis-server
  2. Start redis server
  3. Install redis python pacakge (pip install redis)
  4. Profit.

import redis

ds = redis.Redis(host="localhost", port=6379)

with open("your_text_file.txt") as fh:
    for line in fh:
        line = line.strip()
        k, _, v = line.partition("=")
        ds.set(k, v)

Above assumes a files of values like:

key1=value1
key2=value2
etc=etc

Modify insertion script to your needs.


import redis
ds = redis.Redis(host="localhost", port=6379)

# Do your code that needs to do look ups of keys:
for mykey in special_key_list:
    val = ds.get(mykey)

Why I like Redis.

  1. Configurable persistance options
  2. Blazingly fast
  3. Offers more than just key / value pairs (other data types)
  4. @antrirez