*large* python dictionary with persistence storage for quick look-ups
No one has mentioned dbm. It is opened like a file, behaves like a dictionary and is in the standard distribution.
From the docs https://docs.python.org/3/library/dbm.html
import dbm
# Open database, creating it if necessary.
with dbm.open('cache', 'c') as db:
# Record some values
db[b'hello'] = b'there'
db['www.python.org'] = 'Python Website'
db['www.cnn.com'] = 'Cable News Network'
# Note that the keys are considered bytes now.
assert db[b'www.python.org'] == b'Python Website'
# Notice how the value is now in bytes.
assert db['www.cnn.com'] == b'Cable News Network'
# Often-used methods of the dict interface work too.
print(db.get('python.org', b'not present'))
# Storing a non-string key or value will raise an exception (most
# likely a TypeError).
db['www.yahoo.com'] = 4
# db is automatically closed when leaving the with statement.
I would try this before any of the more exotic forms, and using shelve/pickle will pull everything into memory on loading.
Cheers
Tim
If you want to persist a large dictionary, you are basically looking at a database.
Python comes with built in support for sqlite3, which gives you an easy database solution backed by a file on disk.
In principle the shelve module does exactly what you want. It provides a persistent dictionary backed by a database file. Keys must be strings, but shelve will take care of pickling/unpickling values. The type of db file can vary, but it can be a Berkeley DB hash, which is an excellent light weight key-value database.
Your data size sounds huge so you must do some testing, but shelve/BDB is probably up to it.
Note: The bsddb module has been deprecated. Possibly shelve will not support BDB hashes in future.
Without a doubt (in my opinion), if you want this to persist, then Redis is a great option.
- Install redis-server
- Start redis server
- Install redis python pacakge (pip install redis)
- Profit.
import redis
ds = redis.Redis(host="localhost", port=6379)
with open("your_text_file.txt") as fh:
for line in fh:
line = line.strip()
k, _, v = line.partition("=")
ds.set(k, v)
Above assumes a files of values like:
key1=value1
key2=value2
etc=etc
Modify insertion script to your needs.
import redis
ds = redis.Redis(host="localhost", port=6379)
# Do your code that needs to do look ups of keys:
for mykey in special_key_list:
val = ds.get(mykey)
Why I like Redis.
- Configurable persistance options
- Blazingly fast
- Offers more than just key / value pairs (other data types)
- @antrirez