memoize to disk - python - persistent memoization
Check out joblib.Memory
. It's a library for doing exactly that.
from joblib import Memory
memory = Memory("cachedir")
@memory.cache
def f(x):
print('Running f(%s)' % x)
return x
Python offers a very elegant way to do this - decorators. Basically, a decorator is a function that wraps another function to provide additional functionality without changing the function source code. Your decorator can be written like this:
import json
def persist_to_file(file_name):
def decorator(original_func):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
def new_func(param):
if param not in cache:
cache[param] = original_func(param)
json.dump(cache, open(file_name, 'w'))
return cache[param]
return new_func
return decorator
Once you've got that, 'decorate' the function using @-syntax and you're ready.
@persist_to_file('cache.dat')
def html_of_url(url):
your function code...
Note that this decorator is intentionally simplified and may not work for every situation, for example, when the source function accepts or returns data that cannot be json-serialized.
More on decorators: How to make a chain of function decorators?
And here's how to make the decorator save the cache just once, at exit time:
import json, atexit
def persist_to_file(file_name):
try:
cache = json.load(open(file_name, 'r'))
except (IOError, ValueError):
cache = {}
atexit.register(lambda: json.dump(cache, open(file_name, 'w')))
def decorator(func):
def new_func(param):
if param not in cache:
cache[param] = func(param)
return cache[param]
return new_func
return decorator
There is also diskcache
.
from diskcache import Cache
cache = Cache("cachedir")
@cache.memoize()
def f(x, y):
print('Running f({}, {})'.format(x, y))
return x, y
A cleaner solution powered by Python's Shelve module. The advantage is the cache gets updated in real time via well-known dict
syntax, also it's exception proof(no need to handle annoying KeyError
).
import shelve
def shelve_it(file_name):
d = shelve.open(file_name)
def decorator(func):
def new_func(param):
if param not in d:
d[param] = func(param)
return d[param]
return new_func
return decorator
@shelve_it('cache.shelve')
def expensive_funcion(param):
pass
This will facilitate the function to be computed just once. Next subsequent calls will return the stored result.