Python decorator with multiprocessing fails
The problem is that pickle needs to have some way to reassemble everything that you pickle. See here for a list of what can be pickled:
http://docs.python.org/library/pickle.html#what-can-be-pickled-and-unpickled
When pickling my_func
, the following components need to be pickled:
An instance of
my_decorator_class
, calledmy_func
.This is fine. Pickle will store the name of the class and pickle its
__dict__
contents. When unpickling, it uses the name to find the class, then creates an instance and fills in the__dict__
contents. However, the__dict__
contents present a problem...The instance of the original
my_func
that's stored inmy_func.target
.This isn't so good. It's a function at the top-level, and normally these can be pickled. Pickle will store the name of the function. The problem, however, is that the name "my_func" is no longer bound to the undecorated function, it's bound to the decorated function. This means that pickle won't be able to look up the undecorated function to recreate the object. Sadly, pickle doesn't have any way to know that object it's trying to pickle can always be found under the name
__main__.my_func
.
You can change it like this and it will work:
import random
import multiprocessing
import functools
class my_decorator(object):
def __init__(self, target):
self.target = target
try:
functools.update_wrapper(self, target)
except:
pass
def __call__(self, candidates, args):
f = []
for candidate in candidates:
f.append(self.target([candidate], args)[0])
return f
def old_my_func(candidates, args):
f = []
for c in candidates:
f.append(sum(c))
return f
my_func = my_decorator(old_my_func)
if __name__ == '__main__':
candidates = [[random.randint(0, 9) for _ in range(5)] for _ in range(10)]
pool = multiprocessing.Pool(processes=4)
results = [pool.apply_async(my_func, ([c], {})) for c in candidates]
pool.close()
f = [r.get()[0] for r in results]
print(f)
You have observed that the decorator function works when the class does not. I believe this is because functools.wraps
modifies the decorated function so that it has the name and other properties of the function it wraps. As far as the pickle module can tell, it is indistinguishable from a normal top-level function, so it pickles it by storing its name. Upon unpickling, the name is bound to the decorated function so everything works out.
I also had some problem using decorators in multiprocessing. I'm not sure if it's the same problem as yours:
My code looked like this:
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
@decorate_func
def actual_func(x):
return x ** 2
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(actual_func,(2,))
print result.get()
and when I run the code I get this:
Traceback (most recent call last):
File "test.py", line 15, in <module>
print result.get()
File "somedirectory_too_lengthy_to_put_here/lib/python2.7/multiprocessing/pool.py", line 572, in get
raise self._value
cPickle.PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed
I fixed it by defining a new function to wrap the function in the decorator function, instead of using the decorator syntax
from multiprocessing import Pool
def decorate_func(f):
def _decorate_func(*args, **kwargs):
print "I'm decorating"
return f(*args, **kwargs)
return _decorate_func
def actual_func(x):
return x ** 2
def wrapped_func(*args, **kwargs):
return decorate_func(actual_func)(*args, **kwargs)
my_swimming_pool = Pool()
result = my_swimming_pool.apply_async(wrapped_func,(2,))
print result.get()
The code ran perfectly and I got:
I'm decorating
4
I'm not very experienced at Python, but this solution solved my problem for me