multiprocessing.Pool - PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
multiprocessing passes tasks (which include check_one
and data
) to the worker processes through a mp.SimpleQueue
. Unlike Queue.Queue
s, everything put in the mp.SimpleQueue
must be pickable. Queue.Queue
s are not pickable:
import multiprocessing as mp
import Queue
def foo(queue):
pass
pool=mp.Pool()
q=Queue.Queue()
pool.map(foo,(q,))
yields this exception:
UnpickleableError: Cannot pickle <type 'thread.lock'> objects
Your data
includes packages
, which is a Queue.Queue. That might be the source of the problem.
Here is a possible workaround: The Queue
is being used for two purposes:
- to find out the approximate size (by calling
qsize
) - to store results for later retrieval.
Instead of calling qsize
, to share a value between multiple processes, we could use a mp.Value
.
Instead of storing results in a queue, we can (and should) just return values from calls to check_one
. The pool.map
collects the results in a queue of its own making, and returns the results as the return value of pool.map
.
For example:
import multiprocessing as mp
import Queue
import random
import logging
# logger=mp.log_to_stderr(logging.DEBUG)
logger = logging.getLogger(__name__)
qsize = mp.Value('i', 1)
def check_one(args):
total, package, version = args
i = qsize.value
logger.info('\r[{0:.1%} - {1}, {2} / {3}]'.format(
i / float(total), package, i, total))
new_version = random.randrange(0,100)
qsize.value += 1
if new_version > version:
return (package, version, new_version, None)
else:
return None
def update():
logger.info('Searching for updates')
set_len=10
data = ( (set_len, 'project-{0}'.format(i), random.randrange(0,100))
for i in range(set_len) )
pool = mp.Pool()
results = pool.map(check_one, data)
pool.close()
pool.join()
for result in results:
if result is None: continue
package, version, new_version, json = result
txt = 'A new release is avaiable for {0}: {1!s} (old {2}), update'.format(
package, new_version, version)
logger.info(txt)
logger.info('Updating finished successfully')
if __name__=='__main__':
logging.basicConfig(level=logging.DEBUG)
update()
After a lot of digging on a similar issue...
It also turns out that ANY object that happens to contain a threading.Condition() object will NEVER NEVER work with multiprocessing.Pool.
Here is an example
import multiprocessing as mp
import threading
class MyClass(object):
def __init__(self):
self.cond = threading.Condition()
def foo(mc):
pass
pool=mp.Pool()
mc=MyClass()
pool.map(foo,(mc,))
I'm ran this with Python 2.7.5 and hit the same error:
Exception in thread Thread-2:
Traceback (most recent call last):
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
self.run()
File "/usr/lib64/python2.7/threading.py", line 764, in run
self.__target(*self.__args, **self.__kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 342, in _handle_tasks
put(task)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
But then ran it on python 3.4.1 and this issue has been fixed.
Although I haven't come across any useful workarounds yet for those of us still on 2.7.x.