Modify object in python multiprocessing
I don't see you passing shm references out into the child processes so I don't see how work done by them could be written back into the shared memory. Perhaps I'm missing something here.
Alternatively, have you considered numpy.memmap? (BTW: tcaswell, the module referred to here seems to be:numpy-sharedmem).
Also you might want to read Sturla Molden's Using Python, multiprocessing and NumPy/SciPy for parallel numerical computing(PDF) as recommended in unutbu's answer to [StackOverflow:How do I pass large numpy arrays between python subprocesses without saving to disk?] and (How do I pass large numpy arrays between python subprocesses without saving to disk?). and Joe Kington's StackOverflow: NumPy vs. multiprocessing and mmap.
These might be more inspirational than directly relevant.
Your code doesn't try to modify the shared memory. It just clones individual objects.
dtype=object
means that sharedmem
won't work due to reasons outlined in the link provided by @tcaswell:
sharing of object graphs that include references/pointers to other objects is basically unfeasible
For plain (value) types you can use shared memory, see Use numpy array in shared memory for multiprocessing.
The manager
approach should also work (it just copies the objects around):
import random
from multiprocessing import Pool, Manager
class Tester(object):
def __init__(self, num=0.0, name='none'):
self.num = num
self.name = name
def __repr__(self):
return '%s(%r, %r)' % (self.__class__.__name__, self.num, self.name)
def init(L):
global tests
tests = L
def modify(i_t_nn):
i, t, nn = i_t_nn
t.num += random.normalvariate(mu=0, sigma=1) # modify private copy
t.name = nn
tests[i] = t # copy back
return i
def main():
num_processes = num = 10 #note: num_processes and num may differ
manager = Manager()
tests = manager.list([Tester(num=i) for i in range(num)])
print(tests[:2])
args = ((i, t, 'some') for i, t in enumerate(tests))
pool = Pool(processes=num_processes, initializer=init, initargs=(tests,))
for i in pool.imap_unordered(modify, args):
print("done %d" % i)
pool.close()
pool.join()
print(tests[:2])
if __name__ == '__main__':
main()
The problem is that when the objects are passed to the worker processes, they are packed up with pickle, shipped to the other process, where they are unpacked and worked on. Your objects aren't so much passed to the other process, as cloned. You don't return the objects, so the cloned object are happily modified, and then thrown away.
It looks like this can not be done (Python: Possible to share in-memory data between 2 separate processes) directly.
What you can do is return the modified objects.
import numpy as np
import multiprocessing as mp
class Tester:
num = 0.0
name = 'none'
def __init__(self,tnum=num, tname=name):
self.num = tnum
self.name = tname
def __str__(self):
return '%f %s' % (self.num, self.name)
def mod(test, nn, out_queue):
print test.num
test.num = np.random.randn()
print test.num
test.name = nn
out_queue.put(test)
if __name__ == '__main__':
num = 10
out_queue = mp.Queue()
tests = np.empty(num, dtype=object)
for it in range(num):
tests[it] = Tester(tnum=it*1.0)
print '\n'
workers = [ mp.Process(target=mod, args=(test, 'some', out_queue) ) for test in tests ]
for work in workers: work.start()
for work in workers: work.join()
res_lst = []
for j in range(len(workers)):
res_lst.append(out_queue.get())
for test in res_lst: print test
This does lead to the interesting observation that because the spawned processes are identical, they all start with the same seed for the random number, so they all generate the same 'random' number.