Why are multiprocessing.sharedctypes assignments so slow?

Just put a numpy array around the shared array:

import numpy as np
import multiprocessing as mp

sh = mp.RawArray('i', int(1e8))
x = np.arange(1e8, dtype=np.int32)
sh_np = np.ctypeslib.as_array(sh)

then time:

%time sh[:] = x
CPU times: user 10.1 s, sys: 132 ms, total: 10.3 s
Wall time: 10.2 s

%time memoryview(sh).cast('B').cast('i')[:] = x
CPU times: user 64 ms, sys: 132 ms, total: 196 ms
Wall time: 196 ms

%time sh_np[:] = x
CPU times: user 92 ms, sys: 104 ms, total: 196 ms
Wall time: 196 ms

No need to figure out how to cast the memoryview (as I had to in python3 Ubuntu 16) and mess with reshaping (if x has more dimensions, since cast() flattens). And use sh_np.dtype.name to double check data types just like any numpy array. :)

This is slow for the reasons given in your second link, and the solution is actually pretty simple: Bypass the (slow) RawArray slice assignment code, which in this case is inefficiently reading one raw C value at a time from the source array to create a Python object, then converts it straight back to raw C for storage in the shared array, then discards the temporary Python object, and repeats 1e8 times.

But you don't need to do it that way; like most C level things, RawArray implements the buffer protocol, which means you can convert it to a memoryview, a view of the underlying raw memory that implements most operations in C-like ways, using raw memory operations if possible. So instead of doing:

# assign memory, very slow
%time temp[:] = np.arange(1e8, dtype = np.uint16)
Wall time: 9.75 s  # Updated to what my machine took, for valid comparison

use memoryview to manipulate it as a raw bytes-like object and assign that way (np.arange already implements the buffer protocol, and memoryview's slice assignment operator seamlessly uses it):

# C-like memcpy effectively, very fast
%time memoryview(temp)[:] = np.arange(1e8, dtype = np.uint16)
Wall time: 74.4 ms  # Takes 0.76% of original time!!!

Note, the time for the latter is milliseconds, not seconds; copying using memoryview wrapping to perform raw memory transfers takes less than 1% of the time to do it the plodding way RawArray does it by default!

Why are multiprocessing.sharedctypes assignments so slow?

Tags:

Python

Multiprocessing

Shared Memory

Related

Recent Posts