How to parallelize list-comprehension calculations in Python?
On automatical parallelisation of list comprehension
IMHO, effective automatic parallisation of list comprehension would be impossible without additional information (such as those provided using directives in OpenMP), or limiting it to expressions that involve only built-in types/methods.
Unless there is a guarantee that the processing done on each list item has no side effects, there is a possibility that the results will be invalid (or at least different) if done out of order.
# Artificial example
counter = 0
def g(x): # func with side-effect
global counter
counter = counter + 1
return x + counter
vals = [g(i) for i in range(100)] # diff result when not done in order
There is also the issue of task distribution. How should the problem space be decomposed?
If the processing of each element forms a task (~ task farm), then when there are many elements each involving trivial calculation, the overheads of managing the tasks will swamps out the performance gains of parallelisation.
One could also take the data decomposition approach where the problem space is divided equally among the available processes.
The fact that list comprehension also works with generators makes this slightly tricky, however this is probably not a show stopper if the overheads of pre-iterating it is acceptable. Of course, there is also a possibility of generators with side-effects which can change the outcome if subsequent items are prematurely iterated. Very unlikely, but possible.
A bigger concern would be load imbalance across processes. There is no guarantee that each element would take the same amount of time to process, so statically partitioned data may result in one process doing most of the work while the idle your time away.
Breaking the list down to smaller chunks and handing them as each child process is available is a good compromise, however, a good selection of chunk size would be application dependent hence not doable without more information from the user.
Alternatives
As mentioned in several other answers, there are many approaches and parallel computing modules/frameworks to choose from depending on one requirements.
Having used only MPI (in C) with no experience using Python for parallel processing, I am not in a position to vouch for any (although, upon a quick scan through, multiprocessing, jug, pp and pyro stand out).
If a requirement is to stick as close as possible to list comprehension, then jug seems to be the closest match. From the tutorial, distributing tasks across multiple instances can be as simple as:
from jug.task import Task
from yourmodule import process_data
tasks = [Task(process_data,infile) for infile in glob('*.dat')]
While that does something similar to multiprocessing.Pool.map()
, jug
can use different backends for synchronising process and storing intermediate results (redis, filesystem, in-memory) which means the processes can span across nodes in a cluster.
As Ken said, it can't, but with 2.6's multiprocessing module, it's pretty easy to parallelize computations.
import multiprocessing
try:
cpus = multiprocessing.cpu_count()
except NotImplementedError:
cpus = 2 # arbitrary default
def square(n):
return n * n
pool = multiprocessing.Pool(processes=cpus)
print(pool.map(square, range(1000)))
There are also examples in the documentation that show how to do this using Managers, which should allow for distributed computations as well.
For shared-memory parallelism, I recommend joblib:
from joblib import delayed, Parallel
def square(x): return x*x
values = Parallel(n_jobs=NUM_CPUS)(delayed(square)(x) for x in range(1000))