Parallelism in Python
Generally, you describe a CPU bound calculation. This is not Python's forte. Neither, historically, is multiprocessing.
Threading in the mainstream Python interpreter has been ruled by a dreaded global lock. The new multiprocessing API works around that and gives a worker pool abstraction with pipes and queues and such.
You can write your performance critical code in C or Cython, and use Python for the glue.
The new (2.6) multiprocessing module is the way to go. It uses subprocesses, which gets around the GIL problem. It also abstracts away some of the local/remote issues, so the choice of running your code locally or spread out over a cluster can be made later. The documentation I've linked above is a fair bit to chew on, but should provide a good basis to get started.
Ray is an elegant (and fast) library for doing this.
The most basic strategy for parallelizing Python functions is to declare a function with the @ray.remote
decorator. Then it can be invoked asynchronously.
import ray
import time
# Start the Ray processes (e.g., a scheduler and shared-memory object store).
ray.init(num_cpus=8)
@ray.remote
def f():
time.sleep(1)
# This should take one second assuming you have at least 4 cores.
ray.get([f.remote() for _ in range(4)])
You can also parallelize stateful computation using actors, again by using the @ray.remote
decorator.
# This assumes you already ran 'import ray' and 'ray.init()'.
import time
@ray.remote
class Counter(object):
def __init__(self):
self.x = 0
def inc(self):
self.x += 1
def get_counter(self):
return self.x
# Create two actors which will operate in parallel.
counter1 = Counter.remote()
counter2 = Counter.remote()
@ray.remote
def update_counters(counter1, counter2):
for _ in range(1000):
time.sleep(0.25)
counter1.inc.remote()
counter2.inc.remote()
# Start three tasks that update the counters in the background also in parallel.
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
update_counters.remote(counter1, counter2)
# Check the counter values.
for _ in range(5):
counter1_val = ray.get(counter1.get_counter.remote())
counter2_val = ray.get(counter2.get_counter.remote())
print("Counter1: {}, Counter2: {}".format(counter1_val, counter2_val))
time.sleep(1)
It has a number of advantages over the multiprocessing module:
- The same code runs on a single multi-core machine as well as a large cluster.
- Data is shared efficiently between processes on the same machine using shared memory and efficient serialization.
- You can parallelize Python functions (using tasks) and Python classes (using actors).
- Error messages are propagated nicely.
Ray is a framework I've been helping develop.