What's the difference between python's multiprocessing and concurrent.futures?
You actually should use the if __name__ == "__main__"
guard with ProcessPoolExecutor
, too: It's using multiprocessing.Process
to populate its Pool
under the covers, just like multiprocessing.Pool
does, so all the same caveats regarding picklability (especially on Windows), etc. apply.
I believe that ProcessPoolExecutor
is meant to eventually replace multiprocessing.Pool
, according to this statement made by Jesse Noller (a Python core contributor), when asked why Python has both APIs:
Brian and I need to work on the consolidation we intend(ed) to occur as people got comfortable with the APIs. My eventual goal is to remove anything but the basic multiprocessing.Process/Queue stuff out of MP and into concurrent.* and support threading backends for it.
For now, ProcessPoolExecutor
is mostly doing the exact same thing as multiprocessing.Pool
with a simpler (and more limited) API. If you can get away with using ProcessPoolExecutor
, use that, because I think it's more likely to get enhancements in the long-term. Note that you can use all the helpers from multiprocessing
with ProcessPoolExecutor
, like Lock
, Queue
, Manager
, etc., so needing those isn't a reason to use multiprocessing.Pool
.
There are some notable differences in their APIs and behavior though:
If a Process in a
ProcessPoolExecutor
terminates abruptly, aBrokenProcessPool
exception is raised, aborting any calls waiting for the pool to do work, and preventing new work from being submitted. If the same thing happens to amultiprocessing.Pool
it will silently replace the process that terminated, but the work that was being done in that process will never be completed, which will likely cause the calling code to hang forever waiting for the work to finish.If you are running Python 3.6 or lower, support for
initializer
/initargs
is missing fromProcessPoolExecutor
. Support for this was only added in 3.7).There is no support in
ProcessPoolExecutor
formaxtasksperchild
.concurrent.futures
doesn't exist in Python 2.7, unless you manually install the backport.If you're running below Python 3.5, according to this question,
multiprocessing.Pool.map
outperformsProcessPoolExecutor.map
. Note that the performance difference is very small per work item, so you'll probably only notice a large performance difference if you're usingmap
on a very large iterable. The reason for the performance difference is thatmultiprocessing.Pool
will batch the iterable passed to map into chunks, and then pass the chunks to the worker processes, which reduces the overhead of IPC between the parent and children.ProcessPoolExecutor
always (or by default, starting in 3.5) passes one item from the iterable at a time to the children, which can lead to much slower performance with large iterables, due to the increased IPC overhead. The good news is this issue is fixed in Python 3.5, as thechunksize
keyword argument has been added toProcessPoolExecutor.map
, which can be used to specify a larger chunk size when you know you're dealing with large iterables. See this bug for more info.
if __name__ == '__main__':
just means that you invoked the script on the command prompt using python <scriptname.py> [options]
instead of import <scriptname>
in the python shell.
When you invoke a script from the command prompt, the __main__
method gets called. In the second block, the
with ProcessPoolExecutor() as executor:
result = executor.map(calculate, range(4))
block is executed regardless of whether it was invoked from the command prompt or imported from the shell.