Python: How can I check the number of pending tasks in a multiprocessing.Pool?

No airtight way that I know of, but if you use the Pool.imap_unordered() function instead of map_async, you can intercept the elements that are processed.

import multiprocessing
import time

process_count = 4

def mytask(num):
    print('Started task, sleeping %s' % num)
    time.sleep(num)
    # Actually, you should return the job you've created here.
    return num

pool = multiprocess.Pool(process_count)
jobs  = []
items = [1,2,3,4,5,3,2,3,4,5,2,3,2,3,4,5,6,4]
job_count = 0
for job in pool.imap_unordered(mytask, items):
    jobs.append(job)
    job_count += 1

    incomplete = len(items) - job_count
    unsubmitted = max(0, incomplete - process_count)

    print "Jobs incomplete: %s. Unsubmitted: %s" % incomplete, unsubmitted

pool.close()

I'm subtracting process_count, because you can pretty much assume that all processes will be processing with one of two exceptions: 1) if you use an iterator, there may not be further items left to consume and process, and 2) You may have fewer than 4 items left. I didn't code in for the first exception. But it should be pretty easy to do so if you need to. Anyway, your example uses a list so you shouldn't have that problem.

Edit: I also realized you're using a While loop, which makes it look like you're trying to update something periodically, say, every half second or something. The code I gave as an example will not do it that way. I'm not sure if that's a problem.


You can check the number of pending jobs by seeing Pool._cache attribute assuming that you are using apply_async. This is where ApplyResult is stored until they are available and equals to the number of ApplyResults pending.

import multiprocessing as mp
import random
import time


def job():
    time.sleep(random.randint(1,10))
    print("job finished")

if __name__ == '__main__':
    pool = mp.Pool(5)
    for _ in range(10):
        pool.apply_async(job)

    while pool._cache:
        print("number of jobs pending: ", len(pool._cache))
        time.sleep(2)

    pool.close()
    pool.join()

Looks like jobs._number_left is what you want. _ indicates that it is an internal value that may change at the whim of the developers, but it seems to be the only way to get that info.