Python: How can I check the number of pending tasks in a multiprocessing.Pool?
No airtight way that I know of, but if you use the Pool.imap_unordered()
function instead of map_async, you can intercept the elements that are processed.
import multiprocessing
import time
process_count = 4
def mytask(num):
print('Started task, sleeping %s' % num)
time.sleep(num)
# Actually, you should return the job you've created here.
return num
pool = multiprocess.Pool(process_count)
jobs = []
items = [1,2,3,4,5,3,2,3,4,5,2,3,2,3,4,5,6,4]
job_count = 0
for job in pool.imap_unordered(mytask, items):
jobs.append(job)
job_count += 1
incomplete = len(items) - job_count
unsubmitted = max(0, incomplete - process_count)
print "Jobs incomplete: %s. Unsubmitted: %s" % incomplete, unsubmitted
pool.close()
I'm subtracting process_count
, because you can pretty much assume that all processes will be processing with one of two exceptions: 1) if you use an iterator, there may not be further items left to consume and process, and 2) You may have fewer than 4 items left. I didn't code in for the first exception. But it should be pretty easy to do so if you need to. Anyway, your example uses a list so you shouldn't have that problem.
Edit: I also realized you're using a While loop, which makes it look like you're trying to update something periodically, say, every half second or something. The code I gave as an example will not do it that way. I'm not sure if that's a problem.
You can check the number of pending jobs by seeing Pool._cache
attribute assuming that you are using apply_async
. This is where ApplyResult
is stored until they are available and equals to the number of ApplyResult
s pending.
import multiprocessing as mp
import random
import time
def job():
time.sleep(random.randint(1,10))
print("job finished")
if __name__ == '__main__':
pool = mp.Pool(5)
for _ in range(10):
pool.apply_async(job)
while pool._cache:
print("number of jobs pending: ", len(pool._cache))
time.sleep(2)
pool.close()
pool.join()
Looks like jobs._number_left
is what you want. _
indicates that it is an internal value that may change at the whim of the developers, but it seems to be the only way to get that info.