Jupyter notebook never finishes processing using multiprocessing (Python 3)

To execute a function without having to write it into a separated file manually:

We can dynamically write the task to process into a temporary file, import it and execute the function.

from multiprocessing import Pool
from functools import partial
import inspect

def parallel_task(func, iterable, *params):
    
    with open(f'./tmp_func.py', 'w') as file:
        file.write(inspect.getsource(func).replace(func.__name__, "task"))
        
    from tmp_func import task

    if __name__ == '__main__':
        func = partial(task, params)
        pool = Pool(processes=8)
        res = pool.map(func, iterable)
        pool.close()
        return res
    else:
        raise "Not in Jupyter Notebook"

We can then simply call it in a notebook cell like this:

def long_running_task(params, id):
    # Heavy job here
    return params, id

data_list = range(8)

for res in parallel_task(long_running_task, data_list, "a", 1, "b"):
    print(res)

Ouput:

('a', 1, 'b') 0
('a', 1, 'b') 1
('a', 1, 'b') 2
('a', 1, 'b') 3
('a', 1, 'b') 4
('a', 1, 'b') 5
('a', 1, 'b') 6
('a', 1, 'b') 7

Note: If you're using Anaconda and if you want to see the progress of the heavy task, you can use print() inside long_running_task(). The content of the print will be displayed in the Anaconda Prompt console.

It seems that the problem in Jupyter notebook as in different ide is the design feature. Therefore, we have to write the function (prime_factor) into a different file and import the module. Furthermore, we have to take care of the adjustments. For example, in my case, I have coded the function into a file known as defs.py

def prime_factor(value):
    factors = []
    for divisor in range(2, value-1):
        quotient, remainder = divmod(value, divisor)
        if not remainder:
            factors.extend(prime_factor(divisor))
            factors.extend(prime_factor(quotient))
            break
        else:
            factors = [value]
    return factors

Then in the jupyter notebook I wrote the following lines

import multiprocessing  
import random
from multiprocessing import Pool
import defs



if __name__ == '__main__':
    pool = Pool()
    to_factor = [ random.randint(100000, 50000000) for i in range(20)]
    results = pool.map(defs.prime_factor, to_factor)
    for value, factors in zip(to_factor, results):
        print("The factors of {} are {}".format(value, factors))

This solved my problem

enter image description here

Jupyter notebook never finishes processing using multiprocessing (Python 3)

Tags:

Debugging

Python 3.X

Multiprocessing

Jupyter

Related

Recent Posts