Jupyter notebook never finishes processing using multiprocessing (Python 3)
To execute a function without having to write it into a separated file manually:
We can dynamically write the task to process into a temporary file, import it and execute the function.
from multiprocessing import Pool
from functools import partial
import inspect
def parallel_task(func, iterable, *params):
with open(f'./tmp_func.py', 'w') as file:
file.write(inspect.getsource(func).replace(func.__name__, "task"))
from tmp_func import task
if __name__ == '__main__':
func = partial(task, params)
pool = Pool(processes=8)
res = pool.map(func, iterable)
pool.close()
return res
else:
raise "Not in Jupyter Notebook"
We can then simply call it in a notebook cell like this:
def long_running_task(params, id):
# Heavy job here
return params, id
data_list = range(8)
for res in parallel_task(long_running_task, data_list, "a", 1, "b"):
print(res)
Ouput:
('a', 1, 'b') 0
('a', 1, 'b') 1
('a', 1, 'b') 2
('a', 1, 'b') 3
('a', 1, 'b') 4
('a', 1, 'b') 5
('a', 1, 'b') 6
('a', 1, 'b') 7
Note: If you're using Anaconda and if you want to see the progress of the heavy task, you can use print()
inside long_running_task()
. The content of the print will be displayed in the Anaconda Prompt console.
It seems that the problem in Jupyter notebook as in different ide is the design feature. Therefore, we have to write the function (prime_factor) into a different file and import the module. Furthermore, we have to take care of the adjustments. For example, in my case, I have coded the function into a file known as defs.py
def prime_factor(value):
factors = []
for divisor in range(2, value-1):
quotient, remainder = divmod(value, divisor)
if not remainder:
factors.extend(prime_factor(divisor))
factors.extend(prime_factor(quotient))
break
else:
factors = [value]
return factors
Then in the jupyter notebook I wrote the following lines
import multiprocessing
import random
from multiprocessing import Pool
import defs
if __name__ == '__main__':
pool = Pool()
to_factor = [ random.randint(100000, 50000000) for i in range(20)]
results = pool.map(defs.prime_factor, to_factor)
for value, factors in zip(to_factor, results):
print("The factors of {} are {}".format(value, factors))
This solved my problem