Why is it important to protect the main loop when using joblib.Parallel?
This is necessary because Windows doesn't have fork()
. Because of this limitation, Windows needs to re-import your __main__
module in all the child processes it spawns, in order to re-create the parent's state in the child. This means that if you have the code that spawns the new process at the module-level, it's going to be recursively executed in all the child processes. The if __name__ == "__main__"
guard is used to prevent code at the module scope from being re-executed in the child processes.
This isn't necessary on Linux because it does have fork()
, which allows it to fork a child process that maintains the same state of the parent, without re-importing the __main__
module.
In case someone stumbles across this in 2021: Due to the new backend "loky" used by joblib>0.12 protecting the main for loop is no longer required. See https://joblib.readthedocs.io/en/latest/parallel.html