Why is it important to protect the main loop when using joblib.Parallel?

This is necessary because Windows doesn't have fork(). Because of this limitation, Windows needs to re-import your __main__ module in all the child processes it spawns, in order to re-create the parent's state in the child. This means that if you have the code that spawns the new process at the module-level, it's going to be recursively executed in all the child processes. The if __name__ == "__main__" guard is used to prevent code at the module scope from being re-executed in the child processes.

This isn't necessary on Linux because it does have fork(), which allows it to fork a child process that maintains the same state of the parent, without re-importing the __main__ module.


In case someone stumbles across this in 2021: Due to the new backend "loky" used by joblib>0.12 protecting the main for loop is no longer required. See https://joblib.readthedocs.io/en/latest/parallel.html