Difference in behavior between os.fork and multiprocessing.Process
The answer you are looking for is in detail addressed here. There is also an explanation of differences between different OS.
One big issue is that the fork
system call does not exist on Windows. Therefore, when running a Windows OS you cannot use this method. multiprocessing
is a higher-level interface to execute a part of the currently running program. Therefore, it - as forking does - creates a copy of your process current state. That is to say, it takes care of the forking of your program for you.
Therefore, if available you could consider fork()
a lower-level interface to forking a program, and the multiprocessing
library to be a higher-level interface to forking.
To answer your question directly, there must be some side effect of external_process
that makes it so that when the code is run in series, you get different results than if you run them at the same time. This is due to how you set up your code, and the lack of differences between os.fork
and multiprocessing.Process
in systems that os.fork
is supported.
The only real difference between the os.fork
and multiprocessing.Process
is portability and library overhead, since os.fork
is not supported in windows, and the multiprocessing
framework is included to make multiprocessing.Process
work. This is because os.fork
is called by multiprocessing.Process
, as this answer backs up.
The important distinction, then, is os.fork
copies everything in the current process using Unix's forking, which means at the time of forking both processes are the same with PID differences. In Window's, this is emulated by rerunning all the setup code before the if __name__ == '__main__':
, which is roughly the same as creating a subprocess using the subprocess
library.
For you, the code snippets you provide are doing fairly different things above, because you call external_function
in main before you open the new process in the second code clip, making the two processes run in series but in different processes. Also the pipe is unnecessary, as it emulates no functionality from the first code.
In Unix, the code snippets:
import os
pid = os.fork()
if pid == 0:
os.environ['HOME'] = "rep1"
external_function()
else:
os.environ['HOME'] = "rep2"
external_function()
and:
import os
from multiprocessing import Process
def f():
os.environ['HOME'] = "rep1"
external_function()
if __name__ == '__main__':
p = Process(target=f)
p.start()
os.environ['HOME'] = "rep2"
external_function()
p.join()
should do exactly the same thing, but with a little extra overhead from the included multiprocessing library.
Without further information, we can't figure out what the issue is. If you can provide code that demonstrates the issue, that would help us help you.