Can someone explain in detail what "set -m" does?
Quoting the bash documentation (from man bash
):
JOB CONTROL
Job control refers to the ability to selectively stop
(suspend) the execution of processes and continue (resume)
their execution at a later point. A user typically employs
this facility via an interactive interface supplied jointly
by the operating system kernel's terminal driver and bash.
So, quite simply said, having set -m
(the default for
interactive shells) allows one to use built-ins such as fg
and bg
,
which would be disabled under set +m
(the default for non-interactive shells).
It's not obvious to me what the connection is between job control and
killing background processes on exit, however, but I can confirm that
there is one: running set -m; (sleep 10 ; touch control-on) &
will
create the file if one quits the shell right after typing that
command, but set +m; (sleep 10 ; touch control-off) &
will not.
I think the answer lies in the rest of the documentation for set -m
:
-m Monitor mode. [...] Background pro‐
cesses run in a separate process group and a line con‐
taining their exit status is printed upon their comple‐
tion.
This means that background jobs started under set +m
are not actual
"background processes" ("Background processes are those whose process
group ID differs from the terminal's"): they share the same process
group ID as the shell that started them, rather than having their own
process group like proper background processes. This explains the
behavior observed when the shell quits before some of its background
jobs: if I understand correctly, when quitting, a signal is sent to
the processes in the same process group as the shell (thus killing
background jobs started under set +m
), but not to those of other
process groups (thus leaving alone true background processes started
under set -m
).
So, in your case, the startup.sh
script presumably starts a
background job. When this script is run non-interactively, such as
over SSH as in the question you linked to, job control is disabled,
the "background" job shares the process group of the remote shell, and
is thus killed as soon that shell exits. Conversely, by enabling job
control in that shell, the background job acquires its own process
group, and isn't killed when its parent shell exits.
I'v found this at github issue list, and I think this really answer your question.
It's not really a SSH problem, it's more the subtle behaviour around BASH non-interactive/interactive modes and signal propagation to process groups.
Following is based on https://stackoverflow.com/questions/14679178/why-does-ssh-wait-for-my-subshells-without-t-and-kill-them-with-t/14866774#14866774 and http://www.itp.uzh.ch/~dpotter/howto/daemonize, with some assumptions not fully validated, but tests about how this works seem to confirm.
pty/tty = false
The bash shell launched connects to the stdout/stderr/stdin of the started process and is kept running until there is nothing attached to the sockets and it's children have exited. A good deamon process will ensure it doesn't wait for it's children to exit, fork a child process and then exit. When in this mode no SIGHUP will be sent to the child process by SSH. I believe this will work correctly for most scripts executing a process that handles deamonizing itself and doesn't need to be backgrounded. Where init scripts use '&' to background a process then it's likely that the main problem will be whether the backgrounded process ever attempts to read from stdin since that will trigger a SIGHUP if the session has been terminated.
pty/tty = true*
If the init script backgrounds the process started, the parent BASH shell will return an exit code to the SSH connection, which will in turn look to exit immediately since it isn't waiting on a child process to terminate and isn't blocked on stdout/stderr/stdin. This will cause a SIGHUP to be sent to the parent bash shell process group, which since job control is disabled in non-interactive mode in bash, will include the child processes just launched. Where a daemon process explicitly starts a new process session when forking or in the forked process then it or it's children won't receive the SIGHUP from the BASH parent process exiting. Note this is different from suspended jobs which will see a SIGTERM. I suspect the problems around this only working sometimes has to do with a slight race condition. If you look at the standard approach to deamonizing - http://www.itp.uzh.ch/~dpotter/howto/daemonize, you'll see that in the code the new session is created by the forked process which may not be run before the parent exits, thus resulting the random sucess/failure behaviour mentioned above. A sleep statement will allow enough time for the forked process to have created a new session, which is why it works for some cases.
pty/tty = true and job control is explicitly enabled in bash
SSH won't connect to the stdout/stderr/stdin of the bash shell or any launched child processes, which will mean it will exit as soon as the parent bash shell started finished executing the requested commands. In this case, with job control explicitly enabled, any processes launched by the bash shell with '&' to background them will be placed into a separate session immediately and will not receive the SIGHUP signal when the the parent process to the BASH session exits (SSH connection in this case).
What's needed to fix
I think the solutions just need to be explicitly mentioned in the run/sudo operations documentation as a special case when working with background processes/services. Basically either use 'pty=false', or where that is not possible, explicitly enable job control as the first command, and the behaviour will be correct.
From https://github.com/fabric/fabric/issues/395