Why doesn't SSH -t wait for background processes?
Without -t
, sshd
gets the stdout of the remote shell (and children like sleep
) and stderr via two pipes (and also sends the client's input via another pipe).
sshd
does wait for the process in which it has started the user's login shell, but also, after that process has terminated waits for eof on the stdout pipe (not the stderr pipe in the case of openssh at least).
And eof happens when there's no file descriptor by any process open on the writing end of the pipe, which typically only happens when all the processes that didn't have their stdout redirected to something else are gone.
When you use -t
, sshd
doesn't use pipes. Instead, all the interaction (stdin, stdout, stderr) with the remote shell and its children are done using one pseudo-terminal pair.
With a pseudo-terminal pair, for sshd
interacting with the master side, there's no similar eof handling and while at least some systems provide alternative ways to know if there are still processes with fds open to the slave side of the pseudo-terminal (see @JdeBP comment below), sshd
doesn't use them, so it just waits for the termination of the process in which it executed the login shell of the remote user and then exits.
Upon that exit, the master side of the pty pair is closed which means the pty is destroyed, so processes controlled by the slave will receive a SIGHUP (which by default would terminate them).
Edit: that last part was incorrect, though the end result is the same. See @pynexj's answer for a correct description of what exactly happens.
Use wait
:
ssh user@example -t 'sleep 2 & wait'
(Moved comments here to include more information.)
The SIGHUP
part in the accepted answer is not correct.
Upon that exit, the master side of the pty pair is closed which means the pty is destroyed, so processes controlled by the slave will receive a SIGHUP.
This is not the case. According to POSIX, "If a modem disconnect is detected by the terminal interface for a controlling terminal [...] the SIGHUP
signal shall be sent to the controlling process." For ssh -t 'sleep 2 &'
, it's the controlling process exiting which causes the tty disconnect so SIGHUP cannot be sent to the controlling process since it's already dead. The sleep
is killed by SIGHUP
is actually because when the session leader exits, "the SIGHUP
signal shall be sent to each process in the foreground process group".
The confusing part is in sleep 2 &
. Yes it's a command running in background but it's not part of a background process group. Background process group is related to job control which is by default disabled in non-interactive shell (as in ssh ... 'sleep 2 &'
). Actually the sleep 2 &
is running in the foreground process group. For example:
$ ssh -t localhost 'sleep 2 & ps jt'
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
88819 88825 88825 88825 pts/36 88825 Ss+ 0 0:00 bash -c sleep 2 & ps jt
88825 88826 88825 88825 pts/36 88825 S+ 0 0:00 sleep 2
88825 88827 88825 88825 pts/36 88825 R+ 0 0:00 ps jt
As we can see, all the processes' PGID (88825) is the same as PID of the bash shell and TPGID is also 88825. That's to say the background process sleep 2 &
is also in this foreground process group.
For comparison, see
$ pgrep -af sleep
$ ssh -t localhost 'set -m; sleep 123 & ps jt'
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
89002 89008 89008 89008 pts/3 89010 Ss 0 0:00 bash -c set -m; sleep 123 & ps jt
89008 89009 89009 89008 pts/3 89010 S 0 0:00 sleep 123
89008 89010 89010 89008 pts/3 89010 R+ 0 0:00 ps jt
Connection to localhost closed.
$ ps j 89009
PPID PID PGID SID TTY TPGID STAT UID TIME COMMAND
1 89009 89009 89008 ? -1 S 0 0:00 sleep 123
$
As we can see, with job control enabled (set -m
), sleep 2 &
is running in its own process group (PGID 89009) which is a background process group. And after ssh
terminates, the sleep
is still running.
(See a similar scenario for more discussion: Expect + "ssh -f" does not work)