Why am I missing /var/run/sshd after every boot?
Solution 1:
One mistake you did was trying to start sshd
by hand.
If you instead start sshd
through official means it should just work. The service
command knows what the correct way to start a service on your distribution is, and this should work:
service ssh start
In case of sysv init scripts, that's everything you need to do. The reason the directory is missing is that /var/run
is a symlink to /run
and /run
is a tmpfs
mount point. That means on each boot /var/run
will start out empty. When you use the service
command the /etc/init.d/ssh
script will be used to start sshd
but before doing that the script will create /var/run/sshd
if it doesn't exist.
With systemd
things work a bit differently. There will be a file called /usr/lib/tmpfiles.d/sshd.conf
with this content:
d /var/run/sshd 0755 root root
During boot this should cause the /var/run/sshd
directory to be created. What you need to verify that the file exists and has the correct contents. If the /var/run/sshd
directory is still missing you can verify if it gets created when you run systemd-tmpfiles --create
manually.
Solution 2:
So /run (and /var/run symlinked to it) gets recreated every reboot. Except that systemd-tmpfiles isn't doing that for some files including (/var)/run/sshd.
Apparently, this is fixed by a OpenVZ kernel upgrade. But to actually fix it now you edit /usr/lib/tmpfiles.d/sshd.conf
and remove /var
from the line d /var/run/sshd 0755 root root
to read instead:
d /run/sshd 0755 root root
And that's it..!
And when openssh-server gets upgraded, we hope that they will have fixed this bug (or is it really a bug in systemd? or openvz??) -- otherwise you could run into the same problem.
Solution 3:
Apparently this gets resolved when running an OpenVZ kernel 2.6.32-042stab134.7 or newer. I find it strange that there is no fix possible in the systemd start scripts somehow. Probably an ugly hack like automatically creating /run/sshd/ after starting up and then starting sshd would work.
The output of my systemd-tmpfiles --create
:
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
fchownat() of /run/named failed: Invalid argument
Failed to openat(/dev/simfs): Operation not permitted
Failed to validate path /var/run/screen: Too many levels of symbolic links
Failed to validate path /var/run/sshd: Too many levels of symbolic links
Failed to validate path /var/run/sudo: Too many levels of symbolic links
Failed to validate path /var/run/sudo/ts: Too many levels of symbolic links
fchownat() of /run/systemd/netif failed: Invalid argument
fchownat() of /run/systemd/netif/links failed: Invalid argument
fchownat() of /run/systemd/netif/leases failed: Invalid argument
fchownat() of /run/log/journal failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc failed: Invalid argument
fchownat() of /run/log/journal/e9e1d08bc42c48999865b96c250f40cc/system.journal failed: Invalid argument
The changelog of OpenVZ 2.6.32-042stab134.7 says this:
Running Ubuntu containers with systemd 229-4ubuntu21.9 could result in services failing to start because systemd-tmpfiles was unable to validate path due to symlinking issues. (PSBM-90038)
Solution 4:
For as much trouble as I've had with systemd over the years, I must admit this issue stems instead from the Ansible synchronize directive.
For some reason, after provisioning this host with our ansbile scripts, it left the / directory (as well as /etc, /opt and others) owned by an admin user, and not root. After running chown
to correct things, /var/run/sshd
is now created on boot again.
I really appreciate all the input but there is no bug here, at least in the sense that applying inappropriate ownership to root directories caused undefined system behavior.