What is the NSFS filesystem?
As described in the kernel commit log linked to by jiliagre above, the nsfs
filesystem is a virtual filesystem making Linux-kernel namespaces available. It is separate from the /proc
"proc" filesystem, where some process directory entries reference inodes in the nsfs
filesystem in order to show which namespaces a certain process (or thread) is currently using.
The nsfs
doesn't get listed in /proc/filesystems
(while proc
does), so it cannot be explicitly mounted. mount -t nsfs ./namespaces
fails with "unknown filesystem type". This is, as nsfs
as it is tightly interwoven with the proc
filesystem.
The filesystem type nsfs
only becomes visible via /proc/$PID/mountinfo
when bind-mounting an existing(!) namespace filesystem link to another target. As Stephen Kitt rightly suggests above, this is to keep namespaces existing even if no process is using them anymore.
For example, create a new user namespace with a new network namespace, then bind-mount it, then exit: the namespace still exists, but lsns
won't find it, since it's not listed in /proc/$PID/ns
anymore, but exists as a (bind) mount point.
# bind mount only needs an inode, not necessarily a directory ;)
touch mynetns
# create new network namespace, show its id and then bind-mount it, so it
# is kept existing after the unshare'd bash has terminated.
# output: net:[##########]
NS=$(sudo unshare -n bash -c "readlink /proc/self/ns/net && mount --bind /proc/self/ns/net mynetns") && echo $NS
# notice how lsns cannot see this namespace anymore: no match!
lsns -t net | grep ${NS:5:-1} || echo "lsns: no match for net:[${NS:5:-1}]"
# however, findmnt does locate it on the nsfs...
findmnt -t nsfs | grep ${NS:5:-1} || echo "no match for net:[${NS:5:-1}]"
# output: /home/.../mynetns nsfs[net:[##########]] nsfs rw
# let the namespace go...
echo "unbinding + releasing network namespace"
sudo umount mynetns
findmnt -t nsfs | grep ${NS:5:-1} || echo "findmnt: no match for net:[${NS:5:-1}]"
# clean up
rm mynetns
Output should be similar to this one:
net:[4026532992]
lsns: no match for net:[4026532992]
/home/.../mynetns nsfs[net:[4026532992]] nsfs rw
unbinding + releasing network namespace
findmnt: no match for net:[4026532992]
Please note that it is not possible to create namespaces via the nsfs filesystem, only via the syscalls clone() (CLONE_NEW...
) and unshare. The nsfs
only reflects the current kernel status w.r.t. namespaces, but it cannot create or destroy them.
Namespaces automatically get destroyed whenever there isn't any reference to them left, no processes (so no /proc/$PID/ns/...
) AND no bind-mounts either, as we've explored in the above example.
That's the "Name Space File System", used by the setns
system call and, as its source code shows, Name Space related ioctl's (e.g. NS_GET_USERNS
, NS_GET_OWNER_UID
...)
NSFS
pseudo-files entries used to be provided by the /proc
file system until Linux 3.19. Here is the commit of this change.
See Stephen Kitt's comment about a possible explanation about this files presence.