How can a log program continue to log to a deleted file?
When you delete a file you really remove a link to the file (to the inode). If someone already has that file open, they get to keep the file descriptor they have. The file remains on disk, taking up space, and can be written to and read from if you have access to it.
The unlink
function is defined with this behaviour by POSIX:
When the file's link count becomes 0 and no process has the file open, the space occupied by the file shall be freed and the file shall no longer be accessible. If one or more processes have the file open when the last link is removed, the link shall be removed before unlink() returns, but the removal of the file contents shall be postponed until all references to the file are closed.
This piece of advice because of that behaviour. The daemon will have the file open, and won't notice that it has been deleted (unless it was monitoring it specifically, which is uncommon). It will keep blithely writing to the existing file descriptor it has: you'll keep taking up (more) space on disk, but you won't be able to see any of the messages it writes, so you're really in the worst of both worlds. If you truncate the file to zero length instead then the space is freed up immediately, and any new messages will be appended at the new end of the file where you can see them.
Eventually, when the daemon terminates or close
s the file, the space will be freed up. Nobody new can open the file in the mean time (other than through system-specific reflective interfaces like Linux's /proc/x/fd/...
). It's also guaranteed that:
If the link count of the file is 0, when all file descriptors associated with the file are closed, the space occupied by the file shall be freed and the file shall no longer be accessible.
So you don't lose your disk space permanently, but you don't gain anything by deleting the file and you lose access to new messages.
Exactly.
Files are tri-partite.
- The content, that is, a flat array of bytes, written somewhere on a disk or generated on-the-fly.
- The index node, or inode for short, which is a data structure populated and used by the kernel. It contains all the metadata (size, permission, etc.) about the file, and also pointers to the location of the content of the file.
- One or more directory entries, which are locations, manipulated as paths like
/home/user/personal_file
, which act as handles through which you can use the file, modify its content, change its metadata, etc.
When you open a file, you give the path to the operating system and it returns you a handle directly to the inode. With this handle, called a file descriptor, you can manipulate the file as you want (or at least, as permitted by the OS).
You can never delete directly an inode, you have to give a path to the OS to require deletion. So, when you want to delete a file, you delete only the directory entry. If the file has other directory entries, it will continue to be accessible, and even if it has not, its inode will not be deleted while there are still file descriptors pointing to it. @MichaelHomer's answer is more technical and more detailed on this specific topic.
The other 2 answers explain the issue well - a file doesn't get "deleted" until all directory links to it and all open file descriptors to it are gone.
To avoid this, it's a good habit to use
> /var/log/bigfile
instead of
rm -f /var/log/bigfile
since that just resets the content to 0 bytes instead of deleting it, and you can still see what's written to it.
If you deleted the file, and are on linux where you have a /proc/fd filesystem, you can still use
> /proc/12345/fd/3
to zero the file's contents (assuming 12345 is your process id and 3 is the fd number of the big file). This can be a life saver if your disk is running full and you can't kill the process that's writing your log file for some reason.