Why do open file handles to deleted files seemingly fill up the hard drive
Deleting a file in Unix simply removes a named reference to its data (hence the syscall name unlink
/unlinkat
, rather than delete
). In order for the data itself to be freed, there must be no other references to it. References can be taken in a few ways:
- There must be no further references to this data on the filesystem (
st_nlink
must be 0) -- this can happen when hard linking. Otherwise, we'd drop the data while there's still a way to access it from the filesystem. - There must be no further references to this data from open file handles (on Linux, the relevant
struct file
'sf_count
in the kernel must be 0). Otherwise, the data could still be accessed or mutated by reading or writing to the file handle (or/proc/pid/fd
on Linux), and we need somewhere to continue to store it.
Once both of these conditions are fulfilled, the data is eligible to be freed. As your case violates condition #2 -- you still have open file handles -- the data continued to be stored on disk (since it has nowhere else to go) until the file handle was closed.
Some programs even use this in order to simplify cleaning up their data. For example, imagine a program which needs to have some large data stored on disk for intermediate work, but doesn't need to share it with others. If it opens and then immediately deletes that file, it can use it without having to worry about making sure they clean up on exit -- the open file descriptor reference count will naturally drop to 0 on close(fd)
or exit, and the relevant space will be freed whether the program exits normally or not.
Detection
Deleted files which are still being held open by a file descriptor can be found with lsof
, using something like the following:
% lsof -nP +L1
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NLINK NODE NAME
pulseaudi 1799 cdown 6u REG 0,1 67108864 0 1025 /memfd:pulseaudio (deleted)
chrome 46460 cdown 45r REG 0,27 131072 0 105357 /dev/shm/.com.google.Chrome.gL8tTh (deleted)
This lists all open files which an st_nlink
value of less than one.
Mitigation
In your case you were able to close the file handles by terminating the process, which is a good solution if possible.
In cases where that isn't possible, on Linux you can access the data backed by the file descriptor via /proc/pid/fd
and truncate it to size 0, even if the file has already been deleted:
: > "/proc/pid/fd/$num"
Note that, depending on what your application then does with this file descriptor, the application may be varying degrees of displeased about having the data changed out from under it like this.
If you are certain that file descriptor has simply leaked and will not be accessed again, then you can also use gdb
to close it. First, use lsof -nP +L1
or ls -l /prod/pid/fd
to find the relevant file descriptor number, and then:
% gdb -p pid --batch -ex 'call close(num)'
To answer your other question, although it's not the cause of your problem:
Is the number of file [descriptors] limited?
The number of file descriptors is limited, but that's not the limit you're hitting here. "No space left on device" is ENOSPC
, which is what we generate when your filesystem is out of space. If you were hitting a file descriptor limit, you'd receive EMFILE
(process-level shortage, rendered by strerror
as "Too many open files") or ENFILE
(system-level shortage, rendered by strerror
as "Too many open files in system") instead. The process level soft limit can be inspected with ulimit -Sn
, and the system-level limit can be viewed at /proc/sys/fs/file-max
.