Deleting billions of files from a directory while seeing the progress as well
You can use rm -v
to have rm
print one line per file deleted. This way you can see that rm
is indeed working to delete files. But if you have billions of files then all you will see is that rm
is still working. You will have no idea how many files are already deleted and how many are left.
The tool pv
can help you with a progress estimation.
http://www.ivarch.com/programs/pv.shtml
Here is how you would invoke rm
with pv
with example output
$ rm -rv dirname | pv -l -s 1000 > logfile
562 0:00:07 [79,8 /s] [====================> ] 56% ETA 0:00:05
In this contrived example I told pv
that there are 1000
files. The output from pv
shows that 562 are already deleted, elapsed time is 7 seconds, and the estimation to complete is in 5 seconds.
Some explanation:
pv -l
makespv
to count by newlines instead of bytespv -s number
tellspv
what the total is so that it can give you an estimation.- The redirect to
logfile
at the end is for clean output. Otherwise the status line frompv
gets mixed up with the output fromrm -v
. Bonus: you will have a logfile of what was deleted. But beware the file will get huge. You can also redirect to/dev/null
if you don't need a log.
To get the number of files you can use this command:
$ find dirname | wc -l
This also can take a long time if there are billions of files. You can use pv
here as well to see how much it has counted
$ find dirname | pv -l | wc -l
278k 0:00:04 [56,8k/s] [ <=> ]
278044
Here it says that it took 4 seconds to count 278k files. The exact count at the end (278044
) is the output from wc -l
.
If you don't want to wait for the counting then you can either guess the number of files or use pv
without estimation:
$ rm -rv dirname | pv -l > logfile
Like this you will have no estimation to finish but at least you will see how many files are already deleted. Redirect to /dev/null
if you don't need the logfile.
Nitpick:
- do you really need
sudo
? - usually
rm -r
is enough to delete recursively. no need forrm -f
.
Check out lesmana's answer, it's much better than mine — especially the last pv
example, which won't take much longer than the original silent rm
if you specify /dev/null
instead of logfile
.
Assuming your rm
supports the option (it probably does since you're running Linux), you can run it in verbose mode with -v
:
sudo rm -rfv bolands-mills-mhcptz
As has been pointed out by a number of commenters, this could be very slow because of the amount of output being generated and displayed by the terminal. You could instead redirect the output to a file:
sudo rm -rfv bolands-mills-mhcptz > rm-trace.txt
and watch the size of rm-trace.txt
.
Another option is to watch the number of files on the filesystem decrease. In another terminal, run:
watch df -ih pathname
The used-inodes count will decrease as rm
makes progress. (Unless the files mostly had multiple links, e.g. if the tree was created with cp -al
). This tracks deletion progress in terms of number-of-files (and directories). df
without -i
will track in terms of space used.
You could also run iostat -x 4
to see I/O operations per second (as well as kiB/s, but that's not very relevant for pure metadata I/O).
If you get curious about what files rm
is currently working on, you can attach an strace
to it and watch as the unlink()
(and getdents) system calls spew on your terminal. e.g. sudo strace -p $(pidof rm)
. You can ^c
the strace to detach from rm
without interrupting it.
I forget if rm -r
changes directory into the tree it's deleting; if so you could look at /proc/<PID>/cwd
. Its /proc/<PID>/fd
might often have a directory fd open, so you could look at that to see what your rm
process is currently looking at.