Get the exact size of files retrieved by find output
Try pipe the output of find
to du
and specify the --files0-from -
flag:
find -type f -mtime +60 -print0 | du -shc --files0-from -
This should give you a grand total at the end
To get just the total, pipe that output to tail -n1
:
find -type f -mtime +60 -print0 | du -shc --files0-from - | tail -n1
I should mention that I actually just tested this with gnu linux, not busybox. Looking at the busybox page, it does not look like du
supports the --files0-from
option.
You can change the above command to this to have it work on busybox:
find -type f -mtime +60 -print0 | xargs -0 du -ch | tail -n1
The above also works with files with spaces and newlines in their names, but may not work well if there are too many files found by find
command. See the below comment. If you feel that there may be too many files, you can try the other answer on this page.
In principle this is easy: just tell find
to run du
on a bunch of files at once.
find . -type f -mtime +60 -exec du -smc {} +
Unfortunately this doesn't work reliably, because -exec … {} +
can execute the command multiple times, it only tries to group the arguments, and it cannot possibly group all the arguments if their total length would go over the system's command line length limit. And in fact BusyBox find (at least the version I tested just now) doesn't try grouping at all: -exec … {} +
processes one argument at a time, line -exec … {} \;
. There's no way to be sure to get a single total
line.
GNU du
can be told to read an arbitrarily long list of file names with --files0-from
, but other versions of du
, in particular the one in BusyBox, can only take file names from the command line.
So if you can't assume that you have GNU du
, there's no way to avoid running du
multiple times, and this means you need another tool to do the summing, which in turn requires that du
doesn't round the sizes. The summing is simple with awk if the output of du
is parseable.
If you can assume that there are no newlines in file names, or you're ok with excluding paths that contain newlines, the output of du
is easy to parse: just one file per line.
newline='
'
find . ! -path "*${newline}*" -type f -mtime +60 -exec du -k {} + |
awk '{kB += $1} END {printf "%d MB\n", (kB + 512) / 1024}'
If you wanted the cumulative disk usage (as your usage of du
suggests) of the regular files that are over 60 days old and need only to be portable to GNU and busybox systems (though note that which commands are included in busybox and what feature they support is configurable at build time, so you can never know if what works with one instance of busybox will work with the next), you can do:
find . -type f -mtime +59 -print0 |
xargs -r0 stat -c '%D:%i %b' | awk '
!seen[$1]++ {sum += $2}
END {print sum * 512}'
(and yes, you need -mtime +59
for files older than 60 x 24 hours. -mtime +60
would not match on a file that is 60.9 days old as that's rounded down to 60 days and 60 is not greater than 60)
That reports the total in number of bytes. Hard links (or other cases such as bind-mounts where there may be several paths to the same file) are counted only once (like GNU du
does; busybox du
doesn't do it if the hardlinks are passed as separate arguments as opposed to found in the traversal of a single directory argument). However, like du
, it won't detect the cases where some data is shared between non-hardlinked files, like when files have been copied with cp --reflink=always
on filesystems like btrfs or when deduplication is performed by the file system.
That should be equivalent to the GNU-specific:
find . -type f -mtime +59 -print0 |
du -cB1 --files0-from=- |
awk 'END{print $1}'
POSIXly, and assuming all files are on the same file system, you could do:
LC_ALL=C LS_BLOCK_SIZE=512 BLOCKSIZE=512 POSIXLY_CORRECT=1 \
find . -type f -mtime +59 -exec ls -nisqd {} + | awk '
!seen[$1]++ {sum += $2}
END {print sum * 512}'
(with LS_BLOCK_SIZE=512 BLOCKSIZE=512 POSIXLY_CORRECT=1
to work around the fact that some ls
implementations like GNU ls
are not POSIX compliant by default. It won't work with busybox ls
which doesn't support -q
. However since it always renders newline characters in file paths as ?
(which is also not POSIX compliant), -q
is not needed there).
After (here on a GNU system):
$ seq 10000 > a
$ truncate -s14T a
$ ln a b
$ touch -d '-60 days' a
$ BLOCKSIZE=1 ls -lis --full-time
total 98304
59944369 49152 -rw-rw-r-- 2 me me 15393162788864 2019-07-29 09:49:25.933 +0100 a
59944369 49152 -rw-rw-r-- 2 me me 15393162788864 2019-07-29 09:49:25.933 +0100 b
$ date --iso-8601=s
2019-09-27T09:50:03+01:00
$ du -h
52K .
All give me 49152
, which is the cumulative disk usage of both a
and b
but is different from the sum of their size (28 TiB) or the size of their disk usage (49152 x 2).
(note that the 52K above also includes the disk usage of the current directory file (.
, 4KiB in my case)).
For the sum of the apparent sizes.
find . -type f -mtime +59 -print0 |
xargs -r0 stat -c %s | awk -v sum=0 '
{sum += $0}; END{print sum}'
Or with GNU du
:
find . -type f -mtime +59 -print0 |
du -cbl --files0-from=- |
awk 'END{print $1}'
Or POSIXly (here without the restriction about single file system):
LC_ALL=C find . -type f -mtime +59 -exec ls -nqd {} + |
awk -v sum=0 '{sum += $5}; END {print sum}'
On the above example, they all give: 30786325577728
(28 TiB).