How to find total filesize grouped by extension
On a GNU system:
find . -name '?*.*' -type f -printf '%b.%f\0' |
awk -F . -v RS='\0' '
{s[$NF] += $1; n[$NF]++}
END {for (e in s) printf "%15d %4d %s\n", s[e]*512, n[e], e}' |
sort -n
Or the same with perl
, avoiding the -printf
extension of GNU find
(still using a GNU extension, -print0
, but this one is more widely supported nowadays):
find . -name '?*.*' -type f -print0 |
perl -0ne '
if (@s = stat$_){
($ext = $_) =~ s/.*\.//s;
$s{$ext} += $s[12];
$n{$ext}++;
}
END {
for (sort{$s{$a} <=> $s{$b}} keys %s) {
printf "%15d %4d %s\n", $s{$_}<<9, $n{$_}, $_;
}
}'
It gives an output like:
12288 1 pnm
16384 4 gif
204800 2 ico
1040384 17 jpg
2752512 83 png
If you want KiB
, MiB
... suffixes, pipe to numfmt --to=iec-i --suffix=B
.
%b*512
gives the disk usage, but note that if files are hard linked several times, they will be counted several times so you may see a discrepancy with what du
reports.
Here is another solution:
find . -type f | egrep -o "\.[a-zA-Z0-9]+$" | sort -u | xargs -I '%' find . -type f -name "*%" -exec du -ch {} + -exec echo % \; | egrep "^\.[a-zA-Z0-9]+$|total$" | uniq | paste - -
The part that gets the extensions is:
find . -type f | egrep -o "\.[a-zA-Z0-9]+$" | sort -u
Next search for the files with an extension and print it on the screen as well:
xargs -I '%' find . -type f -name "*%" -exec du -ch {} + -exec echo % \;
Next we want to keep the extension and the total:
egrep "^\.[a-zA-Z0-9]+$|total$" | uniq
and keep it on the same line:
paste - -
Not as nice as Stephane's solution, but you could try
find . -type f -name "*.png" -print0 | xargs -0r du -ch | tail -n1
where you have to run this for each type of files.