recursive statistics on file types in directory?
You could use find
and uniq
for this, e.g.:
$ find . -type f | sed 's/.*\.//' | sort | uniq -c
16 avi
29 jpg
136 mp3
3 mp4
Command explanation
find
recursively prints all filenamessed
deletes from every filename the prefix until the file extensionuniq
assumes sorted input-c
does the counting (like a histogram).
With zsh:
print -rl -- **/?*.*(D.:e) | uniq -c |sort -n
The pattern **/?*.*
matches all files that have an extension, in the current directory and its subdirectories recursively. The glob qualifier D
let zsh
traverse even hidden directories and consider hidden files, .
selects only regular files. The history modifier retains only the file extension. print -rl
prints one match per line. uniq -c
counts consecutive identical items (the glob result is already sorted). The final call to sort
sorts the extensions by use count.
This one-liner seems to be a fairly robust method:
find . -type f -printf '%f\n' | sed -r -n 's/.+(\..*)$/\1/p' | sort | uniq -c
The find . -type f -printf '%f\n'
prints the basename of every regular file in the tree, with no directories. That eliminates having to worry about directories which may have .
's in them in your sed
regex.
The sed -r -n 's/.+(\..*)$/\1/p'
replaces the incoming filename with only its extension. E.g., .somefile.ext
becomes .ext
. Note the initial .+
in the regex; this results in any match needing at least one character before the extension's .
. This prevents filenames like .gitignore
from being treated as having no name at all and the extension '.gitignore', which is probably what you want. If not, replace the .+
with a .*
.
The rest of the line is from the accepted answer.
Edit: If you want a nicely-sorted histogram in Pareto chart format, just add another sort
to the end:
find . -type f -printf '%f\n' | sed -r -n 's/.+(\..*)$/\1/p' | sort | uniq -c | sort -bn
Sample output from a built Linux source tree:
1 .1992-1997
1 .1994-2004
1 .1995-2002
1 .1996-2002
1 .ac
1 .act2000
1 .AddingFirmware
1 .AdvancedTopics
[...]
1445 .S
2826 .o
2919 .cmd
3531 .txt
19290 .h
23480 .c