How do I count the total number of words of all files in a directory (and its subdirectories)?
Using find
to find all files, then concatenating them with cat
and counting the words in the concatenated stream with wc
:
find . -type f -exec cat {} + | wc -w
The issue with your command is the wc
will be called multiple times on batches of files if you have many thousands of files to process. In the command above, cat
will be called multiple times on batches of files, but all output is sent to a single invocation of wc
.
If your wc
has the --files0-from
option, you can do this:
find . -type f -print0 | wc -w --files0-from=-
Explanation:
I found this solution by first reading the wc(1) man page to see what options were available for scanning multiple files. I found this:
--files0-from=F
read input from the files specified by NUL-terminated names in file F;
If F is - then read names from standard input
From using find
before, I knew that it could generate the desired list of files and with the -print0
option, output the files as a list of NULL-terminated names.
Putting that together resulted in the command above. The find
command searches the current directory (.
) and all subdirectories for regular files (-type f
) and prints their full path names to standard output, each name followed by a null character instead of the usual newline (-print0
). That result is piped (|
) into the standard input of wc
which read that list from the specified file (--files0-from=
), where -
means the standard input, and prints the number of words (-w
) found in each file followed by the total of all words found.
If all you are interested in is the grand total, you could append this to the command above.
| tail -1