Get text-file word occurrence count of all words & print output sorted

I would use tr instead of awk:

echo "Lorem ipsum dolor sit sit amet et cetera." | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr

tr just replaces spaces with newlines
grep -v "^\s*$" trims out empty lines
sort to prepare as input for uniq
uniq -c to count occurrences
sort -bnr sorts in numeric reverse order while ignoring whitespace

wow. it turned out to be a great command to count swear-per-lines

find . -name "*.py" -exec cat {} \; | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr | grep fuck

Split the input into words, one per line.
Sort the resulting list of words (lines).
Squash multiple occurences.
Sort by occurrence count.

To split the input into words, replace any character that you deem to be a word separator by a newline.

<input_file \
tr -sc '[:alpha:]' '[\n*]' | # Add digits, -, ', ... if you consider
                             # them word constituents
sort |
uniq -c |
sort -nr

Not using grep and awk but this seems to do what you want:

for w in `cat maxwell.txt`; do echo $w; done|sort|uniq -c
  2 a
  1 A
  1 an
  1 command
  1 considered
  1 domain-specific
  1 for
  1 interpreter,
  2 is
  1 language.
  1 line
  1 of

Get text-file word occurrence count of all words & print output sorted

Tags:

Sort

Related

Recent Posts