How to create a frequency list of every word in a file?

Not sed and grep, but tr, sort, uniq, and awk:

% (tr ' ' '\n' | sort | uniq -c | awk '{print $2"@"$1}') <<EOF
This is a file with many words.
Some of the words appear more than once.
Some of the words only appear one time.
EOF

a@1
appear@2
file@1
is@1
many@1
more@1
of@2
once.@1
one@1
only@1
Some@2
than@1
the@2
This@1
time.@1
with@1
words@2
words.@1

In most cases you also want to remove numbers and punctuation, convert everything to lowercase (otherwise "THE", "The" and "the" are counted separately) and suppress an entry for a zero length word. For ASCII text you can do all these with this modified command:

sed -e  's/[^A-Za-z]/ /g' text.txt | tr 'A-Z' 'a-z' | tr ' ' '\n' | grep -v '^$'| sort | uniq -c | sort -rn

uniq -c already does what you want, just sort the input:

echo 'a s d s d a s d s a a d d s a s d d s a' | tr ' ' '\n' | sort | uniq -c

output:

  6 a
  7 d
  7 s

How to create a frequency list of every word in a file?

Tags:

File Io

Bash

Grep

Sed

Related

Recent Posts