Get text-file word occurrence count of all words & print output sorted
I would use tr
instead of awk:
echo "Lorem ipsum dolor sit sit amet et cetera." | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr
tr
just replaces spaces with newlinesgrep -v "^\s*$"
trims out empty linessort
to prepare as input foruniq
uniq -c
to count occurrencessort -bnr
sorts in numeric reverse order while ignoring whitespace
wow. it turned out to be a great command to count swear-per-lines
find . -name "*.py" -exec cat {} \; | tr '[:space:]' '[\n*]' | grep -v "^\s*$" | sort | uniq -c | sort -bnr | grep fuck
- Split the input into words, one per line.
- Sort the resulting list of words (lines).
- Squash multiple occurences.
- Sort by occurrence count.
To split the input into words, replace any character that you deem to be a word separator by a newline.
<input_file \
tr -sc '[:alpha:]' '[\n*]' | # Add digits, -, ', ... if you consider
# them word constituents
sort |
uniq -c |
sort -nr
Not using grep and awk but this seems to do what you want:
for w in `cat maxwell.txt`; do echo $w; done|sort|uniq -c
2 a
1 A
1 an
1 command
1 considered
1 domain-specific
1 for
1 interpreter,
2 is
1 language.
1 line
1 of