Why does "uniq" count identical words as different?

Try to sort first:

cat .temp_occ | sort| uniq -c | sort -k1,1nr -k2 > distribution.txt

Or use "sort -u" which also eliminates duplicates. See here.

The size of the file has nothing to do with what you're seeing. From the man page of uniq(1):

Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.`

So running uniq on

a
b
a

will return:

a
b
a

Why does "uniq" count identical words as different?

Tags:

Linux

Shell

Bash

Uniq

Related

Recent Posts