uniq command not working properly?
You need to use sort
before uniq
:
find . -type f -exec md5sum {} ';' | sort | uniq -w 33
uniq
only removes repeated lines. It does not re-order the lines looking for repeats. sort
does that part.
This is documented in man uniq
:
Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use
sort -u' without
uniq'.
The input for uniq
needs to be sorted. So for the example case,
find . -type f -exec md5sum '{}' ';' | sort | uniq -w 33
would work. The -w
(--check-chars=N
) makes the lines unique only regarding the first column; This option works for this case. but the possibilities to specify the relevant parts of the line for uniq
are limited. For example, there are no options to specify working on some column 3 and 5, ignoring column 4.
The command sort
has an option for unique output lines itself, and the lines are unique regarding the keys used for sorting. This means we can make use of the powerful key syntax of sort
to define regarding which part the lines should be uniq.
For the example,
find . -type f -exec md5sum '{}' ';' | sort -k 1,1 -u
gives just the same result, but the sort
part is more flexible for other uses.