How to delete duplicate lines in a file without sorting it in Unix?

awk '!seen[$0]++' file.txt

seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.

The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on. AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.

From http://sed.sourceforge.net/sed1line.txt: (Please don't ask me how this works ;-) )

 # delete duplicate, consecutive lines from a file (emulates "uniq").
 # First line in a set of duplicate lines is kept, rest are deleted.
 sed '$!N; /^\(.*\)\n\1$/!P; D'

 # delete duplicate, nonconsecutive lines from a file. Beware not to
 # overflow the buffer size of the hold space, or else use GNU sed.
 sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

How to delete duplicate lines in a file without sorting it in Unix?

Tags:

Unix

Scripting

Shell

Awk

Sed

Related

Recent Posts