How to remove duplicate lines with awk whilst keeping empty lines?
Another option is to check NF
, eg:
awk '!NF || !seen[$0]++'
Alternatively
awk '!/./ || !seen[$0]++' file
The main trick is the same, seen[$0]++
creates an entry in the seen
associative array whose key is the current line ($0
). Therefore, !seen[$0]++
will be false if this line has already been seen. The /./
is checking whether the line contains any non-blank characters, so !/./
matches non blank lines. Combined with || !seen[$0]++
it will ignore all duplicate lines except blank ones and print the rest.
Here is another awk
solution, similar to @Thor's answer, less concise but more efficient:
awk '!NF {print;next}; !($0 in a) {a[$0];print}' file
With this, we only check a[$0]
has existed or not. If not, initializing it then print. In this case, we don't have any reference, assignment to a[$0]
if it existed.