How to remove duplicate lines with awk whilst keeping empty lines?

Another option is to check NF, eg:

awk '!NF || !seen[$0]++'

Alternatively

awk '!/./ || !seen[$0]++' file

The main trick is the same, seen[$0]++ creates an entry in the seen associative array whose key is the current line ($0). Therefore, !seen[$0]++ will be false if this line has already been seen. The /./ is checking whether the line contains any non-blank characters, so !/./ matches non blank lines. Combined with || !seen[$0]++ it will ignore all duplicate lines except blank ones and print the rest.

Here is another awk solution, similar to @Thor's answer, less concise but more efficient:

awk '!NF {print;next}; !($0 in a) {a[$0];print}' file

With this, we only check a[$0] has existed or not. If not, initializing it then print. In this case, we don't have any reference, assignment to a[$0] if it existed.

How to remove duplicate lines with awk whilst keeping empty lines?

Tags:

Awk

Related

Recent Posts