Unix command to check if any two lines in a file are same?
Here is one way to get the exact output you're looking for:
$ grep -nFx "$(sort sentences.txt | uniq -d)" sentences.txt
1:This is sentence X
4:This is sentence X
Explanation:
The inner $(sort sentences.txt | uniq -d)
lists each line that occurs more than once. The outer grep -nFx
looks again in sentences.txt
for exact -x
matches to any of these lines -F
and prepends their line number -n
Not exactly what you want, but you can try combining sort
and uniq -c -d
:
aularon@aularon-laptop:~$ cat input
This is sentence X
This is sentence Y
This is sentence Z
This is sentence X
This is sentence A
This is sentence B
aularon@aularon-laptop:~$ sort input | uniq -cd
2 This is sentence X
aularon@aularon-laptop:~$
2
here is the number of duplications found for the line, from man uniq
:
-c, --count
prefix lines by the number of occurrences
-d, --repeated
only print duplicate lines
IF the file contents fit in memory awk
is good for this. The standard one-liner in comp.lang.awk (I can't search an instance from this machine but there's several every month) to just detect there is duplication is awk 'n[$0]++'
which counts the occurrences of each line value and prints any occurrence(s) other than the first, because the default action is print $0
.
To show all occurrences including the first, in your format, but possibly in mixed order when more than one value is duplicated, gets a little more finicky:
awk <sentences.txt ' !($0 in n) {n[$0]=NR;next} \
n[$0] {n[$0]=0; print "Line "n[$0]":"$0} \
{print "Line "NR":"$0} '
Shown in multiple lines for clarity, you usually run together in real use.
If you do this often you can put the awk
script in a file with awk -f
, or of course the whole thing in a shell script. Like most simple awk
this can be done very similarly with perl -n[a]
.