Find intersection of lines in two files
Simple comm
+ sort
solution:
comm -12 <(sort file1) <(sort file2)
-12
- suppress column1
and2
(lines unique toFILE1
andFILE2
respectively), thus outputting only common lines (that appear in both files)
In awk
, this loads the first file fully in memory:
$ awk 'NR==FNR { lines[$0]=1; next } $0 in lines' file1 file2
67
102
Or, if you want to keep track of how many times a given line appears:
$ awk 'NR==FNR { lines[$0] += 1; next } lines[$0] {print; lines[$0] -= 1}' file1 file2
join
could do that, though it does require the input files to be sorted, so you need to do that first, and doing it loses the original ordering:
$ join <(sort file1) <(sort file2)
102
67
awk
awk 'NR==FNR { p[NR]=$0; next; }
{ for(val in p) if($0==p[val]) { delete p[val]; print; } }' file1 file2
This is the good solution because (for large files) it should be the fastest as it omits both printing the same entry more than once and checking an entry again after it has been matched.
grep
grep -Fxf file1 file2
This would output the same entry several times if it occurs more than once in file2
.
sort
For fun (should be much slower than grep
):
sort -u file1 >t1
sort -u file2 >t2
sort t1 t2 | uniq -d