Find intersection of lines in two files

Simple comm + sort solution:

comm -12 <(sort file1) <(sort file2)

-12 - suppress column 1 and 2 (lines unique to FILE1 and FILE2 respectively), thus outputting only common lines (that appear in both files)

In awk, this loads the first file fully in memory:

Click to copy

$ awk 'NR==FNR { lines[$0]=1; next } $0 in lines' file1 file2 
67
102

Or, if you want to keep track of how many times a given line appears:

Click to copy

$ awk 'NR==FNR { lines[$0] += 1; next } lines[$0] {print; lines[$0] -= 1}' file1 file2

join could do that, though it does require the input files to be sorted, so you need to do that first, and doing it loses the original ordering:

Click to copy

$ join <(sort file1) <(sort file2)
102
67

awk

Click to copy

awk 'NR==FNR { p[NR]=$0; next; }
   { for(val in p) if($0==p[val]) { delete p[val]; print; } }' file1 file2

This is the good solution because (for large files) it should be the fastest as it omits both printing the same entry more than once and checking an entry again after it has been matched.

grep

Click to copy

grep -Fxf file1 file2

This would output the same entry several times if it occurs more than once in file2.

sort

For fun (should be much slower than grep):

Click to copy

sort -u file1 >t1
sort -u file2 >t2
sort t1 t2 | uniq -d

Find intersection of lines in two files

Tags:

Awk

Sed

Text Processing

Related

Recent Posts