Print lines in one file matching patterns in another file
Try grep -Fwf file2 file1 > out
The -F
option specifies plain string matching, so should be faster without having to engage the regex engine.
Here's how to do it in awk:
awk 'NR==FNR{pats[$0]; next} $2 in pats' File2 File1
Using a 60,000 line File1 (your File1 repeated 8000 times) and a 6,000 File2 (yours repeated 1200 times):
$ time grep -Fwf File2 File1 > ou2
real 0m0.094s
user 0m0.031s
sys 0m0.062s
$ time awk 'NR==FNR{pats[$0]; next} $2 in pats' File2 File1 > ou1
real 0m0.094s
user 0m0.015s
sys 0m0.077s
$ diff ou1 ou2
i.e. it's about as fast as the grep. One thing to note though is that the awk solution lets you pick a specific field to match on so if anything from File2 shows up anywhere else in File1 you won't get a false match. It also lets you match on a whole field at a time so if your target strings were various lengths and you didn't want "scign000003" to match "scign0000031" for example (though the -w for grep gives similar protection for that).
For completeness, here's the timing for the other awk solution posted elsethread:
$ time awk 'BEGIN{i=0}FNR==NR{a[i++]=$1;next}{for(j=0;j<i;j++)if(index($0,a[j]))print $0}' File2 File1 > ou3
real 3m34.110s
user 3m30.850s
sys 0m1.263s
and here's the timing I get for the perl script Mark posted:
$ time ./go.pl > out2
real 0m0.203s
user 0m0.124s
sys 0m0.062s