Fast alternative to grep -f
If you want a pure Perl option, read your query file keys into a hash table, then check standard input against those keys:
#!/usr/bin/env perl
use strict;
use warnings;
# build hash table of keys
my $keyring;
open KEYS, "< file.contain.query.txt";
while (<KEYS>) {
chomp $_;
$keyring->{$_} = 1;
}
close KEYS;
# look up key from each line of standard input
while (<STDIN>) {
chomp $_;
my ($key, $value) = split("\t", $_); # assuming search file is tab-delimited; replace delimiter as needed
if (defined $keyring->{$key}) { print "$_\n"; }
}
You'd use it like so:
lookup.pl < file.to.search.txt
A hash table can take a fair amount of memory, but searches are much faster (hash table lookups are in constant time), which is handy since you have 10-fold more keys to lookup than to store.
If the files are already sorted:
join file1 file2
if not:
join <(sort file1) <(sort file2)
This Perl code may helps you:
use strict;
open my $file1, "<", "file.contain.query.txt" or die $!;
open my $file2, "<", "file.to.search.in.txt" or die $!;
my %KEYS = ();
# Hash %KEYS marks the filtered keys by "file.contain.query.txt" file
while(my $line=<$file1>) {
chomp $line;
$KEYS{$line} = 1;
}
while(my $line=<$file2>) {
if( $line =~ /(\w+)\s+(\d+)/ ) {
print "$1 $2\n" if $KEYS{$1};
}
}
close $file1;
close $file2;
If you have fixed strings, use grep -F -f
. This is significantly faster than regex search.