Optimizing GNU grep
No, there's no such thing. Generally the cost of starting grep
(fork a new process, load the executable, shared library, dynamic linkage...) would be a lot greater than compiling the regexps, so this kind of optimisation would make little sense.
Though see Why is matching 1250 strings against 90k patterns so slow? about a bug in some versions of GNU grep
that would make it particularly slow for a great number of regexps.
Possibly here, you could avoid running grep
several times by feeding your chunks to the same grep
instance, for instance by using it as a co-process and use a marker to detect the end. With zsh
and GNU grep
and awk
implementations other than mawk
:
coproc grep -E -f patterns -e '^@@MARKER@@$' --line-buffered
process_chunk() {
{ cat; echo @@MARKER@@; } >&p & awk '$0 == "@@MARKER@@"{exit};1' <&p
}
process_chunk < chunk1 > chunk1.grepped
process_chunk < chunk2 > chunk2.grepped
Though it may be simpler to do the whole thing with awk
or perl
instead.
But if you don't need the grep
output to go into different files for different chunks, you can always do:
{
cat chunk1
while wget -qO- ...; done # or whatever you use to fetch those chunks
...
} | grep -Ef patterns > output