How to make this search faster in fgrep/Ag?
Since you're using ack
and The Silver Searcher (ag
), it seems that you are OK with using additional tools.
A new tool in this space is ripgrep (rg
). It is designed to be fast in both finding files to search (like ag
) and also fast in searching files themselves (like plain old GNU grep
).
For the example in your question, you might use it something like this:
rg --files-with-matches --glob "*.tex" "and" "$HOME"
The author of ripgrep posted a detailed analysis of how the different searching tools work, along with benchmark comparisons.
One of the benchmarks, linux-literal-casei
, is somewhat similar to the task you describe. It searches over a large number of files in a lot of nested directories (the Linux codebase), searching for a case-insensitive string literal.
In that benchmark, rg
was fastest when using a whitelist (like your "*.tex" example). The ucg
tool also did well on this benchmark.
rg (ignore) 0.345 +/- 0.073 (lines: 370) rg (ignore) (mmap) 1.612 +/- 0.011 (lines: 370) ag (ignore) (mmap) 1.609 +/- 0.015 (lines: 370) pt (ignore) 17.204 +/- 0.126 (lines: 370) sift (ignore) 0.805 +/- 0.005 (lines: 370) git grep (ignore) 0.343 +/- 0.007 (lines: 370) rg (whitelist) 0.222 +/- 0.021 (lines: 370)+ ucg (whitelist) 0.217 +/- 0.006 (lines: 370)*
* - Best mean time. + - Best sample time.
The author excluded ack
from the benchmarks because it was much slower than the others.
You could probably make it a little bit faster by running multiple find
calls in parallel. For example, first get all toplevel directories and run N find calls, one for each dir. If you run the in a subshell, you can collect the output and pass it to vim or anything else:
shopt -s dotglob ## So the glob also finds hidden dirs
( for dir in $HOME/*/; do
find -L "$dir" -xtype f -name "*.tex" -exec grep -Fli and {} + &
done
) | vim -R -
Or, to be sure you only start getting output once all the finds have finished:
( for dir in $HOME/*/; do
find -L "$dir" -xtype f -name "*.tex" -exec grep -Fli and {} + &
done; wait
) | vim -R -
I ran a few tests and the speed for the above was indeed slightly faster than the single find
. On average, over 10 runs, the single find
call tool 0.898 seconds and the subshell above running one find per dir took 0.628 seconds.
I assume the details will always depend on how many directories you have in $HOME
, how many of them could contain .tex
files and how many might match, so your mileage may vary.