grep: memory exhausted
Two potential problems:
grep -R
(except for the modified GNUgrep
found on OS/X 10.8 and above) follows symlinks, so even if there's only 100GB of files in~/Documents
, there might still be a symlink to/
for instance and you'll end up scanning the whole file system including files like/dev/zero
. Usegrep -r
with newer GNUgrep
, or use the standard syntax:find ~/Documents -type f -exec grep Milledgeville /dev/null {} +
(however note that the exit status won't reflect the fact that the pattern is matched or not).
grep
finds the lines that match the pattern. For that, it has to load one line at a time in memory. GNUgrep
as opposed to many othergrep
implementations doesn't have a limit on the size of the lines it reads and supports search in binary files. So, if you've got a file with a very big line (that is, with two newline characters very far appart), bigger than the available memory, it will fail.That would typically happen with a sparse file. You can reproduce it with:
truncate -s200G some-file grep foo some-file
That one is difficult to work around. You could do it as (still with GNU
grep
):find ~/Documents -type f -exec sh -c 'for i do tr -s "\0" "\n" < "$i" | grep --label="$i" -He "$0" done' Milledgeville {} +
That converts sequences of NUL characters into one newline character prior to feeding the input to
grep
. That would cover for cases where the problem is due to sparse files.You could optimise it by doing it only for large files:
find ~/Documents -type f \( -size -100M -exec \ grep -He Milledgeville {} + -o -exec sh -c 'for i do tr -s "\0" "\n" < "$i" | grep --label="$i" -He "$0" done' Milledgeville {} + \)
If the files are not sparse and you have a version of GNU
grep
prior to2.6
, you can use the--mmap
option. The lines will be mmapped in memory as opposed to copied there, which means the system can always reclaim the memory by paging out the pages to the file. That option was removed in GNUgrep
2.6
I usually do
find ~/Documents | xargs grep -ne 'expression'
I tried a bunch of methods, and found this to be the fastest. Note that this doesn't handle files with spaces the file name very well. If you know this is the case and have a GNU version of grep, you can use:
find ~/Documents -print0 | xargs -0 grep -ne 'expression'
If not you can use:
find ~/Documents -exec grep -ne 'expression' "{}" \;
Which will exec
a grep for every file.
I can think of a few ways to get around this:
Instead of grepping all files at once, do one file at a time. Example:
find /Documents -type f -exec grep -H Milledgeville "{}" \;
If you only need to know which files contain the words, do
grep -l
instead. Since grep will there stop searching after the first hit, it won't have to keep reading any huge filesIf you do want the actual text as well, you could string two separate greps along:
for file in $( grep -Rl Milledgeville /Documents ); do grep -H Milledgeville "$file"; done