Search for text files where two different words exist (any order, any line)
With GNU tools:
find . -type f -exec grep -lZ FIND {} + | xargs -r0 grep -l ME
You can do standardly:
find . -type f -exec grep -q FIND {} \; -exec grep -l ME {} \;
But that would run up to two grep
s per file. To avoid running that many grep
s and still be portable while still allowing any character in file names, you could do:
convert_to_xargs() {
sed "s/[[:blank:]\"\']/\\\\&/g" | awk '
{
if (NR > 1) {
printf "%s", line
if (!index($0, "//")) printf "\\"
print ""
}
line = $0
}'
END { print line }'
}
export LC_ALL=C
find .//. -type f |
convert_to_xargs |
xargs grep -l FIND |
convert_to_xargs |
xargs grep -l ME
The idea being to convert the output of find
into a format suitable for xargs (that expects a blank (SPC/TAB/NL in the C
locale, YMMV in other locales) separated list of words where single, double quotes and backslashes can escape blanks and each other).
Generally you can't post-process the output of find -print
, because it separates the file names with a newline character and doesn't escape the newline characters that are found in file names. For instance if we see:
./a
./b
We've got no way to know whether it's one file called b
in a directory called a<NL>.
or if it's the two files a
and b
in the current directory.
By using .//.
, because //
cannot appear otherwise in a file path as output by find
(because there's no such thing as a directory with an empty name and /
is not allowed in a file name), we know that if we see a line that contains //
, then that's the first line of a new filename. So we can use that awk
command to escape all newline characters but those that precede those lines.
If we take the example above, find
would output in the first case (one file):
.//a
./b
Which awk escapes to:
.//a\
./b
So that xargs
sees it as one argument. And in the second case (two files):
.//a
.//b
Which awk
would leave as is, so xargs
sees two arguments.
You need the LC_ALL=C
so sed
, awk
(and some implementations of xargs
) work for arbitrary sequences of bytes (even though that don't form valid characters in the user's locale), to simplify the blank definition to just SPC and TAB and to avoid problems with different interpretations of characters whose encoding contains the encoding of backslash by the different utilities.
If the files are in a single directory and their name don't contain space, tab, newline, *
, ?
nor [
characters and don't start with -
nor .
, this will get a list of files containing ME, then narrow that down to the ones that also contain FIND.
grep -l FIND `grep -l ME *`
With awk
you could also run:
find . -type f -exec awk 'BEGIN{cx=0; cy=0}; /FIND/{cx++}
/ME/{cy++}; END{if (cx > 0 && cy > 0) print FILENAME}' {} \;
It uses cx
and cy
to count for lines matching FIND
and respectively ME
. In the END
block, if both counters > 0, it prints the FILENAME
.
This would be faster/more efficient with gnu awk
:
find . -type f -exec gawk 'BEGINFILE{cx=0; cy=0}; /FIND/{cx++}
/ME/{cy++}; ENDFILE{if (cx > 0 && cy > 0) print FILENAME}' {} +