Print unmatched patterns, using grep with patterns from file
You could use grep -o
to print only the matching part and use the result as patterns for a second grep -v
on the original patterns.txt
file:
grep -oFf patterns.txt Strings.xml | grep -vFf - patterns.txt
Though in this particular case you could also use join
+ sort
:
join -t\" -v1 -j2 -o 1.1 1.2 1.3 <(sort -t\" -k2 patterns.txt) <(sort -t\" -k2 strings.xml)
The best approach is probably what @don_crissti suggested, so here's a variation on the same theme:
$ grep -vf <(grep -Po 'name=\K.+?"' Strings.xml) patterns.txt
"ExitWarning"
"SomeMessage"
"Help"
This basically is the inverse of @don_crissti's approach. It uses grep with Perl Compatible Regular Expressions (-P
) and the -o
switch to print only the matching part of the line. Then, the regex looks for name=
and discards it (\K
), and then looks for one or more characters until the first "
(.+?"
). This results in the list of patterns present in the String.txt
file which is then passed as input to a reverse grep (grep -v
) using process substitution (<(command)
).
I would use cut
, probably. That is, if, as it appears, you know where to expect the quoted string you're looking for.
If I do:
{ cut -sd\" -f2 |
grep -vFf- pat
} <<\IN
# <string name="Introduction">One day there was an apple that went to the market.</string>
# <string name="BananaOpinion">Bananas are great!</string>
# <string name="MessageToUser">We would like to give you apples, bananas and tomatoes.</string>
IN
...after saving my own copy of your example patterns.txt
in pat
and running the above command the output is:
"ExitWarning"
"SomeMessage"
"Help"
cut
prints to stdout only the second "
double-quote -d
elimited -f
ield for each delimiter-matched line of input and -s
uppresses all others.
What cut
actually prints at grep
is:
Introduction
BananaOpinion
MessageToUser
grep
searches its named file operand for lines which -v
don't match the -F
ixed strings in its -
stdin pattern -f
ile.
If you can rely on the second "
-delimited field as the one to match, then it will definitely be an optimization over grep
-P
erl mode by just matching -F
ixed strings and only tiny portions of them because cut
does the heavy lifting - and it does it fast.