grep strange behaviour with single letter words
This was a bug in bsdgrep
, relating to a variable that tracks the part of the current line still to scan that is overwritten with successive calls to the regular expression matching engine when multiple patterns are involved.
local fix
You can work around this to an extent by not using the -w
option, which relies upon this variable for correct operation and thus is failing, but instead using the regular expression extensions that match the beginning and endings of words, making your stopwords
file look like:
\<i\> \<file\> \<types\>
This workaround will also require that you do not use the -F
option.
Note that the documented regular expression components [[:<:]]
and [[:>:]]
that the re_format
manual tells you about will not work here. This is because the regular expression library that is compiled into bsdgrep
has GNU regular expression compatibility support turned on. This is another bug, which is reportedly fixed.
service fix
This bug was fixed earlier this year. The fix has not yet made it into the STABLE or RELEASE flavours of FreeBSD, but is reportedly in CURRENT.
For getting this into the MacOS version of grep
, that is derived from FreeBSD's bsdgrep
, please consult Apple. ☺
Further reading
- Jonathan de Boyne Pollard (2017-10-15). bsdgrep behaves incorrectly when given multiple patterns. Bug #223031. FreeBSD Bugzilla.
- Kyle Evans (2017-04-03). bsdgrep: fix matching behaviour. Revision 316477. FreeBSD source.
- Kyle Evans (2017-05-02). bsdgrep: fix -w -v matching improperly with certain patterns . Revision 317665. FreeBSD source.
- Nathan Weeks (2014-06-16). grep(1) and bsdgrep(1) do not recognize [[:<:]] and [[:>:]]. Bug #191086. FreeBSD Bugzilla.