Grep: unexpected results when searching for words in heading from man page
If you add a | sed -n l
to that tail
command, to show non-printable characters, you'll probably see something like:
N\bNA\bAM\bME\bE
That is, each character is written as X
Backspace X
. On modern terminals, the character ends up being written over itself (as Backspace aka BS aka \b
aka ^H
is the character that moves the cursor one column to the left) with no difference. But in ancient tele-typewriters, that would cause the character to appear in bold as it gets twice as much ink.
Still, pagers like more
/less
do understand that format to mean bold, so that's still what roff
does to output bold text.
Some man implementations would call roff
in a way that those sequences are not used (or internally call col -b -p -x
to strip them like in the case of the man-db
implementation (unless the MAN_KEEP_FORMATTING
environment variable is set)), and don't invoke a pager when they detect the output is not going to a terminal (so man bash | grep NAME
would work there), but not yours.
You can use col -b
to remove those sequences (there are other types (_
BS X
) as well for underline).
For systems using GNU roff
(like GNU or FreeBSD), you can avoid those sequences being used in the first place by making sure the -c -b -u
options are passed to grotty
, for instance by making sure the -P-cbu
options is passed to groff
.
For instance by creating a wrapper script called groff
containing:
#! /bin/sh -
exec /usr/bin/groff -P-cbu "$@"
That you put ahead of /usr/bin/groff in $PATH
.
With macOS' man
(also using GNU roff
), you can create a man-no-overstrike.conf
with:
NROFF /usr/bin/groff -mandoc -Tutf8 -P-cbu
And call man
as:
man -C man-no-overstrike.conf bash | grep NAME
Still with GNU roff
, if you set the GROFF_SGR
environment variable (or don't set the GROFF_NO_SGR
variable depending on how the defaults have been set at compile time), then grotty
(as long as it's not passed the -c
option) will use ANSI SGR terminal escape sequences instead of those BS tricks for character attributes. less
understand them when called with the -R
option.
FreeBSD's man calls grotty
with the -c
option unless you're asking for colours by setting the MANCOLOR variable (in which case -c
is not passed to grotty
and grotty
reverts to the default of using ANSI SGR escape sequences there).
MANCOLOR=1 man bash | grep NAME
will work there.
On Debian, GROFF_SGR is not the default. If you do:
GROFF_SGR=1 man bash | grep NAME
however, because man
's stdout is not a terminal, it takes it upon itself to also pass a GROFF_NO_SGR
variable to grotty
(I suppose so it can use col -bpx
to strip the BS sequences as col
doesn't know how to strip the SGR sequences, even though it still does it with MAN_KEEP_FORMATTING
) which overrides our GROFF_SGR
. You can do instead:
GROFF_SGR=1 MANPAGER='grep NAME' man bash
(in a terminal) to have the SGR escape sequences.
That time, you'll notice that some of those NAMEs do appear in bold on the terminal (and in a less -R
pager). If you feed the output to sed -n l
(MANPAGER='sed -n /NAME/l'
), you'll see something like:
\033[1mNAME\033[0m$
Where \e[1m
is the sequence to enable bold in ANSI compatible terminals, and \e[0m
the sequence to revert all SGR attributes to the default.
On that text grep NAME
works as that text does contain NAME
, but you could still have problems if looking for text where only parts of it is in bold/underline...
If you look at any manual page, you'll notice that the headers are in bold. This is achieved through formatting them with control characters. To be able to grep
like you're wanting to, these have to be stripped out.
The col
utility may be used for this:
$ man bash | col -b | grep 'NAME'
The -b
option has the following description on OpenBSD:
Do not output any backspaces, printing only the last character written to each column position. This can be useful in processing the output of mandoc(1).
Linux the col
manual (on Ubuntu) doesn't have the last sentence in there (but it works in the same way).
On Linux, unsetting the MAN_KEEP_FORMATTING
environment variable (or setting it to an empty string) may also help, and will allow you to grep
without passing the output of man
through col -b
.