Why is the unit separator (ASCII 31) invisible in terminal output?
The unit separator (US
) character, also known as IS1
, is in the cntrl
character class and is not in the print
character class. It is a control character that is intended for organizing text into groups, for programs that are designed to make use of that information. In general, non-printable characters are probably going to be interpreted and rendered differently in different programs or environments.
The reason you are seeing it represented as ^_
in Vim is because Vim is an interactive editor. It can freely render non-printable characters however it wants, as long as the correct binary character is written to disk.
You cannot get the same behavior in the shell because Unix shell programs are written to operate on and pass plain text to each other. When you cat
a file, the text that is written to the terminal must be what is actually in the file.
So that leaves it to the terminal device to interpret the character. And it turns out that some terminal emulators do render the US
character differently from others. In gnome-terminal
(or any vte
-based terminal), the character will be rendered as a box containing the hex code 001F
. In xterm
or rxvt
, the character is indeed invisible.
The unit separator is in the ASCII range of Control Characters, and therefore does not (or should not usually) have a visual representation.
Vim and some other editors display them, so you can edit them. As you noticed, cat -v
displays it too. The man page shows, that -v
is the short form of --show-nonprinting
, which causes it to replace the non-printing characters with a printable representation, which is not the original content of the file and might therefore cause trouble, if the output is actually to another program.
The representation you see already hints it's a control character: a character prepended with a ^
is a common notation for Ctrl + the character, which is the key combination that produces this character in a terminal. Ctrl+_ will let you input the unit separator in vim, for example. But another editor or some GUI viewer might display the hex code, a placeholder or something completely different.
As your terminal does not print the control characters, it is also not copied when selecting the text (the whitespace characters like newline and tab are an exception here, which are control characters too). Another example of control characters in the terminal that are usually ignored when copying are color codes, which are an ESC
character followed by the code for coloring the text.
So to show the characters on your terminal, there is no other way than to use a program that replaces the unit separator with some printable character.
A little bit at the margin of the other (very good) answers, if you want to alter only the control character ^_
when displaying the file content, you might want to transliterate it using the tr
utility (and a little bit of bash-compatible syntax):
# Replace the control character US (^_) by *one* other character
$ cat my.file | tr $'\c_' ':'
If you need to replace that control character by its "expanded" form, you will need sed
instead:
# Replace the control character US (^_) by any string
cat /tmp/f | sed s/$'\c_'/^_/g
Please note the syntax $'\cX'
: this syntax inform your (bash-compatible shell) to replace the corresponding control character. See wikipedia for a list of control characters alias using the "caret notation". If you don't like that syntax, you might prefer using the octal $'\037'
or hexadecimal $'\x1f'
notation instead.