Why won't some unicode characters print to my terminal?
help printf
defers to printf(1)
for the escape sequences interpreted, and the docs for GNU printf says:
printf
interprets two character syntaxes introduced in ISO C 99:\u
for 16-bit Unicode (ISO/IEC 10646) characters, specified as four hexadecimal digits hhhh, and\U
for 32-bit Unicode characters, specified as eight hexadecimal digits hhhhhhhh.printf
outputs the Unicode characters according to theLC_CTYPE
locale. Unicode characters in the ranges U+0000…U+009F, U+D800…U+DFFF cannot be specified by this syntax, except for U+0024 ($), U+0040 (@), and U+0060 (`).
Something similar is specified in the Bash manual for ANSI C Quoting and echo
:
\uHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)
\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)
In short: \u
is not for 5 hex digits. It's \U
:
# printf "\u2660 \u1F0A1 \U1F0A1\n"
♠ Ἂ1
Muru's answer is completely correct, but just to clarify one point:
When you're printing \u1F0A1
, that's interpreted as a sixteen-bit Unicode escape \u1F0A
, followed by the literal character 1
(since \u
takes the following four characters, no more, no less). U+1F0A then gives Ἂ
, a Greek alpha with a couple diacritics on it (Greek Capital Letter Alpha with Psili and Varia, to be precise).
If you want more than sixteen bits in your Unicode escape, you need to use \U
, which takes eight characters' worth of hex: \U0001F0A1
will give you the playing card.