Why won't some unicode characters print to my terminal?

help printf defers to printf(1) for the escape sequences interpreted, and the docs for GNU printf says:

printf interprets two character syntaxes introduced in ISO C 99: \u for 16-bit Unicode (ISO/IEC 10646) characters, specified as four hexadecimal digits hhhh, and \U for 32-bit Unicode characters, specified as eight hexadecimal digits hhhhhhhh. printf outputs the Unicode characters according to the LC_CTYPE locale. Unicode characters in the ranges U+0000…U+009F, U+D800…U+DFFF cannot be specified by this syntax, except for U+0024 ($), U+0040 (@), and U+0060 (`).

Something similar is specified in the Bash manual for ANSI C Quoting and echo:

\uHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHH (one to four hex digits)

\UHHHHHHHH
the Unicode (ISO/IEC 10646) character whose value is the hexadecimal value HHHHHHHH (one to eight hex digits)

In short: \u is not for 5 hex digits. It's \U:

# printf "\u2660 \u1F0A1 \U1F0A1\n"
♠ Ἂ1

Muru's answer is completely correct, but just to clarify one point:

When you're printing \u1F0A1, that's interpreted as a sixteen-bit Unicode escape \u1F0A, followed by the literal character 1 (since \u takes the following four characters, no more, no less). U+1F0A then gives Ἂ, a Greek alpha with a couple diacritics on it (Greek Capital Letter Alpha with Psili and Varia, to be precise).

If you want more than sixteen bits in your Unicode escape, you need to use \U, which takes eight characters' worth of hex: \U0001F0A1 will give you the playing card.

Why won't some unicode characters print to my terminal?

Tags:

Fonts

Printf

Unicode

Bash

Related

Recent Posts