MySQL unicode literals
If the goal is to specify the code point instead of the encoded byte sequence (i.e. 0x0F02
instead of the UTF-8 0xE0BC82
for "༂"), then you need to use an encoding in which the code point value just happens to be the encoded byte sequence. For example, "0xE28098" is the UTF-8 encoded byte sequence for the " ‘ " character (as shown in dkamins's answer), which is code point U+2018. However, 0x2018
is both the code point value for ‘
and the encoded byte sequence for ucs2
/ utf16
(they are effectively the same encoding for BMP characters, but I prefer to use "utf16" as it is consistent with "utf8" and "utf32", consistent in the "utf" theme). Hence:
_utf16 0x2018
returns the same ‘
character as:
_utf8 0xE0BC82
But, utf16
only works for BMP characters (code points U+0000 - U+FFFF) in terms of specifying the code point value. If you want a Supplementary Character (by specifying the code point instead of a specific encoding's sequence of bytes), then you will need to use the utf32
encoding. Not only does _utf32 0x2018
return ‘
, but:
_utf32 0x1F47E
returns: 👾
To use either UTF-8 or UTF-16 encodings for that same Supplementary Character would require the following:
_utf8mb4 0xF09F91BE
_utf16 0xD83DDC7E
HOWEVER, if you are having trouble adding this to a string that is already utf8, then you will need to convert this into utf8
(or into utf8mb4
when creating Supplementary Characters as the utf8
encoding / charset can only handle BMP characters):
CONVERT(_utf32 0x1F47E USING utf8mb4)
Or, using the example character from Michael - sqlbot's answer:
CONVERT(_utf32 0x2192 USING utf8)
returns a →
. Hence, a custom function is not needed in order to create a UTF-8 encoded character from its code point (at least not as of MySQL 8.0). Here is a test query
SELECT _utf32 0x1F47E AS "Supplementary Character in utf32",
CONVERT(_utf32 0x1F47E USING utf8mb4) AS "Supplementary Character in utf8mb4",
CHARSET(CONVERT(_utf32 0x1F47E USING utf8mb4)) AS "Proof",
"---" AS "---",
_utf32 0x2192 AS "BMP character in utf32",
CONVERT(_utf32 0x2192 USING utf8) AS "BMP character in utf8",
CHARSET(CONVERT(_utf32 0x2192 USING utf8)) AS "Proof";
And you can see it working on db<>fiddle (might not work in pre-8.0 MySQL).
For more details on these options, plus Unicode escape sequences for other languages and platforms, please see my post:
Unicode Escape Sequences Across Various Languages and Platforms (including Supplementary Characters)
This stored function provides the functionality MySQL is (apparently) missing, with a way to turn a literal code point into a character without having to already know the UTF-8 encoding.
If VARCHAR(1)
seems strange, since utf8 characters in MySQL can be up to 3 bytes long, remember the size of VARCHAR
is characters, not bytes. The function returns a single UTF-8-encoded character from the input value.
For hexadecimal literals, prepend 0x
.
DELIMITER $$
DROP FUNCTION IF EXISTS `utf8_char` $$
CREATE FUNCTION `utf8_char`(v smallint unsigned) RETURNS VARCHAR(1) CHARSET utf8
NO SQL
DETERMINISTIC
BEGIN
-- http://stackoverflow.com/questions/3632410/mysql-unicode-literals/30675371#30675371
RETURN CHAR(CASE
WHEN v <= 0x7F THEN v
WHEN v <= 0x7FF THEN 0xC080 | ((v >> 6) << 8) | (v & 0x3F)
WHEN v <= 0xFFFF THEN 0xE08080 | (((v >> 12) & 0x0F ) << 16) | (((v >> 6) & 0x3F ) << 8) | (v & 0x3F)
ELSE NULL END);
END $$
DELIMITER ;
Example output:
mysql> select utf8_char(8592) AS 'leftwards_arrow';
+-----------------+
| leftwards_arrow |
+-----------------+
| ← |
+-----------------+
1 row in set (0.00 sec)
mysql> select utf8_char(0x2192) AS 'rightwards_arrow_hex';
+----------------------+
| rightwards_arrow_hex |
+----------------------+
| → |
+----------------------+
1 row in set (0.00 sec)
See: http://bugs.mysql.com/bug.php?id=10199 (Bug #10199: "Allow Unicode escape sequence for string literals.") This request has been "Open" since 2005. More details in Worklog Task #3529: Unicode Escape Sequences.
From https://web.archive.org/web/20091117221116/http://eng.kaching.com/2009/10/mysql-unicode-escape-sequences.html though, you can see the following example, which does actually seem to work, but requires you to know the actual byte-by-byte UTF8 encoding:
You can also use the variable-length UTF-8 representation (convenient when, for example, copying from a utf-8 URL-encoded value like %E2%80%98).
mysql> select _utf8 x'E28098'; +---+ | ‘ | +---+