"Incorrect string value" when trying to insert UTF-8 into MySQL via JDBC?
The strings that contain \xF0
are simply characters encoded as multiple bytes using UTF-8.
Although your collation is set to utf8_general_ci, I suspect that the character encoding of the database, table or even column may be different. They are independent settings. Try:
ALTER TABLE database.table MODIFY COLUMN col VARCHAR(255)
CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL;
Substitute whatever your actual data type is for VARCHAR(255)
Got the same problem, to save the data with utf8mb4
needs to make sure:
character_set_client, character_set_connection, character_set_results
areutf8mb4
:character_set_client
andcharacter_set_connection
indicate the character set in which statements are sent by the client,character_set_results
indicates the character set in which the server returns query results to the client.
See charset-connection.the table and column encoding is
utf8mb4
For JDBC, there are two solutions:
Solution 1 (need to restart MySQL):
modify
my.cnf
like the following and restart MySQL:[mysql] default-character-set=utf8mb4 [mysqld] character-set-server=utf8mb4 collation-server=utf8mb4_unicode_ci
this can make sure the database and character_set_client, character_set_connection, character_set_results
are utf8mb4
by default.
restart MySQL
change the table and column encoding to
utf8mb4
STOP specifying
characterEncoding=UTF-8
andcharacterSetResults=UTF-8
in the jdbc connector,cause this will overridecharacter_set_client
,character_set_connection
,character_set_results
toutf8
Solution two (don't need to restart MySQL):
change the table and column encoding to
utf8mb4
specifying
characterEncoding=UTF-8
in the jdbc connector,cause the jdbc connector doesn't suportutf8mb4
.write your sql statement like this (need to add
allowMultiQueries=true
to jdbc connector):'SET NAMES utf8mb4;INSERT INTO Mytable ...';
this will make sure each connection to the server, character_set_client,character_set_connection,character_set_results
are utf8mb4
.
Also see charset-connection.
I wanted to combine a couple of posts to make a full answer of this since it does appear to be a few steps.
- Above advice by @madtracey
/etc/mysql/my.cnf
or /etc/mysql/mysql.conf.d/mysqld.cnf
[mysql]
default-character-set=utf8mb4
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
##
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
Again from advice above all jdbc connections had characterEncoding=UTF-8
and characterSetResults=UTF-8
removed from them
With this set -Dfile.encoding=UTF-8
appeared to make no difference.
I could still not write international text into db getting same failure as above
Now using this how-to-convert-an-entire-mysql-database-characterset-and-collation-to-utf-8
Update all your db to use utf8mb4
ALTER DATABASE YOURDB CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
Run this query that gives you what needs to be rung
SELECT CONCAT(
'ALTER TABLE ', table_name, ' CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ',
'ALTER TABLE ', table_name, ' CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci; ')
FROM information_schema.TABLES AS T, information_schema.`COLLATION_CHARACTER_SET_APPLICABILITY` AS C
WHERE C.collation_name = T.table_collation
AND T.table_schema = 'YOURDB'
AND
(C.CHARACTER_SET_NAME != 'utf8mb4'
OR
C.COLLATION_NAME not like 'utf8mb4%')
Copy paste output in editor replace all | with nothing post back into mysql when connected to correct db.
That is all that had to be done and all seems to work for me. Not the -Dfile.encoding=UTF-8
is not enabled and it appears to work as expected
E2A Still having an issue ? I certainly am in production so it turns out you do need to check over what has been done by above, since it sometimes does not work, here is reason and fix in this scenario:
show create table user
`password` varchar(255) CHARACTER SET latin1 NOT NULL,
`username` varchar(255) CHARACTER SET latin1 NOT NULL,
You can see some are still latin attempting to manually update the record:
ALTER TABLE user CONVERT TO CHARACTER SET utf8mb4;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
So let's narrow it down:
mysql> ALTER TABLE user change username username varchar(255) CHARACTER SET utf8mb4 not NULL;
ERROR 1071 (42000): Specified key was too long; max key length is 767 bytes
mysql> ALTER TABLE user change username username varchar(100) CHARACTER SET utf8mb4 not NULL;
Query OK, 5 rows affected (0.01 sec)
In short I had to reduce the size of that field in order to get the update to work.
Now when I run:
mysql> ALTER TABLE user CONVERT TO CHARACTER SET utf8mb4;
Query OK, 5 rows affected (0.01 sec)
Records: 5 Duplicates: 0 Warnings: 0
It all works
MySQL's utf8
permits only the Unicode characters that can be represented with 3 bytes in UTF-8. Here you have a character that needs 4 bytes: \xF0\x90\x8D\x83 (U+10343 GOTHIC LETTER SAUIL).
If you have MySQL 5.5 or later you can change the column encoding from utf8
to utf8mb4
. This encoding allows storage of characters that occupy 4 bytes in UTF-8.
You may also have to set the server property character_set_server
to utf8mb4
in the MySQL configuration file. It seems that Connector/J defaults to 3-byte Unicode otherwise:
For example, to use 4-byte UTF-8 character sets with Connector/J, configure the MySQL server with
character_set_server=utf8mb4
, and leavecharacterEncoding
out of the Connector/J connection string. Connector/J will then autodetect the UTF-8 setting.