MySQL collation for all languages
The accepted answer is wrong (maybe it was right in 2009).
utf8mb4_unicode_ci
is the best encoding to use for wide language support.
Reasoning and supporting evidence:
You want to use
utf8mb4
rather thanutf8
because the latter only supports 3 byte characters, and you want to support 4 byte characters. (ref)
and
You want to use
unicode
rather thangeneral
because the latter never sorted correctly. (ref)
I generally use 8-bit UCS/Unicode transformation format which works perfect for any (well most) languages
utf8_general_ci
http://dev.mysql.com/doc/refman/5.0/en/charset-unicode.html