How to Use UTF-8 Collation in SQL Server database?
UTF-8 is not a character set, it's an encoding. The character set for UTF-8 is Unicode. If you want to store Unicode text you use the nvarchar
data type.
If the database would use UTF-8 to store text, you would still not get the text out as encoded UTF-8 data, you would get it out as decoded text.
You can easily store UTF-8 encoded text in the database, but then you don't store it as text, you store it as binary data (varbinary
).
No! It's not a joke.
Take a look here: http://msdn.microsoft.com/en-us/library/ms186939.aspx
Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.
And also here: http://en.wikipedia.org/wiki/UTF-16
The older UCS-2 (2-byte Universal Character Set) is a similar character encoding that was superseded by UTF-16 in version 2.0 of the Unicode standard in July 1996.
Looks like this will be finally supported in the SQL Server 2019! SQL Server 2019 - whats new?
From BOL:
UTF-8 support
Full support for the widely used UTF-8 character encoding as an import or export encoding, or as database-level or column-level collation for text data. UTF-8 is allowed in the
CHAR
andVARCHAR
datatypes, and is enabled when creating or changing an object’s collation to a collation with theUTF8
suffix.For example,
LATIN1_GENERAL_100_CI_AS_SC
toLatin1_General_100_CI_AS_KS_SC_UTF8
. UTF-8 is only available to Windows collations that support supplementary characters, as introduced in SQL Server 2012.NCHAR
andNVARCHAR
allow UTF-16 encoding only, and remain unchanged.This feature may provide significant storage savings, depending on the character set in use. For example, changing an existing column data type with ASCII strings from
NCHAR(10)
toCHAR(10)
using an UTF-8 enabled collation, translates into nearly 50% reduction in storage requirements. This reduction is becauseNCHAR(10)
requires 22 bytes for storage, whereasCHAR(10)
requires 12 bytes for the same Unicode string.
2019-05-14 update:
Documentation seems to be updated now and explains our options staring in MSSQL 2019 in section "Collation and Unicode Support".
2019-07-24 update:
Article by Pedro Lopes - Senior Program Manager @ Microsoft about introducing UTF-8 support for Azure SQL Database