What is the difference between "UTF-16" and "std::wstring"?
std::wstring
is a container of wchar_t
. The size of wchar_t
is not specified—Windows compilers tend to use a 16-bit type, Unix compilers a 32-bit type.
UTF-16 is a way of encoding sequences of Unicode code points in sequences of 16-bit integers.
Using Visual Studio, if you use wide character literals (e.g. L"Hello World"
) that contain no characters outside of the BMP, you'll end up with UTF-16, but mostly the two concepts are unrelated. If you use characters outside the BMP, std::wstring
will not translate surrogate pairs into Unicode code points for you, even if wchar_t
is 16 bits.
UTF-16 is a specific Unicode encoding. std::wstring
is a string implementation that uses wchar_t
as its underlying type for storing each character. (In contrast, regular std::string
uses char
).
The encoding used with wchar_t
does not necessarily have to be UTF-16—it could also be UTF-32 for example.
UTF-16 is a concept of text represented in 16-bit elements but an actual textual character may consist of more than one element
std::wstring is just a collection of these elements, and is a class primarily concerned with their storage.
The elements in a wstring, wchar_t is at least 16-bits but could be 32 bits.