How to convert wstring into string?
Here is a worked-out solution based on the other suggestions:
#include <string>
#include <iostream>
#include <clocale>
#include <locale>
#include <vector>
int main() {
std::setlocale(LC_ALL, "");
const std::wstring ws = L"ħëłlö";
const std::locale locale("");
typedef std::codecvt<wchar_t, char, std::mbstate_t> converter_type;
const converter_type& converter = std::use_facet<converter_type>(locale);
std::vector<char> to(ws.length() * converter.max_length());
std::mbstate_t state;
const wchar_t* from_next;
char* to_next;
const converter_type::result result = converter.out(state, ws.data(), ws.data() + ws.length(), from_next, &to[0], &to[0] + to.size(), to_next);
if (result == converter_type::ok or result == converter_type::noconv) {
const std::string s(&to[0], to_next);
std::cout <<"std::string = "<<s<<std::endl;
}
}
This will usually work for Linux, but will create problems on Windows.
As Cubbi pointed out in one of the comments, std::wstring_convert
(C++11) provides a neat simple solution (you need to #include
<locale>
and <codecvt>
):
std::wstring string_to_convert;
//setup converter
using convert_type = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_type, wchar_t> converter;
//use converter (.to_bytes: wstr->str, .from_bytes: str->wstr)
std::string converted_str = converter.to_bytes( string_to_convert );
I was using a combination of wcstombs
and tedious allocation/deallocation of memory before I came across this.
http://en.cppreference.com/w/cpp/locale/wstring_convert
update(2013.11.28)
One liners can be stated as so (Thank you Guss for your comment):
std::wstring str = std::wstring_convert<std::codecvt_utf8<wchar_t>>().from_bytes("some string");
Wrapper functions can be stated as so: (Thank you ArmanSchwarz for your comment)
std::wstring s2ws(const std::string& str)
{
using convert_typeX = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.from_bytes(str);
}
std::string ws2s(const std::wstring& wstr)
{
using convert_typeX = std::codecvt_utf8<wchar_t>;
std::wstring_convert<convert_typeX, wchar_t> converterX;
return converterX.to_bytes(wstr);
}
Note: there's some controversy on whether string
/wstring
should be passed in to functions as references or as literals (due to C++11 and compiler updates). I'll leave the decision to the person implementing, but it's worth knowing.
Note: I'm using std::codecvt_utf8
in the above code, but if you're not using UTF-8 you'll need to change that to the appropriate encoding you're using:
http://en.cppreference.com/w/cpp/header/codecvt
An older solution from: http://forums.devshed.com/c-programming-42/wstring-to-string-444006.html
std::wstring wide( L"Wide" );
std::string str( wide.begin(), wide.end() );
// Will print no problemo!
std::cout << str << std::endl;
Update (2021): However, at least on more recent versions of MSVC, this may generate a wchar_t
to char
truncation warning. The warning can be quieted by using std::transform
instead with explicit conversion in the transformation function, e.g.:
std::wstring wide( L"Wide" );
std::string str;
std::transform(wide.begin(), wide.end(), std::back_inserter(str), [] (wchar_t c) {
return (char)c;
});
Or if you prefer to preallocate and not use back_inserter
:
std::string str(wide.length(), 0);
std::transform(wide.begin(), wide.end(), str.begin(), [] (wchar_t c) {
return (char)c;
});
See example on various compilers here.
Beware that there is no character set conversion going on here at all. What this does is simply to assign each iterated wchar_t
to a char
- a truncating conversion. It uses the std::string c'tor:
template< class InputIt >
basic_string( InputIt first, InputIt last,
const Allocator& alloc = Allocator() );
As stated in comments:
values 0-127 are identical in virtually every encoding, so truncating values that are all less than 127 results in the same text. Put in a chinese character and you'll see the failure.
the values 128-255 of windows codepage 1252 (the Windows English default) and the values 128-255 of unicode are mostly the same, so if that's teh codepage you're using most of those characters should be truncated to the correct values. (I totally expected á and õ to work, I know our code at work relies on this for é, which I will soon fix)
And note that code points in the range 0x80 - 0x9F
in Win1252 will not work. This includes €
, œ
, ž
, Ÿ
, ...