What most correct way to set the encoding in C++?
I need that any Unicode symbol/string was correctly inputed and outputed.
This is certainly possible, although making the Windows command prompt console properly Unicode-aware takes some special magic. I seriously doubt that any of the implementations of the standard library functions are going to do this, unfortunately.
You'll find a number of questions about it on Stack Overflow, but this one is a good one. Basically, the console uses what is called (somewhat erroneously) the "OEM" code page by default. You want to change that to the UTF-8 code page, the value of which is defined by CP_UTF8
. To do this, you'll need to call both the SetConsoleCP
function (to set the input code page) and the SetConsoleOutputCP
function (to set the output code page). The code would look something like this:
if (!SetConsoleCP(CP_UTF8))
{
// An error occurred; handle it. Call GetLastError() for more information.
// ...
}
if (!SetConsoleOutputCP(CP_UTF8))
{
// An error occurred; handle it. Call GetLastError() for more information.
// ...
}
For extra robustness, you might also want to make sure that the UTF-8 code page is supported first, before trying to set and use it. You would do that by calling the IsValidCodePage
function. For example:
if (IsValidCodePage(CP_UTF8))
{
// We're all good, so set the console code page...
}
You will also have to change the font from the default ("Raster Fonts") to something that contains the requisite Unicode character glyphs—e.g., Lucida Console or Consolas (reference). That's trivial to do using the SetCurrentConsoleFontEx
function.
Unfortunately, this function does not exist in versions of Windows prior to Vista. If you absolutely need to support these older operating systems, the only thing I know to do is to call the undocumented SetConsoleFont
function. Normally, I would advise strongly against using undocumented functions, but I think it's less of a problem here since you would only be using it in old versions of the operating system. You know those aren't going to change. On the newer versions where it is available, you call the supported function. Sample untested code:
bool IsWinVistaOrLater()
{
OSVERSIONINFOEX osvi;
osvi.dwOSVersionInfoSize = sizeof(osvi);
GetVersionEx(reinterpret_cast<LPOSVERSIONINFO>(&osvi));
if (osvi.dwPlatformId == VER_PLATFORM_WIN32_NT)
{
return osvi.dwMajorVersion >= 6;
}
return false;
}
void SetConsoleToUnicodeFont()
{
HANDLE hConsole = GetStdHandle(STD_OUTPUT_HANDLE);
if (IsWinVistaOrLater())
{
// Call the documented function.
typedef BOOL (WINAPI * pfSetCurrentConsoleFontEx)(HANDLE, BOOL, PCONSOLE_FONT_INFOEX);
HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
pfSetCurrentConsoleFontEx pfSCCFX = (pfSetCurrentConsoleFontEx)GetProcAddress(hMod, "SetCurrentConsoleFontEx");
CONSOLE_FONT_INFOEX cfix;
cfix.cbSize = sizeof(cfix);
cfix.nFont = 12;
cfix.dwFontSize.X = 8;
cfix.dwFontSize.Y = 14;
cfix.FontFamily = FF_DONTCARE;
cfix.FontWeight = 400; // normal weight
lstrcpy(cfix.FaceName, TEXT("Lucida Console"));
pfSCCFX(hConsole,
FALSE, /* set font for current window size */
&cfix);
}
else
{
// There is no supported function on these older versions,
// so we have to call the undocumented one.
typedef BOOL (WINAPI * pfSetConsoleFont)(HANDLE, DWORD);
HMODULE hMod = GetModuleHandle(TEXT("kernel32.dll"));
pfSetConsoleFont pfSCF = (pfSetConsoleFont)GetProcAddress(hMod, "SetConsoleFont");
pfSCF(hConsole, 12);
}
}
Notice that I've left adding the required error checking as an exercise for the reader. The focus here is on technique and readability; cluttering it up with error handling would just confuse matters.
I have no idea how to do any of this on Linux. I suspect it's a lot less work, since people tell me the OS uses UTF-8 internally. Either way, you're on your own for that; making Windows purr is enough work for one answer!
I've just needed to output Unicode text to the console and only this function WriteConsoleW(GetStdHandle(STD_OUTPUT_HANDLE), ...);
helped. For input I assume ReadConsoleW(GetStdHandle(STD_INPUT_HANDLE), ...);
does the trick.
PS: WriteOutput
has a limit in the output string size. So you might want to iterate it in chunks if it's longer.