Does Process.StartInfo.Arguments support a UTF-8 string?

It completely depends on the program you are trying to start. The Process class fully supports Unicode, as does the operating system. But the program might be old and use 8-bit characters. It will use GetCommandLineA() to retrieve the command line arguments, the ANSI version of the native Unicode GetCommandLineW() API function. And that translates the Unicode string to 8-bit chars using the system default code page as configured in Control Panel + Regional and Language Options, Language for Non-Unicode Programs. WideCharToMultiByte() using CP_ACP.

If that is not the Japanese code page, that translation produces question marks since the Japanese glyphs only have a code in the Japanese code page. Switching the system code page isn't usually very desirable for non-Japanese speakers. Utf8 certainly won't work, the program isn't going to expect them. Consider running this program in a virtual machine.


Programs receive their command lines in UTF-16, the same encoding as .NET strings:

Arguments = "/U /K \"echo これはテストです> output.txt\"";

It is the console window that cannot display characters outside of it's current codepage/selected font. However, I am assuming that you don't want to call echo, so this depends entirely on how the program you are calling is written.

Some background info: C or C++ programs that use the 'narrow' (system code page) entry points, eg main(int argc, char** argv), rather than the 'wide' (UTF-16) entry points, wmain(int argc, wchar_t** argv), are called by a stub that converts the commandline to the system codepage - which cannot be UTF-8.

By far the best option is to change the program to use a wide entrypoint, and simply get the same UTF-16 as you had in your .NET string. If that is not possible, then one trick you could try is to pass it a UTF-16 commandline that when converted to the system codepage is UTF-8 for the characters you want it to use:

Arguments = Encoding.Default.GetString(Encoding.UTF8.GetBytes(args));

Caveat Coder: Don't be surprised if this goes horribly wrong on your or someone else's machine, it depends on every possible byte being valid in the current system codepage, the system codepage not being different from when your program was started, the program you are running not using the data to any encoding dependent Windows function (those with A, W suffixed versions), and so on.

Tags:

C#

Utf 8