Python: Which encoding is used for processing sys.argv?

I'm guessing that you are asking this because you ran into issue 2128. Note that this has been fixed in Python 3.0.


I don't know if this helps or not but this is what I get in DOS mode:

C:\Python27>python Lib\codingtest.py нер
['Lib\\codingtest.py', '\xed\xe5\xf0']

C:\Python27>python Lib\codingtest.py hello
['Lib\\codingtest.py', 'hello']

In IDLE:

>>> print "hello"
hello
>>> "hello"
'hello'
>>> "привет"
'\xef\xf0\xe8\xe2\xe5\xf2'
>>> print "привет"
привет
>>> sys.getdefaultencoding()
'ascii'
>>> 

What can we deduce from this? I don't know yet... I'll comment in a little bit.

A little bit later: sys.argv is encoded with sys.stdin.encoding and not sys.getdefaultencoding()


A few observations:

(1) It's certainly not sys.getdefaultencoding.

(2) sys.stdin.encoding appears to be a much better bet.

(3) On Windows, the actual value of sys.stdin.encoding will vary, depending on what software is providing the stdio. IDLE will use the system "ANSI" code page, e.g. cp1252 in most of Western Europe and America and former colonies thereof. However in the Command Prompt window, which emulates MS-DOS more or less, the corresponding old DOS code page (e.g. cp850) will be used by default. This can be changed by using the CHCP (change code page) command.

(4) The documentation for the subprocess module doesn't provide any suggestions on what encoding to use for args and stdout.

(5) One trusts that assert sys.stdin.encoding == sys.stdout.encoding never fails.