How to print UTF-8 encoded text to the console in Python < 3?
This is how I do it:
#!/usr/bin/python2.7 -S
import sys
sys.setdefaultencoding("utf-8")
import site
Note the -S
in the bangline. That tells Python to not automatically import the site
module. The site
module is what sets the default encoding and the removes the method so it can't be set again. But will honor what is already set.
How to print UTF-8 encoded text to the console in Python < 3?
print u"some unicode text \N{EURO SIGN}"
print b"some utf-8 encoded bytestring \xe2\x82\xac".decode('utf-8')
i.e., if you have a Unicode string then print it directly. If you have a bytestring then convert it to Unicode first.
Your locale settings (LANG
, LC_CTYPE
) indicate a utf-8 locale and
therefore (in theory) you could print a utf-8 bytestring directly and it
should be displayed correctly in your terminal (if terminal settings
are consistent with the locale settings and they should be) but you
should avoid it: do not hardcode the character encoding of your
environment inside your script; print Unicode directly instead.
There are many wrong assumptions in your question.
You do not need to set PYTHONIOENCODING
with your locale settings,
to print Unicode to the terminal. utf-8 locale supports all Unicode characters i.e., it works as is.
You do not need the workaround sys.stdout =
codecs.getwriter(locale.getpreferredencoding())(sys.stdout)
. It may
break if some code (that you do not control) does need to print bytes
and/or it may break while
printing Unicode to Windows console (wrong codepage, can't print undecodable characters). Correct locale settings and/or PYTHONIOENCODING
envvar are enough. Also, if you need to replace sys.stdout
then use io.TextIOWrapper()
instead of codecs
module like win-unicode-console
package does.
sys.getdefaultencoding()
is unrelated to your locale settings and to
PYTHONIOENCODING
. Your assumption that setting PYTHONIOENCODING
should change sys.getdefaultencoding()
is incorrect. You should
check sys.stdout.encoding
instead.
sys.getdefaultencoding()
is not used when you print to the
console. It may be used as a fallback on Python 2 if stdout is
redirected to a file/pipe unless PYTHOHIOENCODING
is set:
$ python2 -c'import sys; print(sys.stdout.encoding)'
UTF-8
$ python2 -c'import sys; print(sys.stdout.encoding)' | cat
None
$ PYTHONIOENCODING=utf8 python2 -c'import sys; print(sys.stdout.encoding)' | cat
utf8
Do not call sys.setdefaultencoding("UTF-8")
; it may corrupt your
data silently and/or break 3rd-party modules that do not expect
it. Remember sys.getdefaultencoding()
is used to convert bytestrings
(str
) to/from unicode
in Python 2 implicitly e.g., "a" + u"b"
. See also,
the quote in @mesilliac's answer.
It seems accomplishing this is not recommended.
Fedora suggested using the system locale as the default, but apparently this breaks other things.
Here's a quote from the mailing-list discussion:
The only supported default encodings in Python are: Python 2.x: ASCII Python 3.x: UTF-8 If you change these, you are on your own and strange things will start to happen. The default encoding does not only affect the translation between Python and the outside world, but also all internal conversions between 8-bit strings and Unicode. Hacks like what's happening in the pango module (setting the default encoding to 'utf-8' by reloading the site module in order to get the sys.setdefaultencoding() API back) are just downright wrong and will cause serious problems since Unicode objects cache their default encoded representation. Please don't enable the use of a locale based default encoding. If all you want to achieve is getting the encodings of stdout and stdin correctly setup for pipes, you should instead change the .encoding attribute of those (only). -- Marc-Andre Lemburg eGenix.com
If the program does not display the appropriate characters on the screen, i.e., invalid symbol, run the program with the following command line:
PYTHONIOENCODING=utf8 python3 yourprogram.py
Or the following, if your program is a globally installed module:
PYTHONIOENCODING=utf8 yourprogram
On some platforms as Cygwin (mintty.exe
terminal) with Anaconda Python
(or Python 3
), simply run export PYTHONIOENCODING=utf8
and
later run the program does not work,
and you are required to always do every time PYTHONIOENCODING=utf8 yourprogram
to run the program correctly.
On Linux, in case of sudo
, you can try to do pass the -E
argument to export the user variables to the sudo process:
export PYTHONIOENCODING=utf8
sudo -E python yourprogram.py
If you try this and it did no work, you will need to enter on a sudo shell:
sudo /bin/bash
PYTHONIOENCODING=utf8 yourprogram
Related:
- How to print UTF-8 encoded text to the console in Python < 3?
- Changing default encoding of Python?
- Forcing UTF-8 over cp1252 (Python3)
- Permanently set Python path for Anaconda within Cygwin
- https://superuser.com/questions/1374339/what-does-the-e-in-sudo-e-do
- Why bash -c 'var=5 printf "$var"' does not print 5?
- https://unix.stackexchange.com/questions/296838/whats-the-difference-between-eval-and-exec