UTF-8 In Python logging, how?
Having code like:
raise Exception(u'щ')
Caused:
File "/usr/lib/python2.7/logging/__init__.py", line 467, in format
s = self._fmt % record.__dict__
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)
This happens because the format string is a byte string, while some of the format string arguments are unicode strings with non-ASCII characters:
>>> "%(message)s" % {'message': Exception(u'\u0449')}
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\u0449' in position 0: ordinal not in range(128)
Making the format string unicode fixes the issue:
>>> u"%(message)s" % {'message': Exception(u'\u0449')}
u'\u0449'
So, in your logging configuration make all format string unicode:
'formatters': {
'simple': {
'format': u'%(asctime)-s %(levelname)s [%(name)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
},
...
And patch the default logging
formatter to use unicode format string:
logging._defaultFormatter = logging.Formatter(u"%(message)s")
Check that you have the latest Python 2.6 - some Unicode bugs were found and fixed since 2.6 came out. For example, on my Ubuntu Jaunty system, I ran your script copied and pasted, removing only the '/home/ted/' prefix from the log file name. Result (copied and pasted from a terminal window):
vinay@eta-jaunty:~/projects/scratch$ python --version Python 2.6.2 vinay@eta-jaunty:~/projects/scratch$ python utest.py printed unicode object: ô vinay@eta-jaunty:~/projects/scratch$ cat logfile.txt ô vinay@eta-jaunty:~/projects/scratch$
On a Windows box:
C:\temp>python --version Python 2.6.2 C:\temp>python utest.py printed unicode object: ô
And the contents of the file:
This might also explain why Lennart Regebro couldn't reproduce it either.