UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)

The actual best answer for this problem depends on your environment, specifically what encoding your terminal expects.

The quickest one-line solution is to encode everything you print to ASCII, which your terminal is almost certain to accept, while discarding characters that you cannot print:

print ch #fails
print ch.encode('ascii', 'ignore')

The better solution is to change your terminal's encoding to utf-8, and encode everything as utf-8 before printing. You should get in the habit of thinking about your unicode encoding EVERY time you print or read a string.

Just putting .encode('utf-8') at the end of object will do the job in recent versions of Python.

It seems you are hitting a UTF-8 byte order mark (BOM). Try using this unicode string with BOM extracted out:

import codecs

content = unicode(q.content.strip(codecs.BOM_UTF8), 'utf-8')
parser.parse(StringIO.StringIO(content))

I used strip instead of lstrip because in your case you had multiple occurences of BOM, possibly due to concatenated file contents.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xef' in position 0: ordinal not in range(128)

Tags:

Python

Xml Parsing

Google App Engine

Related

Recent Posts