Python - 'ascii' codec can't decode byte
"你好".encode('utf-8')
encode
converts a unicode object to a string
object. But here you have invoked it on a string
object (because you don't have the u). So python has to convert the string
to a unicode
object first. So it does the equivalent of
"你好".decode().encode('utf-8')
But the decode fails because the string isn't valid ascii. That's why you get a complaint about not being able to decode.
Always encode from unicode to bytes.
In this direction, you get to choose the encoding.
>>> u"你好".encode("utf8")
'\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print _
你好
The other way is to decode from bytes to unicode.
In this direction, you have to know what the encoding is.
>>> bytes = '\xe4\xbd\xa0\xe5\xa5\xbd'
>>> print bytes
你好
>>> bytes.decode('utf-8')
u'\u4f60\u597d'
>>> print _
你好
This point can't be stressed enough. If you want to avoid playing unicode "whack-a-mole", it's important to understand what's happening at the data level. Here it is explained another way:
- A unicode object is decoded already, you never want to call
decode
on it. - A bytestring object is encoded already, you never want to call
encode
on it.
Now, on seeing .encode
on a byte string, Python 2 first tries to implicitly convert it to text (a unicode
object). Similarly, on seeing .decode
on a unicode string, Python 2 implicitly tries to convert it to bytes (a str
object).
These implicit conversions are why you can get Unicode
Decode
Error
when you've called encode
. It's because encoding usually accepts a parameter of type unicode
; when receiving a str
parameter, there's an implicit decoding into an object of type unicode
before re-encoding it with another encoding. This conversion chooses a default 'ascii' decoder†, giving you the decoding error inside an encoder.
In fact, in Python 3 the methods str.decode
and bytes.encode
don't even exist. Their removal was a [controversial] attempt to avoid this common confusion.
† ...or whatever coding sys.getdefaultencoding()
mentions; usually this is 'ascii'
You can try this
import sys
reload(sys)
sys.setdefaultencoding("utf-8")
Or
You can also try following
Add following line at top of your .py file.
# -*- coding: utf-8 -*-