Pythonic way to ensure unicode in python 2 and 3
Using six.text_type
should suffice virtually always, just like the accepted answer says.
On a side note, and FYI, you could get yourself into trouble in Python 3 if you somehow feed a bytes
instance to it, (although this should be really hard to do).
CONTEXT
six.text_type
is basically an alias for str
in Python 3:
>>> import six
>>> six.text_type
<class 'str'>
Surprisingly, using str
to cast bytes
instances gives somewhat unexpected results:
>>> six.text_type(b'bytestring')
"b'bytestring'"
Notice how our string just got mangled? Straight from str
's docs:
Passing a
bytes
object tostr()
without the encoding or errors arguments falls under the first case of returning the informal string representation.
That is, str(...)
will actually call the object's __str__
method, unless you pass an encoding
:
>>> b'bytestring'.__str__()
"b'bytestring'"
>>> six.text_type(b'bytestring', encoding='utf-8')
'bytestring'
Sadly, if you do pass an encoding
, "casting" regular str
instances will no longer work:
>>> six.text_type('string', encoding='utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: decoding str is not supported
On a somewhat related note, casting None
values can be troublesome as well:
>>> six.text_type(None)
'None'
You'll end up with a 'None'
string, literally. Probably not what you wanted.
ALTERNATIVES
Just use six.text_type. Really. There's nothing to worry about unless you interact with
bytes
on purpose. Make sure to check forNone
s before casting though.Use Django's
force_text
. Safest way out of this madness if you happen to be working on a project that's already using Django 1.x.x.Copy-paste Django's
force_text
to your project. Here's a sample implementation.
For either of the Django alternatives, keep in mind that force_text
allows you to specify strings_only=True
to neatly preserve None
values:
>>> force_text(None)
'None'
>>> type(force_text(None))
<class 'str'>
>>> force_text(None, strings_only=True)
>>> type(force_text(None, strings_only=True))
<class 'NoneType'>
Be careful, though, as it won't cast several other primitive types as well:
>>> force_text(100)
'100'
>>> force_text(100, strings_only=True)
100
>>> force_text(True)
'True'
>>> force_text(True, strings_only=True)
True
Don't re-invent the compatibility layer wheel. Use the six
compatibility layer, a small one-file project that can be included with your own:
Six supports every Python version since 2.6. It is contained in only one Python file, so it can be easily copied into your project. (The copyright and license notice must be retained.)
It includes a six.text_type()
callable that does exactly this, convert a value to Unicode text:
import six
unicode_x = six.text_type(x)
In the project source code this is defined as:
import sys
PY2 = sys.version_info[0] == 2
PY3 = sys.version_info[0] == 3
# ...
if PY3:
# ...
text_type = str
# ...
else:
# ...
text_type = unicode
# ...