Let JSON object accept bytes or let urlopen output strings
HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv
,...).
urllib
should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.
Dive Into Python 3 provides an overview about the situation.
Your "work-around" is fine—although it feels wrong, it's the correct way to do it.
I have come to opinion that the question is the best answer :)
import json
from urllib.request import urlopen
response = urlopen("site.com/api/foo/bar").read().decode('utf8')
obj = json.loads(response)
Python’s wonderful standard library to the rescue…
import codecs
reader = codecs.getreader("utf-8")
obj = json.load(reader(response))
Works with both py2 and py3.
Docs: Python 2, Python3