Let JSON object accept bytes or let urlopen output strings

HTTP sends bytes. If the resource in question is text, the character encoding is normally specified, either by the Content-Type HTTP header or by another mechanism (an RFC, HTML meta http-equiv,...).

urllib should know how to encode the bytes to a string, but it's too naïve—it's a horribly underpowered and un-Pythonic library.

Dive Into Python 3 provides an overview about the situation.

Your "work-around" is fine—although it feels wrong, it's the correct way to do it.


I have come to opinion that the question is the best answer :)

import json
from urllib.request import urlopen

response = urlopen("site.com/api/foo/bar").read().decode('utf8')
obj = json.loads(response)

Python’s wonderful standard library to the rescue…

import codecs

reader = codecs.getreader("utf-8")
obj = json.load(reader(response))

Works with both py2 and py3.

Docs: Python 2, Python3