How to read JSON from socket in python? (Incremental parsing of JSON)

What you want(ed) is ijson, an incremental json parser. It is available here: https://pypi.python.org/pypi/ijson/ . The usage should be simple as (copying from that page):

import ijson.backends.python as ijson

for item in ijson.items(file_obj):
    # ...

(for those who prefer something self-contained - in the sense that it relies only on the standard library: I wrote yesterday a small wrapper around json - but just because I didn't know about ijson. It is probably much less efficient.)

EDIT: since I found out that in fact (a cythonized version of) my approach was much more efficient than ijson, I have packaged it as an independent library - see here also for some rough benchmarks: http://pietrobattiston.it/jsaone


Edit: given that you aren't defining the protocol, this isn't useful, but it might be useful in other contexts.


Assuming it's a stream (TCP) socket, you need to implement your own message framing mechanism (or use an existing higher level protocol that does so). One straightforward way is to define each message as a 32-bit integer length field, followed by that many bytes of data.

Sender: take the length of the JSON packet, pack it into 4 bytes with the struct module, send it on the socket, then send the JSON packet.

Receiver: Repeatedly read from the socket until you have at least 4 bytes of data, use struct.unpack to unpack the length. Read from the socket until you have at least that much data and that's your JSON packet; anything left over is the length for the next message.

If at some point you're going to want to send messages that consist of something other than JSON over the same socket, you may want to send a message type code between the length and the data payload; congratulations, you've invented yet another protocol.

Another, slightly more standard, method is DJB's Netstrings protocol; it's very similar to the system proposed above, but with text-encoded lengths instead of binary; it's directly supported by frameworks such as Twisted.


If you're getting the JSON from an HTTP stream, use the Content-Length header to get the length of the JSON data. For example:

import httplib
import json

h = httplib.HTTPConnection('graph.facebook.com')
h.request('GET', '/19292868552')
response = h.getresponse()
content_length = int(response.getheader('Content-Length','0'))

# Read data until we've read Content-Length bytes or the socket is closed
data = ''
while len(data) < content_length or content_length == 0:
    s = response.read(content_length - len(data))
    if not s:
        break
    data += s

# We now have the full data -- decode it
j = json.loads(data)
print j