How to JSON serialize sets?
You can create a custom encoder that returns a list
when it encounters a set
. Here's an example:
>>> import json
>>> class SetEncoder(json.JSONEncoder):
... def default(self, obj):
... if isinstance(obj, set):
... return list(obj)
... return json.JSONEncoder.default(self, obj)
...
>>> json.dumps(set([1,2,3,4,5]), cls=SetEncoder)
'[1, 2, 3, 4, 5]'
You can detect other types this way too. If you need to retain that the list was actually a set, you could use a custom encoding. Something like return {'type':'set', 'list':list(obj)}
might work.
To illustrated nested types, consider serializing this:
>>> class Something(object):
... pass
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
This raises the following error:
TypeError: <__main__.Something object at 0x1691c50> is not JSON serializable
This indicates that the encoder will take the list
result returned and recursively call the serializer on its children. To add a custom serializer for multiple types, you can do this:
>>> class SetEncoder(json.JSONEncoder):
... def default(self, obj):
... if isinstance(obj, set):
... return list(obj)
... if isinstance(obj, Something):
... return 'CustomSomethingRepresentation'
... return json.JSONEncoder.default(self, obj)
...
>>> json.dumps(set([1,2,3,4,5,Something()]), cls=SetEncoder)
'[1, 2, 3, 4, 5, "CustomSomethingRepresentation"]'
JSON notation has only a handful of native datatypes (objects, arrays, strings, numbers, booleans, and null), so anything serialized in JSON needs to be expressed as one of these types.
As shown in the json module docs, this conversion can be done automatically by a JSONEncoder and JSONDecoder, but then you would be giving up some other structure you might need (if you convert sets to a list, then you lose the ability to recover regular lists; if you convert sets to a dictionary using dict.fromkeys(s)
then you lose the ability to recover dictionaries).
A more sophisticated solution is to build-out a custom type that can coexist with other native JSON types. This lets you store nested structures that include lists, sets, dicts, decimals, datetime objects, etc.:
from json import dumps, loads, JSONEncoder, JSONDecoder
import pickle
class PythonObjectEncoder(JSONEncoder):
def default(self, obj):
try:
return {'_python_object': pickle.dumps(obj).decode('latin-1')}
except pickle.PickleError:
return super().default(obj)
def as_python_object(dct):
if '_python_object' in dct:
return pickle.loads(dct['_python_object'].encode('latin-1'))
return dct
Here is a sample session showing that it can handle lists, dicts, and sets:
>>> data = [1,2,3, set(['knights', 'who', 'say', 'ni']), {'key':'value'}, Decimal('3.14')]
>>> j = dumps(data, cls=PythonObjectEncoder)
>>> loads(j, object_hook=as_python_object)
[1, 2, 3, set(['knights', 'say', 'who', 'ni']), {'key': 'value'}, Decimal('3.14')]
Alternatively, it may be useful to use a more general purpose serialization technique such as YAML, Twisted Jelly, or Python's pickle module. These each support a much greater range of datatypes.
You don't need to make a custom encoder class to supply the default
method - it can be passed in as a keyword argument:
import json
def serialize_sets(obj):
if isinstance(obj, set):
return list(obj)
return obj
json_str = json.dumps(set([1,2,3]), default=serialize_sets)
print(json_str)
results in [1, 2, 3]
in all supported Python versions.