Django loaddata - Out of Memory
loaddata
is generally use for fixtures, i.e. a small number of database objects to get your system started and for tests rather than for large chunks of data. If you're hitting memory limits then you're probably not using it for the right purpose.
If you still have the original database, you should use something more suited to the purpose, like PostgreSQL's pg_dump
or MySQL's mysqldump
.
As Joe pointed out, PostgreSQL's pg_dump or MySQL's mysqldump is more suited in your case.
In case you have lost your original database, there are 2 ways you could try to get your data back:
One: Find another machine, that have more memory and can access to your database. Build your project on that machine, and run the loaddata command on that machine.
I know it sounds silly. But it is the quickest way if your can run django on your laptop and can connect to the db remotely.
Two: Hack the Django source code.
Check the code in django.core.erializers.json.py:
def Deserializer(stream_or_string, **options):
"""
Deserialize a stream or string of JSON data.
"""
if not isinstance(stream_or_string, (bytes, six.string_types)):
stream_or_string = stream_or_string.read()
if isinstance(stream_or_string, bytes):
stream_or_string = stream_or_string.decode('utf-8')
try:
objects = json.loads(stream_or_string)
for obj in PythonDeserializer(objects, **options):
yield obj
except GeneratorExit:
raise
except Exception as e:
# Map to deserializer error
six.reraise(DeserializationError, DeserializationError(e), sys.exc_info()[2])
The code below is the problem. The json
module in the stdlib only accepts string, and cant not handle stream lazily. So django load all the content of a json file into the memory.
stream_or_string = stream_or_string.read()
objects = json.loads(stream_or_string)
You could optimize those code with py-yajl. py-yajl creates an alternative to the built in json.loads and json.dumps using yajl.
I'd like to add that I was quite successful in a similar use-case with ijson: https://github.com/isagalaev/ijson
In order to get an iterator over the objects in a json file from django dumpdata, I modified the json Deserializer like this (imports elided):
Serializer = django.core.serializers.json.Serializer
def Deserializer(stream_or_string, **options):
if isinstance(stream_or_string, six.string_types):
stream_or_string = six.BytesIO(stream_or_string.encode('utf-8'))
try:
objects = ijson.items(stream_or_string, 'item')
for obj in PythonDeserializer(objects, **options):
yield obj
except GeneratorExit:
raise
except Exception as e:
# Map to deserializer error
six.reraise(DeserializationError, DeserializationError(e), sys.exc_info()[2])
The problem with using py-yajl as-is is that you still get all the objects in one large array, which uses a lot of memory. This loop only uses as much memory as a single serialized Django object. Also ijson can still use yajl as a backend.