Proper way to bulk_create for ManyToMany field, Django?
As shown in Du D's answer, Django ManyToMany fields use a table called through
that contains three columns: the ID of the relation, the ID of the object linked to and the ID of the object linked from. You can use bulk_create
on through
to bulk create ManyToMany relations.
As a quick example, you could bulk create Tag to Photo relations like this:
tag1 = Tag.objects.get(id=1)
tag2 = Tag.objects.get(id=2)
photo1 = Photo.objects.get(id=1)
photo2 = Photo.objects.get(id=2)
through_objs = [
Tag.photos.through(
photo_id=photo1.id,
tag_id=tag1.id,
),
Tag.photos.through(
photo_id=photo1.id,
tag_id=tag2.id,
),
Tag.photos.through(
photo_id=photo2.id,
tag_id=tag2.id,
),
]
Tag.photos.through.objects.bulk_create(through_objs)
General solution
Here is a general solution that you can run to set up ManyToMany relations between any list of object pairs.
from typing import Iterable
from collections import namedtuple
ManyToManySpec = namedtuple(
"ManyToManySpec", ["from_object", "to_object"]
)
def bulk_create_manytomany_relations(
model_from,
field_name: str,
model_from_name: str,
model_to_name: str,
specs: Iterable[ManyToManySpec]
):
through_objs = []
for spec in specs:
through_objs.append(
getattr(model_from, field_name).through(
**{
f"{model_from_name.lower()}_id": spec.from_object.id,
f"{model_to_name.lower()}_id": spec.to_object.id,
}
)
)
getattr(model_from, field_name).through.objects.bulk_create(through_objs)
Example usage
tag1 = Tag.objects.get(id=1)
tag2 = Tag.objects.get(id=2)
photo1 = Photo.objects.get(id=1)
photo2 = Photo.objects.get(id=2)
bulk_create_manytomany_relations(
model_from=Tag,
field_name="photos",
model_from_name="tag",
model_to_name="photo",
specs=[
ManyToManySpec(from_object=tag1, to_object=photo1),
ManyToManySpec(from_object=tag1, to_object=photo2),
ManyToManySpec(from_object=tag2, to_object=photo2),
]
)
TL;DR Use the "through" model to bulk insert m2m relationships.
"Tag.photos.through" => Django generated Model with 3 fields [ id, photo, tag ]
photo_tag_1 = Tag.photos.through(photo_id=1, tag_id=1)
photo_tag_2 = Tag.photos.through(photo_id=1, tag_id=2)
Tag.photos.through.objects.bulk_insert([photo_tag_1, photo_tag_2, ...])
This is the fastest way that I know of, I use this all the time to create test data. I can generate millions of records in minutes.
Edit from Georgy:
def add_tags(count):
Tag.objects.bulk_create([Tag(tag='tag%s' % t) for t in range(count)])
tag_ids = list(Tag.objects.values_list('id', flat=True))
photo_ids = Photo.objects.values_list('id', flat=True)
tag_count = len(tag_ids)
for photo_id in photo_ids:
tag_to_photo_links = []
shuffle(tag_ids)
rand_num_tags = randint(0, tag_count)
photo_tags = tag_ids[:rand_num_tags]
for tag_id in photo_tags:
# through is the model generated by django to link m2m between tag and photo
photo_tag = Tag.photos.through(tag_id=tag_id, photo_id=photo_id)
tag_to_photo_links.append(photo_tag)
Tag.photos.through.objects.bulk_create(tag_to_photo_links, batch_size=7000)
I didn't create the model to test, but the structure is there you might have to tweaks some stuff to make it work. Let me know if you run into any problems.
[edited]