Django Import - Export: IntegrittyError when trying to insert duplicate record in field(s) with unique or unique_together constraints
Only one Change is need. And you can use django-import-export
models.py
class Compositions(models.Model):
composer_key = models.ForeignKey(
Composer,
)
composition = models.CharField(
max_length=383,
unique=False
)
date_created = models.DateTimeField(default=timezone.now)
class Meta(object):
unique_together = (('composer_key','composition'),)
override save_instance with try. And ignore error when fail. admin.py
class CompositionsResource(resources.ModelResource):
class Meta:
model = Compositions
skip_unchanged = True
report_skipped = True
def save_instance(self, instance, using_transactions=True, dry_run=False):
try:
super(CompositionsResource, self).save_instance(instance, using_transactions, dry_run)
except IntegrityError:
pass
class CompositionsAdmin(ImportExportModelAdmin):
resource_class = CompositionsResource
admin.site.register(Compositions, CompositionsAdmin)
and import this
from django.db import IntegrityError
A note on the accepted answer: it will give the desired result, but will slam the disk usage and time with large files.
A more efficient approach I've been using (after spending a lot of time going through the docs) is to override skip_row
, and use a set of tuples as a unique constraint as part of the class. I still override save_instance
as the other answer suggests to handle IntegrityErrors that get through, of course.
Python sets
don't create duplicate entries, so they seem appropriate for this kind of unique index.
class CompositionsResource(resources.ModelResource):
set_unique = set()
class Meta:
model = Composers
skip_unchanged = True
report_skipped = True
def before_import(self, dataset, using_transactions, dry_run, **kwargs):
# Clear out anything that may be there from a dry_run,
# such as the admin mixin preview
self.set_unique = set()
def skip_row(self, instance, original):
composer_key = instance.composer_key # Could also use composer_key_id
composition = instance.composition
tuple_unique = (composer_key, composition)
if tuple_unique in self.set_unique:
return true
else:
self.set_unique.add(tuple_unique)
return super(CompositionsResource, self).skip_row(instance, original)
# save_instance override should still go here to pass on IntegrityError
This approach will at least cut down on duplicates encountered within the same dataset. I used it to deal with multiple flat files that were ~60000 lines each, but had lots of repetitive/nested foreign keys. This made that initial data import way faster.