How to add column in ManyToMany Table (Django)

It's very easy using django too! You can use through to define your own manytomany intermediary tables

Documentation provides an example addressing your issue:

Extra fields on many-to-many relationships

class Person(models.Model):
    name = models.CharField(max_length=128)

    def __unicode__(self):
        return self.name

class Group(models.Model):
    name = models.CharField(max_length=128)
    members = models.ManyToManyField(Person, through='Membership')

    def __unicode__(self):
        return self.name

class Membership(models.Model):
    person = models.ForeignKey(Person, on_delete=models.CASCADE)
    group = models.ForeignKey(Group, on_delete=models.CASCADE)
    date_joined = models.DateField()
    invite_reason = models.CharField(max_length=64)

As @dm03514 has answered it is indeed very easy to add column to M2M table via defining explicitly the M2M through model and adding the desired field there.

However if you would like to add some column to all m2m tables - such approach wouldn't be sufficient, because it would require to define the M2M through models for all ManyToManyField's that have been defined across the project.

In my case I wanted to add a "created" timestamp column to all M2M tables that Django generates "under the hood" without the necessity of defining a separate model for every ManyToManyField field used in the project. I came up with a neat solution presented bellow. Cheers!

Introduction

While Django scans your models at startup it creates automatically an implicit through model for every ManyToManyField that does not define it explicitly.

class ManyToManyField(RelatedField):
    # (...)

    def contribute_to_class(self, cls, name, **kwargs):
        # (...)
        super().contribute_to_class(cls, name, **kwargs)

        # The intermediate m2m model is not auto created if:
        #  1) There is a manually specified intermediate, or
        #  2) The class owning the m2m field is abstract.
        #  3) The class owning the m2m field has been swapped out.
        if not cls._meta.abstract:
            if self.remote_field.through:
                def resolve_through_model(_, model, field):
                    field.remote_field.through = model
                lazy_related_operation(resolve_through_model, cls, self.remote_field.through, field=self)
            elif not cls._meta.swapped:
                self.remote_field.through = create_many_to_many_intermediary_model(self, cls)

Source: ManyToManyField.contribute_to_class()

For creation of this implicit model Django uses the create_many_to_many_intermediary_model() function, which constructs new class that inherits from models.Model and contains foreign keys to both sides of the M2M relation. Source: django.db.models.fields.related.create_many_to_many_intermediary_model()

In order to add some column to all auto generated M2M through tables you will need to monkeypatch this function.

The solution

First you should create the new version of the function that will be used to patch the original Django function. To do so just copy the code of the function from Django sources and add the desired fields to the class it returns:

# For example in: <project_root>/lib/monkeypatching/custom_create_m2m_model.py
def create_many_to_many_intermediary_model(field, klass):
    # (...)
    return type(name, (models.Model,), {
        'Meta': meta,
        '__module__': klass.__module__,
        from_: models.ForeignKey(
            klass,
            related_name='%s+' % name,
            db_tablespace=field.db_tablespace,
            db_constraint=field.remote_field.db_constraint,
            on_delete=CASCADE,
        ),
        to: models.ForeignKey(
            to_model,
            related_name='%s+' % name,
            db_tablespace=field.db_tablespace,
            db_constraint=field.remote_field.db_constraint,
            on_delete=CASCADE,
        ),
        # Add your custom-need fields here:
        'created': models.DateTimeField(
            auto_now_add=True,
            verbose_name='Created (UTC)',
        ),
    })

Then you should enclose the patching logic in a separate function:

# For example in: <project_root>/lib/monkeypatching/patches.py
def django_m2m_intermediary_model_monkeypatch():
    """ We monkey patch function responsible for creation of intermediary m2m
        models in order to inject there a "created" timestamp.
    """
    from django.db.models.fields import related
    from lib.monkeypatching.custom_create_m2m_model import create_many_to_many_intermediary_model
    setattr(
        related,
        'create_many_to_many_intermediary_model',
        create_many_to_many_intermediary_model
    )

Finally you have to perform patching, before Django kicks in. Put such code in __init__.py file located next to your Django project settings.py file:

# <project_root>/<project_name>/__init__.py
from lib.monkeypatching.patches import django_m2m_intermediary_model_monkeypatch
django_m2m_intermediary_model_monkeypatch()

Few other things worth mentioning

  1. Remember that this does not affect m2m tables that have been created in the db in the past, so if you are introducing this solution in a project that already had ManyToManyField fields migrated to db, you will need to prepare a custom migration that will add your custom columns to the tables which were created before the monkeypatch. Sample migration provided below :)

    from django.db import migrations
    
    def auto_created_m2m_fields(_models):
        """ Retrieves M2M fields from provided models but only those that have auto
            created intermediary models (not user-defined through models).
        """
        for model in _models:
            for field in model._meta.get_fields():
                if (
                        isinstance(field, models.ManyToManyField)
                        and field.remote_field.through._meta.auto_created
                ):
                    yield field
    
    def add_created_to_m2m_tables(apps, schema_editor):
        # Exclude proxy models that don't have separate tables in db
        selected_models = [
            model for model in apps.get_models()
            if not model._meta.proxy
        ]
    
        # Select only m2m fields that have auto created intermediary models and then
        # retrieve m2m intermediary db tables
        tables = [
            field.remote_field.through._meta.db_table
            for field in auto_created_m2m_fields(selected_models)
        ]
    
        for table_name in tables:
            schema_editor.execute(
                f'ALTER TABLE {table_name} ADD COLUMN IF NOT EXISTS created '
                'timestamp with time zone NOT NULL DEFAULT now()',
            )
    
    
    class Migration(migrations.Migration):
        dependencies = []
        operations = [migrations.RunPython(add_created_to_m2m_tables)]
    
  2. Remember that the solution presented only affects the tables that Django creates automatically for ManyToManyField fields that do not define the through model. If you already have some explicit m2m through models you will need to add your custom-need columns there manually.

  3. The patched create_many_to_many_intermediary_model function will apply also to the models of all 3rd-party apps listed in your INSTALLED_APPS setting.

  4. Last but not least, remember that if you upgrade Django version the original source code of the patched function may change (!). It's a good idea to setup a simple unit test that will warn you if such situation happens in the future.

To do so modify the patching function to save the original Django function:

# For example in: <project_root>/lib/monkeypatching/patches.py
def django_m2m_intermediary_model_monkeypatch():
    """ We monkey patch function responsible for creation of intermediary m2m
        models in order to inject there a "created" timestamp.
    """
    from django.db.models.fields import related
    from lib.monkeypatching.custom_create_m2m_model import create_many_to_many_intermediary_model
    # Save the original Django function for test
    original_function = related.create_many_to_many_intermediary_model
    setattr(
        create_many_to_many_intermediary_model,
        '_original_django_function',
        original_function
    )
    # Patch django function with our version of this function
    setattr(
        related,
        'create_many_to_many_intermediary_model',
        create_many_to_many_intermediary_model
    )

Compute the hash of the source code of the original Django function and prepare a test that checks whether it is still the same as when you patched it:

def _hash_source_code(_obj):
    from inspect import getsourcelines
    from hashlib import md5
    source_code = ''.join(getsourcelines(_obj)[0])
    return md5(source_code.encode()).hexdigest()

def test_original_create_many_to_many_intermediary_model():
    """ This test checks whether the original Django function that has been
        patched did not changed. The hash of function source code is compared
        and if it does not match original hash, that means that Django version
        could have been upgraded and patched function could have changed.
    """
    from django.db.models.fields.related import create_many_to_many_intermediary_model
    original_function_md5_hash = '69d8cea3ce9640f64ce7b1df1c0934b8' # hash obtained before patching (Django 2.0.3)
    original_function = getattr(
        create_many_to_many_intermediary_model,
        '_original_django_function',
        None
    )
    assert original_function
    assert _hash_source_code(original_function) == original_function_md5_hash

Cheers

I hope someone will find this answer useful :)