Django Left Outer Join

What I essentially want is this

select * from bar_topic
left join (select topic_id as tid, value from bar_record where user_id = 1)
on tid = bar_topic.id

...or, perhaps this equivalent which avoids a subquery...

select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1

I want to know how to do this effectively, or, if it is impossible, an explanation of why it is impossible...

Unless you use raw queries, it's impossible with Django's ORM, and here's why.

QuerySet objects (django.db.models.query.QuerySet) have a query attribute (django.db.models.sql.query.Query) which is a representation of the actual query which will be performed. These Query objects helpfully have a __str__ method, so you can print it out to see what it is.

Let's start with a simple QuerySet...

>>> from bar.models import *
>>> qs = Topic.objects.filter(record__user_id=1)
>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" INNER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1

...which is obviously not going to work, due to the INNER JOIN.

Taking a deeper look inside the Query object, there's an alias_map attribute which determines what table joins will be performed...

>>> from pprint import pprint
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='INNER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
 u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
 u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='INNER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}

Note that Django only supports two possible join_types, INNER JOIN and LEFT OUTER JOIN.

Now, we can use the Query object's promote_joins methods to use a LEFT OUTER JOIN on the bar_record table...

>>> qs.query.promote_joins(['bar_record'])
>>> pprint(qs.query.alias_map)
{u'bar_record': JoinInfo(table_name=u'bar_record', rhs_alias=u'bar_record', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_topic', lhs_join_col=u'id', rhs_join_col='topic_id', nullable=True),
 u'bar_topic': JoinInfo(table_name=u'bar_topic', rhs_alias=u'bar_topic', join_type=None, lhs_alias=None, lhs_join_col=None, rhs_join_col=None, nullable=False),
 u'auth_user': JoinInfo(table_name=u'auth_user', rhs_alias=u'auth_user', join_type='LEFT OUTER JOIN', lhs_alias=u'bar_record', lhs_join_col='user_id', rhs_join_col=u'id', nullable=False)}

...which will change the query to...

>>> print qs.query
SELECT "bar_topic"."id", "bar_topic"."name" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1

...however, this is still no use, since the join will always match a row, even if it doesn't belong to the correct user, and the WHERE clause will filter it out.

Using values_list() automatically influences the join_type...

>>> qs = Topic.objects.filter(record__user_id=1).values_list('name', 'record__value')
>>> print qs.query
SELECT "bar_topic"."name", "bar_record"."value" FROM "bar_topic" LEFT OUTER JOIN "bar_record" ON ("bar_topic"."id" = "bar_record"."topic_id") WHERE "bar_record"."user_id" = 1

...but ultimately suffers from the same problem.

There is, unfortunately, a fundamental limitation in the joins generated by the ORM, in that they can only be of the form...

(LEFT OUTER|INNER) JOIN <lhs_alias> ON (<lhs_alias>.<lhs_join_col> = <rhs_alias>.<rhs_join_col>)

...so there's really no way to achieve your desired SQL, other than using a raw query.

Sure, you can hack around with things like annotate() and extra(), but they'll likely generate queries which are far less performant, and arguably no more readable than raw SQL.


...and a suggested alternative.

Personally, I'd just use the raw query...

select * from bar_topic
left join bar_record
on bar_record.topic_id = bar_topic.id and bar_record.user_id = 1

...which is simple enough to be compatible with all of Django's supported backends.


First of all, there is no a way (atm Django 1.9.7) to have a representation with Django's ORM of the raw query you posted, exactly as you want; however, you can get the same desired result with something like:

>>> Topic.objects.annotate(
        f=Case(
            When(
                record__user=johnny, 
                then=F('record__value')
            ), 
            output_field=IntegerField()
        )
    ).order_by(
        'id', 'name', 'f'
    ).distinct(
        'id', 'name'
    ).values_list(
        'name', 'f'
    )
>>> [(u'A', 1), (u'B', None), (u'C', 3)]

>>> Topic.objects.annotate(f=Case(When(record__user=may, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f')
>>> [(u'A', 4), (u'B', 5), (u'C', 6)]

Here the SQL generated for the first query:

>>> print Topic.objects.annotate(f=Case(When(record__user=johnny, then=F('record__value')), output_field=IntegerField())).order_by('id', 'name', 'f').distinct('id', 'name').values_list('name', 'f').query

>>> SELECT DISTINCT ON ("payments_topic"."id", "payments_topic"."name") "payments_topic"."name", CASE WHEN "payments_record"."user_id" = 1 THEN "payments_record"."value" ELSE NULL END AS "f" FROM "payments_topic" LEFT OUTER JOIN "payments_record" ON ("payments_topic"."id" = "payments_record"."topic_id") ORDER BY "payments_topic"."id" ASC, "payments_topic"."name" ASC, "f" ASC

##Some notes

  • Doesn't hesitate to use raw queries, specially when the performance is the most important thing. Moreover, sometimes it is a must since you can't get the same result using Django's ORM; in other cases you can, but once in a while having clean and understandable code is more important than the performance in this piece of code.
  • distinct with positional arguments is used in this answer, which is available for PostgreSQL only, atm. In the docs you can see more about conditional expressions.