Simple Subquery with OuterRef
The django-sql-utils package makes this kind of subquery aggregation simple. Just pip install django-sql-utils
and then:
from sql_util.utils import SubqueryCount
posts = Post.objects.annotate(
tag_count=SubqueryCount('tag'))
The API for SubqueryCount is the same as Count, but it generates a subselect in the SQL instead of joining to the related table.
One of the problems with your example is that you cannot use queryset.count()
as a subquery, because .count()
tries to evaluate the queryset and return the count.
So one may think that the right approach would be to use Count()
instead. Maybe something like this:
Post.objects.annotate(
count=Count(Tag.objects.filter(post=OuterRef('pk')))
)
This won't work for two reasons:
The
Tag
queryset selects allTag
fields, whileCount
can only count on one field. Thus:Tag.objects.filter(post=OuterRef('pk')).only('pk')
is needed (to select counting ontag.pk
).Count
itself is not aSubquery
class,Count
is anAggregate
. So the expression generated byCount
is not recognized as aSubquery
(OuterRef
requires subquery), we can fix that by usingSubquery
.
Applying fixes for 1) and 2) would produce:
Post.objects.annotate(
count=Count(Subquery(Tag.objects.filter(post=OuterRef('pk')).only('pk')))
)
However if you inspect the query being produced:
SELECT
"tests_post"."id",
"tests_post"."title",
COUNT((SELECT U0."id"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id"))
) AS "count"
FROM "tests_post"
GROUP BY
"tests_post"."id",
"tests_post"."title"
you will notice a GROUP BY
clause. This is because COUNT
is an aggregate function. Right now it does not affect the result, but in some other cases it may. That's why the docs suggest a different approach, where the aggregation is moved into the subquery
via a specific combination of values
+ annotate
+ values
:
Post.objects.annotate(
count=Subquery(
Tag.objects
.filter(post=OuterRef('pk'))
# The first .values call defines our GROUP BY clause
# Its important to have a filtration on every field defined here
# Otherwise you will have more than one group per row!!!
# This will lead to subqueries to return more than one row!
# But they are not allowed to do that!
# In our example we group only by post
# and we filter by post via OuterRef
.values('post')
# Here we say: count how many rows we have per group
.annotate(count=Count('pk'))
# Here we say: return only the count
.values('count')
)
)
Finally this will produce:
SELECT
"tests_post"."id",
"tests_post"."title",
(SELECT COUNT(U0."id") AS "count"
FROM "tests_tag" U0
INNER JOIN "tests_post_tags" U1 ON (U0."id" = U1."tag_id")
WHERE U1."post_id" = ("tests_post"."id")
GROUP BY U1."post_id"
) AS "count"
FROM "tests_post"