How to annotate Count with a condition in a Django queryset
For django >= 1.8
Use Conditional Aggregation:
from django.db.models import Count, Case, When, IntegerField
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=1),
output_field=IntegerField(),
))
)
Explanation:
normal query through your articles will be annotated with numviews
field. That field will be constructed as a CASE/WHEN expression, wrapped by Count, that will return 1 for readership matching criteria and NULL
for readership not matching criteria. Count will ignore nulls and count only values.
You will get zeros on articles that haven't been viewed recently and you can use that numviews
field for sorting and filtering.
Query behind this for PostgreSQL will be:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN 1
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
If we want to track only unique queries, we can add distinction into Count
, and make our When
clause to return value, we want to distinct on.
from django.db.models import Count, Case, When, CharField, F
Article.objects.annotate(
numviews=Count(Case(
When(readership__what_time__lt=treshold, then=F('readership__reader')), # it can be also `readership__reader_id`, it doesn't matter
output_field=CharField(),
), distinct=True)
)
That will produce:
SELECT
"app_article"."id",
"app_article"."author",
"app_article"."published",
COUNT(
DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"
ELSE NULL END
) as "numviews"
FROM "app_article" LEFT OUTER JOIN "app_readership"
ON ("app_article"."id" = "app_readership"."which_article_id")
GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"
For django < 1.8 and PostgreSQL
You can just use raw
for executing SQL statement created by newer versions of django. Apparently there is no simple and optimized method for querying that data without using raw
(even with extra
there are some problems with injecting required JOIN
clause).
Articles.objects.raw('SELECT'
' "app_article"."id",'
' "app_article"."author",'
' "app_article"."published",'
' COUNT('
' DISTINCT CASE WHEN "app_readership"."what_time" < 2015-11-18 11:04:00.000000+01:00 THEN "app_readership"."reader_id"'
' ELSE NULL END'
' ) as "numviews"'
'FROM "app_article" LEFT OUTER JOIN "app_readership"'
' ON ("app_article"."id" = "app_readership"."which_article_id")'
'GROUP BY "app_article"."id", "app_article"."author", "app_article"."published"')
For django >= 2.0 you can use Conditional aggregation with a filter
argument in the aggregate functions:
from datetime import timedelta
from django.utils import timezone
from django.db.models import Count, Q # need import
Article.objects.annotate(
numviews=Count(
'readership__reader__id',
filter=Q(readership__what_time__gt=timezone.now() - timedelta(minutes=30)),
distinct=True
)
)