How to aggregate (min/max etc.) over Django JSONField data?
It is possible to do this using a Postgres function
https://www.postgresql.org/docs/9.5/functions-json.html
from django.db.models import Func, F, FloatField
from django.db.models.expressions import Value
from django.db.models.functions import Cast
text = Func(F(json_field), Value(json_key), function='jsonb_extract_path_text')
floatfield = Cast(text, FloatField())
Model.objects.aggregate(min=Min(floatfield))
This is much better than using the RawQuery
because it doesn't break if you do a more complex query, where Django uses aliases and where there are field name collisions. There is so much going on with the ORM that can bite you with hand written implementations.
I know this is a bit late (several months) but I came across the post while trying to do this. Managed to do it by:
1) using KeyTextTransform to convert the jsonb value to text
2) using Cast to convert it to integer, so that the SUM works:
q = myModel.objects.filter(type=9) \
.annotate(numeric_val=Cast(KeyTextTransform(sum_field, 'data'), IntegerField())) \
.aggregate(Sum('numeric_val'))
print(q)
where 'data' is the jsonb property, and 'numeric_val' is the name of the variable I create by annotating.
Hope this helps somebody!
For those who interested, I've found the solution (or workaround at least).
from django.db.models.expressions import RawSQL
Model.objects.annotate(
val=RawSQL("((attrs->>%s)::numeric)", (json_field_key,))
).aggregate(min=Min('val')
Note that attrs->>%s
expression will become smth like attrs->>'width'
after processing (I mean single quotes). So if you hardcode this name you should remember to insert them or you will get error.
/// A little bit offtopic ///
And one more tricky issue not related to django itself but that is needed to be handled somehow. As attrs
is json field and there're no restrictions on its keys and values you can (depending on you application logic) get some non-numeric values in, for example, width
key. In this case you will get DataError
from postgres as a result of executing the above query. NULL values will be ignored meanwhile so it's ok. If you can just catch the error then no problem, you're lucky. In my case I needed to ignore wrong values and the only way here is to write custom postgres function that will supress casting errors.
create or replace function safe_cast_to_numeric(text) returns numeric as $$
begin
return cast($1 as numeric);
exception
when invalid_text_representation then
return null;
end;
$$ language plpgsql immutable;
And then use it to cast text to numbers:
Model.objects.annotate(
val=RawSQL("safe_cast_to_numeric(attrs->>%s)", (json_field_key,))
).aggregate(min=Min('val')
Thus we get quite solid solution for such a dynamic thing as json.
From django 1.11 (which isn't out yet, so this might change) you can use django.contrib.postgres.fields.jsonb.KeyTextTransform
instead of RawSQL
.
In django 1.10 you have to copy/paste KeyTransform
to you own KeyTextTransform
and replace the ->
operator with ->>
and #>
with #>>
so it returns text instead of json objects.
Model.objects.annotate(
val=KeyTextTransform('json_field_key', 'blah__json_field'))
).aggregate(min=Min('val')
You can even include KeyTextTransform
s in SearchVector
s for full text search
Model.objects.annotate(
search=SearchVector(
KeyTextTransform('jsonb_text_field_key', 'json_field'))
)
).filter(search='stuff I am searching for')
Remember you can also index in jsonb fields, so you should consider that based upon your specific workload.