Django: Natural Sort QuerySet
That's not Django's bug actually, that's how databases work internally and for example looks like MySql for example doesn't have natural sort by default (I googled not a lot, so maybe I am wrong there). But we can use some workaround for the case.
I put everything with examples & screenshots at https://gist.github.com/phpdude/8a45e1bd2943fa806aeffee94877680a
But basically for the given models.py
file
from django.db import models
class Item(models.Model):
signature = models.CharField('Signatur', max_length=50)
def __str__(self):
return self.signature
I've used admin.py
just for example with the correct filter implementation
from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import Length, StrIndex, Substr, NullIf, Coalesce
from django.db.models import Value as V
from .models import Item
class AlphanumericSignatureFilter(SimpleListFilter):
title = 'Signature (alphanumeric)'
parameter_name = 'signature_alphanumeric'
def lookups(self, request, model_admin):
return (
('signature', 'Signature (alphanumeric)'),
)
def queryset(self, request, queryset):
if self.value() == 'signature':
return queryset.order_by(
Coalesce(Substr('signature', V(0), NullIf(StrIndex('signature', V(' ')), V(0))), 'signature'),
Length('signature'),
'signature'
)
@register(Item)
class Item(ModelAdmin):
list_filter = [AlphanumericSignatureFilter]
Screenshots with examples
A few references:
- http://www.mysqltutorial.org/mysql-natural-sorting/
- https://docs.djangoproject.com/en/2.0/ref/contrib/admin/
PS: It looks like db function Length(column_name)
was added on Django 1.9, so you should be able to use it, but generally any Django version supports custom db ORM function call and you can call length()
function of the field.
Extra example with using Python library natsort
It will work, but requires to load all the possible signatures before for correct sort since it sorts the rows list using python side, not DB side.
It works. But it could be pretty slow in case of a big table size.
From my point of view it should be used only on db tables sizes less than 50 000 rows (for example, depends on your DB server performance & etc).
from django.contrib.admin import ModelAdmin, register, SimpleListFilter
from django.db.models.functions import StrIndex, Concat
from django.db.models import Value as V
from natsort import natsorted
from .models import Item
class AlphanumericTruePythonSignatureFilter(SimpleListFilter):
title = 'Signature (alphanumeric true python)'
parameter_name = 'signature_alphanumeric_python'
def lookups(self, request, model_admin):
return (
('signature', 'Signature (alphanumeric)'),
)
def queryset(self, request, queryset):
if self.value() == 'signature':
all_ids = list(queryset.values_list('signature', flat=True))
# let's use "!:!" as a separator for signature values
all_ids_sorted = "!:!" + "!:!".join(natsorted(all_ids))
return queryset.order_by(
StrIndex(V(all_ids_sorted), Concat(V('!:!'), 'signature')),
)
@register(Item)
class Item(ModelAdmin):
list_filter = [AlphanumericTruePythonSignatureFilter]
And one more screenshot example for the case
If you don’t mind to target a specific database, you can use RawSQL() to inject a SQL expression for parsing your “signature” field, then annotate the recordset with the result; for example (PostgreSQL):
queryset = (
Item.objects.annotate(
right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
).order_by('right_part')
)
(In case you needed to support different database formats, you could additionally detect the active engine and supply a suitable expression accordingly)
The nice thing about RawSQL() is that you make very explicit when and where you’re applying a database-specific feature.
As noted by @schillingt, Func() may also be an options. On the other side, I would avoid extra() as it might be very well deprecated (see: https://docs.djangoproject.com/en/2.2/ref/models/querysets/#extra).
Proof (for PostgreSQL):
class Item(models.Model):
signature = models.CharField('Signatur', max_length=50)
def __str__(self):
return self.signature
-----------------------------------------------------
import django
from django.db.models.expressions import RawSQL
from pprint import pprint
from backend.models import Item
class ModelsItemCase(django.test.TransactionTestCase):
def test_item_sorting(self):
signatures = [
'BA 1',
'BA 10',
'BA 100',
'BA 2',
'BA 1002',
'BA 1000',
'BA 1001',
]
for signature in signatures:
Item.objects.create(signature=signature)
pprint(list(Item.objects.all()))
print('')
queryset = (
Item.objects.annotate(
right_part=RawSQL("cast(split_part(signature, ' ', 2) as int)", ())
).order_by('right_part')
)
pprint(list(queryset))
self.assertEqual(queryset[0].signature, 'BA 1')
self.assertEqual(queryset[1].signature, 'BA 2')
self.assertEqual(queryset[2].signature, 'BA 10')
self.assertEqual(queryset[3].signature, 'BA 100')
self.assertEqual(queryset[4].signature, 'BA 1000')
self.assertEqual(queryset[5].signature, 'BA 1001')
self.assertEqual(queryset[6].signature, 'BA 1002')
Result:
test_item_sorting (backend.tests.test_item.ModelsItemCase) ... [<Item: BA 1>,
<Item: BA 10>,
<Item: BA 100>,
<Item: BA 2>,
<Item: BA 1002>,
<Item: BA 1000>,
<Item: BA 1001>]
[<Item: BA 1>,
<Item: BA 2>,
<Item: BA 10>,
<Item: BA 100>,
<Item: BA 1000>,
<Item: BA 1001>,
<Item: BA 1002>]
ok
----------------------------------------------------------------------
Ran 1 test in 0.177s