Using list.count to sort a list in-place using .sort() does not work. Why?
It doesn't work with the list.sort
method because CPython decides to "empty the list" temporarily (the other answer already presents this). This is mentioned in the documentation as implementation detail:
CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises
ValueError
if it can detect that the list has been mutated during a sort.
The source code contains a similar comment with a bit more explanation:
/* The list is temporarily made empty, so that mutations performed
* by comparison functions can't affect the slice of memory we're
* sorting (allowing mutations during sorting is a core-dump
* factory, since ob_item may change).
*/
The explanation isn't straight-forward but the problem is that the key-function and the comparisons could change the list
instance during sorting which is very likely to result in undefined behavior of the C-code (which may crash the interpreter). To prevent that the list is emptied during the sorting, so that even if someone changes the instance it won't result in an interpreter crash.
This doesn't happen with sorted
because sorted
copies the list and simply sorts the copy. The copy is still emptied during the sorting but there's no way to access it, so it isn't visible.
However you really shouldn't sort like this to get a frequency sort. That's because for each item you call the key
function once. And list.count
iterates over each item, so you effectively iterate the whole list for each element (what is called O(n**2)
complexity). A better way would be to calculate the frequency once for each element (can be done in O(n)
) and then just access that in the key
.
However since CPython has a Counter
class that also supports most_common
you could really just use that:
>>> from collections import Counter
>>> [item for item, count in reversed(Counter(a).most_common()) for _ in range(count)]
[1, 2, 2, 5, 5, 4, 4, 4]
This may change the order of the elements with equal counts but since you're doing a frequency count that shouldn't matter to much.
What you see is the result of a certain CPython implementation detail of list.sort
. Try this again, but create a copy of a
first:
a.sort(key=a.copy().count)
a
# [1, 5, 5, 2, 2, 4, 4, 4]
.sort
modifies a
internally, so a.count
is going to produce un-predictable results. This is documented as an implementation detail.
What copy
call does is it creates a copy of a
and uses that list's count
method as the key. You can see what happens with some debug statements:
def count(x):
print(a)
return a.count(x)
a.sort(key=count)
[]
[]
[]
...
a
turns up as an empty list when accessed inside .sort
, and [].count(anything)
will be 0
. This explains why the output is the same as the input - the predicates are all the same (0
).
OTOH, sorted
creates a new list, so it doesn't have this problem.
If you really want to sort by frequency counts, the idiomatic method is to use a Counter
:
from collections import Counter
a.sort(key=Counter(a).get)
a
# [1, 5, 5, 2, 2, 4, 4, 4]