Sort a list by the number of occurrences of the elements in the list

The built-in sorted creates a list out of the sequence provided and then sorts that based on the key argument (omitting error checking):

/* copy sequence provided */
newlist = PySequence_List(seq);

/* get list.sort for the list object */
callable = _PyObject_GetAttrId(newlist, &PyId_sort);

/* call it and then return later on */
v = _PyObject_FastCallKeywords(callable, args + 1, nargs - 1, kwnames);

This essentially translates to something like what Jean provided in his answer:

B = list(A)
B.sort(key=lambda x: A.count(x))

By making that copy B and referencing A in the key function, this removes the restriction imposed by A.sort which can't peek in itself.

It seems that A is changed during the in-place sort process, so you cannot rely on the value of A during the sort process.

Making a copy also works.

A=[2,1,3,4,2,2,3]
B=A[:]
A.sort(key=lambda x:B.count(x))
print(A)

Confirmed by this line in python documentation

CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.

I believe it's because A.sort is modifying the list in place underneath while computing. sorted() doesn't modify the list and returns therefore a correct result.

This is by design and intentional. CPython temporarily "disallows" access to the list while the list is being sorted in place, the behavior is documented here:

CPython implementation detail: While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises ValueError if it can detect that the list has been mutated during a sort.

You can inspect that by printing A inside the key function - you'll get an empty list:

In [2]: def key_function(x):
    ...:     print(A, x)
    ...:     return A.count(x)
    ...: 

In [3]: A.sort(key=key_function)  
([], 2)
([], 1)
([], 3)
([], 4)
([], 2)
([], 2)
([], 3)

But, if you do that for sorted():

In [4]: sorted(A, key=key_function)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 1)
([2, 1, 3, 4, 2, 2, 3], 3)
([2, 1, 3, 4, 2, 2, 3], 4)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 2)
([2, 1, 3, 4, 2, 2, 3], 3)
Out[4]: [1, 4, 3, 3, 2, 2, 2]

It is also documented inside the sort() implementation:

/* The list is temporarily made empty, so that mutations performed
 * by comparison functions can't affect the slice of memory we're
 * sorting (allowing mutations during sorting is a core-dump
 * factory, since ob_item may change).
 */.

Sort a list by the number of occurrences of the elements in the list

Tags:

Python

Sorting

List

Related

Recent Posts