Grokking Timsort
This change went through the core-libs mailing list when it went in so there is some discussion and useful links there. Here's the web rev with code review changes and also the original patch.
The comments in the code say:
Implementation note: This implementation is a stable, adaptive,
iterative mergesort that requires far fewer than n lg(n) comparisons
when the input array is partially sorted, while offering the
performance of a traditional mergesort when the input array is
randomly ordered. If the input array is nearly sorted, the
implementation requires approximately n comparisons.
Temporary storage requirements vary from a small constant for nearly sorted
input arrays to n/2 object references for randomly ordered input
arrays.The implementation takes equal advantage of ascending and
descending order in its input array, and can take advantage of
ascending and descending order in different parts of the the same
input array. It is well-suited to merging two or more sorted arrays:
simply concatenate the arrays and sort the resulting array.
The implementation was adapted from Tim Peters's list sort for Python
TimSort. It uses techiques from Peter McIlroy's "Optimistic
Sorting and Information Theoretic Complexity", in Proceedings of the
Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, pp 467-474,
January 1993.
Buried in there is the very useful link to the Python implementation details, and I think that's a great place to start, followed by the code. To be incredibly high level about it, timsort improves performance by noticing runs of sorted data and taking advantage of that structure during the sort.
Quoting the relevant portion from a now deleted blog post: Visualising Sorting Algorithms: Python's timsort
The business-end of timsort is a mergesort that operates on runs of pre-sorted elements. A minimum run length minrun is chosen to make sure the final merges are as balanced as possible - for 64 elements, minrun happens to be 32. Before the merges begin, a single pass is made through the data to detect pre-existing runs of sorted elements. Descending runs are handled by simply reversing them in place. If the resultant run length is less than minrun, it is boosted to minrun using insertion sort. On a shuffled array with no significant pre-existing runs, this process looks exactly like our guess above: pre-sorting blocks of minrun elements using insertion sort, before merging with merge sort.
[...]
- timsort finds a descending run, and reverses the run in-place. This is done directly on the array of pointers, so seems "instant" from our vantage point.
- The run is now boosted to length minrun using insertion sort.
- No run is detected at the beginning of the next block, and insertion sort is used to sort the entire block. Note that the sorted elements at the bottom of this block are not treated specially - timsort doesn't detect runs that start in the middle of blocks being boosted to minrun.
- Finally, mergesort is used to merge the runs.