Quicksort vs heapsort
For most situations, having quick vs. a little quicker is irrelevant... you simply never want it to occasionally get waayyy slow. Although you can tweak QuickSort to avoid the way slow situations, you lose the elegance of the basic QuickSort. So, for most things, I actually prefer HeapSort... you can implement it in its full simple elegance, and never get a slow sort.
For situations where you DO want max speed in most cases, QuickSort may be preferred over HeapSort, but neither may be the right answer. For speed-critical situations, it is worth examining closely the details of the situation. For example, in some of my speed-critical code, it is very common that the data is already sorted or near-sorted (it is indexing multiple related fields that often either move up and down together OR move up and down opposite each other, so once you sort by one, the others are either sorted or reverse-sorted or close... either of which can kill QuickSort). For that case, I implemented neither... instead, I implemented Dijkstra's SmoothSort... a HeapSort variant that is O(N) when already sorted or near-sorted... it is not so elegant, not too easy to understand, but fast... read http://www.cs.utexas.edu/users/EWD/ewd07xx/EWD796a.PDF if you want something a bit more challenging to code.
Heapsort is O(N log N) guaranted, what is much better than worst case in Quicksort. Heapsort doesn't need more memory for another array to putting ordered data as is needed by Mergesort. So why do comercial applications stick with Quicksort? What Quicksort has that is so special over others implementations?
I've tested the algorithms myself and I've seen that Quicksort has something special indeed. It runs fast, much faster than Heap and Merge algorithms.
The secret of Quicksort is: It almost doesn't do unnecessary element swaps. Swap is time consuming.
With Heapsort, even if all of your data is already ordered, you are going to swap 100% of elements to order the array.
With Mergesort, it's even worse. You are going to write 100% of elements in another array and write it back in the original one, even if data is already ordered.
With Quicksort you don't swap what is already ordered. If your data is completely ordered, you swap almost nothing! Although there is a lot of fussing about worst case, a little improvement on the choice of pivot, any other than getting the first or last element of array, can avoid it. If you get a pivot from the intermediate element between first, last and middle element, it is suficient to avoid worst case.
What is superior in Quicksort is not the worst case, but the best case! In best case you do the same number of comparisons, ok, but you swap almost nothing. In average case you swap part of the elements, but not all elements, as in Heapsort and Mergesort. That is what gives Quicksort the best time. Less swap, more speed.
The implementation below in C# on my computer, running on release mode, beats Array.Sort by 3 seconds with middle pivot and by 2 seconds with improved pivot (yes, there is an overhead to get a good pivot).
static void Main(string[] args)
{
int[] arrToSort = new int[100000000];
var r = new Random();
for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
Console.WriteLine("Press q to quick sort, s to Array.Sort");
while (true)
{
var k = Console.ReadKey(true);
if (k.KeyChar == 'q')
{
// quick sort
Console.WriteLine("Beg quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
QuickSort(arrToSort, 0, arrToSort.Length - 1);
Console.WriteLine("End quick sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
}
else if (k.KeyChar == 's')
{
Console.WriteLine("Beg Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
Array.Sort(arrToSort);
Console.WriteLine("End Array.Sort at " + DateTime.Now.ToString("HH:mm:ss.ffffff"));
for (int i = 0; i < arrToSort.Length; i++) arrToSort[i] = r.Next(1, arrToSort.Length);
}
}
}
static public void QuickSort(int[] arr, int left, int right)
{
int begin = left
, end = right
, pivot
// get middle element pivot
//= arr[(left + right) / 2]
;
//improved pivot
int middle = (left + right) / 2;
int
LM = arr[left].CompareTo(arr[middle])
, MR = arr[middle].CompareTo(arr[right])
, LR = arr[left].CompareTo(arr[right])
;
if (-1 * LM == LR)
pivot = arr[left];
else
if (MR == -1 * LR)
pivot = arr[right];
else
pivot = arr[middle];
do
{
while (arr[left] < pivot) left++;
while (arr[right] > pivot) right--;
if(left <= right)
{
int temp = arr[right];
arr[right] = arr[left];
arr[left] = temp;
left++;
right--;
}
} while (left <= right);
if (left < end) QuickSort(arr, left, end);
if (begin < right) QuickSort(arr, begin, right);
}
Quicksort-Heapsort in-place hybrids are really interesting, too, since most of them only needs n*log n comparisons in the worst case (they are optimal with respect to the first term of the asymptotics, so they avoid the worst-case scenarios of Quicksort), O(log n) extra-space and they preserve at least "a half" of the good behaviour of Quicksort with respect to already-ordered set of data. An extremely interesting algorithm is presented by Dikert and Weiss in http://arxiv.org/pdf/1209.4214v1.pdf:
- Select a pivot p as the median of a random sample of sqrt(n) elements (this can be done in at most 24 sqrt(n) comparisons through the algorithm of Tarjan&co, or 5 sqrt(n) comparisons through the much more convoluted spider-factory algorithm of Schonhage);
- Partition your array in two parts as in the first step of Quicksort;
- Heapify the smallest part and use O(log n) extra bits to encode a heap in which every left child has a value greater than its sibling;
- Recursively extract the root of the heap, sift down the lacune left by the root until it reaches a leaf of the heap, then fill the lacune with an appropriate element took from the other part of the array;
- Recur over the remaining non-ordered part of the array (if p is chosen as the exact median, there is no recursion at all).
This paper has some analysis.
Also, from Wikipedia:
The most direct competitor of quicksort is heapsort. Heapsort is typically somewhat slower than quicksort, but the worst-case running time is always Θ(nlogn). Quicksort is usually faster, though there remains the chance of worst case performance except in the introsort variant, which switches to heapsort when a bad case is detected. If it is known in advance that heapsort is going to be necessary, using it directly will be faster than waiting for introsort to switch to it.