Pathological Sorting
It's been a pretty long time, but I remember back in Algorithms 101 we were taught some sorting algorithm that used randomization. I wasn't a very good student so I don't really remember how it went or why it worked quickly on average.
Nevertheless, I've decided that this problem calls for a solution that uses randomization, which will hopefully work in my favor on average.
import random
def arrayIsSorted (arr) :
for i in range(len(arr)-1) :
if arr[i]>arr[i+1] : return False
return True
def rSort (arr) :
random.seed (42)
counter = 0
while not arrayIsSorted(arr) :
random.shuffle (arr)
counter+=1
print ("Sorted in %d iterations." % counter)
return arr
Since true randomization is important, I make sure to seed the RNG with the answer to Life, the Universe and Everything. After a bit of testing it turns out that that was a smart move! Check out how fast these 2 completely arbitrary lists get sorted:
rSort ([6,1,4,2,3,7,5])
rSort ([8,9,6,1,4,7,2,3,5])
Both of these get sorted in only 1 iteration - you couldn't possibly ask for a faster function than that!
Now, admittedly, some other lists produce slightly worse results...
rSort ([5,1,4,2,3,7,6])
rSort ([8,9,6,1,4,7,2,5,3])
These get sorted in 4,176 and 94,523 iterations respectively, which actually takes more than a second... but let's just keep that fact to ourselves so as not to distract anyone from how amazing this algorithm is!
Edit:
I've been asked to prove my algorithm's efficiency on a list of 100 items, so here you go:
rSort ([70, 6, 52, 97, 85, 61, 62, 48, 30, 3, 11, 88, 39, 91, 98, 8, 54, 92, 44, 65, 69, 21, 58, 41, 60, 76, 27, 82, 93, 81, 20, 94, 22, 29, 49, 95, 40, 19, 55, 42, 43, 1, 0, 67, 35, 15, 51, 31, 16, 25, 5, 53, 37, 74, 86, 12, 13, 72, 56, 32, 47, 46, 59, 33, 80, 4, 45, 63, 57, 89, 7, 77, 14, 10, 34, 87, 18, 79, 9, 66, 24, 99, 64, 26, 78, 38, 90, 28, 83, 75, 68, 2, 17, 73, 96, 71, 23, 84, 36, 50])
Even this long and completely arbitrary list gets sorted instantly! Truly I must have stumbled upon the best sorting algorithm in the world!
If you can create your own data, then it's pretty straightforward - get data that looks random, but includes a key for faster sorting. All other data uses the original sorting method, so the average times are better.
One easy way is to make sure each data item has a unique key, and then just hash the keys. Take for example a list with the numbers from 1-10,000, all multiplied by 16, and with a random number from 0-15 added to it (see fillArray() below). They will look random, but each one has a unique sequential key. For sorting, divide by 16 (in C the >>4 is very fast) and then just place the number into an array using the resulting key as the index. One pass and you're done. In testing, I found quicksort was 30 times slower on ten million numbers.
void fillArray(int *a,int len)
{
for (int i=0;i<len;++i)
a[i]=(i<<4)|(rand()&0xF);
// shuffle later
}
void sortArray(int *a,int len)
{
int key=0;
int *r=new int[len];
for (int i=0;i<len;++i)
{
key=a[i]>>4;
r[key]=a[i];
}
memcpy(a,r,len*sizeof(int));
delete[] r;
}
void shuffleArray(int *a,int len)
{
int swap=0, k=0;
for (int i=0;i<len;++i)
{
k=rand()%len;
swap=a[k];
a[k]=a[i];
a[i]=swap;
}
}
int qCompare(const void*a,const void*b)
{
int result=*((int*)a)-*((int*)b);
return result;
}
void main()
{
int aLen=10000;
int *a=new int[aLen];
srand (time(NULL));
fillArray(a,aLen);
// time them
long t0=0, d0=0, d1=0;
// qsort
shuffleArray(a,aLen);
t0=::GetTickCount();
qsort(a,aLen,sizeof(int),&qCompare);
d0=::GetTickCount()-t0;
// oursort
shuffleArray(a,aLen);
t0=::GetTickCount();
sortArray(a,aLen);
d1=::GetTickCount()-t0;
delete[] a;
}
Anything that has a unique key can be sorted this way - if you have the memory to store it, of course. For example, many databases use a unique numeric customer id - if the list is small/sequential enough this could be held in memory. Or some other way to translate a record into a unique number. For more info, research Hash Sorts, since that's what this is...