Why does the Collections.shuffle() algorithm work better than my implementation

This is another explanation of Fisher-Yates.

Consider the following method:

There are two lists, A and B. Initially, all the elements are on list A so list B is empty.
At each step:

Pick with uniform probability from the elements currently on list A.

Permute list A to make the chosen element the last element.

Remove the last element from list A and append it to list B.
When list A is empty, return list B.

I find this algorithm easy to understand and visualize.

The probability of a given item being chosen on the first step is 1/n. The probability of a given item being chosen on the second step is its probability of not being chosen on the first step, (n-1)/n, times its probability of being chosen on the second step given it is still on list A, 1/(n-1). That product is 1/n.

Similarly, it has probability ((n-1)/n)*((n-2)/(n-1)) = (n-2)/n of still being on list A after two items have been moved, and therefore a 1/n probability of being the third item chosen.

In general, the probability of still being on list A after k items have been chosen is ((n-1)/n)*((n-2)/(n-1))*...*((n-k)/(n-k+1)) = (n-k)/n. The probability of being chosen on the next step, given the item is still on list A, is 1/(n-k), so the unconditional probability being chosen on the step when list A has (n-k) items is ((n-k)/n)*(1/(n-k)) = 1/n.

Fisher-Yates is just this algorithm with the two lists, whose total length is always n, concatenated in a single array. At each step, it selects an element from list A with uniform probability, permutes list A to put that element adjacent to list B, and then moves the boundary so that it changes from being a list A element to being the most recently added element of list B.

Collections.Shuffle() does a Fisher-Yates shuffle. It's a more evenly distributed form of shuffling, and does not reshuffle what might have previously been shuffled already, as opposed to your algorithm.

What your algorithm does (also the known as the naive implementation) is that it will randomly pick any array index and shuffle it over, meaning there's a high chance of it picking the same index that has previously been shuffled already.

The Fisher-Yates shuffle (also known as the Donald Knuth Shuffle) is an unbiased algorithm that shuffles items in the array in an equally likely probability. It avoids the chance if it 'moving' the same objects twice.

Here's a good explanation of the Fisher Yates Shuffle by our own Jeff Atwood at Coding Horror, he discusses the naive implementation of the random shuffle versus the Fisher Yates shuffle.

See also this SO question about Java's implementation. It mentions what you asked. You can also look at the source code if you'd like, as mentioned there. I found it by Googling Collections.shuffle() in the top 5 links.

To further discuss this, it's always a good idea to use the Fisher-Yates shuffle compared to the naive implementations, especially in implementations that require a higher level of randomness (such as shuffle poker cards) to avoid introducing odds and unfair play. It wouldn't be a good thing if games of chance where implemented based on our naive implementation, as the bias leads to what you've observed, where the same permutation seems to show up more often than the others.

Lastly, as user @jmruc mentioned, here's a very nice tutorial on visualizing algorithms, it contains the Fisher-Yates shuffle, as well as other algorithms, all beautifully presented. Might help you wrap your head around the concepts if you're more of a visualizer: Visualizing Algorithms by Mike Bostock

Why does the Collections.shuffle() algorithm work better than my implementation

Tags:

Java

Shuffle

Related

Recent Posts