Algorithm for Shuffling a Linked List in n log n time

You can actually do better than that: the best list shuffle algorithm is O(n log n) time and just O(1) space. (You can also shuffle in O(n) time and O(n) space by constructing a pointer array for the list, shuffling it in place using Knuth and re-threading the list accordingly.)

Complexity proof

To see why O(n log n) time is minimal for O(1) space, observe that:

  • With O(1) space, updating the successor of an arbitrary list element necessarily takes O(n) time.
  • Wlog, you can assume that whenever you update one element, you also update all the other elements (leaving them unchanged if you wish), as this also takes just O(n) time.
  • With O(1) space, there are at most O(1) elements to choose from for the successor of any element you're updating (which specific elements these are will obviously depend on the algorithm).
  • Therefore, a single O(n) time update of all the elements could result in at most c^n different list permutations.
  • Since there are n! = O(n^n) = O(c^(n log n)) possible list permutations, you require at least O(log n) updates.

Linked-list data structure (because Python)

import collections

class Cons(collections.Sequence):
    def __init__(self, head, tail=None):
        self.head = head
        self.tail = tail

    def __getitem__(self, index):
        current, n = self, index
        while n > 0:
            if isinstance(current, Cons):
                current, n = current.tail, n - 1
            else:
                raise ValueError("Out of bounds index [{0}]".format(index))
        return current

    def __len__(self):
        current, length = self, 0
        while isinstance(current, Cons):
            current, length = current.tail, length + 1
        return length

    def __repr__(self):
        current, rep = self, []
        while isinstance(current, Cons):
            rep.extend((str(current.head), "::"))
            current = current.tail
        rep.append(str(current))
        return "".join(rep)

Merge-style algorithm

Here is an O(n log n) time and O(1) space algorithm based on iterative merge sort. The basic idea is simple: shuffle the left half, then the right half, then merge them by randomly selecting from the two lists. Two things worth noting:

  1. By making the algorithm iterative rather than recursive, and returning a pointer to the new last element at the end of every merge step, we reduce the space requirement to O(1) while keeping the time cost minimal.
  2. To make sure that all possibilities are equally likely for all input sizes, we use probabilities from the Gilbert–Shannon–Reeds model riffle shuffle when merging (see http://en.wikipedia.org/wiki/Gilbert%E2%80%93Shannon%E2%80%93Reeds_model).
import random

def riffle_lists(head, list1, len1, list2, len2):
    """Riffle shuffle two sublists in place. Returns the new last element."""
    for _ in range(len1 + len2):
        if random.random() < (len1 / (len1 + len2)):
            next, list1, len1 = list1, list1.tail, len1 - 1
        else:
            next, list2, len2 = list2, list2.tail, len2 - 1
        head.tail, head = next, next
    head.tail = list2
    return head

def shuffle_list(list):
    """Shuffle a list in place using an iterative merge-style algorithm."""
    dummy = Cons(None, list)
    i, n = 1, len(list)
    while (i < n):
        head, nleft = dummy, n
        while (nleft > i):
            head = riffle_lists(head, head[1], i, head[i + 1], min(i, nleft - i))
            nleft -= 2 * i
        i *= 2
    return dummy[1]

Another algorithm

Another interesting O(n log n) algorithm that produces not-quite-uniform shuffles involves simply riffle shuffling the list 3/2 log_2(n) times. As described in http://en.wikipedia.org/wiki/Gilbert%E2%80%93Shannon%E2%80%93Reeds_model, this leaves only a constant number of bits of information.


Code

Up shuffle approach

This (lua) version is improved from foxcub's answer to remove the need of dummy nodes.

In order to slightly simplify the code in this answer, this version suppose that your lists know their sizes. In the event they don't, you can always find it in O(n) time, but even better: a few simple adaptation in the code can be done to not require to compute it beforehand (like subdividing one over two instead of first and second half).

function listUpShuffle (l)
    local lsz = #l
    if lsz <= 1 then return l end

    local lsz2 = math.floor(lsz/2)
    local l1, l2 = {}, {}
    for k = 1, lsz2     do l1[#l1+1] = l[k] end
    for k = lsz2+1, lsz do l2[#l2+1] = l[k] end

    l1 = listUpShuffle(l1)
    l2 = listUpShuffle(l2)

    local res = {}
    local i, j = 1, 1
    while i <= #l1 or j <= #l2 do
        local rem1, rem2 = #l1-i+1, #l2-j+1
        if math.random() < rem1/(rem1+rem2) then
            res[#res+1] = l1[i]
            i = i+1
        else
            res[#res+1] = l2[j]
            j = j+1
        end
    end
    return res
end

To avoid using dummy nodes, you have to compensate for the fact that the two intermediate lists can have different lengths by varying the probability to choose in each list. This is done by testing a [0,1] uniform random number against the ratio of nodes popped from the first list over the total number of node popped (in the two lists).

Down shuffle approach

You can also shuffle while you subdivide recursively, which in my humble tests showed slightly (but consistently) better performance. It might come from the fewer instructions, or on the other hand it might have appeared due to cache warmup in luajit, so you will have to profile for your use cases.

function listDownShuffle (l)
    local lsz = #l
    if lsz <= 1 then return l end

    local lsz2 = math.floor(lsz/2)
    local l1, l2 = {}, {}
    for i = 1, lsz do
        local rem1, rem2 = lsz2-#l1, lsz-lsz2-#l2
        if math.random() < rem1/(rem1+rem2) then
            l1[#l1+1] = l[i]
        else
            l2[#l2+1] = l[i]
        end
    end

    l1 = listDownShuffle(l1)
    l2 = listDownShuffle(l2)

    local res = {}
    for i = 1, #l1 do res[#res+1] = l1[i] end
    for i = 1, #l2 do res[#res+1] = l2[i] end
    return res
end

Tests

The full source is in my listShuffle.lua Gist.

It contains code that, when executed, prints a matrix representing, for each element of the input list, the number of times it appears at each position of the output list, after a specified number of run. A fairly uniform matrix 'show' the uniformity of the distribution of characters, hence the uniformity of the shuffle.

Here is an example run with 1000000 iteration using a (non power of two) 3 element list :

>> luajit listShuffle.lua 1000000 3
Up shuffle bias matrix:
333331 332782 333887
333377 333655 332968
333292 333563 333145
Down shuffle bias matrix:
333120 333521 333359
333435 333088 333477
333445 333391 333164

What about the following? Perform the same procedure as merge sort. When merging, instead of selecting an element (one-by-one) from the two lists in sorted order, flip a coin. Choose whether to pick an element from the first or from the second list based on the result of the coin flip.


Edit (2022-01-12): As GA1 points out in the answer below, this algorithm doesn't produce a permutation uniformly at random.


Algorithm.

shuffle(list):
    if list contains a single element
        return list

    list1,list2 = [],[]
    while list not empty:
        move front element from list to list1
        if list not empty: move front element from list to list2

    shuffle(list1)
    shuffle(list2)

    if length(list2) < length(list1):
        i = pick a number uniformly at random in [0..length(list2)]             
        insert a dummy node into list2 at location i 

    # merge
    while list1 and list2 are not empty:
        if coin flip is Heads:
            move front element from list1 to list
        else:
            move front element from list2 to list

    if list1 not empty: append list1 to list
    if list2 not empty: append list2 to list

    remove the dummy node from list
        

The key point for space is that splitting the list into two does not require any extra space. The only extra space we need is to maintain log n elements on the stack during recursion.

The point with the dummy node is to realize that inserting and removing a dummy element keeps the distribution of the elements uniform.


Edit (2022-01-12): As Riley points out in the comments, the analysis below is flawed.


Analysis. Why is the distribution uniform? After the final merge, the probability P_i(n) of any given number ending up in the position i is as follows. Either it was:

  • in the i-th place in its own list, and the list won the coin toss the first i times, the probability of this is 1/2^i;
  • in the i-1-st place in its own list, and the list won the coin toss i-1 times including the last one and lost once, the probability of this is (i-1) choose 1 times 1/2^i;
  • in the i-2-nd place in its own list, and the list won the coin toss i-2 times including the last one and lost twice, the probability of this is (i-1) choose 2 times 1/2^i;
  • and so on.

So the probability

P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * P_j(n/2).

Inductively, you can show that P_i(n) = 1/n. I let you verify the base case and assume that P_j(n/2) = 2/n. The term \sum_{j=0}^{i-1} (i-1 choose j) is exactly the number of i-1-bit binary numbers, i.e. 2^{i-1}. So we get

P_i(n) = \sum_{j=0}^{i-1} (i-1 choose j) * 1/2^i * 2/n
       = 2/n * 1/2^i * \sum_{j=0}^{i-1} (i-1 choose j)
       = 1/n * 1/2^{i-1} * 2^{i-1}
       = 1/n

I hope this makes sense. The only assumption we need is that n is even, and that the two lists are shuffled uniformly. This is achieved by adding (and then removing) the dummy node.

P.S. My original intuition was nowhere near rigorous, but I list it just in case. Imagine we assign numbers between 1 and n at random to the elements of the list. And now we run a merge sort with respect to these numbers. At any given step of the merge, it needs to decide which of the heads of the two lists is smaller. But the probability of one being greater than the other should be exactly 1/2, so we can simulate this by flipping a coin.

P.P.S. Is there a way to embed LaTeX here?