What are efficient data structures and algorithms for simulating loaded dice?

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.

The idea is to take your array of probabilities pk and produce three new n-element arrays, qk, ak, and bk. Each qk is a probability between 0 and 1, and each ak and bk is an integer between 1 and n.

We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If qi < s then return ai else return bi. The work in the alias method is in figuring out how to produce qk, ak and bk.


Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.

function get_result(node, seed):
    if seed < node.interval.start:
        return get_result(node.left_child, seed)
    else if seed < node.interval.end:
        // start <= seed < end
        return node.result
    else:
        return get_result(node.right_child, seed)

The good thing about this solution is that is very simple to implement but still has good complexity.


I'm thinking of granulating your table.

Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.

Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.

Maybe I could explain easier with an example:

Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3

Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)

arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3

Then to get the probability, just randomize a number between 0 and 10 and simply access that index.

This method might loose accuracy, but increase x and accuracy will be sufficient.