What are efficient data structures and algorithms for simulating loaded dice?

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.

The idea is to take your array of probabilities p_k and produce three new n-element arrays, q_k, a_k, and b_k. Each q_k is a probability between 0 and 1, and each a_k and b_k is an integer between 1 and n.

We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If q_i < s then return a_i else return b_i. The work in the alias method is in figuring out how to produce q_k, a_k and b_k.

Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.

function get_result(node, seed):
    if seed < node.interval.start:
        return get_result(node.left_child, seed)
    else if seed < node.interval.end:
        // start <= seed < end
        return node.result
    else:
        return get_result(node.right_child, seed)

The good thing about this solution is that is very simple to implement but still has good complexity.

I'm thinking of granulating your table.

Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.

Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.

Maybe I could explain easier with an example:

Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3

Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)

arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3

Then to get the probability, just randomize a number between 0 and 10 and simply access that index.

This method might loose accuracy, but increase x and accuracy will be sufficient.

What are efficient data structures and algorithms for simulating loaded dice?

Tags:

Algorithm

Language Agnostic

Data Structures

Random

Probability

Related

Recent Posts