Weighted random selection from array

I have found this article to be the most useful at understanding this problem fully. This stackoverflow question may also be what you're looking for.


I believe the optimal solution is to use the Alias Method (wikipedia). It requires O(n) time to initialize, O(1) time to make a selection, and O(n) memory.

Here is the algorithm for generating the result of rolling a weighted n-sided die (from here it is trivial to select an element from a length-n array) as take from this article. The author assumes you have functions for rolling a fair die (floor(random() * n)) and flipping a biased coin (random() < p).

Algorithm: Vose's Alias Method

Initialization:

  1. Create arrays Alias and Prob, each of size n.
  2. Create two worklists, Small and Large.
  3. Multiply each probability by n.
  4. For each scaled probability pi:
    1. If pi < 1, add i to Small.
    2. Otherwise (pi ≥ 1), add i to Large.
  5. While Small and Large are not empty: (Large might be emptied first)
    1. Remove the first element from Small; call it l.
    2. Remove the first element from Large; call it g.
    3. Set Prob[l]=pl.
    4. Set Alias[l]=g.
    5. Set pg := (pg+pl)−1. (This is a more numerically stable option.)
    6. If pg<1, add g to Small.
    7. Otherwise (pg ≥ 1), add g to Large.
  6. While Large is not empty:
    1. Remove the first element from Large; call it g.
    2. Set Prob[g] = 1.
  7. While Small is not empty: This is only possible due to numerical instability.
    1. Remove the first element from Small; call it l.
    2. Set Prob[l] = 1.

Generation:

  1. Generate a fair die roll from an n-sided die; call the side i.
  2. Flip a biased coin that comes up heads with probability Prob[i].
  3. If the coin comes up "heads," return i.
  4. Otherwise, return Alias[i].

Compute the discrete cumulative density function (CDF) of your list -- or in simple terms the array of cumulative sums of the weights. Then generate a random number in the range between 0 and the sum of all weights (might be 1 in your case), do a binary search to find this random number in your discrete CDF array and get the value corresponding to this entry -- this is your weighted random number.


The algorithm is straight forward

rand_no = rand(0,1)
for each element in array 
     if(rand_num < element.probablity)
          select and break
     rand_num = rand_num - element.probability