random number generator test

You should be aware that you cannot guarantee the random number generator is working properly. Note that even a perfect uniform distribution in range [1,10] - there is a 10^-10 chance of getting 10 times 10 in a random sampling of 10 numbers.

Is it likely? Of course not.

So - what can we do?

We can statistically prove that the combination (10,10,....,10) is unlikely if the random number generator is indeed uniformly distributed. This concept is called Hypothesis testing. With this approach we can say "with certainty level of x% - we can reject the hypothesis that the data is taken from a uniform distribution".

A common way to do it, is using Pearson's Chi-Squared test, The idea is similar to yours - you fill in a table - check what is the observed (generated) number of numbers for each cell, and what is the expected number of numbers for each cell under the null hypothesis (in your case, the expected is k/M - where M is the range's size, and k is the total number of numbers taken).
You then do some manipulation on the data (see the wikipedia article for more info what this manipulation is exactly) - and get a number (the test statistic). You then check if this number is likely to be taken from a Chi-Square Distribution. If it is - you cannot reject the null hypothesis, if it is not - you can be certain with x% certainty that the data is not taken from a uniform random generator.

EDIT: example:
You have a cube, and you want to check if it is "fair" (uniformly distributed in [1,6]). Throw it 200 times (for example) and create the following table:

number:                1       2         3         4          5          6
empirical occurances: 37       41        30        27         32         33
expected occurances: 33.3      33.3      33.3      33.3       33.3       33.3

Now, according to Pearson's test, the statistic is:

X = ((37-33.3)^2)/33.3 + ((41-33.3)^2)/33.3 + ... + ((33-33.3)^2)/33.3 
X = (18.49 + 59.29 + 10.89 + 39.69 + 1.69 + 0.09) / 33.3
X = 3.9

For a random C~ChiSquare(5), the probability of being higher then 3.9 is ~0.45 (which is not improbable)¹.

So we cannot reject the null hypothesis, and we can conclude that the data is probably uniformly distributed in [1,6]

(1) We usually reject the null hypothesis if the value is smaller then 0.05, but this is very case dependent.

random number generator test

Tags:

Algorithm

Unit Testing

Testing

Related

Recent Posts