Generating random boolean
Here is a C++11 function template generating boolean outcomes (binomial distribution) with a specified probability (default 0.5 is for uniform):
#include <random>
template <typename Prob = double>
bool binomial_trial(const Prob p = 0.5) {
static auto dev = std::random_device();
static auto gen = std::mt19937{dev()};
static auto dist = std::uniform_real_distribution<Prob>(0,1);
return (dist(gen) < p);
}
bool randomBool() {
return 0 + (rand() % (1 - 0 + 1)) == 1;
}
This is perhaps the worst possible way to convert the output of rand()
into a boolean. On many implementations, the lower order bits are much less random than the upper order bits.
Ideally, you'd use something else entirely, but if you must use rand()
, try:
bool randomBool() {
return rand() > (RAND_MAX / 2);
}
The lower-order bits of pseudo-random number generators tend to provide less randomness. This is especially true for the builtin rand()
function, which is generally implemented as an LCG. The best way to generate random bool
is to use the MSB bit. This is actually a standard Bernoulli distribution with probability 1/2
.
#include <cmath>
#include <cstdlib>
inline bool random_bool()
{
static const int shift = static_cast<int>(std::log2(RAND_MAX));
return (rand() >> shift) & 1;
}
The STL in C++11 has build in random number generation methods that are superior to rand()
. You can simulate a random boolean through a random integer that is 0 or 1:
#include <iostream>
#include <random>
int main(int argc, char *argv[]) {
auto gen = std::bind(std::uniform_int_distribution<>(0,1),std::default_random_engine());
const unsigned int N = 100;
unsigned int numTrue = 0;
unsigned int numFalse = 0;
for (int i = 0; i < 100; ++i) {
bool b = gen();
if (b) ++ numTrue;
else ++numFalse;
}
std::cout << numTrue << " TRUE, " << numFalse << " FALSE" << std::endl;
}
You can find more details on this library in standard C++ references. For instance, if you wanted something other than a 50/50 ratio of "true" and "false" values, you could create a random floating point number between 0 and 1 and call values less than some threshold z true, otherwise false.
Why you see long streaks, I think
I haven't addressed why you get 30 values of "true" or "false" in a row with your code. Though rand()
should no longer be used, and you seem to have some unnecessary addition and subtraction of ones and zeroes in your code, there shouldn't be such a problem. However, I realize now the text in your question is ambiguous. If you are running and exiting your program 30 times in a row, you should expect to see repeated values -- even with my code. Most random number generators are really pseudorandom number generators. Each time you run the program, they will produce the same sequence of random numbers; this is important for consistency of results. However, while the program is running (e.g. putting your randomBool()
in a loop), you should not see streaks of such a length, as they would be highly improbable.
Improbability of long streaks
I was surprised to receive comments disagreeing with my assertion that a streak of 30 "true" or "false" random booleans is improbable (when true or false are equally likely). I realize that a common misunderstanding of probability is that "luck" tries to even things out, and so that if a coin toss has come up heads a couple times in a row, then the universe will try to correct this and make a tails more likely. Because of this misunderstanding, people underestimate the likelihood of getting streaks of all heads and all tails, and I think the motivations of the comments on this answer and the main question were to correct this common mistake.
However, there is a real reason that long streaks (especially as long as 30) are increasingly unlikely. Using the language of random unbiased coin tosses, each IID (independent and identically distributed) coin toss has only 50% chance of being the same as the previous. Thus, the probability of a long streak decreases exponentially with the length of the streak. For a streak of length L, the probability of a streak of all heads is 1 in 2^L; the probability of a streak of either type is 2 in 2^L or 1 in 2^(L-1). Here is some code to demonstrate:
#include <iostream>
#include <random>
#include <map>
bool randomBool() {
static auto gen = std::bind(std::uniform_int_distribution<>(0,1),std::default_random_engine());
return gen();
}
int main(int argc, char *argv[]) {
const unsigned int N = 1e8;
std::map<unsigned int,unsigned int> histogram;
bool current = randomBool();
unsigned int currentLength = 1;
for (int i = 0; i < N; ++i) {
bool b = randomBool();
if (b == current) {
++currentLength;
} else {
auto it = histogram.find(currentLength);
if (it != histogram.end())
it->second += 1;
else
histogram.insert(std::make_pair(currentLength,1));
currentLength = 1;
}
current = b;
}
for (auto pair : histogram)
std::cout << "STREAK LENGTH " << pair.first << " OCCURS " << pair.second << " TIMES" << std::endl;
}
The output histogram is:
STREAK LENGTH 1 OCCURS 25011106 TIMES
STREAK LENGTH 2 OCCURS 12503578 TIMES
STREAK LENGTH 3 OCCURS 6249056 TIMES
STREAK LENGTH 4 OCCURS 3125508 TIMES
STREAK LENGTH 5 OCCURS 1560812 TIMES
STREAK LENGTH 6 OCCURS 781206 TIMES
STREAK LENGTH 7 OCCURS 390143 TIMES
STREAK LENGTH 8 OCCURS 194748 TIMES
STREAK LENGTH 9 OCCURS 97816 TIMES
STREAK LENGTH 10 OCCURS 48685 TIMES
STREAK LENGTH 11 OCCURS 24327 TIMES
STREAK LENGTH 12 OCCURS 12176 TIMES
STREAK LENGTH 13 OCCURS 6149 TIMES
STREAK LENGTH 14 OCCURS 3028 TIMES
STREAK LENGTH 15 OCCURS 1489 TIMES
STREAK LENGTH 16 OCCURS 811 TIMES
STREAK LENGTH 17 OCCURS 383 TIMES
STREAK LENGTH 18 OCCURS 193 TIMES
STREAK LENGTH 19 OCCURS 104 TIMES
STREAK LENGTH 20 OCCURS 43 TIMES
STREAK LENGTH 21 OCCURS 20 TIMES
STREAK LENGTH 22 OCCURS 14 TIMES
STREAK LENGTH 23 OCCURS 4 TIMES
STREAK LENGTH 24 OCCURS 3 TIMES
It is difficult to compute the expected number of streaks of length L in a number of flips N, since there are many overlapping stretches of length L where such a streak could exist. However, note that this histogram follows a roughly exponential distribution, with each entry approximately half the preceding entry.
The maximum streak is 24 [note: a bug in the previous version counted this as 23]. The probability of a streak of this length in any independent string of 24 tosses is 1 in 2^(24-1), or about 1 in 8 million. Since in 1e8 tosses there are about 1e8/24 ~ 4.3 million such separate stretches, we expect a small number of such streaks, so this seems about right [with my above caveat that calculating the exact expectation is difficult]. A streak of length 30, meanwhile, has a probability of 1 in 537 million in any independent stretch of 30 flips, and is much less likely even than a streak of length 24.