Why is the use of rand() considered bad?
None of the answers here explains the real reason of being rand()
bad.
rand()
is a pseudo-random number generator (PRNG), but this doesn't mean it must be bad. Actually, there are very good PRNGs, which are statistically hard or impossible to distinguish from true random numbers.
rand()
is completely implementation defined, but historically it is implemented as a Linear Congruential Generator (LCG), which is usually a fast, but notoriously bad class of PRNGs. The lower bits of these generators have much lower statistical randomness than the higher bits and the generated numbers can produce visible lattice and/or planar structures (the best example of that is the famous RANDU PRNG). Some implementations try to reduce the lower bits problem by shifting the bits right by a pre-defined amount, however this kind of solution also reduces the range of the output.
Still, there are notable examples of excellent LCGs, like L'Ecuyer's 64 and 128 bits multiplicative linear congruential generators presented in Tables of Linear Congruential Generators of Different Sizes and Good Lattice Structure, Pierre L'Ecuyer, 1999.
The general rule of thumb is that don't trust rand()
, use your own pseudo-random number generator which fits your needs and usage requirements.
There are two parts to this story.
First, rand
is a pseudorandom number generator. This means it depends on a seed. For a given seed it will always give the same sequence (assuming the same implementation). This makes it not suitable for certain applications where security is of a great concern. But this is not specific to rand
. It's an issue with any pseudo-random generator. And there are most certainly a lot of classes of problems where a pseudo-random generator is acceptable. A true random generator has its own issues (efficiency, implementation, entropy) so for problems that are not security related most often a pseudo-random generator is used.
So you analyzed your problem and you conclude a pseudo-random generator is the solution. And here we arrive to the real troubles with the C random library (which includes rand
and srand
) that are specific to it and make it obsolete (a.k.a.: the reasons you should never use rand
and the C random library).
One issue is that it has a global state (set by
srand
). This makes it impossible to use multiple random engines at the same time. It also greatly complicates multithreaded tasks.The most visible problem of it is that it lacks a distribution engine:
rand
gives you a number in interval[0 RAND_MAX]
. It is uniform in this interval, which means that each number in this interval has the same probability to appear. But most often you need a random number in a specific interval. Let's say[0, 1017]
. A commonly (and naive) used formula isrand() % 1018
. But the issue with this is that unlessRAND_MAX
is an exact multiple of1018
you won't get an uniform distribution.Another issue is the Quality of Implementation of
rand
. There are other answers here detailing this better than I could, so please read them.
In modern C++ you should definitely use the C++ library from <random>
which comes with multiple random well-defined engines and various distributions for integer and floating point types.
What is bad about rand
/srand
is that rand
—
- Uses an unspecified algorithm for the sequence of numbers it generates, yet
- allows that algorithm to be initialized with
srand
for repeatable "randomness".
These two points, taken together, hamper the ability of implementations to improve on rand
's implementation (e.g., to use a cryptographic random number generator [RNG] or an otherwise "better" algorithm for producing pseudorandom numbers). For example, JavaScript's Math.random
and FreeBSD's arc4random
don't have this problem, since they don't allow applications to seed them for repeatable "randomness" — it's for exactly this reason that the V8 JavaScript engine was able to change its Math.random
implementation to a variant of xorshift128+
while preserving backward compatibility. (On the other hand, letting applications supply additional data to supplement "randomness", as in BCryptGenRandom
, is less problematic; even so, however, this is generally seen only in cryptographic RNGs.)
Also:
- The fact that the algorithm and the seeding procedure for
rand
andsrand
are unspecified means that even reproducible "randomness" is not guaranteed betweenrand
/srand
implementations, between versions of the same standard library, between operating systems, etc. - If
srand
is not called beforerand
is,rand
behaves similarly as thoughsrand(1)
were first called. In practice, this means thatrand
can only be implemented as a pseudorandom number generator (PRNG) rather than as a nondeterministic RNG, and thatrand
's PRNG algorithm can't differ in a given implementation whether the application callssrand
or not.
EDIT (Jul. 8, 2020):
There is one more important thing that's bad about rand
and srand
. Nothing in the C standard for these functions specifies a particular distribution that the "pseudo-random numbers" delivered by rand
have to follow, including the uniform distribution or even a distribution that approximates the uniform distribution. Contrast this with C++'s uniform_int_distribution
and uniform_real_distribution
classes, as well as the specific pseudorandom generator algorithms specified by C++, such as linear_congruential_engine
and mt19937
.
EDIT (begun Dec. 12, 2020):
Yet another bad thing about rand
and srand
: srand
takes a seed that can only be as big as an unsigned
. unsigned
must be at least 16 bits and in most mainstream C implementations, unsigned
is either 16 or 32 bits depending on the implementation's data model (notably not 64 bits even if the C implementation adopts a 64-bit data model). Thus, no more than 2^N different sequences of numbers can be selected this way (where N is the number of bits in an unsigned
), even if the underlying algorithm implemented by rand
can produce many more different sequences than that (say, 2^128 or even 2^19937 as in C++'s mt19937
).