Can Mathematica's random number generation be improved?

Update 06/10/2016

Now I'm using Mathematica 11.0, things seem to have improved in some cases:

list1 = RandomInteger[{1, 10}, {10^5}];
list2 = RandomReal[{0, 1}, {10^5}];
MapThread[
    RandomVariate[BinomialDistribution[#1, #2]] &, {list1, 
     list2}]; // AbsoluteTiming

(* 38.59 seconds on MMA10 *)
(*  2.94 seconds on MMA11, same machine *)

Still not as fast as my compiled version (0.02 seconds!), but a definite improvement. Looking at the other distributions:

list = RandomReal[{0, 100}, {10^5}];
RandomVariate[PoissonDistribution[#]] & /@ list; // AbsoluteTiming

(* 13.16 seconds on MMA10 *)
(*  3.24 seconds on MMA11, same machine *)

list = RandomReal[{0, 100}, {10^6}];
RandomVariate[ExponentialDistribution[#]] & /@ list; // AbsoluteTiming

(*  5.06 seconds on MMA10 *)
(* 12.83 seconds on MMA11, same machine *)

list = RandomReal[{0, 100}, {10^6}];
RandomVariate[ChiSquareDistribution[#]] & /@ list; // AbsoluteTiming

(* 11.10 seconds on MMA10 *)
(*  6.32 seconds on MMA11, same machine *)

Oh dear, not good news for ExponentialDistribution[]!


Update 20/02/2015

I've now adapted my code to work with a binomial distribution, since that seemed to be the most problematic case for the OP. The relevant C++ code is at the bottom of this answer.

Needs["CCompilerDriver`"]

BinomialVariateLib = 
  CreateLibrary[{ToString[NotebookDirectory[]] <> 
     "binomialvariate.cpp"}, "BinomialVariateLib", "Debug" -> False, 
   "CompileOptions" -> "-std=c++11 -O3"];
BinomialVariate = 
  LibraryFunctionLoad[BinomialVariateLib, 
   "BinomialVariate", {{Real, 1}, {Real, 1}}, {Real, 1}];

list1 = RandomInteger[{1, 10}, {10^5}];
list2 = RandomReal[{0, 1}, {10^5}];

l1 = BinomialVariate[list1, list2]; // AbsoluteTiming
l2 = MapThread[
    RandomVariate[BinomialDistribution[#1, #2]] &, {list1, 
     list2}]; // AbsoluteTiming

(* 0.020 seconds *)
(* 38.59 seconds *)

I make that a speed-up of 2000x, with no multi-thread/parallel computing invoked. Trying with a larger list of $10^{6}$ as the OP does in another answer, my method takes 0.16 seconds (I didn't bother timing the MMA version, apparently it takes minutes). This appears to be a similar performance to the R code given by the OP in a separate answer.

It's even better on Windows 8.1 with the Visual Studio compiler, I get a 2700x speed-up. Perhaps the Intel compiler would be even quicker. Unfortunately the OP couldn't get my code to work on a Mac. I guess it has something to do with the use of std::random and std::chrono, but these could be overcome by using a non-C++11 solution. Alternatively, I believe it is possible to install GCC on Mac OS? The above example is compiled with GCC.

Note: @ciao's answer using a parametric mixture distribution is very quick, even more so than my unoptimized C++ below. It takes about 0.13 seconds on my machine to generate the distribution and then draw $10^{6}$ samples from it. At a guess, putting together a similar method in C++ will be faster still, but perhaps not worth the effort. It may also not be appropriate for the OP's case if the distribution needs to be changed each time.


Original answer

One example where drawing random numbers from distributions with different parameters is for adding Poisson noise to an image, where the level of noise depends on the underlying signal.

I encountered this very problem, and because only certain distributions can be compiled in Mathematica, I asked a question about it: Speed up adding Poisson noise to an image.

Eventually I came up with a solution using C++ to deal with the distributions, and pass the data back to MMA using LibraryLink. Adapting that solution, the following improvement is possible for the Poisson distribution. The relevant C++ code is at the bottom of this answer.

Needs["CCompilerDriver`"]

PoissonVariateLib = 
  CreateLibrary[{ToString[NotebookDirectory[]] <> 
     "poissonvariate.cpp"}, "PoissonVariateLib", "Debug" -> False, 
   "CompileOptions" -> "-std=c++11 -O3"];
PoissonVariate = 
  LibraryFunctionLoad[PoissonVariateLib, 
   "PoissonVariate", {{Real, 1}}, {Real, 1}];

list = RandomReal[{0, 100}, {10^5}];

PoissonVariate@list; // AbsoluteTiming
RandomVariate[PoissonDistribution[#]] & /@ list; // AbsoluteTiming

(* 0.08 seconds *)
(* 13.16 seconds *)

This is 150x speed-up over MMA! I've not looked into optimizing the C++ at all, and nor am I making much use of multiple CPU cores by parallelizing any of my code at either the C++ or MMA level.

However, this answer should demonstrate that if time is a critical factor for you, then LibraryLink or MathLink to C/C++ is a pretty good option to consider.

The same process can be done for other distributions in your example, by using what's available in std::random in C++. First the exponential distribution (remember to adapt the C++ code!):

(* Trying it with a bigger list this time *)
list = RandomReal[{0, 100}, {10^6}];

ExponentialVariate@list; // AbsoluteTiming
RandomVariate[ExponentialDistribution[#]] & /@ list; // AbsoluteTiming

(* 0.22 seconds *)
(* 5.06 seconds *)

Another impressive speed-up. Changing the code again for the Chi-Squared distribution, we find another good speed-up:

(* Trying it with a bigger list this time *)
list = RandomReal[{0, 100}, {10^6}];

ChiSquaredVariate@list; // AbsoluteTiming
RandomVariate[ChiSquareDistribution[#]] & /@ list; // AbsoluteTiming

(* 0.11 seconds *)
(* 11.1 seconds *)

C++ code

This code is for the file binomialvariate.cpp, as per my updated answer. The code for the Poisson, exponential and chi-squared distributions (and indeed many other distributions) should be straightforward to implement from this, or from my answer here.

#include "WolframLibrary.h"
#include "stdio.h"
#include "stdlib.h"
#include <random>
#include <chrono>

EXTERN_C DLLEXPORT mint WolframLibrary_getVersion(){return WolframLibraryVersion;}
EXTERN_C DLLEXPORT int WolframLibrary_initialize( WolframLibraryData libData) {return 0;}
EXTERN_C DLLEXPORT void WolframLibrary_uninitialize( WolframLibraryData libData) {}

EXTERN_C DLLEXPORT int BinomialVariate(WolframLibraryData libData, mint Argc, MArgument *Args, MArgument Res){

    int err; // error code

    MTensor m1; // input tensor 2
    MTensor m2; // input tensor 2
    MTensor m3; // output tensor

    mint const* dims1; // dimensions of the tensor

    double* data1; // actual data of the input tensor
    double* data2;
    double* data3; // data for the output tensor

    mint i; // bean counters

    m1 = MArgument_getMTensor(Args[0]);
    dims1 = libData->MTensor_getDimensions(m1);
    m2 = MArgument_getMTensor(Args[1]);
    err = libData->MTensor_new(MType_Real, 1, dims1, &m3);
    data1 = libData->MTensor_getRealData(m1);
    data2 = libData->MTensor_getRealData(m2);
    data3 = libData->MTensor_getRealData(m3);

    unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
    std::mt19937 generator (seed); // RNG

    for(i=0;i<dims1[0];i++) {
        std::binomial_distribution<int> distribution(data1[i], data2[i]); 
        data3[i] = distribution(generator); 
    }

    MArgument_setMTensor(Res, m3);
    return LIBRARY_NO_ERROR;
}

This is a poor use of the random functions in Mathematica. As clearly stated in the documentation, generation of variates one at a time has significant overhead, and generating them en masse has significant benefits, particularly with statistical distributions:

For statistical distributions, the speed advantage of generating many numbers at once can be even greater. In addition to the efficiency benefit inherited from the uniform number generators used, many statistical distributions also benefit from vectorized evaluation of elementary and special functions. For instance, WeibullDistribution benefits from vector evaluations of the elementary functions Power, Times, and Log.

E.g., generating 100K Weibull RV has a speed difference of over 300X.

If you need such variates generated individually, or because they cannot be generated en masse, explore other options. Using a ParameterMixture distribution can speed things dramatically when the parameters themselves are RV (as in your examples), you can build custom generators within Mathematica for specific use cases/custom distributions that can exhibit marked performance benefits, or if you have external facilities that already generate them in the form and with the speed you desire, just call them from within Mathematica.

Per Blochwave's comment, an example of the technique I use for such things. This is the binomial case:

dist = ParameterMixtureDistribution[
   BinomialDistribution[a, b], {a \[Distributed] DiscreteUniformDistribution[{1, 10}], 
                                b \[Distributed] UniformDistribution[]}];

pdf = PDF[dist, #] & /@ Range[0, 10];

variates = RandomChoice[pdf -> Range[0, 10], 10^6]; // Timing

(* {0.202801,Null} *)

That's on an old netbook. The R example on the same netbook takes ~1.6 seconds...

Similar techniques can be used for continuous derived distributions, substituting inversion (or technique of choice) for the RandomChoice.

As with any language, one must think about the tools the language makes available, and how to use them efficiently (want the exact symbolic PDF from R to do such things? Oops, out of luck...). Any pair of languages lend themselves to arbitrary efficiency drag races where one might appear to be clearly superior, until one uses the tools intelligently. One does not expect the Ferrari to be an efficient garbage scow, every software tool has its place and its own idiosyncrasies and coding techniques that can be opaque to many.


Here is my solution. The first 3 lines of code were what I needed to make this work on my Mac. It is not in the documentation and you'd need something else on a PC.

Needs["RLink`"]
SetEnvironment["DYLD_LIBRARY_PATH" -> "/Library/Frameworks/R.framework/Resources/lib"];
InstallR["RHomeLocation" -> "/Library/Frameworks/R.framework/Resources"];
REvaluate["R.version.string"]

num = 10^6;
ints = RandomInteger[{1, 10}, num];
reals = RandomReal[{0, 1}, num];
binom = REvaluate["rbinom"];
pois = REvaluate["rpois"];
binom[num, ints, reals]; // Timing
pois[num, reals]; // Timing

I gather that comparing Mathematica to C is considered invalid although I don't understand why. Is it also invalid to compare the performance of Mathematica to R on the same problem? Are all comparisons of the performance of other languages to Mathematica invalid or are there some valid ones?

In the hope that R is considered a more appropriate comparison, The following R is faster than my compiled C by a factor of 5 or 10.

num=10^6; 
ptm <- proc.time(); 
out=rbinom(num,sample(1:10,num,TRUE),runif(num)); 
proc.time()-ptm

The above line of R runs in about 150 milliseconds. The equivalent line of Mathematica takes over 10 minutes.

num = 10^6; 
MapThread[
    RandomVariate[BinomialDistribution[#1, #2]] &,  
        {RandomInteger[{1, 10}, num], RandomReal[{0, 1}, num]}
   ]; // Timing

(* 629.910404, Null *)

The fact that the lousy performance the random number generation is documented is of little help. While blochwave's solution will prove useful it will require some effort to make it work on my system. It took about 5 minutes to read the R documentation and write the code.

Here is the Octave code:

$ cat rnumcall.m    
t=cputime;
num=10^7;
n=randi([1,5],num,1);
p=rand(num,1);
out=binornd(n,p);
cputime-t

octave rnumcall.m
ans =  2.8404

or with num=10^6,

$ octave rnumcall.m    
ans =  0.29141

which is of course faster than Mathematica but now only 2000x as fast rather than R's 4000x.