How can I shorten the calculation time effectively?
This should have the same effect and runs in about 7.6 seconds.
AbsoluteTiming[
m = 10^7;
l = 4;
n = RandomSample[1 ;; m, 2 10^5];
u = Normal[SparseArray[Partition[n, 1] -> 1, {m}]];
sum = Total@ParallelTable[
Total[u[[RandomChoice[1 ;; m, 10^l]]]],
{10^(9 - l)},
Method -> "CoarsestGrained"
]
]
The important points are:
Using
RandomSample[1 ;; m, ...]
andRandomChoice[1 ;; m, ...]
. This allows us to handle also rather largem
.Conversion of the very expensive
Count
operation (which goes through the whole listn
$10^9$ times!) into a simple read operation from the (packed!) arrayu
.Chopping the vector
r
into small pieces so that it never has to be stored in RAM; if I computed correctly,r
would need more than 7 GB of RAM. Since the pseudorandom numbers in the array returned fromRandomChoice
are pairwise independent, we may just callRandomChoice[1 ;; m, 10^l]
multiple times, do the counts and sum then up in the end withTotal
.
Instead of building the long list r
of random numbers, you could sample directly from the distribution of results: replace the third line of your code with
RandomVariate[BinomialDistribution[Length[r], Length[n]/Length[all]]]
to get a result instantaneously. Or is this cheating?