Efficient way to generate and use millions of random numbers in Python

Python builtin random module, e.g. random.random(), random.randint(), (some distributions also available, you probably want gaussian) does about 300K samples/s.

Since you are doing numerical computation, you probably use numpy anyway, that offers better performance if you cook random number one array at a time instead of one number at a time and wider choice of distributions. 60K/s * 1024 (array length), that's ~60M samples/s.

You can also read /dev/urandom on Linux and OSX. my hw/sw (osx laptop) manages ~10MB/s.

Surely there must be faster ways to generate random numbers en masse, e.g.:

from Crypto.Cipher import AES
from Crypto.Util import Counter
import secrets

aes = AES.new(secrets.token_bytes(16), AES.MODE_CTR, secrets.token_bytes(16), counter=Counter.new(128))
data = "0" * 2 ** 20
with open("filler.bin", "wb") as f:
    while True:
        f.write(aes.encrypt(data))

This generates 200MB/s on a single core of i5-4670K

Common ciphers like aes and blowfish manage 112MB/s and 70MB/s on my stack. Furthermore modern processors make aes even faster up to some 700MB/s see this link to test runs on few hardware combinations. (edit: link broken). You could use weaker ECB mode, provided you feed distinct inputs into it, and achieve up to 3GB/s.

Stream cipher are better suited for the task, e.g. RC4 tops out at 300MB/s on my hardware, you may get best results from most popular ciphers as more effort was spent optimising those both and software.


Generate a random number each time. Since the inner workings of the loop only care about a single random number, generate and use it inside the loop.

Example:

# do this:
import random

for x in xrange(SOMEVERYLARGENUMBER):
    n = random.randint(1,1000) # whatever your range of random numbers is
    # Do stuff with n

# don't do this:
import random

# This list comprehension generates random numbers in a list
numbers = [random.randint(1,1000) for x in xrange(SOMEVERYLARGENUMBER)]

for n in numbers:
    # Do stuff with n

Obviously, in practical terms it really doesn't matter, unless you're dealing with billions and billions of iterations, but why bother generating all those numbers if you're only going to be using one at a time?


import random
for x in (random.randint(0,80) for x in xrange(1000*1000)):
    print x

The code between parentheses will only generate one item at a time, so it's memory safe.


Code to generate 10M random numbers efficiently and faster:

import random
l=10000000
listrandom=[]
for i in range (l):
    value=random.randint(0,l)
    listrandom.append(value)
print listrandom

Time taken included the I/O time lagged in printing on screen:

real    0m27.116s
user    0m24.391s
sys 0m0.819s

Tags:

Python

Random