How to quickly fill a numpy array with values from separate calls to a function
There is nothing NumPy can do to accelerate the process of repeatedly calling a function not designed to interact with NumPy.
The "fancy usage of numpy" way to optimize this is to manually rewrite your generate
function to use NumPy operations to generate entire arrays of output instead of only supporting single values. That's how NumPy works, and how NumPy has to work; any solution that involves calling a Python function over and over again for every array cell is going to be limited by Python overhead. NumPy can only accelerate work that actually happens in NumPy.
If NumPy's provided operations are too limited to rewrite generate
in terms of them, there are options like rewriting generate
with Cython, or using @numba.jit
on it. These mostly help with computations that involve complex dependencies from one loop iteration to the next; they don't help with external dependencies you can't rewrite.
If you cannot rewrite generate
, all you can do is try to optimize the process of getting the return values into your array. Depending on array size, you may be able to save some time by reusing a single array object:
In [32]: %timeit x = numpy.array([random.random() for _ in range(10)])
The slowest run took 5.13 times longer than the fastest. This could mean that an
intermediate result is being cached.
100000 loops, best of 5: 5.44 µs per loop
In [33]: %%timeit x = numpy.empty(10)
....: for i in range(10):
....: x[i] = random.random()
....:
The slowest run took 4.26 times longer than the fastest. This could mean that an
intermediate result is being cached.
100000 loops, best of 5: 2.88 µs per loop
but the benefit vanishes for larger arrays:
In [34]: %timeit x = numpy.array([random.random() for _ in range(100)])
10000 loops, best of 5: 21.9 µs per loop
In [35]: %%timeit x = numpy.empty(100)
....: for i in range(100):
....: x[i] = random.random()
....:
10000 loops, best of 5: 22.8 µs per loop
Conventional "Pythoninc"
List comprehension, or the map function could both be possible solutions for you:
from random import random
import numpy as np
np.array(list(map(lambda idx: random(), range(10))))
np.array([random() for idx in range(10)])
"Need-for-speed"
Maybe pre-alocating the memory will shave off a micro second or two(?)
array = np.empty(10)
for idx in range(10):
array[idx] = random()
See Nathan's answer for an even better solution.
Function Vectorisation
A function can be "vectorised" using numpy:
def rnd(x):
return random()
fun = np.vectorize(rnd)
array = fun(range(10))
Another option would be to make a ufunc
from your generate
function:
gen_array = np.frompyfunc(generate, 0, 1) # takes 0 args, returns 1
array = gen_array(np.empty(array_length))
This is a bit faster for me than the "need for speed" version from Sigve's answer.