What is the preferred way to preallocate NumPy arrays?
In cases where performance is important, np.empty
and np.zeros
appear to be the fastest ways to initialize numpy arrays.
Below are test results for each method and a few others. Values are in seconds.
>>> timeit("np.empty(1000000)",number=1000, globals=globals())
0.033749611208094166
>>> timeit("np.zeros(1000000)",number=1000, globals=globals())
0.03421245135849915
>>> timeit("np.arange(0,1000000,1)",number=1000, globals=globals())
1.2212416112155324
>>> timeit("np.ones(1000000)",number=1000, globals=globals())
2.2877375495381145
>>> timeit("np.linspace(0,1000000,1000000)",number=1000, globals=globals())
3.0824269766860652
in my experience, numpy.empty()
is the fastest way to preallocate HUGE array. the array that I'm talking about has shape with (80,80,300000)
and dtype uint8
.
here is the code:
%timeit np.empty((80,80,300000),dtype='uint8')
%timeit np.zeros((80,80,300000),dtype='uint8')
%timeit np.ones((80,80,300000),dtype='uint8')
and results from timing:
10000 loops, best of 3: 83.7 µs per loop #Too much faster
1 loop, best of 3: 273 ms per loop
1 loop, best of 3: 272 ms per loop
Preallocation mallocs all the memory you need in one call, while resizing the array (through calls to append,insert,concatenate or resize) may require copying the array to a larger block of memory. So you are correct, preallocation is preferred over (and should be faster than) resizing.
There are a number of "preferred" ways to preallocate numpy arrays depending on what you want to create. There is np.zeros
, np.ones
, np.empty
, np.zeros_like
, np.ones_like
, and np.empty_like
, and many others that create useful arrays such as np.linspace
, and np.arange
.
So
ar0 = np.linspace(10, 20, 16).reshape(4, 4)
is just fine if this comes closest to the ar0
you desire.
However, to make the last column all 1's, I think the preferred way would be to just say
ar0[:,-1]=1
Since the shape of ar0[:,-1]
is (4,)
, the 1 is broadcasted to match this shape.