Fastest way to grow a numpy numeric array
np.append() copy all the data in the array every time, but list grow the capacity by a factor (1.125). list is fast, but memory usage is larger than array. You can use array module of the python standard library if you care about the memory.
Here is a discussion about this topic:
How to create a dynamic array
I tried a few different things, with timing.
import numpy as np
The method you mention as slow: (32.094 seconds)
class A: def __init__(self): self.data = np.array([]) def update(self, row): self.data = np.append(self.data, row) def finalize(self): return np.reshape(self.data, newshape=(self.data.shape[0]/5, 5))
Regular ol Python list: (0.308 seconds)
class B: def __init__(self): self.data = [] def update(self, row): for r in row: self.data.append(r) def finalize(self): return np.reshape(self.data, newshape=(len(self.data)/5, 5))
Trying to implement an arraylist in numpy: (0.362 seconds)
class C: def __init__(self): self.data = np.zeros((100,)) self.capacity = 100 self.size = 0 def update(self, row): for r in row: self.add(r) def add(self, x): if self.size == self.capacity: self.capacity *= 4 newdata = np.zeros((self.capacity,)) newdata[:self.size] = self.data self.data = newdata self.data[self.size] = x self.size += 1 def finalize(self): data = self.data[:self.size] return np.reshape(data, newshape=(len(data)/5, 5))
And this is how I timed it:
x = C()
for i in xrange(100000):
x.update([i])
So it looks like regular old Python lists are pretty good ;)