What's the fastest way to save/load a large list in Python 2.7?
Using np.load and tolist is significantly faster than any other solution:
In [77]: outfile = open("test.pkl","w")
In [78]: l = list(range(1000000))
In [79]: timeit np.save("test",l)
10 loops, best of 3: 122 ms per loop
In [80]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 20.9 ms per loop
In [81]: timeit pickle.load(outfile)
1 loops, best of 3: 1.86 s per loop
In [82]: outfile = open("test.pkl","r")
In [83]: timeit pickle.load(outfile)
1 loops, best of 3: 1.88 s per loop
In [84]: cPickle.dump(l,outfile)
....:
1 loops, best of 3:
273 ms per loop
In [85]: outfile = open("test.pkl","r")
In [72]: %%timeit
cPickle.load(outfile)
....:
1 loops, best of 3:
539 ms per loop
In python 3 numpy is far more efficient if you use a numpy array:
In [24]: %%timeit
out = open("test.pkl","wb")
pickle.dump(l, out)
....:
10 loops, best of 3: 27.3 ms per loop
In [25]: %%timeit
out = open("test.pkl","rb")
pickle.load(out)
....:
10 loops, best of 3: 52.2 ms per loop
In [26]: timeit np.save("test",l)
10 loops, best of 3: 115 ms per loop
In [27]: timeit np.load("test.npy")
100 loops, best of 3: 2.35 ms per loop
If you want a list it is again faster to call tolist and use np.load:
In [29]: timeit np.load("test.npy").tolist()
10 loops, best of 3: 37 ms per loop
As PadraicCunningham has mentioned, you can pickle the list.
import pickle
lst = [1,2,3,4,5]
with open('file.pkl', 'wb') as pickle_file:
pickle.dump(lst, pickle_file, protocol=pickle.HIGHEST_PROTOCOL)
this loads the list into a file.
And to extract it:
import pickle
with open('file.pkl', 'rb') as pickle_load:
lst = pickle.load(pickle_load)
print(lst) # prints [1,2,3,4,5]
The HIGHEST_PROTOCOL
bit is optional, but is normally recommended. Protocols define how pickle will serialise the object, with lower protocols tending to be compatible with older versions of Python.
It's worth noting two more things:
There is also the cPickle
module - written in C to optimise speed. You use this in the same way as above.
Pickle is also known to have some insecurities (there are ways of manipulating how pickle deserialises an object, which you can manipulate into making Python do more or less whatever you want). As a result, this library shouldn't be used when it will be opening unknown data. In extreme cases you can try out a safer version like spickle
: https://github.com/ershov/sPickle
Other libraries I'd recommend looking up are json
and marshall
.