Save / load scipy sparse csr_matrix in portable data format
Though you write, scipy.io.mmwrite
and scipy.io.mmread
don't work for you, I just want to add how they work. This question is the no. 1 Google hit, so I myself started with np.savez
and pickle.dump
before switching to the simple and obvious scipy-functions. They work for me and shouldn't be overseen by those who didn't tried them yet.
from scipy import sparse, io
m = sparse.csr_matrix([[0,0,0],[1,0,0],[0,1,0]])
m # <3x3 sparse matrix of type '<type 'numpy.int64'>' with 2 stored elements in Compressed Sparse Row format>
io.mmwrite("test.mtx", m)
del m
newm = io.mmread("test.mtx")
newm # <3x3 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in COOrdinate format>
newm.tocsr() # <3x3 sparse matrix of type '<type 'numpy.int32'>' with 2 stored elements in Compressed Sparse Row format>
newm.toarray() # array([[0, 0, 0], [1, 0, 0], [0, 1, 0]], dtype=int32)
edit: scipy 0.19 now has scipy.sparse.save_npz
and scipy.sparse.load_npz
.
from scipy import sparse
sparse.save_npz("yourmatrix.npz", your_matrix)
your_matrix_back = sparse.load_npz("yourmatrix.npz")
For both functions, the file
argument may also be a file-like object (i.e. the result of open
) instead of a filename.
Got an answer from the Scipy user group:
A csr_matrix has 3 data attributes that matter:
.data
,.indices
, and.indptr
. All are simple ndarrays, sonumpy.save
will work on them. Save the three arrays withnumpy.save
ornumpy.savez
, load them back withnumpy.load
, and then recreate the sparse matrix object with:
new_csr = csr_matrix((data, indices, indptr), shape=(M, N))
So for example:
def save_sparse_csr(filename, array):
np.savez(filename, data=array.data, indices=array.indices,
indptr=array.indptr, shape=array.shape)
def load_sparse_csr(filename):
loader = np.load(filename)
return csr_matrix((loader['data'], loader['indices'], loader['indptr']),
shape=loader['shape'])