What is the fastest way to slice a scipy.sparse matrix?

I've found that the advertised fast row indexing of scipy.sparse.csr_matrix can be made a lot quicker by rolling your own row indexer. Here's the idea:

class SparseRowIndexer:
    def __init__(self, csr_matrix):
        data = []
        indices = []
        indptr = []

        # Iterating over the rows this way is significantly more efficient
        # than csr_matrix[row_index,:] and csr_matrix.getrow(row_index)
        for row_start, row_end in zip(csr_matrix.indptr[:-1], csr_matrix.indptr[1:]):
             data.append(csr_matrix.data[row_start:row_end])
             indices.append(csr_matrix.indices[row_start:row_end])
             indptr.append(row_end-row_start) # nnz of the row

        self.data = np.array(data)
        self.indices = np.array(indices)
        self.indptr = np.array(indptr)
        self.n_columns = csr_matrix.shape[1]

    def __getitem__(self, row_selector):
        data = np.concatenate(self.data[row_selector])
        indices = np.concatenate(self.indices[row_selector])
        indptr = np.append(0, np.cumsum(self.indptr[row_selector]))

        shape = [indptr.shape[0]-1, self.n_columns]

        return sparse.csr_matrix((data, indices, indptr), shape=shape)

That is, it is possible to utilize the fast indexing of numpy arrays by storing the non-zero values of each row in separate arrays (with a different length for each row) and putting all of those row arrays in an object-typed array (allowing each row to have a different size) that can be indexed efficiently. The column indices are stored the same way. The approach is slightly different to the standard CSR data structure which stores all non-zero values in a single array, requiring look-ups to see where each row starts and ends. These look-ups can slow down random access but should be efficient for retrieval of contiguous rows.

Profiling results

My matrix mat is a 1,900,000x1,250,000 csr_matrix with 400,000,000 non-zero elements. ilocs is an array of 200,000 random row indices.

>>> %timeit mat[ilocs]
2.66 s ± 233 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

compared to:

>>> row_indexer = SparseRowIndexer(mat)
>>> %timeit row_indexer[ilocs]
59.9 ms ± 4.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

The SparseRowIndexer seems to be faster when using fancy indexing compared to boolean masks.


If you want to obtain a sparse matrix as output the fastest way to do row slicing is to have a csr type, and for columns slicing csc, as detailed here. In both cases you just have to do what you are currently doing:

matrix[l1:l2,c1:c2]

If you want another type as output there maybe faster ways. In this other answer it is explained many methods for slicing a matrix and their different timings compared. For example, if you want a ndarray as output the fastest slicing is:

matrix.A[l1:l2,c1:c2] 

or:

matrix.toarray()[l1:l2,c1:c2]

much faster than:

matrix[l1:l2,c1:c2].A #or .toarray()