Boolean operations on scipy.sparse matrices
Logical operations are not supported for sparse matrices, but converting back to a 'bool' is not all that expensive. Actually, if using LIL format matrices, the conversion may appear to take negative time due to performance fluctuations:
a = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool')
b = scipy.sparse.rand(10000, 10000, density=0.001, format='lil').astype('bool')
In [2]: %timeit a+b
10 loops, best of 3: 61.2 ms per loop
In [3]: %timeit (a+b).astype('bool')
10 loops, best of 3: 60.4 ms per loop
You may have noticed that your LIL matrices were converted to CSR format before adding them together, look at the return format. If you had already been using CSR format to begin with, then the conversion overhead becomes more noticeable:
In [14]: %timeit a+b
100 loops, best of 3: 2.28 ms per loop
In [15]: %timeit (a+b).astype(bool)
100 loops, best of 3: 2.96 ms per loop
CSR (and CSC) matrices have a data
attribute which is a 1D array that holds the actual non-zero entries of the sparse matrix, so the cost of recasting your sparse matrix will depend on the number of non-zero entries of your matrix, not its size:
a = scipy.sparse.rand(10000, 10000, density=0.0005, format='csr').astype('int8')
b = scipy.sparse.rand(1000, 1000, density=0.5, format='csr').astype('int8')
In [4]: %timeit a.astype('bool') # a is 10,000x10,000 with 50,000 non-zero entries
10000 loops, best of 3: 93.3 us per loop
In [5]: %timeit b.astype('bool') # b is 1,000x1,000 with 500,000 non-zero entries
1000 loops, best of 3: 1.7 ms per loop
You can easily express Boolean operations by the following means. Then it works with sparse matrices.
a.multiply(b) #AND
a+b #OR
(a>b)+(a<b) #XOR
a>b #NOT
So Boolean operations are supported.