How do I compute the variance of a column of a sparse matrix in Scipy?
Sicco has the better answer.
However, another way is to convert the sparse matrix to a dense numpy array one column at a time (to keep the memory requirements lower compared to converting the whole matrix at once):
# mat is the sparse matrix
# Get the number of columns
cols = mat.shape[1]
arr = np.empty(shape=cols)
for i in range(cols):
arr[i] = np.var(mat[:, i].toarray())
You can calculate the variance yourself using the mean, with the following formula:
E[X^2] - (E[X])^2
E[X]
stands for the mean. So to calculate E[X^2]
you would have to square the csc_matrix
and then use the mean
function. To get (E[X])^2
you simply need to square the result of the mean
function obtained using the normal input.