How do I compute the variance of a column of a sparse matrix in Scipy?

Sicco has the better answer.

However, another way is to convert the sparse matrix to a dense numpy array one column at a time (to keep the memory requirements lower compared to converting the whole matrix at once):

# mat is the sparse matrix
# Get the number of columns
cols = mat.shape[1]
arr = np.empty(shape=cols)
for i in range(cols):
    arr[i] = np.var(mat[:, i].toarray())

You can calculate the variance yourself using the mean, with the following formula:

E[X^2] - (E[X])^2

E[X] stands for the mean. So to calculate E[X^2] you would have to square the csc_matrix and then use the mean function. To get (E[X])^2 you simply need to square the result of the mean function obtained using the normal input.

How do I compute the variance of a column of a sparse matrix in Scipy?

Tags:

Python

Numpy

Scipy

Related

Recent Posts