PCA memory error in Sklearn: Alternative Dim Reduction?
In the end, I used TruncatedSVD instead of PCA, which is capable of handling large matrices without memory issues:
from sklearn import decomposition
n_comp = 250
svd = decomposition.TruncatedSVD(n_components=n_comp, algorithm='arpack')
svd.fit(train_features)
print(svd.explained_variance_ratio_.sum())
train_features = svd.transform(train_features)
test_features = svd.transform(test_features)
You Could use IncrementalPCA
available in SK learn. from sklearn.decomposition import IncrementalPCA
. Rest of the interface is same as PCA
. You need to pass an extra argument batch_size
, which needs to <= #components.
However, in case there is a need to apply a non linear version like KernelPCA
there does not seem to be a support for the something similar. KernelPCA
absolutely explodes in it's memory requirement, see this article about Non Linear Dimensionality Reduction on Wikipedia