pca.inverse_transform in sklearn
When I perform inverse transformation by definition isn't it supposed to return to original data
No, you can only expect this if the number of components you specify is the same as the dimensionality of the input data. For any n_components less than this, you will get different numbers than the original dataset after applying the inverse PCA transformation: the following diagrams give an illustration in two dimensions.
It can not do that, since by reducing the dimensions with PCA, you've lost information (check pca.explained_variance_ratio_
for the % of information you still have). However, it tries its best to go back to the original space as well as it can, see the picture below
(generated with
import numpy as np
from sklearn.decomposition import PCA
pca = PCA(1)
X_orig = np.random.rand(10, 2)
X_re_orig = pca.inverse_transform(pca.fit_transform(X_orig))
plt.scatter(X_orig[:, 0], X_orig[:, 1], label='Original points')
plt.scatter(X_re_orig[:, 0], X_re_orig[:, 1], label='InverseTransform')
[plt.plot([X_orig[i, 0], X_re_orig[i, 0]], [X_orig[i, 1], X_re_orig[i, 1]]) for i in range(10)]
plt.legend()
plt.show()
)
If you had kept the n_dimensions the same (set pca = PCA(2)
, you do recover the original points (the new points are on top of the original ones):