PCA trains a model to project vectors to a lower dimensional space of the top k principal components code example

Example: PCA trains a model to project vectors to a lower dimensional space of the top k principal components

# PCA trains a model to project vectors to a lower dimensional space of the top k principal components

from pyspark.ml.linalg import Vectors
data = [(Vectors.sparse(5, [(1, 1.0), (3, 7.0)]),),
        (Vectors.dense([2.0, 0.0, 3.0, 4.0, 5.0]),),
        (Vectors.dense([4.0, 0.0, 0.0, 6.0, 7.0]),)]
df = spark.createDataFrame(data,["features"])
pca = PCA(k=2, inputCol="features", outputCol="pca_features")
model = pca.fit(df)
model.transform(df).collect()[0].pca_features
# DenseVector([1.648..., 4.013...])
model.explainedVariance
# DenseVector([0.794..., 4.013...])
pcaPath = temp_path + "/pca"
pca.save(pcaPath)
loadedPca = PCA.load(pcaPath)
loadedPca.getK() == pca.getK()
# True
modelPath = temp_path + "/pca-model"
model.save(modelPath)
loadedModel = PCAModel.load(modelPath)
loadedModel.pc == model.pc
# True
loadedModel.explainedVariance == model.explainedVariance
# True

Tags:

Misc Example