A feature transformer that merges multiple columns into a vector column code example

Example: A feature transformer that merges multiple columns into a vector column

# A feature transformer that merges multiple columns into a vector column

df = spark.createDataFrame([(1, 0, 3)], ["a", "b", "c"])
vecAssembler = VectorAssembler(inputCols=[
  "a", "b", "c"], outputCol="features")
vecAssembler.transform(df).head().features
# DenseVector([1.0, 0.0, 3.0])
vecAssembler.setParams(outputCol="freqs").transform(df).head().freqs
# DenseVector([1.0, 0.0, 3.0])
params = {vecAssembler.inputCols: [
  "b", "a"], vecAssembler.outputCol: "vector"}
vecAssembler.transform(df, params).head().vector
# DenseVector([0.0, 1.0])
vectorAssemblerPath = temp_path + "/vector-assembler"
vecAssembler.save(vectorAssemblerPath)
loadedAssembler = VectorAssembler.load(vectorAssemblerPath)
loadedAssembler.transform(df).head().freqs == vecAssembler.transform(
  df).head().freqs
# True

Tags:

Misc Example