Reduction of Multiclass Classification to Binary Classification code example

Example: Reduction of Multiclass Classification to Binary Classification

# Reduction of Multiclass Classification to Binary Classification

from pyspark.sql import Row
from pyspark.ml.linalg import Vectors
df = sc.parallelize([
  Row(label=0.0, features=Vectors.dense(1.0, 0.8)),
  Row(label=1.0, features=Vectors.sparse(2, [], [])),
  Row(label=2.0, features=Vectors.dense(0.5, 0.5))]).toDF()
lr = LogisticRegression(maxIter=5, regParam=0.01)
ovr = OneVsRest(classifier=lr)
model = ovr.fit(df)
[x.coefficients for x in model.models]
# [DenseVector([3.3925, 1.8785]), DenseVector([-4.3016, -6.3163]), DenseVector([-4.5855, 6.1785])
[x.intercept for x in model.models]
# [-3.64747..., 2.55078..., -1.10165...]
test0 = sc.parallelize([Row(features=Vectors.dense(-1.0, 0.0))]).toDF()
model.transform(test0).head().prediction
# 1.0
test1 = sc.parallelize([Row(features=Vectors.sparse(2, [0], [1.0]))]).toDF()
model.transform(test1).head().prediction
# 0.0
test2 = sc.parallelize([Row(features=Vectors.dense(0.5, 0.4))]).toDF()
model.transform(test2).head().prediction
# 2.0

Tags:

Misc Example