Predicting multilabel data with sklearn

Your train_test_split() output is not correct. Change this line:

train_x, train_y, test_x, test_y = train_test_split(x, y_enc, test_size=0.33)

To this:

train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)

Also, to use probabilities instead of class predictions, you'll need to change SVC() to SVC(probability = True) and change clf.predict_proba to clf.predict.

Putting it all together:

from sklearn import metrics
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.multiclass import OneVsRestClassifier
from sklearn.cross_validation import train_test_split
from sklearn.svm import SVC


x = [[1,2,3],[3,3,2],[8,8,7],[3,7,1],[4,5,6]]
y = [['bar','foo'],['bar'],['foo'],['foo','jump'],['bar','fox','jump']]

mlb = MultiLabelBinarizer()
y_enc = mlb.fit_transform(y)

train_x, test_x, train_y, test_y = train_test_split(x, y_enc, test_size=0.33)

clf = OneVsRestClassifier(SVC(probability=True))
clf.fit(train_x, train_y)
predictions = clf.predict(test_x)

my_metrics = metrics.classification_report( test_y, predictions)
print my_metrics

This gives me no errors when I run it.

I also experienced "ValueError: Multioutput target data is not supported with label binarization" with OneVsRestClassifier. My issue was caused by the type of training data was "list", after casting with np.array(), it works.

Predicting multilabel data with sklearn

Tags:

Python

Scikit Learn

Related

Recent Posts