Sklearn - How to predict probability for all target labels

You can do that by simply removing the OneVsRestClassifer and using predict_proba method of the DecisionTreeClassifier. You can do the following:

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
pred = clf.predict_proba(X_test)

This will give you a probability for each of your 7 possible classes.

Hope that helps!


You can try using scikit-multilearn - an extension of sklearn that handles multilabel classification. If your labels are not overly correlated you can train one classifier per label and get all predictions - try (after pip install scikit-multilearn):

from skmultilearn.problem_transform import BinaryRelevance    
classifier = BinaryRelevance(classifier = DecisionTreeClassifier())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

Predictions will contain a sparse matrix of size (n_samples, n_labels) in your case - n_labels = 7, each column contains prediction per label for all samples.

In case your labels are correlated you might need more sophisticated methods for multi-label classification.

Disclaimer: I'm the author of scikit-multilearn, feel free to ask more questions.