Scikit-Learn Decision Tree: Probability of prediction being a or b?
You can do something like the following:
from sklearn import tree
#load data
X = [[65,9],[67,7],[70,11],[62,6],[60,7],[72,13],[66,10],[67,7.5]]
Y=["male","female","male","female","female","male","male","female"]
#build model
clf = tree.DecisionTreeClassifier()
#fit
clf.fit(X, Y)
#predict
prediction = clf.predict([[68,9],[66,9]])
#probabilities
probs = clf.predict_proba([[68,9],[66,9]])
#print the predicted gender
print(prediction)
print(probs)
Theory
The result of clf.predict_proba(X)
is: The predicted class probability which is the fraction of samples of the same class in a leaf.
Interpretation of the results:
The first print
returns ['male' 'male']
so the data [[68,9],[66,9]]
are predicted as males
.
The second print
returns:
[[ 0. 1.]
[ 0. 1.]]
This means that the data were predicted as males and this is reported by the ones in the second column.
To see the order of the classes use: clf.classes_
This returns: ['female', 'male']