Classification report with Nested Cross Validation in SKlearn (Average/Individual values)
Its just an addition to Sandipan's answer as I couldn't edit it. If we want to calculate the average classification report for a complete run of the cross-validation instead of individual folds, we can use the following code:
# Variables for average classification report
originalclass = []
predictedclass = []
#Make our customer score
def classification_report_with_accuracy_score(y_true, y_pred):
originalclass.extend(y_true)
predictedclass.extend(y_pred)
return accuracy_score(y_true, y_pred) # return accuracy score
inner_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
outer_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=i)
# Non_nested parameter search and scoring
clf = GridSearchCV(estimator=svr, param_grid=p_grid, cv=inner_cv)
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, scoring=make_scorer(classification_report_with_accuracy_score))
# Average values in classification report for all folds in a K-fold Cross-validation
print(classification_report(originalclass, predictedclass))
Now the result for the example in Sandipan's answer would look like this:
precision recall f1-score support
0 1.00 1.00 1.00 50
1 0.96 0.94 0.95 50
2 0.94 0.96 0.95 50
avg / total 0.97 0.97 0.97 150
We can define our own scoring function as below:
from sklearn.metrics import classification_report, accuracy_score, make_scorer
def classification_report_with_accuracy_score(y_true, y_pred):
print classification_report(y_true, y_pred) # print classification report
return accuracy_score(y_true, y_pred) # return accuracy score
Now, just call cross_val_score
with our new scoring function, using make_scorer
:
# Nested CV with parameter optimization
nested_score = cross_val_score(clf, X=X_iris, y=y_iris, cv=outer_cv, \
scoring=make_scorer(classification_report_with_accuracy_score))
print nested_score
It will print the classification report as text at the same time return the nested_score
as a number.
http://scikit-learn.org/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html example when run with this new scoring function, the last few lines of the output will be as follows:
# precision recall f1-score support
#0 1.00 1.00 1.00 14
#1 1.00 1.00 1.00 14
#2 1.00 1.00 1.00 9
#avg / total 1.00 1.00 1.00 37
#[ 0.94736842 1. 0.97297297 1. ]
#Average difference of 0.007742 with std. dev. of 0.007688.