Sci-kit learn how to print labels for confusion matrix?

From the doc, it seems that there is no such option to print the rows and column labels of the confusion matrix. However, you can specify the label order using argument labels=...

Example:

from sklearn.metrics import confusion_matrix

y_true = ['yes','yes','yes','no','no','no']
y_pred = ['yes','no','no','no','no','no']
print(confusion_matrix(y_true, y_pred))
# Output:
# [[3 0]
#  [2 1]]
print(confusion_matrix(y_true, y_pred, labels=['yes', 'no']))
# Output:
# [[1 2]
#  [0 3]]

If you want to print the confusion matrix with labels, you may try pandas and set the index and columns of the DataFrame.

import pandas as pd
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=['yes', 'no']), 
    index=['true:yes', 'true:no'], 
    columns=['pred:yes', 'pred:no']
)
print(cmtx)
# Output:
#           pred:yes  pred:no
# true:yes         1        2
# true:no          0        3

unique_label = np.unique([y_true, y_pred])
cmtx = pd.DataFrame(
    confusion_matrix(y_true, y_pred, labels=unique_label), 
    index=['true:{:}'.format(x) for x in unique_label], 
    columns=['pred:{:}'.format(x) for x in unique_label]
)
print(cmtx)
# Output:
#           pred:no  pred:yes
# true:no         3         0
# true:yes        2         1

Since confusion matrix is just a numpy matrix, it does not contain any column information. What you can do is convert your matrix into a dataframe and then print this dataframe.

import pandas as pd
import numpy as np

def cm2df(cm, labels):
    df = pd.DataFrame()
    # rows
    for i, row_label in enumerate(labels):
        rowdata={}
        # columns
        for j, col_label in enumerate(labels): 
            rowdata[col_label]=cm[i,j]
        df = df.append(pd.DataFrame.from_dict({row_label:rowdata}, orient='index'))
    return df[labels]

cm = np.arange(9).reshape((3, 3))
df = cm2df(cm, ["a", "b", "c"])
print(df)

Code snippet is from https://gist.github.com/nickynicolson/202fe765c99af49acb20ea9f77b6255e

Output:

It is important to ensure that the way you label your confusion matrix rows and columns corresponds exactly to the way sklearn has coded the classes. The true order of the labels can be revealed using the .classes_ attribute of the classifier. You can use the code below to prepare a confusion matrix data frame.

labels = rfc.classes_
conf_df = pd.DataFrame(confusion_matrix(class_label, class_label_predicted, columns=labels, index=labels))
conf_df.index.name = 'True labels'

The second thing to note is that your classifier is not predicting labels well. The number of correctly predicted labels is shown on the main diagonal of the confusion matrix. You have non-zero values accross the matrix and some classes have not been predicted at all - the columns that are all zero. It might be a good idea to run the classifier with its default parameters and then try to optimise them.

Sci-kit learn how to print labels for confusion matrix?

Tags:

Python

Machine Learning

Confusion Matrix

Scikit Learn

Related

Recent Posts