How to read the classifier confusion matrix in WEKA
Have you read the wikipedia page on confusion matrices? The text around the matrix is arranged slightly differently in their example (row labels on the left instead of on the right), but you read it just the same.
The row indicates the true class, the column indicates the classifier output. Each entry, then, gives the number of instances of <row>
that were classified as <column>
. In your example, 15 Bs were (incorrectly) classified as As, 150 Bs were correctly classified as Bs, etc.
As a result, all correct classifications are on the top-left to bottom-right diagonal. Everything off that diagonal is an incorrect classification of some sort.
Edit: The Wikipedia page has since switched the rows and columns around. This happens. When studying a confusion matrix, always make sure to check the labels to see whether it's true classes in rows, predicted class in columns or the other way around.
I'd put it this way:
The confusion matrix is Weka reporting on how good this J48 model is in terms of what it gets right, and what it gets wrong.
In your data, the target variable was either "functional" or "non-functional;" the right side of the matrix tells you that column "a" is functional, and "b" is non-functional.
The columns tell you how your model classified your samples - it's what the model predicted:
- The first column contains all the samples which your model thinks are "a" - 145 of them, total
- The second column contains all the samples which your model thinks are "b" - 158 of them
The rows, on the other hand, represent reality:
- The first row contains all the samples which really are "a" - 138 of them, total
- The second row contains all the samples which really are "b" - 165 of them
Knowing the columns and rows, you can dig into the details:
- Top left, 130, are things your model thinks are "a" which really are "a" <- these were correct
- Bottom left, 15, are things your model thinks are "a" but which are really "b" <- one kind of error
- Top right, 8, are things your model thinks are "b" but which really are "a" <- another kind of error
- Bottom right, 150 are things your model thinks are "b" which really are "b"
So top-left and bottom-right of the matrix are showing things your model gets right.
Bottom-left and top-right of the matrix are are showing where your model is confused.