What is "naive" in a naive Bayes classifier?

There's actually a very good example on Wikipedia:

In simple terms, a naive Bayes classifier assumes that the presence (or absence) of a particular feature of a class is unrelated to the presence (or absence) of any other feature, given the class variable. For example, a fruit may be considered to be an apple if it is red, round, and about 4" in diameter. Even if these features depend on each other or upon the existence of the other features, a naive Bayes classifier considers all of these properties to independently contribute to the probability that this fruit is an apple.

Basically, it's "naive" because it makes assumptions that may or may not turn out to be correct.


If your data is composed of a feature vector X = {x1, x2, ... x10} and your class labels y = {y1, y2, .. y5}, a Bayes classifier identifies the correct class label as the one that maximizes the following formula:

P(y|X) = P(X|y) * P(y) = P(x1,x2,...,x10|y) * P(y)

For now, it is still not naive. However, it is hard to calculate P(x1,x2,...,x10|y), so we assume the features to be independent, this is what we call the Naive assumption, hence, we end up with the following formula instead:

P(y|X) = P(x1|y) * P(x2|y) * ... * P(x10|y) * P(y)