What is the difference between a Bayesian network and a naive Bayes classifier?
Naive Bayes is just a restricted/constrained form of a general Bayesian network where you enforce the constraint that the class node should have no parents and that the nodes corresponding to the attribute variables should have no edges between them. As such, there is nothing that prevents a general Bayesian network from being used for classification - the predicted class is the one with the maximum probability when (conditioned on) all the other variables are set to the prediction instance values in the usual Bayesian inference fashion. A good paper to read on this is "Bayesian Network Classifiers, Machine Learning, 29, 131–163 (1997)". Of particular interest is section 3. Though Naive Bayes is a constrained form of a more general Bayesian network, this paper also talks about why Naive Bayes can and does outperform a general Bayesian network in classification tasks.
Short answer, if you're only interested in solving a prediction task: use Naive Bayes.
A Bayesian network (has a good wikipedia page) models relationships between features in a very general way. If you know what these relationships are, or have enough data to derive them, then it may be appropriate to use a Bayesian network.
A Naive Bayes classifier is a simple model that describes particular class of Bayesian network - where all of the features are class-conditionally independent. Because of this, there are certain problems that Naive Bayes cannot solve (example below). However, its simplicity also makes it easier to apply, and it requires less data to get a good result in many cases.
Example: XOR
You have a learning problem with binary features x1
and x2
and a target variable y = x1 XOR x2
.
In a Naive Bayes classifier, x1
and x2
must be treated independently - so you would compute things like "The probability that y = 1
given that x1 = 1
" - hopefully you can see that this isn't helpful, because x1 = 1
doesn't make y = 1
any more or less likely. Since a Bayesian network does not assume independence, it would be able to solve such a problem.