Can someone please explain 3-CNF for me?

The point of 3CNF is that it is a "normal form" for formulas: as you mention, every formula is equivalent, up to a quadratic (linear?) blowup, to a 3CNF formula. 3CNF formulas are "simple", and so more easy to deal with. In particular, if you ever read about NP-completeness, you will find out that we want our to put our "challenging problems" in as simple a form as possible. This makes it easier both to design and analyze algorithms solving these problems, and to prove that other problems are also difficult by reducing 3CNF to them (showing how to solve 3CNF using an algorithm for them).

We care specifically about 3CNF for historical reasons, it was the first (or one of the first) NP-complete problems (check out Cook's paper or Karp's paper on their respective pages). Also, 2CNF is not "complete" (arbitrary formulas cannot are not equivalent to a 2CNF), and it is easy to determine whether they are satisfiable or not (google if interested).

The conversion from CNF to 3CNF is best explained by an example. We convert each clause separately. The clause $A \lor B \lor C \lor D \lor E$ is equivalent to the 3CNF $$(A \lor B \lor x_1) \land (\lnot x_1 \lor C \lor x_2) \land (\lnot x_2 \lor D \lor E), $$ in the sense that the original formula (in this case, a single clause) is satisfiable iff the converted formula is. You convert each of the clauses, and take their conjunction.

Suppose you have an arbitrary formula using the connectives $\lnot,\lor,\land$. Picture it as a tree, where inner nodes are labeled with $\lnot,\lor,\land$, and each node has either one ($\lnot$) or two ($\lor,\land$) children. I'm sorry I can't provide a picture. The first step is "pushing the negations to the leaves" using de Morgan's rules (q.v.). That rids us of all $\lnot$ nodes, but now leaves may be literals (variables or negations of variables). Now we convert the formula to a CNF recursively.

Denote by $\varphi$ the function we're constructing, that takes a formula as above (with $\land,\lor$ and possibly negated variables) and returns a CNF. For the base case, we have $\varphi(x) = x$ and $\varphi(\lnot x) = \lnot x$. For a formula of the form $A \land B$, we don't have to work hard: we define $\varphi(A \land B) = \varphi(A) \land \varphi(B)$. The hardest case is formulas of the form $A \lor B$: a somewhat economical choice is $$\varphi(A \lor B) = (y \lor \varphi(A)) \land ((\lnot y) \lor \varphi(B)), $$ where $y$ is a new variable, and $y \lor \varphi(A)$ means adding $y$ to all the clauses.

General formulas with arbitrary connectives can be handled by expressing the connectives using $\lor,\land,\lnot$, for example by writing a truth-table; this can be quite wasteful, though.

The entire construction results in a quadratic blowup in formula size (your "occurrences of variables"; there are many other ways to define formula size). However, the result you're quoting is only a linear blowup (indeed, by a factor of $24$). It is entirely possible that such a construction exists, but I'm not aware of it; perhaps one of the readers is. The SAT to 3SAT part has a linear blowup with a factor of $3$.


If you do not allow adding new variables, then no simple conversion is possible. While it is always possible to convert an arbitrary formula into a CNF (using the "truth table" approach), not every formula is convertible to a 3CNF without adding new variables. An example is parity on $4$ variables. Also, the conversion from an arbitrary formula to a CNF can result in exponential blow-up, for example for parity on $n$ variables we go from $\Theta(n^2)$ to $\Theta(n2^n)$.


How can one convert CNF to 3CNF?

(just to add to the previous answer)

41 minute of Ullman lecture - conversion of CNF to 3CNF (with proof): http://www.youtube.com/watch?v=s9P33IgjwUA

The origin of the lecture is http://www.coursera.org and it actually doesn't answer your other questions.


Here you have to be careful. NOT every formula is equivalent to a 3CNF formula. A simple example of that would be $p\vee q\vee r\vee s$, where $p,q,r,s$ are propositional variables. Now, let's answer the questions:

  • Usage of 3CNF: so if not every formula can be written equivalently in 3CNF, why is 3CNF useful? Because 3SAT, the problem of deciding if a 3CNF formula is satisfiable, is an NP-complete problem, just as SAT. So, in particular, if you want to know if a formula $\phi$ can be satisfied, you can construct a formula $\psi$ in 3CNF such that $\phi$ is satisfiable if and only if $\psi$ is satisfiable. That is NOT the same as saying they are equivalent (there might exist some interpretation that renders $\psi$ true but $\phi$ not true). The fact that 3SAT is NP-complete is very useful, since you can reduce 3SAT to other problems and thus show that they in turn are NP-complete (or at least, NP-hard). And it can be MUCH easier to reduce from 3SAT instead of SAT, since 3SAT has a lot more structure: it's a normal form, you already know the form it has.
  • How to do the conversion: "conversion" being what is explained above, i.e., given formula $\phi$, construct formula $\psi$ in 3CNF such that $\phi$ is satisfiable iff $\psi$ is satisfiable. For that you need to use a little trick, and it is explained in the link posted in another answer, around minute 41 of the video: https://www.youtube.com/watch?v=s9P33IgjwUA It is also explained here: https://www.quora.com/Why-is-3-SAT-NP-complete
  • This will be much clearer once you have understood the "conversion" from CNF to 3CNF (we are assuming that the original formula was in CNF).