Sudoku backtracking algorithm

Fast algorhitm for solving sudoku is Algorithm X by Donald Knuth. You represent solving sudoku as exact cover problem and then use Algorithm X for solving EC problem. Then use DLX as efficient implementation of Algorithm X.

There is great explanation on wikipedia on how to apply exact cover for solving sudoku.

I can tell you that DLX is extremely fast fost solving sudoku in is commonly used in fastest algorhitm.

http://www.setbb.com/phpbb/index.php?mforum=sudoku is great forum whit probably best sudoku programmers.


Between filling the squares with only one choice and going full recursive on the board there are more advanced actions you can do. Lets take that "region" is one row, or one column, or one square region (3x3 or 4x4).

Tactic 1

If there are K squares in a region that can take only identical K numbers (for instance two squares that can take only 2 an 5, or three squares that can take only 1, 7 and 8) then all other squares in that region can't take those specific numbers. You need to iterate each region to weed out "taken" numbers, so you can find a square with only one logical choice (for instance third square with 2, 4 and 5 logically can take only 4, or fourth square with 1, 3, 7 and 8 logically can take only 3).

This has to bi solved with iteration if you consider the following example. A region has squares with this possible numbers:

A: 1 2 3
B: 2 3
C: 2 3 4 5
D: 4 5
E: 4 5

The algorithm should detect that squares D and E hold numbers 4 and 5, so 4 and 5 are excluded from other squares in the region. The algorithm then detects that squares B and C hold numbers 2 and 3, and so excludes them from other squares. This leaves square A with only number 1.

Tactic 2

If a number occurs in the region in only one square then logically that square holds that number.

Tactic 3

Tactics 1 and 2 are only special cases of Tactic 3 having K squares with only K identical numbers. You can have K squares and a set of K numbers and those K squares can hold any subset of those K numbers. Consider the following example of a region:

A: 1 2
B: 2 3
C: 1 3
D: 1 2 3 4

Squares A, B and C can hold only numbers 1, 2 and 3. That's K for K. That means that any other square can't logically hold these numbers, which leaves square D with only number 4.

Tactic 2 is special case of Tactic 3 when K = N - 1.

Tactic 4

Take advantage of regions overlap. Suppose that some number can exist only in certain squares of the region. If all those squares belong to another overlapping region then that number should be excluded from all other squares in this other region.

Tactic 5

Cache results. All regions should have a "dirty" flag that denotes that something in the region has changed from the last time the region is processed. You don't have to process the region with this flag not set.


Human beings use all those tactics, and really hate to guess a number, because backtracking is a real pain. Actually, the difficulty of a board is measured with the minimum number of guesses one has to make to solve the board. For most "extreme" boards one good guess is enough.