Determining the number of valid TicTacToe board states in terms of board dimension

Fun question.

I don't have enough karma to add this as a comment, so I'll offer it up as an answer, although it is (currently) an incomplete one.

Some comments

The reference you link to for the $N=3$ case does not say that there are $255,168$ valid 3-by-3 tic-tac-toe boards, but that there are $255,168$ distinct 3-by-3 tic-tac-toe games.

That is, if you make a "game tree" where each (valid) board is a node and each move is an edge, you're asking how many nodes are in the tree. The Wikipedia article states that there are $255,168$ distinct paths through the tree (from the root to a leaf), but there are substantially fewer nodes. In fact we can set an upper bound on the number of distinct 3-by-3 tic-tac-toe boards by ignoring the rules of the game and noting that there are only $3^9$ ways to fill a 9x9 board with 3 tokens (blank, X and O), which is only $19,683$. And many of those have too many Xs or Os to be valid.

(The Wikipedia article is also taking symmetry into account, which if I understand you correctly, you want to ignore.)

I happen to be working on a program that generates this game tree for entirely different reasons.

The program may be buggy, so take the numbers above with a grain of salt, but it looks right to me on inspection. I believe I've seen other pages online that corroborate the $N=3$ case, but at the time I was only trying to validate that my program was working properly, so I didn't bother the record the link.

Currently my program runs out of memory (on my little netbook) when I try to generate the game tree for $N=4$, but I imagine that's fixable with a little bit beefier hardware and/or a little optimization of the program for memory footprint (currently I'm storing a lot of metadata about each board state).

Circumstances permitting, I'll try come back and update this answer with more information if and when I have some to share, but I wanted to put some hard numbers (and specific constraints) on the table because I think it clears up some of the confusion in the comment thread.

Assumptions

To clarify the constraints on the problem, here's what I'm assuming:

  1. We're interested in counting the number of valid "game states" on an $N$-by-$N$ tic-tac-toe board.
  2. We're ignoring symmetry, so that a board with one X in the upper-left corner is considered a distinct board from the one with an X in the lower-right corner.
  3. X always moves first.
  4. The game stops with either player makes a horizontal, vertical or diagonal line of length $N$. Any boards that can only be reached by continuing to play after one of the players has made a line are considered invalid and ignored.
  5. (The game also stops when we run out of free spaces to move, of course.)

Some data based on enumeration

Under these constraints, as you wrote, we can confirm that:

  • for $N=1$ there are $2$ valid and distinct boards
  • for $N=2$ there are $29$ valid and distinct boards

Based on (programmatic) inspection, I think we can say that:

  • for $N=3$ there are $5,478$ valid and distinct boards
  • for $N=4$ there are $9,722,011$ valid and distinct boards

If you break these down by ply (turn):

      N:   1   2     3        4
 -------   -  --  ----  -------
 ply  0:   1   1     1        1
 ply  1:   1   4     9       16
 ply  2:   -  12    72      240
 ply  3:   -  12   252     1680 
 ply  4:   -   0   756    10920
 ply  5:   -   -  1260    43680
 ply  6:   -   -  1520   160160 
 ply  7:   -   -  1140   400400
 ply  8:   -   -   390   895950
 ply  9:   -   -    78  1433520
 ply 10:   -   -     -  1962576
 ply 11:   -   -     -  1962576          
 ply 12:   -   -     -  1543080
 ply 13:   -   -     -   881760
 ply 14:   -   -     -   333792
 ply 15:   -   -     -    83440
 ply 16:   -   -     -     8220 
 =======   =  ==  ====  =======  
  TOTAL:   2  29  5478  9722011
 -------  

I don't see an obvious formula for the sequence $(2,29,5478,9722011,...)$, but a few interesting (IMO) observations about this:

  • $N=3$ is the smallest board for which player O can win

  • $N=3$ is the smallest board that can end in a tie game

  • $N=2$ is likely the only board that cannot be filled (X is guaranteed to win on the third move, leaving one slot unfilled)

  • Both of the even examples have two plies in a row with the exactly same number of boards (for $N=2$ plies 2 and 3 have $12$ boards and for $N=4$ plies 10 and 11 have $1,962,576$). These are also the "widest" plies in their respective trees. (The same is true for $N=1$, but I imagine that's a degenerate case.)

  • (This didn't hold for $N=4$.) It may just be the law of small numbers, but I notice that the "widest" tier of the game tree is the one where there are $N$ empty spaces left on the board. For $N=1$, this is ply 0 with $1$ board, for $N=2$ this is ply 2 with $12$ boards and for $N=3$ this is ply 6, with $1,520$ boards.

  • and of course, at $N=1$, player O doesn't even get to move.

Looking at upper bounds

By the way, as a very rough sanity check, I compared the number of valid boards with $3^{(N^2)}$ (the number of distinct ways to fill an $N\times{}N$ board with 3 symbols):

       N:   1     2        3          4
 -------- ----  ----  -------  ----------
 # Valid:  2    29     5478     9722011
 3^(N^2):  3    81    19683    43046721
 % Valid: 66.7  35.8     27.8        22.6

We can get a tighter upper bound if we look at the number of ways to fill an $N\times{}N$ board with $count(X) = count(O)$ or $count(X) = count(O)+1$.

That's actually not the hard to figure out if you change your thinking a little bit. Instead of alternating between X and O, imagine you put down all the Xs first, then all the Os.

  • On the zeroth ply, there are no Xs or Os, so we always have 1 board. (Note that this is $N^2$ choose $0$, written ${{N^2} \choose 0}$).

  • On the first ply, there is exactly one X, so we have ${{N^2} \choose 1}$ distinct boards.

  • On the second ply, there is exactly one X and exactly one O, so we have ${{N^2} \choose 1} \times {((N^2)-1) \choose 1}$ boards.

  • On the third ply, there are two Xs and one O, so we have ${{N^2} \choose 2} \times {{((N^2)-2)} \choose 1}$ boards.

  • On the fourth ply, there are two Xs and two Os, so we have ${{N^2} \choose 2} \times {{((N^2)-2)} \choose 2}$ boards.

And so on.

In the general case

Note that on ply $p$ there will be $floor({{(p+1)}\over{2}})$ Xs and $floor({{p}\over{2}})$ Os on the board, so let's say:

  • $x = floor({{(p+1)}\over{2}})$

and

  • $o = floor({{p}\over{2}})$

Note that:

  • There are ${{N^2} \choose x}$ ways to place $x$ X marks on an $N\times{}N$ board.

  • There are ${{N^2 - x} \choose o}$ ways to place $o$ O marks on an $N\times{}N$ board that is already filled with $x$ Xs.

Hence ignoring winners there are:

${{N^2} \choose x} \times {{N^2 - x} \choose o}$

distinct boards at ply $p$. Or (plugging the definitions of $x$ and $o$ from above):

${{N^2} \choose {floor({{p+1}\over{2}})}} \times {{N^2 - {floor({{p+1}\over{2}})}} \choose {floor({p \over 2})}}$

So an upper bound on the number of distinct boards would the be summation of that ugly formula over $p = 0$ to $p = N^2$.

I imagine that the clever application of algebra could dramatically simplify that expression (especially when you plug the definition of $n \choose k$ in for the choose notation).

To get an even better count we could take into account the winning boards.