Average Scrabble graph structure: diameter?
Methods
Starting from the base url http://www.cross-tables.com/annotated.php?a=1 I used a combination of Python's urllib
, multiprocessing
and BeautifulSoup
to extract the first 10000 games. The games were parsed and turned into numpy
15x15 Boolean matrices. The matrices were then turned into graphs making an edge if two adjacent cells in the matrix were both active. Graph properties were then analysed with networkx
Of the 10000 games, only 9966 were usable. Some games did not start on the center title, while others ended so quickly and strangely they did not behave properly. Fortunately these games were rare enough that the sample should give a robust estimation of the true distributions.
Methods (Update)
There was a bit more data cleanup needed. I had not taken into consideration challenged moves, leading to games that had >100 tiles used. In the process, I noted false moves and fake games. We may have to live with a bit of uncertainty in the data, such is the cost of true empirical data.
Results
The first interesting piece of information is the board frequency - this gives a nice spatial connection to the graphs we are about to study. Notice that, due to how the game is played (and how we read left to right and top to bottom) the board is asymmetric.
From here we can answer the question,
"What is the distribution of graph diameters and radii for an average Scrabble game?"
A scatter plot versus the graph size reveals a bit more information for the smaller $N$ values:
Results (Update)
Based of a comment, I plotted the radius vs the diameter, giving a mostly linear relationship of 1 to 2 except for a range of games with some variance. Feel free to make some observations on the significance of this in the comments.
Quick Conclusion (TLDR)
From the data studied, there were mostly full games played ~100 tiles with an average graph radius 18 and a diameter of 36. Further work is needed to compare these results to random graphs with the same size and edge counts but different edge distribution.
It turns out that the maximum possible diameter uses the full hundred tiles! My friend Carl illustrated this in 2008 in a game he made up, here: http://www.cross-tables.com/annotated.php?u=2493#0# .
Thanks for the question!
If you require the viable dictionaries that are allowable for scrabble, they are available on the http://www.lexicalwordfinder.com/about/ site (Lexical Word Finder website).
An interesting experiment: Assuming most people do not play expertly, you can use the generated list from Lexical Word finder to play medium words to see what the average graphs look like. It would be nice to also see what the "best" moves at each turn produce as well.
Interesting use of networkx.