Guess the word (aka Lingo)
Java, 249700 points (beats Chinese Perl Goth in my test)
Updated ranklist:
4 5 6 7 8 9 10 11 12 13 Total perl chinese_perl_goth.pl 6700 12300 16900 19200 23000 26100 28500 29600 32100 33900 228300 java Lingo 9400 14700 18900 21000 26300 28700 30300 32400 33800 34200 249700
Here is the old ranklist using pit.rb
:
4 5 6 7 8 9 10 11 12 13 Total ruby player-example.rb 200 400 400 500 1800 1400 1700 1600 3200 4400 15600 ruby player-example2.rb 2700 3200 2500 4300 7300 6300 8200 10400 13300 15000 73200 ruby player-example3.rb 4500 7400 9900 13700 15400 19000 19600 22300 24600 27300 163700 perl chinese_perl_goth.pl 6400 14600 16500 21000 22500 26000 27200 30600 32500 33800 231100 java Lingo 4800 13100 16500 21400 27200 29200 30600 32400 33700 36100 245000 ** Rankings ** 1: java Lingo (245000) 2: perl chinese_perl_goth.pl (231100) 3: ruby player-example3.rb (163700) 4: ruby player-example2.rb (73200) 5: ruby player-example.rb (15600)
Compared to @chineseperlgoth, I lose in shorter words (< 6 chars) but I win in longer words (>= 6 chars).
The idea is similar to @chineseperlgoth, it's just that my main idea is about finding the guess (can be any word of the same length, not necessarily one of the remaining possibilities) which gives the most information for the next guess.
Currently I'm still playing with the formula, but for the scoreboard above, I choose the word which will yield the minimum for:
-num_confusion * entropy
The latest version uses different scoring to find next best guess, which is maximizing the number of "single possibility" after the current guess. This is done by trying all words in pruned wordlist (to save time) on all possible candidates, and see which guess is more probable to produce "single possibility" (that is, after this guess there will only be one possible answer) for the next guess.
So for example this run:
Starting new round, word is boons Got: seora Sent: ?XOXX Got: topsl Sent: XOX?X Got: monks Sent: XO?XO Got: bewig Sent: OXXXX Got: boons Sent: OOOOO Round won with score 100
From the first three guesses, we already got "*oo*s" with an "n" somewhere and we still need to figure out one more letter. Now the beauty of this algorithm is that instead of guessing words which are similar to that form, it instead guesses the word which has no relation to previous guesses at all, trying to give more letters, hopefully revealing the missing letter. In this case it happens to also get the position for the missing "b" correctly, and concludes with the correct final guess "boons".
Here is the code:
import java.util.*;
import java.io.*;
class Lingo{
public static String[] guessBestList = new String[]{
"",
"a",
"sa",
"tea",
"orae",
"seora", // 5
"ariose",
"erasion",
"serotina",
"tensorial",
"psalterion", // 10
"ulcerations",
"culteranismo",
"persecutional"};
public static HashMap<Integer, ArrayList<String>> wordlist = new HashMap<Integer, ArrayList<String>>();
public static void main(String[] args){
readWordlist("wordlist.txt");
Scanner scanner = new Scanner(System.in);
int wordlen = Integer.parseInt(args[0]);
int roundNum = 5;
ArrayList<String> candidates = new ArrayList<String>();
candidates.addAll(wordlist.get(wordlen));
String guess = "";
while(roundNum-- > 0){
guess = guessBest(candidates, roundNum==4, roundNum==0);
System.out.println(guess);
String response = scanner.nextLine();
if(isAllO(response)){
break;
}
updateCandidates(candidates, guess, response);
//print(candidates);
}
}
public static void print(ArrayList<String> candidates){
for(String str: candidates){
System.err.println(str);
}
System.err.println();
}
public static void readWordlist(String path){
try{
BufferedReader reader = new BufferedReader(new FileReader(path));
while(reader.ready()){
String word = reader.readLine();
if(!wordlist.containsKey(word.length())){
wordlist.put(word.length(), new ArrayList<String>());
}
wordlist.get(word.length()).add(word);
}
} catch (Exception e){
System.exit(1);
}
}
public static boolean isAllO(String response){
for(int i=0; i<response.length(); i++){
if(response.charAt(i) != 'O') return false;
}
return true;
}
public static String getResponse(String word, String guess){
char[] wordChar = word.toCharArray();
char[] result = new char[word.length()];
Arrays.fill(result, 'X');
for(int i=0; i<guess.length(); i++){
if(guess.charAt(i) == wordChar[i]){
result[i] = 'O';
wordChar[i] = '_';
}
}
for(int i=0; i<guess.length(); i++){
if(result[i] == 'O') continue;
for(int j=0; j<wordChar.length; j++){
if(result[j] == 'O') continue;
if(wordChar[j] == guess.charAt(i)){
result[i] = '?';
wordChar[j] = '_';
break;
}
}
}
return String.valueOf(result);
}
public static void updateCandidates(ArrayList<String> candidates, String guess, String response){
for(int i=candidates.size()-1; i>=0; i--){
String candidate = candidates.get(i);
if(!response.equals(getResponse(candidate, guess))){
candidates.remove(i);
}
}
}
public static int countMatchingCandidates(ArrayList<String> candidates, String guess, String response){
int result = 0;
for(String candidate: candidates){
if(response.equals(getResponse(candidate, guess))){
result++;
}
}
return result;
}
public static String[] getSample(ArrayList<String> words, int size){
String[] result = new String[size];
int[] indices = new int[words.size()];
for(int i=0; i<words.size(); i++){
indices[i] = i;
}
Random rand = new Random(System.currentTimeMillis());
for(int i=0; i<size; i++){
int take = rand.nextInt(indices.length-i);
result[i] = words.get(indices[take]);
indices[take] = indices[indices.length-i-1];
}
return result;
}
public static String guessBest(ArrayList<String> candidates, boolean firstGuess, boolean lastGuess){
if(candidates.size() == 1){
return candidates.get(0);
}
String minGuess = candidates.get(0);
int wordlen = minGuess.length();
if(firstGuess && guessBestList[wordlen].length()==wordlen){
return guessBestList[wordlen];
}
int minMatches = Integer.MAX_VALUE;
String[] words;
if(lastGuess){
words = candidates.toArray(new String[0]);
} else if (candidates.size()>10){
words = bestWords(wordlist.get(wordlen), candidates, 25);
} else {
words = wordlist.get(wordlen).toArray(new String[0]);
}
for(String guess: words){
double sumMatches = 0;
for(String word: candidates){
int matches = countMatchingCandidates(candidates, guess, getResponse(word, guess));
if(matches == 0) matches = candidates.size();
sumMatches += (matches-1)*(matches-1);
}
if(sumMatches < minMatches){
minGuess = guess;
minMatches = sumMatches;
}
}
return minGuess;
}
public static String[] bestWords(ArrayList<String> words, ArrayList<String> candidates, int size){
int[] charCount = new int[123];
for(String candidate: candidates){
for(int i=0; i<candidate.length(); i++){
charCount[(int)candidate.charAt(i)]++;
}
}
String[] tmp = (String[])words.toArray(new String[0]);
Arrays.sort(tmp, new WordComparator(charCount));
String[] result = new String[size+Math.min(size, candidates.size())];
String[] sampled = getSample(candidates, Math.min(size, candidates.size()));
for(int i=0; i<size; i++){
result[i] = tmp[tmp.length-i-1];
if(i < sampled.length){
result[size+i] = sampled[i];
}
}
return result;
}
static class WordComparator implements Comparator<String>{
int[] charCount = null;
public WordComparator(int[] charCount){
this.charCount = charCount;
}
public Integer count(String word){
int result = 0;
int[] multiplier = new int[charCount.length];
Arrays.fill(multiplier, 1);
for(char chr: word.toCharArray()){
result += multiplier[(int)chr]*this.charCount[(int)chr];
multiplier[(int)chr] = 0;
}
return Integer.valueOf(result);
}
public int compare(String s1, String s2){
return count(s1).compareTo(count(s2));
}
}
}
C++ 267700 Points
A porting from an old MasterMind engine.
Differences from MasterMind:
- More slots
- More symbols
- Bigger solution space (but non so much, because not all symbols combination allowed)
- The response is much informative, so we have more information after each guess
- The response is slower to generate and that's a pity because my algorithm have to do it a lot.
The basic idea is choosing the word that minimize the solution space. The algorithm is really slow for the first guess (I mean 'days'), but the best first guess depends only on the word len, so it's hardcoded in the source. The other guesses are done in matter of seconds.
The Code
(Compile with g++ -O3)
#include <iostream>
#include <iomanip>
#include <fstream>
#include <string>
#include <ctime>
#include <cstdlib>
using namespace std;
class LRTimer
{
private:
time_t start;
public:
void startTimer(void)
{
time(&start);
}
double stopTimer(void)
{
return difftime(time(NULL),start);
}
};
#define MAX_WORD_LEN 15
#define BIT_QM 0x8000
LRTimer timer;
int size, valid, wordLen;
string firstGuess[] = { "", "a", "as", "iao", "ares",
"raise", "sailer", "saltier", "costlier", "clarities",
"anthelices", "petulancies", "incarcerates", "allergenicity" };
class Pattern
{
public:
char letters[MAX_WORD_LEN];
char flag;
int mask;
Pattern()
: letters(), mask(), flag()
{
}
Pattern(string word)
: letters(), mask(), flag()
{
init(word);
}
void init(string word)
{
const char *wdata = word.data();
for(int i = 0; i < wordLen; i++) {
letters[i] = wdata[i];
mask |= 1 << (wdata[i]-'a');
}
}
string dump()
{
return string(letters);
}
int check(Pattern &secret)
{
if ((mask & secret.mask) == 0)
return 0;
char g[MAX_WORD_LEN], s[MAX_WORD_LEN];
int r = 0, q = 0, i, j, k=99;
for (i = 0; i < wordLen; i++)
{
g[i] = (letters[i] ^ secret.letters[i]);
if (g[i])
{
r += r;
k = 0;
g[i] ^= s[i] = secret.letters[i];
}
else
{
r += r + 1;
s[i] = 0;
}
}
for (; k < wordLen; k++)
{
q += q;
if (g[k])
{
for (j = 0; j < wordLen; j++)
if (g[k] == s[j])
{
q |= BIT_QM;
s[j] = 0;
break;
}
}
}
return r|q;
}
int count(int ck, int limit);
int propcheck(int limit);
void filter(int ck);
};
string dumpScore(int ck)
{
string result(wordLen, 'X');
for (int i = wordLen; i--;)
{
result[i] = ck & 1 ? 'O' : ck & BIT_QM ? '?' : 'X';
ck >>= 1;
}
return result;
}
int parseScore(string ck)
{
int result = 0;
for (int i = 0; i < wordLen; i++)
{
result += result + (
ck[i] == 'O' ? 1 : ck[i] == '?' ? BIT_QM: 0
);
}
return result;
}
Pattern space[100000];
void Pattern::filter(int ck)
{
int limit = valid, i = limit;
// cerr << "Filter IN Valid " << setbase(10) << valid << " This " << dump() << "\n";
while (i--)
{
int cck = check(space[i]);
// cerr << setbase(10) << setw(8) << i << ' ' << space[i].dump()
// << setbase(16) << setw(8) << cck << " (" << Pattern::dumpScore(cck) << ") ";
if ( ck != cck )
{
// cerr << " FAIL\r" ;
--limit;
if (i != limit)
{
Pattern t = space[i];
space[i] = space[limit];
space[limit] = t;
}
}
else
{
// cerr << " PASS\n" ;
}
}
valid = limit;
// cerr << "\nFilter EX Valid " << setbase(10) << valid << "\n";
};
int Pattern::count(int ck, int limit)
{
int i, num=0;
for (i = 0; i < valid; ++i)
{
if (ck == check(space[i]))
if (++num >= limit) return num;
}
return num;
}
int Pattern::propcheck(int limit)
{
int k, mv, nv;
for (k = mv = 0; k < valid; ++k)
{
int ck = check(space[k]);
nv = count(ck, limit);
if (nv >= limit)
{
return 99999;
}
if (nv > mv) mv = nv;
}
return mv;
}
int proposal(bool last)
{
int i, minnv = 999999, mv, result;
for (i = 0; i < valid; i++)
{
Pattern& guess = space[i];
// cerr << '\r' << setw(6) << i << ' ' << guess.dump();
if ((mv = guess.propcheck(minnv)) < minnv)
{
// cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
minnv = mv;
result = i;
}
}
if (last)
return result;
minnv *= 0.75;
for (; i<size; i++)
{
Pattern& guess = space[i];
// cerr << '\r' << setw(6) << i << ' ' << guess.dump();
if ((mv = guess.propcheck(minnv)) < minnv)
{
// cerr << setw(6) << mv << ' ' << setw(7) << setiosflags(ios::fixed) << setprecision(0) << timer.stopTimer() << " s\n";
minnv = mv;
result = i;
}
}
return result;
}
void setup(string wordfile)
{
int i = 0;
string word;
ifstream infile(wordfile.data());
while(infile >> word)
{
if (word.length() == wordLen) {
space[i++].init(word);
}
}
infile.close();
size = valid = i;
}
int main(int argc, char* argv[])
{
if (argc < 2)
{
cerr << "Specify word length";
return 1;
}
wordLen = atoi(argv[1]);
timer.startTimer();
setup("wordlist.txt");
//cerr << "Words " << size
// << setiosflags(ios::fixed) << setprecision(2)
// << " " << timer.stopTimer() << " s\n";
valid = size;
Pattern Guess = firstGuess[wordLen];
for (int t = 0; t < 5; t++)
{
cout << Guess.dump() << '\n' << flush;
string score;
cin >> score;
int ck = parseScore(score);
//cerr << "\nV" << setw(8) << valid << " #"
// << setw(3) << t << " : " << Guess.dump()
// << " : " << score << "\n";
if (ck == ~(-1 << wordLen))
{
break;
}
Guess.filter(ck);
Guess = space[proposal(t == 3)];
}
// cerr << "\n";
double time = timer.stopTimer();
//cerr << setiosflags(ios::fixed) << setprecision(2)
// << timer.stopTimer() << " s\n";
return 0;
}
My scores
Evaluation with lingo, 100 rounds:
4 9000
5 17700
6 22000
7 25900
8 28600
9 29700
10 31000
11 32800
12 33500
13 34900
Total 265'100
Self evalueted scores
Here are my average points, scored on the whole wordlist. Not completely relyable because some detail of algorithm has changed during the tests.
4 # words 6728 PT AVG 100.98 87170.41 s
5 # words 14847 PT AVG 164.44 42295.38 s
6 # words 28127 PT AVG 212.27 46550.00 s
7 # words 39694 PT AVG 246.16 61505.54 s
8 # words 49004 PT AVG 273.23 63567.45 s
9 # words 50655 PT AVG 289.00 45438.70 s
10 # words 43420 PT AVG 302.13 2952.23 s
11 # words 35612 PT AVG 323.62 3835.00 s
12 # words 27669 PT AVG 330.19 5882.98 s
13 # words 19971 PT AVG 339.60 2712.98 s
According to these numbers, my average score should be near 257'800
PIT SCORE
At last I installed Ruby, so now I have an 'official' score:
4 5 6 7 8 9 10 11 12 13 TOTAL
10700 16300 22000 25700 27400 30300 32000 33800 34700 34800 267700
Perl
There's still some room for improvement but at least it beats the included example players :)
Assumes write access to current directory for caching wordlists (to make it run a bit faster); will create wordlist.lenN.stor
files using Storable
. If this is an issue, remove read_cached_wordlist
and change the code to use just read_wordlist
.
Explanation
First, I build a histogram of letter frequencies in all the words in the current wordlist (build_histogram
). Then I need to choose my next guess - which is done by find_best_word
. The scoring algorithm is just adding the histogram values together, skipping letters already seen. This gives me a word containing the most frequent letters in the wordlist. If there is more than one word with a given score, I choose one randomly.
Having found a word, I send it to the game engine, read the reply then try and do something useful with it :)
I maintain a set of conditions, that is, letters that can occur at a given position in a word. On start, this is just simple (['a'..'z'] x $len)
, but it is updated basing on the hints given in reply (see update_conds
). I build a regex out of those conditions then and filter the wordlist through it.
During tests I have found out then that the aforementioned filtering doesn't handle ?
s too well, hence the second filter (filter_wordlist_by_reply
). This takes advantage of the fact that a letter marked as ?
occurs in the word on a different position, and filters the wordlist accordingly.
These steps are repeated for every iteration of the main loop, unless the solution is found (or it is not possible to read from stdin anymore, which means a failure).
Code
#!perl
use strict;
use warnings;
use v5.10;
use Storable;
$|=1;
sub read_wordlist ($) {
my ($len) = @_;
open my $w, '<', 'wordlist.txt' or die $!;
my @wordlist = grep { chomp; length $_ == $len } <$w>;
close $w;
\@wordlist
}
sub read_cached_wordlist ($) {
my ($len) = @_;
my $stor = "./wordlist.len$len.stor";
if (-e $stor) {
retrieve $stor
} else {
my $wl = read_wordlist $len;
store $wl, $stor;
$wl
}
}
sub build_histogram ($) {
my ($wl) = @_;
my %histo = ();
for my $word (@$wl) {
$histo{$_}++ for ($word =~ /./g);
}
\%histo
}
sub score_word ($$) {
my ($word, $histo) = @_;
my $score = 0;
my %seen = ();
for my $l ($word =~ /./g) {
if (not exists $seen{$l}) {
$score += $histo->{$l};
$seen{$l} = 1;
}
}
$score
}
sub find_best_word ($$) {
my ($wl, $histo) = @_;
my @found = (sort { $b->[0] <=> $a->[0] }
map [ score_word($_, $histo), $_ ], @$wl);
return undef unless @found;
my $maxscore = $found[0]->[0];
my @max;
for (@found) {
last if $_->[0] < $maxscore;
push @max, $_->[1];
}
$max[rand @max]
}
sub build_conds ($) {
my ($len) = @_;
my @c;
push @c, ['a'..'z'] for 1..$len;
\@c
}
sub get_regex ($) {
my ($cond) = @_;
local $" = '';
my $r = join "", map { "[@$_]" } @$cond;
qr/^$r$/
}
sub remove_cond ($$$) {
my ($conds, $pos, $ch) = @_;
return if (scalar @{$conds->[$pos]} == 1);
return unless grep { $_ eq $ch } @{$conds->[$pos]};
$conds->[$pos] = [ grep { $_ ne $ch } @{$conds->[$pos]} ]
}
sub add_cond ($$$) {
my ($conds, $pos, $ch) = @_;
return if (scalar @{$conds->[$pos]} == 1);
return if grep { $_ eq $ch } @{$conds->[$pos]};
push @{$conds->[$pos]}, $ch
}
sub update_conds ($$$$) {
my ($word, $reply, $conds, $len) = @_;
my %Xes;
%Xes = ();
for my $pos (reverse 0..$len-1) {
my $r = substr $reply, $pos, 1;
my $ch = substr $word, $pos, 1;
if ($r eq 'O') {
$conds->[$pos] = [$ch]
}
elsif ($r eq '?') {
for my $a (0..$len-1) {
if ($a == $pos) {
remove_cond $conds, $a, $ch
} else {
unless (exists $Xes{$a} and $Xes{$a} eq $ch) {
add_cond($conds, $a, $ch);
}
}
}
}
elsif ($r eq 'X') {
$Xes{$pos} = $ch;
for my $a (0..$len-1) {
remove_cond $conds, $a, $ch
}
}
}
}
sub uniq ($) {
my ($data) = @_;
my %seen;
[ grep { !$seen{$_}++ } @$data ]
}
sub filter_wordlist_by_reply ($$$) {
my ($wl, $word, $reply) = @_;
return $wl unless $reply =~ /\?/;
my $newwl = [];
my $len = length $reply;
for my $pos (0..$len-1) {
my $r = substr $reply, $pos, 1;
my $ch = substr $word, $pos, 1;
next unless $r eq '?';
for my $a (0..$len-1) {
if ($a != $pos) {
if ('O' ne substr $reply, $a, 1) {
push @$newwl, grep { $ch eq substr $_, $a, 1 } @$wl
}
}
}
}
uniq $newwl
}
my $len = $ARGV[0] or die "no length";
my $wl = read_cached_wordlist $len;
my $conds = build_conds $len;
my $c=0;
do {
my $histo = build_histogram $wl;
my $word = find_best_word $wl, $histo;
die "no candidates" unless defined $word;
say $word;
my $reply = <STDIN>;
chomp $reply;
exit 1 unless length $reply;
exit 0 if $reply =~ /^O+$/;
update_conds $word, $reply, $conds, $len;
$wl = filter_wordlist_by_reply $wl, $word, $reply;
$wl = [ grep { $_ =~ get_regex $conds } @$wl ]
} while 1