Unscrambling jumbled anagrams

Edit In this edited version, the upper-case characters are treated as independent characters (you can easily adjust that) and, more importantly, the selection of all the anagrams is more efficient.

An arithmetic-based approach.

Principle The idea is to use primality to check if a word is included in another. For example, if $abc=2\times3\times 5$, then $cb=5\times 3$ divides $abc$ so it is a "sub-anagram". But $aa=2\times 2$ is not because $4$ does not divide $30$. In short, it is based on the unicity of the prime factorization.

Let's go Extract the dictionary (dic) and encode each character with a prime number in an association:

dic = DictionaryLookup["*"];
chars = CharacterRange[65, 300]~Join~{"'", "-"};
corr = <|#[[1]] -> #[[2]] & /@ Transpose[{chars, Prime@Range[Length@chars]}]|>;

Then, convert the dictionary to integers:

convert[s_String] := Times @@ corr /@ Characters[s]
dicnum = convert /@ dic;

As a verification step, we can extract the anagrams of "listen". It is about 150 times faster than the with the code in the OP:

getwords[n_Integer] := dic[[Flatten@Position[dicnum, n]]]
getwords[convert["listen"]]    
(* {"enlist", "inlets", "listen", "silent", "tinsel"} *)

Now, let's illustrate the idea with "dormitory": it is converted to an integer, that is then decomposed into a list of all possible products using this answer (thank you @J.M. for pointing it out).

n = convert["dormitory"];
<< Combinatorica`
divisors[x_] := DeleteDuplicates[Sort /@ Map[Times @@ # &, 
     SetPartitions[Flatten[ConstantArray @@@ FactorInteger[x]]], {2}]]
div = divisors[n]
(* {{37051291718641225291}, {7, 5293041674091603613}, ... *)

div contains the 8155 decompositions of "dormitory". Now, it suffices to pick the ones that correspond to words in the dictionary and recombine them:

good = GatherBy[Select[div, AllTrue[#, MemberQ[dicnum, #] &] &], Length]
recombine[l_] := Flatten@Table[StringRiffle[#, " "] & /@ 
     Distribute[getwords[#] & /@ good[[l, i]], List], {i, Length@good[[l]]}]

Note that the classification by number of words (l) in the anagram is arbitrary.

Results

And now, the fun:

1 anagram with one word:
```
recombine[1]
(* {"dormitory"} *)
```

3 anagrams with two words:

extract[tab, 2]
(* {"moor dirty", "room dirty", "dirt roomy"} *)

46 anagrams with three words:

extract[tab, 3]
(* "id try moor", "id try room", "id my rotor", "mi try door", ... *)

10 anagrams with four words:

extract[tab, 4]
(* "id om or try", "id or my rot", "id or my tor", "do mi or try", ... *)

And that's it. The method can be easily adapted if you don't want to distinguish ü, û, u, U etc.: just code these letters with the same number. Also, it is exhaustive: all the anagrams are computed.

a bit brute force, but not unreasonably slow..

Anagrams[s_String, n_] := 
  Module[{min = 1, max, all},(*min sets minimum word length*)
    max = StringLength@s - min(n-1);
    all = DictionaryLookup[
        x__ /; min <= StringLength[x] <= max && 
        Complement[Characters[ToLowerCase@x], Characters[s]] == {}];
    Select[Subsets[all, {n}], 
        Sort@Characters[StringJoin @@ (ToLowerCase /@ #)] == 
        Sort@Characters[s] &]]
Anagrams["dormitory", 2]

{{"dirt", "roomy"}, {"dirty", "moor"}, {"dirty", "Moor"}, {"dirty", "Moro"}, {"dirty", "room"}}

Anagrams["dormitory", 3]

{{"dim", "or", "Tory"}, {"dim", "or", "troy"}, {"dim", "or", "Troy"},...(210)

incidentally there is only one with 5 words: {{"I", "do", "Mr", "or", "Ty"}}.

Unscrambling jumbled anagrams

An arithmetic-based approach.

Tags:

Linguistics

String Manipulation

Natural Language

Related

Recent Posts