Unscrambling jumbled anagrams
Edit In this edited version, the upper-case characters are treated as independent characters (you can easily adjust that) and, more importantly, the selection of all the anagrams is more efficient.
An arithmetic-based approach.
Principle The idea is to use primality to check if a word is included in another. For example, if $abc=2\times3\times 5$, then $cb=5\times 3$ divides $abc$ so it is a "sub-anagram". But $aa=2\times 2$ is not because $4$ does not divide $30$. In short, it is based on the unicity of the prime factorization.
Let's go
Extract the dictionary (dic
) and encode each character with a prime number in an association:
dic = DictionaryLookup["*"];
chars = CharacterRange[65, 300]~Join~{"'", "-"};
corr = <|#[[1]] -> #[[2]] & /@ Transpose[{chars, Prime@Range[Length@chars]}]|>;
Then, convert the dictionary to integers:
convert[s_String] := Times @@ corr /@ Characters[s]
dicnum = convert /@ dic;
As a verification step, we can extract the anagrams of "listen"
. It is about 150 times faster than the with the code in the OP:
getwords[n_Integer] := dic[[Flatten@Position[dicnum, n]]]
getwords[convert["listen"]]
(* {"enlist", "inlets", "listen", "silent", "tinsel"} *)
Now, let's illustrate the idea with "dormitory"
: it is converted to an integer, that is then decomposed into a list of all possible products using this answer (thank you @J.M. for pointing it out).
n = convert["dormitory"];
<< Combinatorica`
divisors[x_] := DeleteDuplicates[Sort /@ Map[Times @@ # &,
SetPartitions[Flatten[ConstantArray @@@ FactorInteger[x]]], {2}]]
div = divisors[n]
(* {{37051291718641225291}, {7, 5293041674091603613}, ... *)
div
contains the 8155 decompositions of "dormitory". Now, it suffices to pick the ones that correspond to words in the dictionary and recombine them:
good = GatherBy[Select[div, AllTrue[#, MemberQ[dicnum, #] &] &], Length]
recombine[l_] := Flatten@Table[StringRiffle[#, " "] & /@
Distribute[getwords[#] & /@ good[[l, i]], List], {i, Length@good[[l]]}]
Note that the classification by number of words (l
) in the anagram is arbitrary.
Results
And now, the fun:
1 anagram with one word:
recombine[1] (* {"dormitory"} *)
3 anagrams with two words:
extract[tab, 2] (* {"moor dirty", "room dirty", "dirt roomy"} *)
46 anagrams with three words:
extract[tab, 3] (* "id try moor", "id try room", "id my rotor", "mi try door", ... *)
10 anagrams with four words:
extract[tab, 4] (* "id om or try", "id or my rot", "id or my tor", "do mi or try", ... *)
And that's it. The method can be easily adapted if you don't want to distinguish ü, û, u, U
etc.: just code these letters with the same number. Also, it is exhaustive: all the anagrams are computed.
a bit brute force, but not unreasonably slow..
Anagrams[s_String, n_] :=
Module[{min = 1, max, all},(*min sets minimum word length*)
max = StringLength@s - min(n-1);
all = DictionaryLookup[
x__ /; min <= StringLength[x] <= max &&
Complement[Characters[ToLowerCase@x], Characters[s]] == {}];
Select[Subsets[all, {n}],
Sort@Characters[StringJoin @@ (ToLowerCase /@ #)] ==
Sort@Characters[s] &]]
Anagrams["dormitory", 2]
{{"dirt", "roomy"}, {"dirty", "moor"}, {"dirty", "Moor"}, {"dirty", "Moro"}, {"dirty", "room"}}
Anagrams["dormitory", 3]
{{"dim", "or", "Tory"}, {"dim", "or", "troy"}, {"dim", "or", "Troy"},...(210)
incidentally there is only one with 5 words: {{"I", "do", "Mr", "or", "Ty"}}.