How can I find a more efficient solution to mapping characters to digits?
Update
The following approach is much faster when the number of words is small and the number of mappings is large:
stringsToNumberMappings[words_List, letterMap_, samples_]:=Module[
{
len = Length @ letterMap, lrule, wvector
},
lrule = Thread @ Rule[letterMap, Range@len];
wvector = Transpose @ Normal @ Table[
SparseArray[
Thread @ Rule[Characters[w] /. lrule, 10^Range[StringLength[w]-1, 0, -1]],
len
],
{w, words}
];
samples . wvector
]
Comparison with previous answer:
words = {"send", "more", "money"};
letterMap = Flatten[Characters[words]] // DeleteDuplicates;
samples = Table[RandomSample[0 ;; 9, 8], 10^5];
r1 = stringsToNumberMappings[words, letterMap, samples]; //AbsoluteTiming
r2 = StringsToNumbers[words, letterMap, #]& /@ samples; //AbsoluteTiming
r1===r2
{0.010555, Null}
{2.2923, Null}
True
Original answer
I assume the computational inefficiency you're encountering is performing your replacement on lots of words? In that case, you could make use of the Listable
attribute of StringReplace
:
StringsToNumbers[str_List, letters_List, numbers_List] := With[
{d = Thread[letters -> ToString /@ numbers]},
FromDigits /@ StringReplace[str, d]
]
Simple example:
words = {"send", "more", "money"};
StringsToNumbers[words, letterMap, {7, 3, 6, 9, 5, 0, 4, 1}]
{7369, 5043, 50631}
Bigger example with a million words:
big = RandomChoice[words, 10^6];
StringsToNumbers[big, letterMap, {7, 3, 6, 9, 5, 0, 4, 1}]; //AbsoluteTiming
{0.908794, Null}
Recently, I found out for myself that ToCharacterCode
is very efficient in transforming strings to numbers. In particular, ToCharacterCode
turns strings into a packed arrays which is very good for performance.
Here the preparation (stealing a bit from Carl Woll).
SeedRandom[123];
words = {"send", "more", "money"};
letterMap = DeleteDuplicates@Flatten[Characters[strings]];
numberMap = RandomSample[Range[0, 9], 8];
big = RandomChoice[words, 10^6];
We create a packed array as lookup table via SparseArray
and perform the actual lookup together with conversion to numbers in one go with the following compiled function
lookup = Compile[{{lookuptable, _Integer, 1}, {idx, _Integer, 1}},
Sum[10^(Length[idx] - i) Compile`GetElement[lookuptable,
Compile`GetElement[idx, i]], {i, 1, Length[idx]}],
CompilationTarget -> "C",
RuntimeAttributes -> {Listable},
Parallelization -> True,
RuntimeOptions -> "Speed"
];
StringsToNumbers2[strings_, letterMap_, numberMap_] := lookup[
Normal[SparseArray[ToCharacterCode[letterMap] -> numberMap]],
ToCharacterCode[strings]
];
On my machine, method is approximately twice as fast as using StringReplace
and FromDigits
:
result1 = StringsToNumbers[big, letterMap, numberMap]; // RepeatedTiming // First
result2 = StringsToNumbers2[big, letterMap, numberMap]; // RepeatedTiming // First
result1 === result2
0.810
0.40
True