What is wrong with function `LetterCounts` and other functions that operate on strings?
For short strings LetterCounts
is slower, not sure why, for longer strings the timings are identical. Do you see similar behavior?
randomString[n_] :=
RandomInteger[{1, 26}, n] /.
Thread[Range[26] -> CharacterRange["A", "Z"]] // StringJoin
counts[str_] :=
KeyMap[FromCharacterCode,
Sort[Counts[Partition[ToCharacterCode[str], 2, 1]], Greater]]
<< GeneralUtilities`
BenchmarkPlot[{LetterCounts[#, 2] &, counts[#] &},
randomString[#] &,
10^Range[6],
"IncludeFits" -> True]
LetterCounts[str, 2]
and
KeyMap[FromCharacterCode,
Sort[Counts[Partition[ToCharacterCode[str], 2, 1]], Greater]];
are not equivalent operations - just try the inputs found in the LetterCounts
documentation and you'll quickly see differences. So the timing comparison is not very meaningful.
edit: To answer the question in the comments, the self-written
myCharacterCounts[str_, n_] := KeyMap[FromCharacterCode,
Counts @ Partition[ToCharacterCode @ str, n, 1]
]
will run slightly faster than CharacterCounts[str, n]
, though on my machine the difference is sub-millisecond even for very large strings.
But this myCharacterCounts
function still does not do everything that CharacterCounts
does.
CharacterCounts
takes options, as in
In[45]:= CharacterCounts["aAbBcC", IgnoreCase -> True]
Out[45]= <|"c" -> 2, "b" -> 2, "a" -> 2|>
and does argument checking, issuing a message for CharacterCounts[]
or CharacterCounts[2]
. Argument checking and options handling are generally required for any built-in system function, but not needed for self-written functions where you know you won't be passing bad arguments or options. This may be enough to account for the timing difference, or maybe CharacterCounts
is being inefficient somewhere - I can't say.
I will say that it is often, but not always, possible to beat the timing for built-in functions if you focus on a subset of the functionality and neglect error handling. And if your application is time sensitive then it is worthwhile to use the custom function instead.