Using Rules inside SequenceCases degrades Performance
Let's start by looking at the code for SequenceCases
:
<< GeneralUtilities`
PrintDefinitions[SequenceCases]
We can see in this code that three different conditions determine whether the expression will be evaluated by sequenceCasesSublist
or sequenceCasesPattern
. Let's evaluate the two tests that matter on {a_?PrimeQ, b_, c_?PrimeQ}
and {a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}
respectively.
{
SymbolicTensors`UtilitiesDump`VariableLengthPatternFreeQ[{a_?PrimeQ, b_, c_?PrimeQ}],
SymbolicTensors`UtilitiesDump`VariableLengthPatternFreeQ[{a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}]
}
{True, True}
{
SymbolicTensors`UtilitiesDump`ListRepresentationQ[{a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}],
SymbolicTensors`UtilitiesDump`ListRepresentationQ[{a_?PrimeQ, b_, c_?PrimeQ}]
}
{False, True}
Because they do not return the same result on this last test, they are actually handled by different functions, and it seems like sequenceCasesSublist
is much faster than sequenceCasesPattern
.
If we run
PrintDefinitions[SymbolicTensors`UtilitiesDump`sequenceCasesPattern]
we can see that the very first thing that sequenceCasesPattern
does is to rewrite the expression in this form:
Replace[SequenceCases[Range[10^4], {a_?PrimeQ, b_, c_?PrimeQ}], {a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}, 1]
which is fast like your ReplaceAll
version, because it can be evaluated with sequenceCasesSublist
. However, in the source code, instead of SequenceCases
the function sequenceCasesPattern
is used, which never runs the initial tests of SequenceCases
; therefore it doesn't realize that it can choose the fast routine. That's why it's slow.
sequenceCasesPattern
, I may add, is implemented with ReplaceList
.
With Mathematica version 11.1.0 there is no slowdown anymore:
$Version
SequenceCases[Range[10^6], {a_?PrimeQ, b_, c_?PrimeQ}]; // AbsoluteTiming
SequenceCases[Range[10^6], {a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}]; // AbsoluteTiming
"11.1.0 for Microsoft Windows (64-bit) (March 13, 2017)" {1.13733, Null} {1.13629, Null}
Here is output from version 11.0.1:
$Version
SequenceCases[Range[10^4], {a_?PrimeQ, b_, c_?PrimeQ}]; // AbsoluteTiming
SequenceCases[Range[10^4], {a_?PrimeQ, b_, c_?PrimeQ} :> {a, c}]; // AbsoluteTiming
"11.0.1 for Microsoft Windows (64-bit) (September 20, 2016)" {0.016909744048664993`, Null} {1.1500872252585508`, Null}
So the problem was fixed in version 11.1.0.