Why (and when) does pattern matching with f[__] perform MUCH more quickly than _f?

Times has the attributes Flat and Orderless. This means that any pattern that matches some combination of the arguments must, in principle, scan every permutation of arguments. Sometimes, the pattern matcher can optimize and avoid a full scan in the presence of explicit values and heads. Patterns of the form f[__] (i.e. f[BlankSequence[]] trigger such explicit head optimization whereas patterns like _f (i.e. Blank[f]) do not -- presumably due to implementation details within the pattern matcher.

Analysis (current as of version 11.0.1)

We can reproduce the behaviour in a Flat Orderless function of our own devising:

SetAttributes[times, {Flat, Orderless}]

times[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, f[1], g[2]] /. 
  times[_f, _g] :> fg // AbsoluteTiming // First
(* 7.62321 *)


times[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, f[1], g[2]] /. 
  times[f[_], g[_]] :> fg // AbsoluteTiming // First
(* 0.000033407 *)

Flat Orderless Matching: The General Case

Let us begin by examining the complexity of the general case when performing pattern matching upon a Flat Orderless function. Consider the following transformation:

times[1, 2, 3, a] /. times[x_, a] :> {x, a}
(* {times[1,2,3], a} *)

Take note that the pattern matcher correctly identified that x matches multiple arguments to times, namely the leading prefix times[1, 2, 3]. We can observe the internal matching operation if we add conditions to the subpatterns that display some output:

times[1, 2, 3, a] /.
  times[x_ /; (Echo[x, "x_"]; True), m_ /; (Echo[m, "m_"]; m===a)] :> {x, a}

pattern trace

Notice how hard the pattern matcher had to work to get the final result. It had to scan through various permutations of subparts within the times[...] expression until it finally found its match.

Helper Function

We will introduce a helper function tp that adjusts a pattern to display some output whenever it is matched:

tp[patt_] := Module[{s}, Condition @@ Hold[s : patt, (Echo[{s}, patt]; True)]]

The Case At Hand

We can use this function to observe how pattern matcher operations grow exponentially as expression size increases for the problematic case at hand:

times[1, 2, f[1], g[2]] /. times[tp[_f], tp[_g]] :> fg;

pattern matching trace

times[1, 2, 3, f[1], g[2]] /. times[tp[_f], tp[_g]] :> fg;

pattern matching trace

times[1, 2, 3, 4, f[1], g[2]] /. times[tp[_f], tp[_g]] :> fg;

pattern matching trace

In contrast, when we match using f[_] and g[_] instead of _f and _g, the number of operations remains constant:

times[1, 2, f[1], g[2]] /. times[tp[f[_]], tp[g[_]]] :> fg;

pattern matching trace

times[1, 2, 3, f[1], g[2]] /. times[tp[f[_]], tp[g[_]]] :> fg;

pattern matching trace

times[1, 2, 3, 4, f[1], g[2]] /. times[tp[f[_]], tp[g[_]]] :> fg;

pattern matching trace

Clearly in the latter case the pattern matcher is applying an optimization. It need only scan the expression linearly to find the explicit heads f and g and then back-track to verify that the entire pattern is matched. We can see this explicitly if we also display the matched prefix:

times[1, 2, 3, 4, f[1], g[2]] /. times[tp[___], tp[f[_]], tp[g[_]]] :> fg;

pattern matching trace

Even a small case of the problematic expression will produce a lot of output if we trace the prefixes successively considered during its scan:

times[1, 2, 3, f[1], g[2]] /. times[tp[___], tp[_f], tp[_g]] :> fg;

pattern matching trace

Note that the matcher is considering numerous combinations until it finally finds the match. In fact, the output resembles the general case that we examined earlier although more rescanning is taking place here.

The pattern matcher is not recognizing that it has the same opportunity to optimize that it had in the previous expression. Apparently, its implementation will recognize that the pattern f[__] (i.e. f[BlankSequence[]] has an explicit head but it fails to make that recognition for _f (i.e. Blank[f]). My guess is that this is an implementation coincidence and that the code is explicitly looking for the (meta)pattern _[BlankSequence[]] but not Blank[_]. The pattern matcher is rumoured to be an interesting piece of code so it might not necessarily be easy for WRI to introduce or maintain optimizations of this sort.

Disclaimer

Beware that it is difficult to trace the operation of the pattern matcher from high-level code. Any change to a pattern can alter the execution strategy chosen by the matcher (e.g. a change such as the trick used here of introducing a condition to display output). The examples shown in this response are meant to illustrate the principles involved rather than offering a strict step-by-step description of pattern matching operation.

This is not an answer, but an extended comment.

I experimented with your code using V11.0.1 on OS X 10.10.2 running on a seven year old iMac. I can confirm that something funny is going on that it happens when the number of elements given to an arithmetic operation reaches a certain size (in my tests at about 14 elements).

Here are my tests and results:

Test data generator

expr[nn_ /; 20 >= nn > 1] := 
  With[
      {symbols = {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t},
       funcs = {myG[u], myF[a, b]}},
    Times @@ Join[funcs, symbols[[;; nn]]]]

expr[14]

a b c d e f g h i j k l m n myF[a, b] myG[u]

Getting the timings

timingHeadPattern =
  Table[
    With[{expr = expr[i]}, 
      {expr // Length, 
       AbsoluteTiming[expr /. _myF _myG -> combinedForm][[1]]}], 
    {i, 2, 16}];

timingFuncPattern =
  Table[
    With[{expr = expr[i]}, 
      {expr // Length, 
       AbsoluteTiming[expr /. myF[__] myG[__] -> combinedForm][[1]]}], 
    {i, 2, 16}];

Results

ListPlot[{timingHeadPattern, timingFuncPattern},
  PlotMarkers -> {{\[EmptyCircle], 14}, \[FilledSmallCircle]},
  PlotLegends -> {_myF _myG, myF[__] myG[__]},
  PlotRange -> All,
  Frame -> True,
  FrameLabel -> Evaluate[Style[#, 14] & /@ {"factors", "timing (s)"}],
  ImageSize -> 500]

plot

Clearly, two different algorithms are being used. One has essentially constant time and the other what looks like polynomial time.

That's about all I have to say, except that I think this matter should brought to the attention of Wolfram Research tech support.

Why (and when) does pattern matching with f[__] perform MUCH more quickly than _f?

Tags:

Pattern Matching

Performance Tuning

Related

Recent Posts