Choose between `Apply` and `Map`

Odd that this question had no answers until today. Anyway, the performance of various methods, unsurprisingly, depends on the details of their applications. Generally speaking there are several layers of possible performance levels in Mathematica including optimized C libraries, sometimes with automatic parallelism, functions compiled within Mathematica either manually or automatically, operations fully internal to the pattern-matching engine, and high level rule-based expression transformation. (This is by no means an exhaustive list.) The more efficient a layer you are able to apply to your program, the faster it will run.

The high level rule language is highly flexible but not generally fast. An example:

dat = RandomReal[1, {1*^6, 2}];

Replace[dat, {a_, b_} :> Divide[a, b], {1}] // Timing // First // AbsoluteTiming
{0.612035, 0.592804}

This is (or should be) effectively equivalent to this Map application, though Map is slower for a reason I do not understand:

divide[{a_, b_}] := Divide[a, b]

divide /@ dat // AbsoluteTiming // First // Timing
{1.014007, 1.020058}

Both methods above require both a pattern matching operation and application of Divide for each pair of numbers. Using Apply at level one and Divide avoids the first operation and is faster:

Divide @@@ dat // AbsoluteTiming // First // Timing
{0.405603, 0.401023}

This, still requires a separate high level evaluation for each pair of numbers however. But Map has a possible trick in store: automatic compilation. For this to be applicable, the data need to be packed or packable and the function must be compilable. In this case dat is both packable and packed, as output by RandomReal, but divide is not automatically compilable. If instead we use:

divide2 = Divide[#[[1]], #[[2]]] &;

divide2 /@ dat // AbsoluteTiming // First // Timing
{0.093601, 0.098006}

This is automatically compilable and we see an order of magnitude improvement. Here, there is only the initial high level evaluation of Map with its automatic compilation, then all the pairs are processed in lower-level code. This, at least by default, is running in the Mathematica virtual machine, rather than low level C code, so it is not yet optimal. Divide is (presumably) written in C but if it is to be fast, we need to avoid repeated high level evaluation of the function and instead pass it all the data at once. It bears the Listable attribute and is specifically optimized for this case. To do that, we can use Transpose as mentioned in the comments, below the question:

Divide @@ Transpose[dat] // AbsoluteTiming // First // Timing
{0., 0.022001}

This is so fast, we need larger data to get an accurate estimate:

big = RandomReal[1, {1*^8, 2}];

(big2 = Transpose[big]) // AbsoluteTiming // First // Timing

Divide @@ big2          // AbsoluteTiming // First // Timing
{0.358802, 0.428024}

{0.249602, 0.589034}

This is another order of magnitude faster than the auto-compiled Map operation. We can see the breakdown in the time between Transpose and Divide operations. There is very little high-level evaluation here; nearly all of the computation is done at the compiled library level. This is enabled by packed input to functions specifically optimized to handle it. (Apply in @@ dat unpacks but only one level; Divide is still given two packed lists as input.)

In each of the examples above, I used Divide rather than / for consistency due to this issue:

  • Why are numeric division and subtraction not handled better in Mathematica?

The problem isn't choosing whether Map, Apply, etc., are faster, but how to decide how much work you're asking Mathematica to do. I like the following method:

Trace shows you every detail of every step of a calculation; if you apply Trace to each of your calculations, you'll see that there are simply more steps required for some methods than the others. So:

dat = {{a, b}, {c, d}, {e, f}}
Trace[#[[1]]/#[[2]] & /@ dat]
(*
{(#1[[1]]/#1[[2]] &)[{a, b}], (#1[[1]]/#1[[2]] &)[{c, 
    d}], (#1[[1]]/#1[[2]] &)[{e, f}]} etc ...

*)

The output from Trace quickly gets impenetrable, so I use ByteCount as a rough and ready way of estimating how complex it is --- ByteCount is one way of counting how many steps there were, and how complicated they were.

So, combining the steps:

dat = {{a, b}, {c, d}, {e, f}}
ByteCount@Trace[#[[1]]/#[[2]] & /@ dat]  (* 9600 bytes *)
ByteCount@Trace[#1/#2 & @@@ dat]         (* 4800 bytes *)
ByteCount@Trace[Divide @@@ dat]          (* 2672 bytes *)

You can also try the same exercise with the other versions of "dat" you'll see that the amount of time to complete a calculation is loosely related to the size of Trace.

dat = {RandomReal[], RandomReal[]} & /@ Range@(10^6);
ByteCount@Trace[#[[1]]/#[[2]] & /@ dat]  (* 40 001 264 bytes *)
ByteCount@Trace[#1/#2 & @@@ dat]         (* 1 648 001 304 bytes *)
ByteCount@Trace[Divide @@@ dat]          (* 416 001 000 bytes *)

Note that the first method is very efficient when working on the packed array. If you "unpack" the array, the first method loses its efficiency:

dat = {RandomReal[], RandomReal[]} & /@ Range@(10^6);
dat[[1, 1]] = {1, 1};
ByteCount@Trace[#[[1]]/#[[2]] & /@ dat]  (* 3 152 003 128 bytes *)

p.s. never ever forget Transpose... for packed data, this is the fastest method:

With[{t = Transpose[dat]}, t[[1]]/t[[2]]]