ParallelMap vs Map performance comparsion

The difference between the parallelized and the serial version is not due to auto-compilation. Auto-compilation will be used on the subkernels too. You can easily check that turning it off there will slow the parallel evaluation down considerably.

Not every program parallelizes well in Mathematica, and very often finding out where the bottleneck is is quite difficult. I describe some common reasons here:

  • Why won't Parallelize speed up my code?

I believe that in your example the most likely cause is this bug:

  • Transferring a large amount of data in parallel calculations

You can check that the fix shown there will speed up things considerably.

This bug has been fixed in version 11.1 (by optimizing MemberQ for packed arrays: it no longer unpacks when not necessary). In 11.1, the parallel version is actually faster than the serial version:

In[1]:= LaunchKernels[]

In[2]:= Map[(N@Sin[#]) &, Range[10000000]]; // AbsoluteTiming
Out[2]= {0.81256, Null}

In[3]:= ParallelMap[(N@Sin[#]) &, Range[10000000]]; // AbsoluteTiming
Out[3]= {0.407437, Null}

I suppose that both you an Marius are using an earlier version of Mathematica.


Here are some tips for good performance:

  • As Marius said, vectorization has a much bigger potential for speedup than parallelization.

  • Transferring data between the main kernel and subkernels can be expensive. When you transfer a lot of data, like here, paralellization does not always do so well. This is particularly true if the data is not a packed array. In your case it is a packed array, so we do actually get a speedup (but not a 4x speedup on my 4-core machine).


EDIT

As Szabolcs has pointed out in his post, auto-compilation in Map is in fact not the cause of the issue in OP. For context I will leave my original post.

ORIGINAL POST

In your example, this happens because Map can auto-compile certain functions, and N@Sin[#] & fits the auto-compilation criteria. You can read a very good post about it here, by Leonid Shifrin. The fact that autocompilation is the "culprit" here can be verified by turning it off and checking the timing again:

System`SetSystemOptions["CompileOptions" -> "MapCompileLength" -> Infinity];
Map[(N@Sin[#]) &, Range[10000000]] // AbsoluteTiming // First
ParallelMap[(N@Sin[#]) &, Range[10000000]] // AbsoluteTiming // First

gives

13.477214

2.432819

on my machine. So, the parallel version is now faster.

I'm not sure that you are aware of this, but (at least in my experience) using Listable functions and so-called vectorized operations can have much larger potentials for speed-up than using parallelization. Here we would do

Sin@N@Range[10000000] // AbsoluteTiming // First

0.100697

i.e. no need for Map at all.