Do SetSharedVariable/SetSharedFunction ruin the benefits of ParallelTable?
Do SetSharedVariable/SetSharedFunction ruin the benefits of ParallelTable?
Yes, they do. This has been discussed here many times, for example at:
- Why won't Parallelize speed up my code?
Mathematica uses separate processes for parallelization. This means that the parallel threads cannot share any memory. What SetSharedVariable
really does is that it causes that variable to always be evaluated an set on the main kernel. This involves a callback from the subkernel to the main kernel. Main kernel – subkernel communication is already a major bottleneck in the parallel tools. Forcing it for every single evaluation will typically kill all speed benefits. (Note that otherwise communication may happen only as few times as the number of subkernels. This is the case with Method -> "CoarsestGrained"
.)
The only exception is when the evaluation on the subkernel takes significantly longer than the callback to the main kernel. For example, take
list={};
SetSharedVariable[list];
ParallelDo[AppendTo[list, f[i]], {i, 100}]
This is effective only if f[i]
takes long to evaluate (say, 1 second or more), and it does not return a lot of data (say, it returns a number instead of an array). The subkernel evaluations should take significantly longer than the communication between kernels.
Because of this, the key to effective parallelization in Mathematica is to fully separate the tasks of subkernels and avoid any communication between them. If they need to access the same variable, things get much more difficult.
Functional programming is much more amenable to parallelization because it avoids mutable data structures and side effects. To put it in simple terms, a problem is well parallelizable if you can phrase it in terms of Map
(ParallelMap
) or ParallelCombine
.
This is more a summary of a lengthy discussion in comments than an answer:
The example as given in the question is actually a worst case for a parallel program: the worker processes do not have a lot of work to do but need to return quite large results (for j=6,7
) back to the master and most probably are getting into the way of each other doing so. Using the following version you can see that almost the entire time is spent sending the j=7
result (most probably including some time waiting for j=6
to be returned):
SetSharedVariable[foo]
ParallelTable[
Module[{res},
Print[{"calc", j} -> AbsoluteTiming[res = j^j^j;]];
Print[{"send", j} -> AbsoluteTiming[foo = res;]];
],
{j, 1, 7}
]
Of course in this case the two factors of magnitude that the parallel version takes longer than the sequential one is almost entirely due to how inefficient Mathematica worker kernels communicate data back to the master. On the other hand code as shown would hardly have a chance to see any speedup with whatever technology, platform or language you would use.
In general, when trying to see good speedup with parallel code, you will need to minimize any communication overhead and synchronization between the parallel parts of your code. This is even more important in Mathematica compared to other languages/thechnologies due to the high level at which it operates and some suboptimal implementation details. Both make you pay an especially high price for any communication/synchronization.
I understand that the given example is just a demonstration, but if your code does contain parts that do similar things I would suggest to rethink your coding strategies if you want to see speedup from parallelization. Follow the advices in the answer of Szabolcs is a good starting point.
You also should be aware of the fact that it usually is much easier to see surprisingly high speedup using the various optimization strategies for sequential Mathematica code that you can find in other questions and answers on this site.