Parallelize evaluation of function with memoization

The problem with SetSharedFunction is that it forces f to be evaluated on the main kernel: this means that if you simply do

SetSharedFunction[f]

then you will lose parallelization (a timing of ParallelTable[f[x], {x, 3}] will give about 9 seconds).

This property of SetSharedFunction is not clear from the documentation in my opinion. I learned about it from this answer. It is also not clear if the behaviour is the same in version 7 (can someone test? I tested my answer on version 8 only).

We can however store the memoized vales on the main kernel, while evaluating the expensive computations on the paralell kernels. Here's one way to do it:

f[x_] :=
 With[{result = g[x]},
  If[result === Null,
   g[x] = (Pause[3]; N[Sin[x]]),
   result
  ]
 ]
    
SetSharedFunction[g]

Here I used the special property of shared functions that they return Null on the paralell kernels when they have no value for a given argument.

The first time we run this, we get a 6 s timing, as expected:

AbsoluteTiming@ParallelTable[f[x], {x, 3}]

(* ==> {6.0533462, {0.841471, 0.909297, 0.14112}} *)

The second time it will be very fast:

AbsoluteTiming@ParallelTable[f[x], {x, 3}]

(* ==> {0.0260015, {0.841471, 0.909297, 0.14112}} *)

However, as you noticed, evaluating f on the parallel kernels is a bit slow. On the main kernel it's much faster. This is due to the communication overhead: every time f is evaluated or changed on a subkernel, it needs to communicate with the main kernel. The slowdown does not really matter if f is really expensive (like the 3 seconds in your toy example), but it can be significant if f is very fast to execute (comparable in time to the apparently ~10 ms communication overhead)

ParallelTable[AbsoluteTiming@f[x], {x, 3}]

(* ==> {{0.0100006, 0.841471}, {0.0110006, 0.909297}, {0.0110007,  0.14112}} *)

Table[AbsoluteTiming@f[x], {x, 3}]

(* ==> {{0., 0.841471}, {0., 0.909297}, {0., 0.14112}} *)

Finally a note about benchmarking: in general, measuring very short times like the 10 ms here should be done with care. On older versions of Windows, the timer resolution is only 15 ms. On Windows 7, the resolution is much better. These timings are from Windows 7.


Update:

Based on @Volker's @Leonid's suggestion in the comments, and @Volker's original solution, we can combine subkerel and main kernel caching like this:

f[x_] := With[{result = g[x]}, (f[x] = result) /; result =!= Null];
f[x_] := g[x] = f[x] = (Pause[3]; N[Sin[x]])

Packaged up solution

We can bundle all these ideas into a single memoization function (see the code at the end of the post).

Here is an example use:

Clear[f]
f[x_] := (Pause[3]; Sin[x])

AbsoluteTiming@ParallelTable[AbsoluteTiming@pm[f][Mod[x, 3]], {x, 15}]

(* ==>
{6.0683471, {{3.0181726, Sin[1]}, {0.0110007, Sin[2]}, 
             {3.0181726, 0}, {0., Sin[1]}, {3.0191727, Sin[2]}, 
             {3.0181726, 0}, {0.0110007, Sin[1]}, {0., Sin[2]}, 
             {0., 0}, {0., Sin[1]}, {0., Sin[2]}, 
             {0., 0}, {0., Sin[1]}, {0., Sin[2]}, {0., 0}}}
*)

The function simply needs to be called using pm[f][x] instead of f[x]. Memoized values are associated with f as UpValues, so I thought automatic distribution of definitions should take care of both synchronizing memoized values and clearing them when necessary. Unfortunately this mechanism doesn't seem to be reliable (sometimes it works, sometimes it doesn't), so I provided a function clearParallelCache[f] that will clear all memoized values on all kernels.

Caching happens at two levels: on the main kernel level and on subkernels. Computed or main-kernel-cached values are copied to the subkernels as soon as possible. This is visible in the timings of the example above. Sometimes retrieving cached values takes 10 ms, but eventually it becomes very fast. Note that it might happen that the two kernels will each compute the same value (if they start computing it at the same time). This can sometimes be avoided by using a different setting for the Method option of Parallelize (depending on the structure of the input data).

For simplicity, I restricted pm to only work with functions that take a single numerical argument (and return anything). This is to avoid having to deal with more complex conditional definitions (especially cases when the function won't evaluate for certain argument types). It could safely be changed to accept e.g. a vector or matrix of values instead.


The code

pm[f_Symbol][x_?NumericQ] :=
 With[{result = memo[f, x]},
  If[result === Null,
   With[{value = valueOnMain[f, x]},
    If[value === Null,
     f /: memo[f, x] = setValueOnMain[f, x, f[x]],
     f /: memo[f, x] = value]
    ],
   result]
  ]

memo[___] := Null
DistributeDefinitions[memo];

valueOnMain[f_, x_] := memo[f, x]

setValueOnMain[f_, x_, fx_] := f /: memo[f, x] = fx

SetSharedFunction[valueOnMain, setValueOnMain]

clearParallelCache[f_Symbol] := 
  (UpValues[f] = {}; ParallelEvaluate[UpValues[f] = {}];)

The problem with a normal shared memoized function definition

f[x_] := f[x] = (Pause[3]; N[Sin[x]])

is indeed that any evaluation of f[n] on a parallel subkernel causes the rhs f[x] = (Pause[3]; N[Sin[x]]) to be sent back and evaluated on the master kernel.

There is an undocumented feature that if such a definition is made on a subkernel, the rhs will also be flagged for evaluation on the subkernel. So you could do instead

SetSharedFunction[f]; (* no definitions on master *)
ParallelEvaluate[f[x_] := f[x] = (Pause[3]; N[Sin[x]]), First[Kernels[]]]

If you look at the result, you see

?f
f[x_]:=Parallel`Developer`SendBack[f[x]=(Pause[3];N[Sin[x]])]

Knowing this you can get the desired effect with a definition on the master kernel:

g[x_] := Parallel`Developer`SendBack[g[x] = (Pause[3]; N[Sin[x]])]
SetSharedFunction[g]

and then

In[9]:= AbsoluteTiming[ParallelTable[g[Mod[i, 4]], {i, 40}]]

Out[9]= {3.13635, {0.841471, 0.909297, 0.14112, 0., 0.841471, 
  0.909297, 0.14112, 0., 0.841471, 0.909297, 0.14112, 0., 0.841471, 
  0.909297, 0.14112, 0., 0.841471, 0.909297, 0.14112, 0., 0.841471, 
  0.909297, 0.14112, 0., 0.841471, 0.909297, 0.14112, 0., 0.841471, 
  0.909297, 0.14112, 0., 0.841471, 0.909297, 0.14112, 0., 0.841471, 
  0.909297, 0.14112, 0.}}

?g
g[0]=0.

g[1]=0.841471

g[2]=0.909297

g[3]=0.14112

g[x_]:=Parallel`Developer`SendBack[g[x]=(Pause[3];N[Sin[x]])]

Each evaluation of g[n] on the subkernel will then first send g[n] to the master. If there is already a specific value for that n it is sent back; otherwise the expression g[x] = (Pause[3]; N[Sin[x]]) is sent back. The subkernel then evaluates its rhs and makes the definition g[x]=val, which expression is now sent back to the master where it is performed, for all kernels to see.


I realize this is an old question, but I recently had the same issue and have come across (link to google groups question) what I think is a cleaner solution. I don't want to take credit for coming up with that solution, but I thought it would be helpful to add it to this site.

I'll use a simple example function to demonstrate.

 f[x_] := f[x] = x
 ParallelDo[f[i], {i, 1, 3}]

The problem is that, while f[1], f[2], and f[3] have now been computed, they are stored on the subkernels and not on the main one. But we can access the values using ParallelEvaluate:

 DownValues[f] = DeleteDuplicates[Flatten[ParallelEvaluate[DownValues[f]]]]

DeleteDuplicates is needed because each subkernel includes the original function definition as a down-value.