Parallelize GroupBy
Parallelize the application of the selector function, then use Pick
to choose the elements for which the selector function returns True
or False
.
I made up a slow function f
and a "long list":
f = (Pause[0.1]; # < 50) &;
longlist = RandomReal[{0, 100}, {10}];
You can appreciate the difference in application time between serial and parallel execution:
AbsoluteTiming[Map[f, longlist];] (* {1.01678, Null} *)
AbsoluteTiming[result = ParallelMap[f, longlist];] (* {0.31652, Null} *)
You can then use Pick
to split your list:
true = Pick[longlist, result, True]
false = Pick[longlist, result, False]
You could also use Complement
to achieve the same, but it seems much slower than Pick
:
false = Complement[longlist, true]
If the list contains numbers, I suggest vectorization instead of parallelization. Try to write f
so that it is vectorizable.
My BoolEval package (partially available as a resource function) is also convenient.
For example, if f
is Sin
(which is vectorized) then you can do
arr = RandomReal[100, 100000000];
<< BoolEval`
mask = BoolEval[Sin[arr] > 0.5]; // RepeatedTiming
(* {1.5, Null} *)
Pick[arr, mask, 1] // Length (* elements satisfying the condition *)
(* 33521172 *)
Pick[arr, mask, 0] // Length (* elements NOT satisfying the condition *)
(* 66478828 *)
If you only want one set of elements, a shorthand is
BoolPick[arr, Sin[arr] > 0.5]
I expect that no Parallel*
function will be able to compete with this approach for as long as f
is vectorizable.