Is it possible to Parallelize Select?

Time increases enormously with ParallelMap, because

rlp = Partition[rl, 4];

produces 2500000 sublists of four elements each. Instead, use

rlp = Partition[rl, 2500000];

to produce four sublists of 2500000 elements each.

Also, note that the results of ParallelMap should be combined by

ParallelMap[Select[#, PrimeQ] &, rlp] // Flatten

to achieve the same answer as with Map. With these changes, the respective AbsoluteTimings are 4.38775 and 3.47231, which are comparable. The savings from parallel computation in this case are offset by the overhead of ParallelMap applied to Select. This can be seen more clearly by varying the size of rl; larger numbers of elements favor ParallelMap.

It is difficult to answer the second and third questions without more detail; e.g. what sorts of elements.

Addendum

A bit faster is

AbsoluteTiming[ParallelTable[Select[rlp[[i]], PrimeQ] , {i, 4}] // Flatten][[1]]
(* 2.71114 *)

because ParallelTable is particularly well optimized, in my experience. On the other hand, somewhat slower are the equivalent (in this case),

AbsoluteTiming[ParallelCombine[Select[#, PrimeQ] &, rl]][[1]]
(* 5.9146 *)
AbsoluteTiming[Parallelize[Select[rl, PrimeQ]]][[1]]
(* 5.83129 *)

I mention these cases, because they appear in the "Properties and Relations" section of the Parallelize documentation.


What causes the huge time increase? Passing huge lists between kernels?

This is exactly the reason in your particular example. In this case, the computation time is totally dominated by the time it takes to send the necessary data to each of the subkernels and get the results; the actual computation is rather trivial once there (PrimeQ is ridiculously fast).

In general, if you have a list of computations where the time needed to compute each item is small and the size of the data being sent to and received from the subkernels is large, parallelization probably won't be much help.

However, for cases where parallelization can provide benefits, wrapping Parallelize around Select will do the trick. Here's a modified version of your example that illustrates this:

slowPrimeQ = MatchQ[FactorInteger[#], {{_, 1}}] &;
rl = RandomInteger[2^63 - 1, {10^5}];

First[AbsoluteTiming[p1 = Select[rl, slowPrimeQ]]]
(*42.2424*)

First[AbsoluteTiming[p2 = Parallelize[Select[rl, slowPrimeQ]]]]
(*7.55142*)

p1 === p2
(*True*)

(this was on an 8 core machine)