Quick multiple selections from a list
This seems to give a rather decent performance (final version with improvements by jVincent):
Clear[getSubset];
getSubset[input_List,sub_List]:=
Module[{inSubQ,sowMatches},
Scan[(inSubQ[#] := True)&,sub];
sowMatches[x_/;inSubQ@First@x] := Sow[x,First@x];
Apply[Sequence, Last@Reap[Scan[sowMatches, input], sub], {2}]
];
Benchmarks:
n = 10000;
biggerlist = Map[{#, FromCharacterCode[Mod[# - 1, 26] + 97], #*100} &, Range[n]];
unsortedbiglist = RandomSample[biggerlist, n];
unsortedsubset = RandomSample[Range[n], Round[n/10]];
Row[{First[Timing[selection1=Map[Cases[unsortedbiglist,{#,__}]&,unsortedsubset];]]," seconds"}]
(* 1.170008 seconds *)
(sel1 = getSubset[unsortedbiglist,unsortedsubset])//Short//Timing
(* {0.031,{{{8286,r,828600}},<<998>>,{{6420,x,642000}}}} *)
selection1===sel1
(* True *)
Timing[selection3 = Pick[unsortedbiglist, unsortedbiglist[[All, 1]],
Alternatives @@ unsortedsubset];]
(* {0.218401, Null} -- same as Cases[..., Alternatives@@ ..] *)
selection3 == selection2
(* True *)
Here is my take on Leonid's method. It's better because it's shorter and uses ~infix~. ;-)
(It's just a little bit faster, too: about 20% on his test.)
getSubset2[input_List, sub_List] := Module[{test},
(test@# = True) & ~Scan~ sub;
Apply[Sequence,
Reap[Cases[input, x:{y_?test, ___} :> x ~Sow~ y], sub][[2]],
{2}
]
]
getSubset2[Range@20 ~Partition~ 4, Prime ~Array~ 7]
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}}
Although slower I cannot pass by the more direct implementation without comment:
getSubset3[input_List, sub_List] :=
Last @ Reap[# ~Sow~ #[[1]] & ~Scan~ input, sub, Sequence @@ #2 &]
Also slower than getSubset2
but pleasingly clean, Association
can be nicely applied to this problem in the form of GroupBy
and Lookup
.
getSubset4[set_, sub_] := Lookup[set ~GroupBy~ First, sub, {}]
getSubset4[Range@20 ~Partition~ 4, Prime ~Array~ 7]
{{}, {}, {{5, 6, 7, 8}}, {}, {}, {{13, 14, 15, 16}}, {{17, 18, 19, 20}}}