Finding all elements within a certain range in a sorted list
As you said, essentially you need binary search, since you have a sorted list and binary search has a logarithmic complexity. However, since
- the limiting numbers may not be present in the list
- some numbers may be present more than once
we'd need modified binary search. Here is a possible implementation:
(* maximum number smaller than or equal to the limit *)
bsearchMin[list_List, elem_] :=
Module[{n0 = 1, n1 = Length[list], m},
While[n0 <= n1,
m = Floor[(n0 + n1)/2];
If[list[[m]] == elem,
While[list[[m]] == elem, m++];
Return[m - 1]];
If[list[[m]] < elem, n0 = m + 1, n1 = m - 1]
];
If[list[[m]] < elem, m, m - 1]
];
and
(* minimum number larger than or equal to the limit *)
bsearchMax[list_List, elem_] :=
Module[{n0 = 1, n1 = Length[list], m},
While[n0 <= n1,
m = Floor[(n0 + n1)/2];
If[list[[m]] == elem,
While[list[[m]] == elem, m--];
Return[m + 1]];
If[list[[m]] < elem, n0 = m + 1, n1 = m - 1]
];
If[list[[m]] > elem, m, m + 1]
];
With the help of these:
window[list_, {xmin_, xmax_}] :=
With[{minpos = bsearchMax[list, xmin], maxpos = bsearchMin[list, xmax]},
Take[list, {minpos, maxpos}] /; ! MemberQ[{minpos, maxpos}, -1]
];
window[__] := {};
For example:
lst = {1, 4, 4, 4, 6, 7, 7, 11, 11, 11, 11, 13, 15, 18, 19, 22, 23, 25, 27, 30}
window[lst, {4, 11}]
(* ==> {4, 4, 4, 6, 7, 7, 11, 11, 11, 11} *)
You can Compile
functions bsearchMin
and bsearchMax
, if you expect lots of duplicate elements (this will speed an inner While
loop). Compiling them per se won't improve the speed much (unless you call these very often), since the complexity is logarithmic in any case.
This is certainly generally more efficient than Nearest
for sorted lists (perhaps unless you have lots of repeated elements, but then you can compile), because Nearest
is a general algorithm which can not take into account the sorted nature of the list. I tried on 10^7 elements list, and it took something 0.0003 seconds for that.
Compiled version
Compiled versions speed up bsearchMin
and bsearchMax
, but seem not to improve the performance of window[]
. See discussion in comments section.
bsearchMax = Compile[{{list, _Complex, 1}, {elem, _Real}},
Block[{n0 = 1, n1 = Length[list], m = 0},
While[n0 <= n1,
m = Floor[(n0 + n1)/2];
If[list[[m]] == elem,
While[m >= n0 && list[[m]] == elem, m--]; Return[m + 1] ];
If[list[[m]] < elem, n0 = m + 1, n1 = m - 1]];
If[list[[m]] > elem, m, m + 1]
]
,
RuntimeAttributes -> {Listable},
CompilationTarget -> "C"
]
bsearchMin = Compile[{{list, _Complex, 1}, {elem, _Real}},
Block[{n0=1,n1=Length[list],m = 0},
While[n0<=n1,
m=Floor[(n0+n1)/2];
If[list[[m]]==elem,
While[m<=n1 && list[[m]]==elem,m++]; Return[m-1] ];
If[list[[m]]<elem, n0=m+1, n1=m-1]];
If[list[[m]]<elem,m,m-1]
]
,
RuntimeAttributes -> {Listable},
CompilationTarget -> "C"
]
Using Pick
with Boole
selector
window[list_, {xmin_, xmax_}] :=
Pick[list, Boole[xmin <= # <= xmax] & /@ list, 1]
With
list = Sort@RandomReal[1, 1000000];
{min, max} = Sort@RandomReal[1, 2];
Timings:
Table[ClearSystemCache[];
Timing[window[list, {min, max}];], {50}] // Mean
(* ==> {0.0674, Null} *)
on a laptop with Vista 64bit OS, Intel Core2 Duo T9600 2.80GHz, 8G memory.
UPDATE: Using Pick
with alternative selector arrays:
UnitStep
windowUnitStep1[list_, {xmin_, xmax_}] :=
Pick[list, UnitStep[(list-xmin)(xmax-list)], 1]
or
windowUnitStep2[list_, {xmin_, xmax_}] :=
Pick[list, UnitStep[list-xmin]UnitStep[xmax-list], 1]
both are twice as fast as Boole
.
UnitStep
Compiled (Ruebenko's compiled function win
)
windowUnitStep3[list_, {xmin_, xmax_}] :=
Pick[list, win[list,xmin,xmax], 1]
is twice as fast as uncompiled UnitStep
.
Using GatherBy
with Boole
:
windowGatherBy[list_, {xmin_, xmax_}] := Last@GatherBy[list, Boole[xmin <= # <= xmax] &]
gives similar timings to window
.
Using SparseArray
with Part
or Take
:
The following alternatives attempt to take into account the fact that input data is sorted, thus the first and the last non-zero positions
in SparseArray[UnitStep[(list-min)(max-list)]]
give the first and the last positions of the portion of input list that satisfy the bounds.
windowSparseArray1[list_, xmin_, xmax_] :=
With[{fromTo = SparseArray[UnitStep[(list - xmin) (xmax - list)]][
"NonzeroPositions"][[{1, -1}]]},
list[[fromTo[[1, 1]] ;; fromTo[[2, 1]]]]]
or
windowSparseArray2[list_, xmin_, xmax_] :=
With[{fromTo = SparseArray[UnitStep[(list - xmin) (xmax - list)]][
"NonzeroPositions"][[{1, -1}]]},
Take[list, {fromTo[[1, 1]], fromTo[[2, 1]]}]]
both give rougly 50 percent speed improvement over window
above. Using Ruebenko's compiled UnitStep
to construct the array again doubles the speed:
windowSparseArray3[list_, xmin_, xmax_] :=
With[{fromTo = SparseArray[win[list,xmin,xmax]][
"NonzeroPositions"][[{1, -1}]]},
Take[list, {fromTo[[1, 1]], fromTo[[2, 1]]}]]
I think Nearest[]
is the most effective way. You don't even need to sort your data.
a = RandomReal[1, 100];
nf = Nearest@a;
xmin = 0.01; xmax = 0.6;
x0 = (xmin + xmax)/2; dx = (xmax - xmin)/2;
nf[x0, {\[Infinity], dx}] // Sort
{0.0117819, 0.013102, 0.0177269, 0.0356801, 0.040019, 0.0504563, \
0.0627056, 0.0749593, 0.0758206, 0.106541, 0.107941, 0.112281, \
0.117172, 0.132445, 0.143151, 0.157252, 0.166585, 0.179652, 0.217876, \
0.241301, 0.242821, 0.254276, 0.258477, 0.267544, 0.268951, 0.280489, \
0.290386, 0.305346, 0.315458, 0.318908, 0.337006, 0.338169, 0.339338, \
0.362153, 0.366946, 0.371712, 0.386563, 0.396061, 0.416329, 0.426874, \
0.430932, 0.439427, 0.460844, 0.473224, 0.475559, 0.476573, 0.479037, \
0.480472, 0.503684, 0.513969, 0.521916, 0.535221, 0.541562, 0.54198, \
0.554534, 0.558954, 0.563491, 0.565873, 0.582683, 0.58919, 0.592807, \
0.593541}
For array of 100 000 numbers it took 0.062 seconds on my machine. For million -- 0.688.