Find closest value in a vector with binary search
You can use data.table
to do a binary search:
dt = data.table(w, val = w) # you'll see why val is needed in a sec
setattr(dt, "sorted", "w") # let data.table know that w is sorted
Note that if the column w
isn't already sorted, then you'll have to use setkey(dt, w)
instead of setattr(.)
.
# binary search and "roll" to the nearest neighbour
dt[J(x), roll = "nearest"]
# w val
#1: 4.5 4
In the final expression the val
column will have the you're looking for.
# or to get the index as Josh points out
# (and then you don't need the val column):
dt[J(x), .I, roll = "nearest", by = .EACHI]
# w .I
#1: 4.5 3
# or to get the index alone
dt[J(x), roll = "nearest", which = TRUE]
#[1] 3
R>findInterval(4.5, c(1,2,4,5,6))
[1] 3
will do that with price-is-right matching (closest without going over).
x = 4.5
w = c(1,2,4,6,7)
closestLoc = which(min(abs(w-x)))
closestVal = w[which(min(abs(w-x)))]
# On my phone- please pardon typos
If your vector is lengthy, try a 2-step approach:
x = 4.5
w = c(1,2,4,6,7)
sdev = sapply(w,function(v,x) abs(v-x), x = x)
closestLoc = which(min(sdev))
for maddeningly long vectors (millions of rows!, warning- this will actually be slower for data which is not very, very, very large.)
require(doMC)
registerDoMC()
closestLoc = which(min(foreach(i = w) %dopar% {
abs(i-x)
}))
This example is just to give you a basic idea of leveraging parallel processing when you have huge data. Note, I do not recommend you use it for simple & fast functions like abs().
See match.closest()
from the MALDIquant package:
> library(MALDIquant)
> match.closest(x, w)
[1] 3