Find nearest indices for one array against all values in another array - Python / NumPy
Here's one vectorized approach with np.searchsorted
based on this post
-
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
Brief explanation :
Get the sorted indices for the left positions. We do this with -
np.searchsorted(arr1, arr2, side='left')
or justnp.searchsorted(arr1, arr2)
. Now,searchsorted
expects sorted array as the first input, so we need some preparatory work there.Compare the values at those left positions with the values at their immediate right positions
(left + 1)
and see which one is closest. We do this at the step that computesmask
.Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the
mask
values acting as the offsets being converted toints
.
Benchmarking
Original approach -
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
Timings and verification -
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+
speedup for the posted sample and hopefully more for larger datasets!
An answer that is much shorter than that of @Divakar, also using broadcasting and even slightly faster:
abs(myArray[:, None] - refArray[None, :]).argmin(axis=-1)