Numpy: find first index of value fast
There is a feature request for this scheduled for Numpy 2.0.0: https://github.com/numpy/numpy/issues/2269
I've made a benchmark for several methods:
argwhere
nonzero
as in the question.tostring()
as in @Rob Reilink's answer- python loop
- Fortran loop
The Python and Fortran code are available. I skipped the unpromising ones like converting to a list.
The results on log scale. X-axis is the position of the needle (it takes longer to find if it's further down the array); last value is a needle that's not in the array. Y-axis is the time to find it.
The array had 1 million elements and tests were run 100 times. Results still fluctuate a bit, but the qualitative trend is clear: Python and f2py quit at the first element so they scale differently. Python gets too slow if the needle is not in the first 1%, whereas f2py
is fast (but you need to compile it).
To summarize, f2py is the fastest solution, especially if the needle appears fairly early.
It's not built in which is annoying, but it's really just 2 minutes of work. Add this to a file called search.f90
:
subroutine find_first(needle, haystack, haystack_length, index)
implicit none
integer, intent(in) :: needle
integer, intent(in) :: haystack_length
integer, intent(in), dimension(haystack_length) :: haystack
!f2py intent(inplace) haystack
integer, intent(out) :: index
integer :: k
index = -1
do k = 1, haystack_length
if (haystack(k)==needle) then
index = k - 1
exit
endif
enddo
end
If you're looking for something other than integer
, just change the type. Then compile using:
f2py -c -m search search.f90
after which you can do (from Python):
import search
print(search.find_first.__doc__)
a = search.find_first(your_int_needle, your_int_array)
Although it is way too late for you, but for future reference: Using numba (1) is the easiest way until numpy implements it. If you use anaconda python distribution it should already be installed. The code will be compiled so it will be fast.
@jit(nopython=True)
def find_first(item, vec):
"""return the index of the first occurence of item in vec"""
for i in xrange(len(vec)):
if item == vec[i]:
return i
return -1
and then:
>>> a = array([1,7,8,32])
>>> find_first(8,a)
2