Efficiently check if an element occurs at least n times in a list
You could use the second argument of index
to find the subsequent indices of occurrences:
def check_list(l, x, n):
i = 0
try:
for _ in range(n):
i = l.index(x, i)+1
return True
except ValueError:
return False
print( check_list([1,3,2,3,4,0,8,3,7,3,1,1,0], 3, 4) )
About index
arguments
The official documentation does not mention in its Python Tutuorial, section 5 the method's second or third argument, but you can find it in the more comprehensive Python Standard Library, section 4.6:
s.index(x[, i[, j]])
index of the first occurrence of x in s (at or after index i and before index j) (8)(8)
index
raisesValueError
when x is not found in s. When supported, the additional arguments to the index method allow efficient searching of subsections of the sequence. Passing the extra arguments is roughly equivalent to usings[i:j].index(x)
, only without copying any data and with the returned index being relative to the start of the sequence rather than the start of the slice.
Performance Comparison
In comparing this list.index
method with the islice(gen)
method, the most important factor is the distance between the occurrences to be found. Once that distance is on average 13 or more, the list.index
has a better performance. For lower distances, the fastest method also depends on the number of occurrences to find. The more occurrences to find, the sooner the islice(gen)
method outperforms list.index
in terms of average distance: this gain fades out when the number of occurrences becomes really large.
The following graph draws the (approximate) border line, at which both methods perform equally well (the X-axis is logarithmic):
Instead of incurring extra overhead with the setup of a range
object and using all
which has to test the truthiness of each item, you could use itertools.islice
to advance the generator n
steps ahead, and then return the next item in the slice if the slice exists or a default False
if not:
from itertools import islice
def check_list(lst, x, n):
gen = (True for i in lst if i==x)
return next(islice(gen, n-1, None), False)
Note that like list.count
, itertools.islice
also runs at C speed. And this has the extra advantage of handling iterables that are not lists.
Some timing:
In [1]: from itertools import islice
In [2]: from random import randrange
In [3]: lst = [randrange(1,10) for i in range(100000)]
In [5]: %%timeit # using list.index
....: check_list(lst, 5, 1000)
....:
1000 loops, best of 3: 736 µs per loop
In [7]: %%timeit # islice
....: check_list(lst, 5, 1000)
....:
1000 loops, best of 3: 662 µs per loop
In [9]: %%timeit # using list.index
....: check_list(lst, 5, 10000)
....:
100 loops, best of 3: 7.6 ms per loop
In [11]: %%timeit # islice
....: check_list(lst, 5, 10000)
....:
100 loops, best of 3: 6.7 ms per loop