Find the smallest positive number not in list

This makes use of the property of sets

>>> l = [1,2,3,5,7,8,12,14]
>>> m = range(1,len(l))
>>> min(set(m)-set(l))
4

I came up with several different ways:

Iterate the first number not in set

I didn't want to get the shortest code (which might be the set-difference trickery) but something that could have a good running time.

This might be one of the best proposed here, my tests show that it might be substantially faster - especially if the hole is in the beginning - than the set-difference approach:

from itertools import count, filterfalse # ifilterfalse on py2

A = [1,14,2,5,3,7,8,12]
print(next(filterfalse(set(A).__contains__, count(1))))

The array is turned into a set, whose __contains__(x) method corresponds to x in A. count(1) creates a counter that starts counting from 1 to infinity. Now, filterfalse consumes the numbers from the counter, until a number is found that is not in the set; when the first number is found that is not in the set it is yielded by next()

Timing for len(a) = 100000, randomized and the sought-after number is 8:

>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
0.9200698399945395
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
3.1420603669976117

Timing for len(a) = 100000, ordered and the first free is 100001

>>> timeit(lambda: next(filterfalse(set(a).__contains__, count(1))), number=100)
1.520096342996112
>>> timeit(lambda: min(set(range(1, len(a) + 2)) - set(a)), number=100)
1.987783643999137

(note that this is Python 3 and range is the py2 xrange)

Use heapq

The asymptotically good answer: heapq with enumerate

from heapq import heapify, heappop

heap = list(A)
heapify(heap)

from heapq import heapify, heappop
from functools import partial

# A = [1,2,3] also works
A = [1,14,2,5,3,7,8,12]

end = 2 ** 61      # these are different and neither of them can be the 
sentinel = 2 ** 62 # first gap (unless you have 2^64 bytes of memory).

heap = list(A)
heap.append(end)
heapify(heap)

print(next(n for n, v in enumerate(
     iter(partial(heappop, heap), sentinel), 1) if n != v))

Now, the one above could be the preferred solution if written in C, but heapq is written in Python and most probably slower than many other alternatives that mainly use C code.

Just sort and enumerate to find the first not matching

Or the simple answer with good constants for O(n lg n)

next(i for i, e in enumerate(sorted(A) + [ None ], 1) if i != e)

This might be fastest of all if the list is almost sorted because of how the Python Timsort works, but for randomized the set-difference and iterating the first not in set are faster.

The + [ None ] is necessary for the edge cases of there being no gaps (e.g. [1,2,3]).

Tags:

Python

List