Why is my Sieve of Eratosthenes so slow?
The Sieve of Eratosthenes looks like this:
def sieve(n):
primality_flags = [True]*(n+1)
primality_flags[0] = primality_flags[1] = False
primes = []
for i, flag in enumerate(primality_flags):
if flag:
primes.append(i)
for j in xrange(2*i, n+1, i):
primality_flags[i] = False
return primes
It processes each number once when the outer loop reaches it, and once for every prime that divides it. About 1/2 the numbers are divisible by 2, about 1/3 are divisible by 3, and so on; asymptotically speaking, the average number of times each number will be processed is 1 + the sum of the reciprocals of the primes up to n. This sum is about log(log(n))
, so the sieve has asymptotic time complexity O(n*log(log(n)))
, assuming arithmetic is constant time. This is really good.
Your function doesn't do that. Your filter
goes over every element in numbers
, regardless of whether it's divisible by prime
. Each element is processed for every prime up until the first prime that divides it, and processing the prime p removes about 1/p of the elements of numbers
. Letting the sequence of primes be p[0], p[1], p[2], etc., and letting the sequence of sizes of numbers
be n[0], n[1], n[2], etc., we have the following approximate recurrence:
n[0] = upperBound - 1
n[1] = n[0] * (p[0]-1)/p[0]
n[2] = n[1] * (p[1]-1)/p[1]
...
n[k+1] = n[k] * (p[k]-1)/p[k]
and your algorithm takes time roughly proportional to the sum of the n
values up until numbers
is empty. I haven't analyzed the behavior of that series, but calculations show the growth is much worse than O(n*log(log(n)))
. (EDIT: An analysis I didn't come up with when composing this answer says it's O((n/log(n))^2).)
Running cProfile shows that most of the time is spent in the filter. Replacing the filter with a list comprehension speeds things up a by about a factor of 2.
numbers = [n for n in numbers if n%prime != 0]
But this doesn't really fix the main problem, which is that you are are recreating the list of numbers with each iteration, and that is slow. The faster implementations http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d just mark the non primes by replacing them with 0 or similar.