How to speed up Sieve of Eratosthenes python list generator
I realized that there a lot of optimizations on SO, but they are rarely ever explained by others for the prime sieve algorithm, so it makes them difficult to approach by beginners or first time creators of the algorithm. All the solutions here are in python, to be on the same page for speed and optimizations. These solutions will progressively become faster and more complex. :)
Vanilla Solution
def primesVanilla(n):
r = [True] * n
r[0] = r[1] = False
for i in xrange(n):
if r[i]:
for j in xrange(i+i, n, i):
r[j] = False
return r
This is a very straightforward implementation of the Sieve. Please make sure you understand what is going on above before proceeding. The only slight thing to note is that you start marking not-primes at i+i instead of i, but this is rather obvious. (Since you assume that i itself is a prime). To make tests fair, all numbers will be for the list up to 25 million.
real 0m7.663s
user 0m7.624s
sys 0m0.036s
Minor Improvement 1 (Square roots):
I'll try to sort them in terms of straight-forward to less straight-forward changes. Observe that we do not need to iterate to n, but rather only need to go up to the square root of n. The reason being that any composite number under n, must have a prime factor under or equal to the square root of n. When you sieve by hand, you'll notice that all the "unsieved" numbers over the square root of n are by default primes.
Another remark is that you have to be a little bit careful for when the square root turns out to be an integer, so you should add one in this case so it covers it. IE, at n=49, you want to loop until 7 inclusive, or you might conclude that 49 is prime.
def primes1(n):
r = [True] * n
r[0] = r[1] = False
for i in xrange(int(n**0.5+1)):
if r[i]:
for j in xrange(i+i, n, i):
r[j] = False
return r
real 0m4.615s
user 0m4.572s
sys 0m0.040s
Note that it's quite a bit faster. When you think about it, you're looping only until the square root, so what would take 25 million top level iterations now is only 5000 top level.
Minor Improvement 2 (Skipping in inner loop):
Observe that in the inner loop, instead of starting from i+i, we can start from i*i. This follows from a very similar argument to the square root thing, but the big idea is that any composites between i and i*i have already been marked by smaller primes.
def primes2(n):
r = [True] * n
r[0] = r[1] = False
for i in xrange(int(n**0.5+1)):
if r[i]:
for j in xrange(i*i, n, i):
r[j]=False
return r
real 0m4.559s
user 0m4.500s
sys 0m0.056s
Well that's a bit disappointing. But hey, it's still faster.
Somewhat Major Improvement 3 (Even skips):
The idea here is that we can premark all the even indices, and then skip iterations by 2 in the main loop. After that we can start the outer loop at 3, and the inner loop can skip by 2*i instead. (Since going by i instead implies that it'll be even, (i+i) (i+i+i+i) etc.)
def primes3(n):
r = [True] * n
r[0] = r[1] = False
for i in xrange(4,n,2):
r[i] = False
for i in xrange(3, int(n**0.5+1), 2):
if r[i]:
for j in xrange(i*i, n, 2*i):
r[j] = False
return r
real 0m2.916s
user 0m2.872s
sys 0m0.040s
Cool Improvements 4 (wim's idea):
This solution is a rather advanced trick. Slice assignment is faster than looping, so this uses python's slice notation: r[begin:end:skip]
def primes4(n):
r = [True] * n
r[0] = r[1] = False
r[4::2] = [False] * len(r[4::2])
for i in xrange(3, int(1 + n**0.5), 2):
if r[i]:
r[i*i::2*i] = [False] * len(r[i*i::2*i])
return r
10 loops, best of 3: 1.1 sec per loop
Slight Improvement 5
Note that python reslices the r[4::2]
when it calculates the length, so this takes quite a bit of extra time since all we need for it is computing the length. We do use some nasty math to achieve this though.
def primes5(n):
r = [True] * n
r[0] = r[1] = False
r[4::2] = [False] * ((n+1)/2-2)
for i in xrange(3, int(1 + n**0.5), 2):
if r[i]:
r[i*i::2*i] = [False] * ((n+2*i-1-i*i)/(2*i))
return r
10 loops, best of 3: 767 msec per loop
Assignment speed-up (Padraic Cunningham):
Note that we assign an array with all True and then set half (the evens) to be False. We can actually just start with a boolean array that's alternating.
def primes6(n):
r = [False, True] * (n//2) + [True]
r[1], r[2] = False, True
for i in xrange(3, int(1 + n**0.5), 2):
if r[i]:
r[i*i::2*i] = [False] * ((n+2*i-1-i*i)/(2*i))
return r
10 loops, best of 3: 717 msec per loop
Don't quote me on this, but I think without some nasty math methods, there is no obvious improvements to this last version. One cute property that I tried, but did not turn out to be any faster is noting that primes other than 2,3 must be of the form 6k+1 or 6k-1. (Note that if it's 6k, then divisible by 6, 6k+2 | 2, 6k+3 | 3, 6k+ 4 | 2, 6k+5 is congruent to -1 mod 6. This suggests that we can skip by 6 each time and check both sides. Either from a poor implementation on my side, or python internals, I was unable to find any meaningful speed increase. :(
The first thing I saw is the way you generate the initial list (looping and appending) is inefficient and unnecessary. You can just add lists instead of looping and appending per-element.
The second thing I saw is that the type-checking you are doing is unnecessary, that function call is expensive and you can refactor to avoid that completely.
Finally, I think the "big thing" you can get in any sieve implementation is taking advantage of a slice assignment. You should cross out all the factors in one hit instead of looping. Example:
from math import sqrt
def primes(n):
r = [True] * n
r[0] = r[1] = False
r[4::2] = [False] * len(r[4::2])
for i in xrange(int(1 + sqrt(n))):
if r[i]:
r[3*i::2*i] = [False] * len(r[3*i::2*i])
return r
Note I also have a couple of other tricks:
- avoid half of the work by crossing out even numbers immediately.
- only iterating up to sqrt of the length is necessary
On my crappy underpowered macbook this code can generate the 1,000,001 list in about 75 milliseconds:
>>> timeit primes(1000001)
10 loops, best of 3: 75.4 ms per loop
Some timings show in python2 and 3 wim's approach is significantly faster, it can be slightly optimised further by how the list is created:
def primes_wim_opt(n):
r = [False, True] * (n // 2)
r[0] = r[1] = False
r[2] = True
for i in xrange(int(1 + n ** .5)):
if r[i]:
r[3*i::2*i] = [False] * len(r[3*i::2*i])
return r
Python2 timings:
In [9]: timeit primesVanilla(100000)
10 loops, best of 3: 25.7 ms per loop
In [10]: timeit primes_wim(100000)
100 loops, best of 3: 3.59 ms per loop
In [11]: timeit primes1(100000)
100 loops, best of 3: 14.8 ms per loop
In [12]: timeit primes_wim_opt(100000)
100 loops, best of 3: 2.18 ms per loop
In [13]: timeit primes2(100000)
100 loops, best of 3: 14.7 ms per loop
In [14]: primes_wim(100000) == primes_wim_opt(100000) == primes(100000) == primesVanilla(100000) == primes2(100000)
Out[14]: True
Timings for python3 where using the same functions just changing to range:
In [76]: timeit primesVanilla(100000)
10 loops, best of 3: 22.3 ms per loop
In [77]: timeit primes_wim(100000)
100 loops, best of 3: 2.92 ms per loop
In [78]: timeit primes1(100000)
100 loops, best of 3: 10.9 ms per loop
In [79]: timeit primes_wim_opt(100000)
1000 loops, best of 3: 1.88 ms per loop
In [80]: timeit primes2(100000)
100 loops, best of 3: 10.3 ms per loop
In [81]: primes_wim(100000) == primes_wim_opt(100000) == primes(100000) == primesVanilla(100000) == primes2(100000)
Out[80]: True
it can be optimised further by instead using the len of range/xrange instead of slicing:
def primes_wim_opt(n):
is_odd = n % 2 & 1
r = [False, True] * (n // 2 + is_odd)
r[0] = r[1] = False
r[2] = True
for i in range(int(1 + n ** .5)):
if r[i]:
r[3*i::2*i] = [False] * len(range(3*i,len(r), 2 * i))
return r
Python3 it knocks a good chunk off:
In [16]: timeit primes_wim_opt_2(100000)
1000 loops, best of 3: 1.38 ms per loop
And the same for python2 using xrange:
In [10]: timeit primes_wim_opt_2(100000)
1000 loops, best of 3: 1.60 ms per loop
Using (((n - 3 * i) // (2 * i)) + 1)
should also work:
def primes_wim_opt_2(n):
is_odd = n % 2 & 1
r = [False, True] * ((n // 2) + is_odd)
r[0] = r[1] = False
r[2] = True
for i in range(int(1 + n ** .5)):
if r[i]:
r[3*i::2*i] = [False] * (((n - 3 * i) // (2 * i)) + 1)
return r
Which is very slightly faster:
In [12]: timeit primes_wim_opt_2(100000)
1000 loops, best of 3: 1.32 ms per loop
In [6]: timeit primes5(100000)
100 loops, best of 3: 2.47 ms per loop
You can also start at 3 and step 2:
def primes_wim_opt_2(n):
r = [False, True] * (n // 2)
r[0] = r[1] = False
r[2] = True
for i in range(3, int(1 + n ** .5),2):
if r[i]:
r[3*i::2*i] = [False] * (((n - 3 * i) // (2 * i)) + 1)
return r
Which is faster again:
In [2]: timeit primes_wim_opt_2(100000)
1000 loops, best of 3: 1.10 ms per loop
Python2:
In [2]: timeit primes_wim_opt_2(100000)
1000 loops, best of 3: 1.29 ms per loop