Improving the prime sieve code

Improving the prime sieve code#

An inefficiency with our code is that we compute the prime numbers using two data structures. This wastes memory, but also means we need to check if a values already exists in crossed_out twice. The second check is needed to make sure we aren’t needlessly appending an already crossed out prime number.

The function prime_sieve2 delivers the same results as prime_sieve, but more efficiently. The primary source of this saving is by designing our code so that we required only one list.

def prime_sieve2(n):
    # We will use boolean's here instead of ints
    candidates = [True] * (n + 1)
    candidates[0] = candidates[1] = False
    limit = int(math.sqrt(n)) + 1    
    
    for i in range(2, limit): 
        if candidates[i]:
            candidates[i+i::i] = [False] * ((n - i) // i)
            
    return [i for i in range(n+1) if candidates[i]]        

print(len(prime_sieve2(100)) == n_primes[100])
print(len(prime_sieve2(1000)) == n_primes[1000])

True
True

%timeit prime_sieve2(1000)

64 µs ± 1.01 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Does the time saving matter?#

You should see a substantial reduction in computational overhead of prime_sieve2. We are now finding the primes under 1000 in microseconds (µs) as opposed milliseconds in our naive implementation (1 µs = 1000ms!).

Some of you might be thinking “both procedures give the same results and execution takes under a second. So does this type of optimisation matter?” To answer that question let’s compute primes under 10,000. You should now see a much bigger difference with our naive function returning in around a second while prime_sieve2 is still in µs (on my machine still below half a millsecond).

When first writing this section I first tested prime_sieve with n=1_000_000. Needless to say, I quickly decided to use an example of 10,000 instead! When you fancy a cup of tea or coffee set n=1_000_000 and come back after your break to see an enourmous difference in performance!

TEN_THOUSAND = 10_000
%timeit len(prime_sieve(TEN_THOUSAND))

2.62 s ± 46.9 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit len(prime_sieve2(TEN_THOUSAND))

609 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Taking a look at the redesign#

The redesigned implementation only has single list candidates: its contains only bool values. In python that won’t make much, if any, difference to memory usage, as bool is really a special type of int. But its a nice simplification as we the index of the list item gives us our factor. This modification provides our biggest speed up. Each time we ‘cross out’ a number e.g. i we simply set candidate[i] = False. (We don’t bother checking as we are not appending to a second list). At the end of the function any items that are still True are the primes.

Don’t believe me about int and bool? Try executing (True + 4).

The second change might be considered a ‘micro optimisation’. I.e. list slicing in favour of an inner for loop.

candidates[i+i::i] = [False] * ((n - i) // i)

Here we start at the current index + one factor ahead e.g. if i=2 then we start at index i+i=4 and select items in steps of i=2 all the way to the end candidates[i+i::i] (Note we could also have used candidates[i+i:n+1:i]). In general, a slice will be faster than looping over the list. Here’s a simpler example of list slicing with a step:

list_to_slice = [i for i in range(31)]
list_to_slice[4:len(list_to_slice):2]

[4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]

Improving the prime sieve code

Contents

Improving the prime sieve code#

Does the time saving matter?#

Taking a look at the redesign#