Computing prime numbers#

A prime number is an integer greater than 1 that is not a product of two smaller integer values. I.e it can only be divided by itself and one. As an example, 2 is a prime number, but 4 is not.

Computing prime numbers up to \(n\)#

A naive implementation of an algorithm to compute the prime numbers up to \(n\) is trial division where each number between 2 and \(n\) is checked for primality. This works, but it computationally demanding even for small \(n\).

We won’t go higher than \(n\) = 100 this here as its just too painfully slow

def primes_trial_division(n):
    '''
    Compute primes up to n using a trial
    division approach.
    '''
    primes = []
    for i in range(2, n):
        found = False
        for j in range(2, i):
            found = False
            if i % j == 0:
                found = True
                break
        if not found:
            primes.append(i)
    return primes
len(primes_trial_division(100))
25

A more efficient algorithm#

A more efficient algorithm design, at least for relatively smally \(n\) (e.g. < 1 million), is the Sieve of Eratosthenes. It works as follows:

Consider the primes up to 10. First write out a list of number 2 to 10. Here we’ll represent this as a python list called candidates. As we work out which numbers are not primes we will cross (sieve) them out. For simplicity of illustration we will store those in a list called crossed_out

candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = []

The first candidate number in the list is 2. The Sieve of Eratosthenes algorithm crosses out every 2nd number in the list after 2 by counting up from 2 in increments of 2:

candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10]

The next number in candidates after 2 that has not been crossed out is 3. We now cross out every 3rd number in the list by counting up from 3 increments of 3.

candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10, 9]

This is the basic process that is repeated in each iteration of the Sieve of Eratosthenes. What might not be obvious is that we do not need to traverse every item in candidates, but instead only iterate up to the \(\sqrt{n}\). At the end of the algorithm the numbers that have not been crossed out are the prime numbers between 2 and \(n\)

primes = [i for i in candidates if i not in crossed_off]  

Let’s implement the sieve in standard python. Our first implementation will be fairly naive. We will look at its inefficiencies and think through how we can improve on them. We’ll need some test data first. The dict n_primes holds the number of primes up to a value of n increasing by a factor of 10 each time. For example, there are 25 primes numbers in the first 100 natural numbers.

n_primes = {10 : 4,                 
            100 : 25,               
            1000 : 168,
            10000 : 1229,
            100000 : 9592,
            1000000 : 78498,
            10000000 : 664579}

A naive implementation#

The function below prime_sieve is a naive implementation of this algorithm. It makes use of two list data structures that we used above when working through the sieve’s logic.

  • candidates: a simple sequence of numbers 0 to n;

  • crossed_out a list of numbers sieved.

Although a naive implementation its not a bad first draft of a function: its readable and works.

The implementation and test below provides expected results and for it takes around 10ms (at least on my machine) to compute all the prime numbers up to 1000.

import math

def prime_sieve(n):
    '''
    Compute the prime numbers between 1 and n.
    Naive implementation of the Sieve of Eratosthenes.
    
    Parameters:
    ----------
    n: int
        The upper limit
    
    Returns:
    -------
    list
        a list of primes
    '''
    # a list of candidate numbers to sieve
    candidates = [i for i in range(n + 1)]
    # a list of numbered eliminated from consideration
    crossed_out = [0, 1]
    # maximum iterations required
    limit = int(math.sqrt(n)) + 1
    
    for i in range(2, limit):
         for j in range(i+i, n+1, i):              
                if not candidates[j] in crossed_out:
                    crossed_out.append(candidates[j])
    
    return [i for i in candidates if i not in crossed_out]   
prime_sieve(10)
[2, 3, 5, 7]
print(len(prime_sieve(100)) == n_primes[100])
print(len(prime_sieve(1_000)) == n_primes[1_000])
True
True
%timeit (prime_sieve(1_000))
20.8 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)