# Computing prime numbers#

A prime number is an integer greater than 1 that is not a product of two smaller integer values. I.e it can only be divided by itself and one. As an example, 2 is a prime number, but 4 is not.

## Computing prime numbers up to \(n\)#

A naive implementation of an algorithm to compute the prime numbers up to \(n\) is **trial division** where each number between 2 and \(n\) is checked for primality. This works, but it computationally demanding even for small \(n\).

We won’t go higher than \(n\) = 100 this here as its just too painfully slow

```
def primes_trial_division(n):
'''
Compute primes up to n using a trial
division approach.
'''
primes = []
for i in range(2, n):
found = False
for j in range(2, i):
found = False
if i % j == 0:
found = True
break
if not found:
primes.append(i)
return primes
```

```
len(primes_trial_division(100))
```

```
25
```

# A more efficient algorithm#

A more efficient algorithm design, at least for relatively smally \(n\) (e.g. < 1 million), is the **Sieve of Eratosthenes**. It works as follows:

Consider the primes up to 10. First write out a list of number 2 to 10. Here we’ll represent this as a python list called `candidates`

. As we work out which numbers are not primes we will cross (sieve) them out. For simplicity of illustration we will store those in a list called `crossed_out`

```
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = []
```

The first candidate number in the list is 2. The Sieve of Eratosthenes algorithm crosses out every 2nd number in the list after 2 by counting up from 2 in increments of 2:

```
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10]
```

The next number in `candidates`

after 2 **that has not been crossed out** is 3. We now cross out every 3rd number in the list by counting up from 3 increments of 3.

```
candidates = [2, 3, 4, 5, 6, 7, 8, 9, 10]
crossed_out = [4, 6, 8, 10, 9]
```

This is the basic process that is repeated in each iteration of the Sieve of Eratosthenes. What might not be obvious is that we do not need to traverse every item in `candidates`

, but instead only iterate up to the \(\sqrt{n}\). At the end of the algorithm the numbers that have not been crossed out are the prime numbers between 2 and \(n\)

```
primes = [i for i in candidates if i not in crossed_off]
```

Let’s implement the sieve in standard python. Our first implementation will be fairly naive. We will look at its inefficiencies and think through how we can improve on them. We’ll need some test data first. The dict `n_primes`

holds the number of primes up to a value of n increasing by a factor of 10 each time. For example, there are 25 primes numbers in the first 100 natural numbers.

```
n_primes = {10 : 4,
100 : 25,
1000 : 168,
10000 : 1229,
100000 : 9592,
1000000 : 78498,
10000000 : 664579}
```

## A naive implementation#

The function below `prime_sieve`

is a naive implementation of this algorithm. It makes use of two list data structures that we used above when working through the sieve’s logic.

`candidates`

: a simple sequence of numbers 0 to`n`

;`crossed_out`

a list of numbers sieved.

Although a naive implementation its not a bad first draft of a function: its readable and works.

The implementation and test below provides expected results and for it takes around 10ms (at least on my machine) to compute all the prime numbers up to 1000.

```
import math
def prime_sieve(n):
'''
Compute the prime numbers between 1 and n.
Naive implementation of the Sieve of Eratosthenes.
Parameters:
----------
n: int
The upper limit
Returns:
-------
list
a list of primes
'''
# a list of candidate numbers to sieve
candidates = [i for i in range(n + 1)]
# a list of numbered eliminated from consideration
crossed_out = [0, 1]
# maximum iterations required
limit = int(math.sqrt(n)) + 1
for i in range(2, limit):
for j in range(i+i, n+1, i):
if not candidates[j] in crossed_out:
crossed_out.append(candidates[j])
return [i for i in candidates if i not in crossed_out]
```

```
prime_sieve(10)
```

```
[2, 3, 5, 7]
```

```
print(len(prime_sieve(100)) == n_primes[100])
print(len(prime_sieve(1_000)) == n_primes[1_000])
```

```
True
True
```

```
%timeit (prime_sieve(1_000))
```

```
20.8 ms ± 542 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
```