## Show code cell source

```
import numpy as np
```

# Vectors and matricies#

You can think of a `numpy.ndarray`

as n-dimentional (or multi-dimentional) array or if you are more mathematically inclined as **vector** or **matrix**.

A vector is simply a 1-dimensional array. In `numpy`

there are multiple ways to create one.

## Static creation#

To practice using `numpy.ndarray`

you can create one using hard coded values. The example below first creates an array using two parameters: a list of (float values) and expcitly declaring its data type as `np.float64`

. We can review the dimensions of the array using `.shape`

. This returns a tuple. In this case (3,) that tells us it has 1 dimension of length 3.

```
my_arr = np.array([88.9, 45.6, 20.4], dtype=np.float64)
print(my_arr)
print(my_arr.dtype)
print(my_arr.shape)
print(len(my_arr))
```

```
[88.9 45.6 20.4]
float64
(3,)
3
```

Note that the object passed to `np.array()`

is a `list`

. In general to cast from a `list`

to an `ndarray`

us simple.

```
lst = [88.9, 45.6, 20.4]
arr = np.array(lst)
print(arr)
```

```
[88.9 45.6 20.4]
```

Tip

In general its unlikely that you will do much hard coding of array values in real data science. I’ve occationally use it to help debug code; although I normally defer to `np.arange`

that we will see next.

## Dynamic creation#

Sequences of numbers can be created using `np.arange`

. This is particularly useful when testing code. The parameters work in a similar manner to `range`

i.e. **start, stop, step**.

```
my_arr = np.arange(10)
print(my_arr)
```

```
[0 1 2 3 4 5 6 7 8 9]
```

```
my_arr = np.arange(5, 10)
print(my_arr)
```

```
[5 6 7 8 9]
```

A use case I often encounter is the need to create an array of a fixed size that is empty. There are a number of ways to do this. Using `np.empty`

will just allocate memory and you will get whatever is already there. For example:

```
my_arr = np.empty(shape=8, dtype=np.int64)
print(my_arr)
```

```
[ 140419899542592 94636582418192 94636576010160
140419899542576 140419795699824 140419795727552
-2301443869655676957 140419879893104]
```

I occationally find ‘empty’ arrays confusing - particularly when debugging code, as it can reuse previously allocated memory that contains values used early in the algorithm or model I’m running. I find it easier (and less confusing) to create an array and fill it with a fixed known value. There are some easy efficient ways to do this as well.

Assume you need to create a vector of length 5 and it will hold positive numbers in the range -127 to 127 (signed 8 bit integers).

```
zeros = np.zeros(shape=5, dtype=np.int8)
ones = np.ones(shape=5, dtype=np.int8)
neg_ones = -np.ones(shape=5, dtype=np.int8)
neg_fives = np.full(shape=5, fill_value=-5, dtype=np.int8)
```

```
print(zeros)
print(ones)
print(neg_ones)
print(neg_fives)
```

```
[0 0 0 0 0]
[1 1 1 1 1]
[-1 -1 -1 -1 -1]
[-5 -5 -5 -5 -5]
```

## Loading data from file#

In many health data science applications, data will be held in an external file e.g a Comma Seperated Files (where data fields are delimited by a comma). `numpy`

has a several built in functions for loading this data. If you data contain no missing values the `loadtxt`

is very simple.

The file `minor_illness_ed_attends.csv`

contains the rate of attendance per 10,000 of population. The first row is a head and will be skipped on read in.

```
file_name = 'data/minor_illness_ed_attends.csv'
ed_data = np.loadtxt(file_name, skiprows=1, delimiter=',')
```

```
ed_data.shape
```

```
(74,)
```

There are 74 elements in our vector. The first 10 are.

```
ed_data[:10]
```

```
array([2.11927795, 3.49057545, 3.98922908, 2.36860477, 3.24124863,
2.8672584 , 3.11658522, 2.74259499, 3.61523885, 3.61523885])
```

### saving arrays to file#

I once set a piece of university MSc coursework where students were required to save the contents of an array to file. Shortly afterwards, I recieved a pretty extensive telling off from a student as “

I hadn’t taught them how to save arrays and a ‘friend’ had spent several hours attempting the task”. I felt about 5 inches tall after this and to avoid future pain for learners I now reveal the method I had inadvertently kept secret from my class. I believe this book is perhaps the only place in the universe where it is documented.

```
np.savetxt('my_array.csv', ed_data)
```

## Matricies#

Recall that a 1 dimentional array in `numpy`

is a vector. It is trivial to extend what we have learnt to a 2D matrix. Let’s start with a simple \(2 \times 2\) matrix \(A\).

\( A = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix}\)

To create the equivalent numpy 2D array:

```
a = np.array([[1, 2], [3, 4]])
print(a)
```

```
[[1 2]
[3 4]]
```

If we now inspect the `.shape`

property of the array we see that the dimensions are represented in a tuple of length 2.

```
print(a.shape)
```

```
(2, 2)
```

Note that `numpy`

defauted to `int64`

as the data type.

To be more explicit about type we can specify it.

```
a = np.array([[1, 2], [3, 4]], dtype=np.uint8)
print(a.dtype)
```

```
uint8
```

To access the element \(ij\) in a 2D matrix use `array[i, j]`

notation. For example, the element `i=1`

, `j=1`

contains the value 4.

The main thing to remember here is that like other collection types in python arrays are zero indexed. So the

`a[0, 0]`

would return 1 in our example.

```
a[1, 1]
```

```
4
```

The next section will explore how you can slice arrays and use advanced boolean and fancy indexing.