Vectors and matricies

Vectors and matricies#

You can think of a numpy.ndarray as n-dimentional (or multi-dimentional) array or if you are more mathematically inclined as vector or matrix.

A vector is simply a 1-dimensional array. In numpy there are multiple ways to create one.

Static creation#

To practice using numpy.ndarray you can create one using hard coded values. The example below first creates an array using two parameters: a list of (float values) and expcitly declaring its data type as np.float64. We can review the dimensions of the array using .shape. This returns a tuple. In this case (3,) that tells us it has 1 dimension of length 3.

my_arr = np.array([88.9, 45.6, 20.4], dtype=np.float64)
print(my_arr)
print(my_arr.dtype)
print(my_arr.shape)
print(len(my_arr))

[88.9 45.6 20.4]
float64
(3,)
3

Note that the object passed to np.array() is a list. In general to cast from a list to an ndarray us simple.

lst = [88.9, 45.6, 20.4]
arr = np.array(lst)
print(arr)

[88.9 45.6 20.4]

Tip

In general its unlikely that you will do much hard coding of array values in real data science. I’ve occationally use it to help debug code; although I normally defer to np.arange that we will see next.

Dynamic creation#

Sequences of numbers can be created using np.arange. This is particularly useful when testing code. The parameters work in a similar manner to range i.e. start, stop, step.

my_arr = np.arange(10)
print(my_arr)

[0 1 2 3 4 5 6 7 8 9]

my_arr = np.arange(5, 10)
print(my_arr)

[5 6 7 8 9]

A use case I often encounter is the need to create an array of a fixed size that is empty. There are a number of ways to do this. Using np.empty will just allocate memory and you will get whatever is already there. For example:

my_arr = np.empty(shape=8, dtype=np.int64)
print(my_arr)

[     140419899542592       94636582418192       94636576010160
      140419899542576      140419795699824      140419795727552
 -2301443869655676957      140419879893104]

I occationally find ‘empty’ arrays confusing - particularly when debugging code, as it can reuse previously allocated memory that contains values used early in the algorithm or model I’m running. I find it easier (and less confusing) to create an array and fill it with a fixed known value. There are some easy efficient ways to do this as well.

Assume you need to create a vector of length 5 and it will hold positive numbers in the range -127 to 127 (signed 8 bit integers).

zeros = np.zeros(shape=5, dtype=np.int8)
ones = np.ones(shape=5, dtype=np.int8)
neg_ones = -np.ones(shape=5, dtype=np.int8)
neg_fives = np.full(shape=5, fill_value=-5, dtype=np.int8)

print(zeros)
print(ones)
print(neg_ones)
print(neg_fives)

[0 0 0 0 0]
[1 1 1 1 1]
[-1 -1 -1 -1 -1]
[-5 -5 -5 -5 -5]

Loading data from file#

In many health data science applications, data will be held in an external file e.g a Comma Seperated Files (where data fields are delimited by a comma). numpy has a several built in functions for loading this data. If you data contain no missing values the loadtxt is very simple.

The file minor_illness_ed_attends.csv contains the rate of attendance per 10,000 of population. The first row is a head and will be skipped on read in.

file_name = 'data/minor_illness_ed_attends.csv'
ed_data = np.loadtxt(file_name, skiprows=1, delimiter=',')

ed_data.shape

(74,)

There are 74 elements in our vector. The first 10 are.

ed_data[:10]

array([2.11927795, 3.49057545, 3.98922908, 2.36860477, 3.24124863,
       2.8672584 , 3.11658522, 2.74259499, 3.61523885, 3.61523885])

saving arrays to file#

I once set a piece of university MSc coursework where students were required to save the contents of an array to file. Shortly afterwards, I recieved a pretty extensive telling off from a student as “I hadn’t taught them how to save arrays and a ‘friend’ had spent several hours attempting the task”. I felt about 5 inches tall after this and to avoid future pain for learners I now reveal the method I had inadvertently kept secret from my class. I believe this book is perhaps the only place in the universe where it is documented.

np.savetxt('my_array.csv', ed_data)

Matricies#

Recall that a 1 dimentional array in numpy is a vector. It is trivial to extend what we have learnt to a 2D matrix. Let’s start with a simple \(2 \times 2\) matrix \(A\).

\( A = \begin{bmatrix} 1 & 2\\ 3 & 4 \end{bmatrix}\)

To create the equivalent numpy 2D array:

a = np.array([[1, 2], [3, 4]])
print(a)

[[1 2]
 [3 4]]

If we now inspect the .shape property of the array we see that the dimensions are represented in a tuple of length 2.

print(a.shape)

(2, 2)

Note that numpy defauted to int64 as the data type.

To be more explicit about type we can specify it.

a = np.array([[1, 2], [3, 4]], dtype=np.uint8)
print(a.dtype)

uint8

To access the element \(ij\) in a 2D matrix use array[i, j] notation. For example, the element i=1, j=1 contains the value 4.

The main thing to remember here is that like other collection types in python arrays are zero indexed. So the a[0, 0] would return 1 in our example.

a[1, 1]

The next section will explore how you can slice arrays and use advanced boolean and fancy indexing.