```
import numpy as np
```

# Array slicing and indexing#

Slicing and indexing are powerful ways to select and access elements within an array. The complexity of what you can achieve with `numpy`

using only a small amount of code is quite remarkable. However, both approaches require careful study to avoid potential unexpected behaviour in your code (that’s a polite way of saying ‘bugs’). We will cover this behaviour in detail, but for now its enough to say that slices can be considered **views** of an array rather than seperate objects.

## Slicing#

You can access subsets of arrays using **slicing** notation

`array[start:end:step]`

`start`

is included and`end`

is excluded`[start, end)`

Tip: if`start`

or`end`

areommittednumpy uses the corresponding index for the start or end of the array

Reminder: Don’t forget that arrays arezeroindexed.

Let’s start of simple with a couple of vector examples. We’ll gradually work our way up to higher dimensions (and headaches!).

### Example 1#

Given the array `[10, 11, 12, 13, 14, 15]`

, select array elements 3 through 4

```
complete_vector = np.array([10, 11, 12, 13, 14, 15])
```

```
slice_vector = complete_vector[3:5]
print(f'complete vector: {complete_vector}')
print(f'slice of vector: {slice_vector}')
```

```
complete vector: [10 11 12 13 14 15]
slice of vector: [13 14]
```

### Example 2#

Given

`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]`

Select the last four elements of the array

We can do this by ommitting the

`end`

parameter

```
complete_vector = np.arange(10)
```

```
slice_vector = complete_vector[-4:]
print(f'original vector: {complete_vector}')
print(f'slice of vector: {slice_vector}')
```

```
original vector: [0 1 2 3 4 5 6 7 8 9]
slice of vector: [6 7 8 9]
```

### Example 3#

Starting from the third element of the array

`[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]`

slice the array to return every other element

```
complete_vector = np.arange(10)
```

```
slice_vector = complete_vector[2::2]
print(f'original vector: {complete_vector}')
print(f'slice of vector: {slice_vector}')
```

```
original vector: [0 1 2 3 4 5 6 7 8 9]
slice of vector: [2 4 6 8]
```

### Example 4: A matrix#

```
original = np.array([[1, 2], [3, 4], [5, 6]])
print('Original matrix:')
print(original)
```

```
Original matrix:
[[1 2]
[3 4]
[5 6]]
```

```
middle_row = original[1,:]
bottom_row = original[2,:]
print(f'Middle row: {middle_row}')
print(f'\nBottom row: {bottom_row}')
```

```
Middle row: [3 4]
Bottom row: [5 6]
```

```
first_column = original[:,0]
second_column = original[:,1]
print(f'First column: {first_column}')
print(f'\nSecond columns: {second_column}')
```

```
First column: [1 3 5]
Second columns: [2 4 6]
```

### Example 5: A three dimensional array#

If you haven’t worked extensively with arrays or lists before then it is usually 3 or more dimensions where the headaches start to kick in. Recall our three dimensional array example:

```
td = np.array([
[[11,12], [13,14]],
[[21,22], [23,24]],
[[31,32], [33,34]]
])
print(td.shape)
```

```
(3, 2, 2)
```

The best way to get to grips with slicing a 3D array is to look at the shape and think about how it all links together. The array `td`

has a shape `(3, 2, 2)`

. I think of this as 3 rows, each of which contains 2 vectors of length 2. I’ve laid the code listing above out in this manner. It might help your understanding if you come back to this as we work through the array and slicing it. Let’s consider this statement dimension by dimension.

If we take element 0 of the first dimension we get

```
print(td[0])
print(td[0].shape)
```

```
[[11 12]
[13 14]]
(2, 2)
```

That is element 0 contains a \(2 \times 2\) matrix or to put it another way it contains two vectors each of length 2. Now for the second dimension

```
print(td[0][0])
```

```
[11 12]
```

We are now accessing the first vector in this row. To access the second we would just use `td[0][1]`

. Finally we can use our third dimension to select a scalar value. For example to access the 2nd value in the 1st vector of the 1st row use:

```
print(td[0][0][1])
```

```
12
```

If that is giving you a headache then I suggest you practice accessing different rows, vectors and scalar values in the array before proceeding!

Building a solid understanding and intution of 3 and 4 dimensional arrays is very useful for machine learning! Particularly for neural networks where you may find yourself battling to get your training data into the correct shape!

Going back to slicing we can understand the dimensions. Here’s a reminder fo the full array:

```
print(td)
```

```
[[[11 12]
[13 14]]
[[21 22]
[23 24]]
[[31 32]
[33 34]]]
```

**Task**: slice `td`

so that we have the vector `[13, 23, 33]`

i.e. the [i, 1, 0] elements where i is the row. To do this we use the following `numpy`

array slicing syntax

```
td[:, 1, 0]
```

```
array([13, 23, 33])
```

Note how there are three indicies in the notation `td[i, j, k]`

. To get all of the rows we specified `i`

as `:`

. To select the 2nd array in each row we set `j=1`

. At this stage we have selected `[13 14]`

, `[23 24]`

, and `[33 34]`

. We then set `k=0`

to get the first element of these arrays.

**Task:** Modify this slice to select from only the first two rows.

We start with `td[:, 1, 0]`

that we know gives us `[13, 12, 33]`

. We selected all rows by setting `i=:`

. Imagine for a moment that `td`

is actually only a 1D array. What would you do to slice an array to get the first two elements only? You would use `td[:2]`

. Therefore our 3D equivalent is:

```
td[:2, 1, 0]
```

```
array([13, 23])
```

**Task**: Slice `td`

to return the second array in each row.

This is again a modification of our original code.

```
td[:, 1, 0]
```

Originally we were restricting our slice to the 1st element i.e. `k=0`

of each array returned. If we want all elements we replace the `0`

with `:`

```
td[:, 1, :]
```

```
array([[13, 14],
[23, 24],
[33, 34]])
```

Getting to grips with multi-dimensional arrays takes time and practice. So do persevere.

### Slices are views of memory#

If you have only ever coded in python before and have no experience of a lower level language like C++ or Rust then the behaviour of arrays and slices can catch you out. Let’s take a quick detour into standard python and look at updating a slicing of a python list.

```
original_list = [1, 2, 3, 4, 5]
print(f'original list before slice update: {original_list}')
# slice and update the 1st element
list_slice = original_list[1:3]
list_slice[0] = 999
print(f'slice: {list_slice}')
print(f'original list: {original_list}')
```

```
original list before slice update: [1, 2, 3, 4, 5]
slice: [999, 3]
original list: [1, 2, 3, 4, 5]
```

Our standard python list slicing code created a copy of items at index 1 through 2. So when we updated `list_slice`

it had no effect on `original_list`

. Now let’s take a look at equivalent code in `numpy`

```
original_array = np.array([1, 2, 3, 4, 5])
print(f'original list before slice update: {original_array}')
# slice and update the 1st element
array_slice = original_array[1:3]
array_slice[0] = 999
print(f'slice: {array_slice}')
print(f'original list: {original_array}')
```

```
original list before slice update: [1 2 3 4 5]
slice: [999 3]
original list: [ 1 999 3 4 5]
```

`numpy`

arrays are efficient because `numpy`

works with a known block of memory. A list slice is a **view** of that memory. When you update the slice you update the **original** data in the array. If you are not careful this can lead to unexpected silent bugs in your code. It is important to realise that this happens even when you pass an array or slice to a function.

```
def zero_zero_to_seven(input_array):
input_array[0,0] = 7
```

```
test_array = np.ones((5,5))
zero_zero_to_seven(test_array[1:2,:])
zero_zero_to_seven(test_array[3:,3:])
zero_zero_to_seven(test_array)
print(test_array)
```

```
[[7. 1. 1. 1. 1.]
[7. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 7. 1.]
[1. 1. 1. 1. 1.]]
```

`numpy`

behaves this way by design. It means that passing a slice is not expensive (as no memory is copied). If you want to avoid this behaviour then you can **redefine** an array as follows:

```
def array_times_seven(input_array):
# redefine the array
output_array = input_array * 7
return output_array
```

```
test_array = np.ones((5, 5))
new_array = array_times_seven(test_array[3:,3:])
print(new_array)
print(test_array)
```

```
[[7. 7.]
[7. 7.]]
[[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]
[1. 1. 1. 1. 1.]]
```

In this case `new_array`

is indeed a new memory allocation seperate from `test_array`

.

## Fancy and Boolean indexing#

Beyond slicing, two more powerful ways to return a elements of an `ndarry`

are **fancy indexing** and **boolean indexing**.

### Fancy indexing#

Fancy indexing allows an array-like object (e.g. a list) to specify the elements of an array to slice. For example, if we needed the 2nd, 4th and 7th elements of a vector:

```
complete_vector = np.arange(start=10, stop=0, step=-1)
indexes = [2, 4, 7]
sub_vector = complete_vector[indexes]
print(f'original vector: {complete_vector}')
print(f'sub vector: {sub_vector}')
```

```
original vector: [10 9 8 7 6 5 4 3 2 1]
sub vector: [8 6 3]
```

The way to think about fancy indexing for matricies (and higher order dimensions) is to as a set of arrays specifying the row and column coordinates of elements. For example, given a \(3 \times 3\) matrix

\( A = \begin{bmatrix} 0 & 1 & 2 \\ 3 & 4 & 5 \\ 6 & 7 & 8 \\ \end{bmatrix} \)

array return the elements [0, 2], [1, 1] and [2, 2]

Our expected answer is [2, 4, 8]

```
complete_matrix = np.arange(9).reshape(-1, 3)
row = [0, 1, 2]
col = [2, 1, 2]
sub_matrix = complete_matrix[row, col]
print(f'original matrix: \n{complete_matrix}')
print(f'sub matrix: \n{sub_matrix}')
```

```
original matrix:
[[0 1 2]
[3 4 5]
[6 7 8]]
sub matrix:
[2 4 8]
```

The coordinate approach can be mixed with other commands we have already seen. For example, the shorthand to index on all rows would use `:`

```
complete_matrix = np.arange(15).reshape(-1, 5)
# select the following columns
cols = [0, 1, 2, 4]
# no need for a list for all rows. Use : instead
sub_matrix = complete_matrix[:, cols]
print(f'original matrix: \n{complete_matrix}')
print(f'\nsub matrix: \n{sub_matrix}')
```

```
original matrix:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
sub matrix:
[[ 0 1 2 4]
[ 5 6 7 9]
[10 11 12 14]]
```

It noteworthy that the sub array returned by fancy indexing is its own array. Unlike slices it is not a view of the original array

```
#set all elements to 100.
sub_matrix[:] = 100
print(f'original matrix: \n{complete_matrix}')
print(f'\nsub matrix: \n{sub_matrix}')
```

```
original matrix:
[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]
sub matrix:
[[100 100 100 100]
[100 100 100 100]
[100 100 100 100]]
```

### Boolean indexing#

In boolean indexing we provide a boolean mask for an array `a`

of of length \(l\). The mask is a `list`

or `ndarray`

is also of length \(l\) and contains only `True`

or `False`

elements. The sub array returned only contains the elements from `a`

that have a matching `True`

in the mask array.

```
complete_vector = np.arange(5)
mask = [False, False, False, True, True]
sub_vector = complete_vector[mask]
print(f'original matrix: {complete_vector}')
print(f'sub vector: {sub_vector}')
```

```
original matrix: [0 1 2 3 4]
sub vector: [3 4]
```

We can generate an array of booleans using some conditional logic. For example by checking which elements are greater than a threshold.

```
THRESHOLD = 90
original_vector = np.array([99, 40, 55, 103, 92, 86])
original_vector >= THRESHOLD
```

```
array([ True, False, False, True, True, False])
```

It it then just a case of selecting the elements using boolean array.

```
mask = original_vector >= THRESHOLD
original_vector[mask]
```

```
array([ 99, 103, 92])
```

More often I’ve found that I need to get the index of the elements that contain a value less than, greater than or equal to a threshold of some kind. In `numpy`

this can be achieved using `np.where`

```
np.where(original_vector >= THRESHOLD)
```

```
(array([0, 3, 4]),)
```

## Summing up#

If you are aiming to get the maximum benefit from `numpy`

then you will need to make extensive use of slicing and indexing. These example’s I’ve given here are the one that I’ve found most useful in my coding. One takeaway you should pay particularly close attention to is that a slice is a **view** of an array. Its the same data and updates to a slice are actually updating the original data. Fancy and boolean indexing on the otherhand define a new array. Data is copied and stored in a new location in memory.