Array slicing and indexing

Array slicing and indexing#

Slicing and indexing are powerful ways to select and access elements within an array. The complexity of what you can achieve with numpy using only a small amount of code is quite remarkable. However, both approaches require careful study to avoid potential unexpected behaviour in your code (that’s a polite way of saying ‘bugs’). We will cover this behaviour in detail, but for now its enough to say that slices can be considered views of an array rather than seperate objects.

Slicing#

You can access subsets of arrays using slicing notation

array[start:end:step]
start is included and end is excluded [start, end)

Tip: if start or end are ommitted numpy uses the corresponding index for the start or end of the array

Reminder: Don’t forget that arrays are zero indexed.

Let’s start of simple with a couple of vector examples. We’ll gradually work our way up to higher dimensions (and headaches!).

Example 1#

Given the array [10, 11, 12, 13, 14, 15], select array elements 3 through 4

complete_vector = np.array([10, 11, 12, 13, 14, 15])

slice_vector = complete_vector[3:5]

print(f'complete vector: {complete_vector}')
print(f'slice of vector: {slice_vector}')

complete vector: [10 11 12 13 14 15]
slice of vector: [13 14]

Example 2#

Given [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Select the last four elements of the array
We can do this by ommitting the end parameter

complete_vector = np.arange(10)

slice_vector = complete_vector[-4:]

print(f'original vector: {complete_vector}')
print(f'slice of vector: {slice_vector}') 

original vector: [0 1 2 3 4 5 6 7 8 9]
slice of vector: [6 7 8 9]

Example 3#

Starting from the third element of the array [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
slice the array to return every other element

complete_vector = np.arange(10)

slice_vector = complete_vector[2::2]

print(f'original vector: {complete_vector}')
print(f'slice of vector: {slice_vector}') 

original vector: [0 1 2 3 4 5 6 7 8 9]
slice of vector: [2 4 6 8]

Example 4: A matrix#

original = np.array([[1, 2], [3, 4], [5, 6]])
print('Original matrix:')
print(original)

Original matrix:
[[1 2]
 [3 4]
 [5 6]]

middle_row = original[1,:]
bottom_row = original[2,:]
print(f'Middle row: {middle_row}')
print(f'\nBottom row: {bottom_row}')

Middle row: [3 4]

Bottom row: [5 6]

first_column = original[:,0]
second_column = original[:,1]
print(f'First column: {first_column}')
print(f'\nSecond columns: {second_column}')

First column: [1 3 5]

Second columns: [2 4 6]

Example 5: A three dimensional array#

If you haven’t worked extensively with arrays or lists before then it is usually 3 or more dimensions where the headaches start to kick in. Recall our three dimensional array example:

td = np.array([
              [[11,12], [13,14]], 
              [[21,22], [23,24]],
              [[31,32], [33,34]]
              ])

print(td.shape)

(3, 2, 2)

The best way to get to grips with slicing a 3D array is to look at the shape and think about how it all links together. The array td has a shape (3, 2, 2). I think of this as 3 rows, each of which contains 2 vectors of length 2. I’ve laid the code listing above out in this manner. It might help your understanding if you come back to this as we work through the array and slicing it. Let’s consider this statement dimension by dimension.

If we take element 0 of the first dimension we get

print(td[0])
print(td[0].shape)

[[11 12]
 [13 14]]
(2, 2)

That is element 0 contains a \(2 \times 2\) matrix or to put it another way it contains two vectors each of length 2. Now for the second dimension

print(td[0][0])

[11 12]

We are now accessing the first vector in this row. To access the second we would just use td[0][1]. Finally we can use our third dimension to select a scalar value. For example to access the 2nd value in the 1st vector of the 1st row use:

print(td[0][0][1])

If that is giving you a headache then I suggest you practice accessing different rows, vectors and scalar values in the array before proceeding!

Building a solid understanding and intution of 3 and 4 dimensional arrays is very useful for machine learning! Particularly for neural networks where you may find yourself battling to get your training data into the correct shape!

Going back to slicing we can understand the dimensions. Here’s a reminder fo the full array:

print(td)

[[[11 12]
  [13 14]]

 [[21 22]
  [23 24]]

 [[31 32]
  [33 34]]]

Task: slice td so that we have the vector [13, 23, 33] i.e. the [i, 1, 0] elements where i is the row. To do this we use the following numpy array slicing syntax

td[:, 1, 0]

array([13, 23, 33])

Note how there are three indicies in the notation td[i, j, k]. To get all of the rows we specified i as :. To select the 2nd array in each row we set j=1. At this stage we have selected [13 14], [23 24], and [33 34]. We then set k=0 to get the first element of these arrays.

Task: Modify this slice to select from only the first two rows.

We start with td[:, 1, 0] that we know gives us [13, 12, 33]. We selected all rows by setting i=:. Imagine for a moment that td is actually only a 1D array. What would you do to slice an array to get the first two elements only? You would use td[:2]. Therefore our 3D equivalent is:

td[:2, 1, 0]

array([13, 23])

Task: Slice td to return the second array in each row.

This is again a modification of our original code.

td[:, 1, 0]

Originally we were restricting our slice to the 1st element i.e. k=0 of each array returned. If we want all elements we replace the 0 with :

td[:, 1, :]

array([[13, 14],
       [23, 24],
       [33, 34]])

Getting to grips with multi-dimensional arrays takes time and practice. So do persevere.

Slices are views of memory#

If you have only ever coded in python before and have no experience of a lower level language like C++ or Rust then the behaviour of arrays and slices can catch you out. Let’s take a quick detour into standard python and look at updating a slicing of a python list.

original_list = [1, 2, 3, 4, 5]
print(f'original list before slice update: {original_list}')

# slice and update the 1st element
list_slice = original_list[1:3]
list_slice[0] = 999
print(f'slice: {list_slice}')
print(f'original list: {original_list}')

original list before slice update: [1, 2, 3, 4, 5]
slice: [999, 3]
original list: [1, 2, 3, 4, 5]

Our standard python list slicing code created a copy of items at index 1 through 2. So when we updated list_slice it had no effect on original_list. Now let’s take a look at equivalent code in numpy

original_array = np.array([1, 2, 3, 4, 5])
print(f'original list before slice update: {original_array}')

# slice and update the 1st element
array_slice = original_array[1:3]
array_slice[0] = 999
print(f'slice: {array_slice}')
print(f'original list: {original_array}')

original list before slice update: [1 2 3 4 5]
slice: [999   3]
original list: [  1 999   3   4   5]

numpy arrays are efficient because numpy works with a known block of memory. A list slice is a view of that memory. When you update the slice you update the original data in the array. If you are not careful this can lead to unexpected silent bugs in your code. It is important to realise that this happens even when you pass an array or slice to a function.

def zero_zero_to_seven(input_array):
    input_array[0,0] = 7

test_array = np.ones((5,5))

zero_zero_to_seven(test_array[1:2,:])
zero_zero_to_seven(test_array[3:,3:])
zero_zero_to_seven(test_array)

print(test_array)

[[7. 1. 1. 1. 1.]
 [7. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 7. 1.]
 [1. 1. 1. 1. 1.]]

numpy behaves this way by design. It means that passing a slice is not expensive (as no memory is copied). If you want to avoid this behaviour then you can redefine an array as follows:

def array_times_seven(input_array):
    # redefine the array
    output_array = input_array * 7
    return output_array

test_array = np.ones((5, 5))
new_array = array_times_seven(test_array[3:,3:])
print(new_array)
print(test_array)

[[7. 7.]
 [7. 7.]]
[[1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]
 [1. 1. 1. 1. 1.]]

In this case new_array is indeed a new memory allocation seperate from test_array.

Fancy and Boolean indexing#

Beyond slicing, two more powerful ways to return a elements of an ndarry are fancy indexing and boolean indexing.

Fancy indexing#

Fancy indexing allows an array-like object (e.g. a list) to specify the elements of an array to slice. For example, if we needed the 2nd, 4th and 7th elements of a vector:

complete_vector = np.arange(start=10, stop=0, step=-1)
indexes = [2, 4, 7]
sub_vector = complete_vector[indexes]

print(f'original vector: {complete_vector}')
print(f'sub vector: {sub_vector}')

original vector: [10  9  8  7  6  5  4  3  2  1]
sub vector: [8 6 3]

The way to think about fancy indexing for matricies (and higher order dimensions) is to as a set of arrays specifying the row and column coordinates of elements. For example, given a \(3 \times 3\) matrix

\( A = \begin{bmatrix} 0 & 1 & 2 \\ 3 & 4 & 5 \\ 6 & 7 & 8 \\ \end{bmatrix} \)

array return the elements [0, 2], [1, 1] and [2, 2]

Our expected answer is [2, 4, 8]

complete_matrix = np.arange(9).reshape(-1, 3)

row = [0, 1, 2]
col = [2, 1, 2]

sub_matrix = complete_matrix[row, col] 

print(f'original matrix: \n{complete_matrix}')
print(f'sub matrix: \n{sub_matrix}')

original matrix: 
[[0 1 2]
 [3 4 5]
 [6 7 8]]
sub matrix: 
[2 4 8]

The coordinate approach can be mixed with other commands we have already seen. For example, the shorthand to index on all rows would use :

complete_matrix = np.arange(15).reshape(-1, 5)

# select the following columns
cols = [0, 1, 2, 4]

# no need for a list for all rows. Use : instead
sub_matrix = complete_matrix[:, cols] 

print(f'original matrix: \n{complete_matrix}')
print(f'\nsub matrix: \n{sub_matrix}')

original matrix: 
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

sub matrix: 
[[ 0  1  2  4]
 [ 5  6  7  9]
 [10 11 12 14]]

It noteworthy that the sub array returned by fancy indexing is its own array. Unlike slices it is not a view of the original array

#set all elements to 100.
sub_matrix[:] = 100
print(f'original matrix: \n{complete_matrix}')
print(f'\nsub matrix: \n{sub_matrix}')

original matrix: 
[[ 0  1  2  3  4]
 [ 5  6  7  8  9]
 [10 11 12 13 14]]

sub matrix: 
[[100 100 100 100]
 [100 100 100 100]
 [100 100 100 100]]

Boolean indexing#

In boolean indexing we provide a boolean mask for an array a of of length \(l\). The mask is a list or ndarray is also of length \(l\) and contains only True or False elements. The sub array returned only contains the elements from a that have a matching True in the mask array.

complete_vector = np.arange(5)
mask = [False, False, False, True, True]
sub_vector = complete_vector[mask]

print(f'original matrix: {complete_vector}')
print(f'sub vector: {sub_vector}')

original matrix: [0 1 2 3 4]
sub vector: [3 4]

We can generate an array of booleans using some conditional logic. For example by checking which elements are greater than a threshold.

THRESHOLD = 90
original_vector = np.array([99, 40, 55, 103, 92, 86])
original_vector >= THRESHOLD

array([ True, False, False,  True,  True, False])

It it then just a case of selecting the elements using boolean array.

mask = original_vector >= THRESHOLD
original_vector[mask]

array([ 99, 103,  92])

More often I’ve found that I need to get the index of the elements that contain a value less than, greater than or equal to a threshold of some kind. In numpy this can be achieved using np.where

np.where(original_vector >= THRESHOLD)

(array([0, 3, 4]),)

Summing up#

If you are aiming to get the maximum benefit from numpy then you will need to make extensive use of slicing and indexing. These example’s I’ve given here are the one that I’ve found most useful in my coding. One takeaway you should pay particularly close attention to is that a slice is a view of an array. Its the same data and updates to a slice are actually updating the original data. Fancy and boolean indexing on the otherhand define a new array. Data is copied and stored in a new location in memory.