Coding scientific functions

Coding scientific functions#

These exercises are designed to refresh and in some cases challenge your python knowlege and problem solving skills. They are a mix of classic scientific coding problems and coding interview type questions found on sites such as HackerRank.

Note that the solutions to these exercises are in standard Python. If you have knowledge of scientific python libraries such as numpy or scipy and are confident in using them to complete the exercises then that is fine. However, don’t feel you need to overcomplicate your code. There are seperate exercises for numpy.

Even if you are confident in your basic python skills try to work through all of the questions.

In general the complexity of questions increases as you progress through the questions.

Some extra challenges relevant to algorithms

Aim to produce efficient solutions to problems i.e. those that require a minimal amount of memory, computations and run time. I have provided example solutions, but maybe you can do better!
Time your solutions to identify if you have improved run time.
Keep your solutions readable. Write good quality human readable code and give your functions PEP8 docstrings.
Test your problems with more than one data input - this can help identify weaknesses and errors in your code.
Add some defensive error checking and/or exception handling for function inputs.

Imports#

Exercise 1#

(source: hackerrank.com)

Given five positive integers, find the minimum and maximum values that can be calculated by summing exactly four of the five integers. Print the respective minimum and maximum values.

Example:

arr = [9, 3, 5, 7, 1]

output:

16
24

# your code here ...
arr = [9, 3, 5, 7, 1]

# example solution
def sum_smallest_n(arr, n=4):
    '''
    Return the sum of the smallest n of an array
    
    Params:
    -------
    arr: list
        assumes list of numerical data 
    
    n: int, optional (default=4)
        Number of items to return 
        
    Returns:
    -------
    int
    '''
    return sum(sorted(arr)[:n])

def sum_largest_n(arr, n=4):
    '''
    Return the sum of the largest n of an array
    
    Params:
    -------
    arr: list
        assumes list of numerical data 
    
    n: int, optional (default=4)
        Number of items to return 
        
    Returns:
    -------
    int
    '''
    return sum(sorted(arr, reverse=True)[:n])

arr = [9, 3, 5, 7, 1]
print(sum_smallest_n(arr))
print(sum_largest_n(arr))

16
24

Exercise 2#

The inverse of a 2×2 matrix \(A = \begin{pmatrix} a & b\\ c & d \end{pmatrix}\) is given by

\[\begin{split}A^{-1} = \dfrac{1}{ad - bc} \begin{pmatrix} d & -b\\ -c & a \end{pmatrix}\end{split}\]

Task:

Code a function that, given a 2D array A = [[a,b],[c,d]], prints out the 2x2 inverse matrix.
Do not use the built in inversion functions.

Test data:

matrix = [[5.0, 2.0], [-7.0, -3.0]]
expected = [[3.0, 2.0], [-7.0, -5.0]] 

Hints:

How do I represent a matrix in standard python?
- For example the matrix \(A = \begin{pmatrix} 1 & 2\\ 3 & 4 \end{pmatrix}\) would be list of lists a = [[1, 2], [3, 4]]

Example solution#

There are several ways you might implement this equation in standard python.

def inverse_2x2_matrix(matrix):
    '''
    Calculate and return the inverse of a 2x2.
    Uses standard python.  Lowest overhead of all answers I can think of!
        
    Given matrix A = [[a, b], [d, c]]
    
    det A = ad - bc
 
    A^-1 = (1 / det A) * [[d, -b], [-c, a]]
    
    Params:
    -------
    matrix: array-like
        assumes 2x2 list i.e. [[a, b], [c, d]]
        
    Returns:
    ------
    list of lists
    '''
    a, b, c, d = matrix[0][0], matrix[0][1], matrix[1][0], matrix[1][1]
    det = (a * d) - (b * c)
    return [[(1 / det) * d, (1 / det) * -b], [(1 / det) * -c, (1 / det) * a]]


def inverse_2x2_matrix_map(matrix):
    '''
    Calculate and return the inverse of a 2x2.
    Uses standard python + lambda expression + map 
    
    Given matrix A [[a, b], [d, c]]
    
    det A = ad - bc
    inverse = (1 / det A) * [[d, -b], [-c, a]]
    
    Params:
    -------
    matrix: array-like
        assumes 2x2 matrix i.e. [[a, b], [c, d]]
        
    Returns:
    ------
    list of lists
    '''
    a, b, c, d = matrix[0][0], matrix[0][1], matrix[1][0], matrix[1][1]
    det = (a * d) - (b * c)
    return [list(map(lambda x: (1 / det) * x, row)) for row in [[d, -b], 
                                                                [-c, a]]]

#nice example as determinant = -1
matrix = [[5.0, 2.0], [-7.0, -3.0]]

inverse_2x2_matrix(matrix)

[[3.0, 2.0], [-7.0, -5.0]]

inverse_2x2_matrix_map(matrix)

[[3.0, 2.0], [-7.0, -5.0]]

Exercise 3#

A classic Markovian model for representing the steady state performance of queuing and service system is the M/M/1.

In the Markovian model “customers” arrive to the model following a Poisson process with rate parameter \(\lambda\) (e.g. an average of 5 arrivals per hour). The M/M/1 is a first come first served system with a single server. Service time is exponentially distributed with mean \(1/\mu\) i.e. customers are served at rate \(\mu\).

The equation for calculating the average waiting time to be served (\(W_q\)) in an M/M/1 queue is:

\[W_q = \dfrac{\lambda}{\mu\left(\mu-\lambda\right)}\]

While the probability that there are \(n\) customers in the system (\(P_n\)) is given by:

\[P_n = \left(1 - \dfrac{\lambda}{\mu}\right)\left(\dfrac{\lambda}{\mu}\right)^n\]

Example:

Given:

\[\lambda = 2 \textnormal{ per hour}\]

\[\mu = 3 \textnormal{ per hour}\]

Calculate \(W_q\) and \(P_0, P_1\)

\[W_q = \dfrac{2}{3\left(3 - 2\right)} = \dfrac{2}{3} \textnormal{ per hour}\]

\[P_0 = \left(1 - \dfrac{\lambda}{\mu}\right) = 1 - \dfrac{2}{3} = \dfrac{1}{3}\]

\[P_1 = \left(\dfrac{1}{3}\right)\left(\dfrac{2}{3}\right) = \dfrac{2}{9}\]

Task:

Code functions in python to calculate \(W_q\) and \(P_n\) for the M/M/1 queue model.

Test data:

arrival_rate = 2.0
service_rate = 3.0
expected_wq = 2/3
expected_p0 = 1/3
expected_p1 = 2/9
expected_p2 = 4/27

Hints:

Note that the word “lambda” is a special word in Python and should not be used as a variable name. When naming a variable options are to use _lambda or arrival_rate

Example solution#

def waiting_time(arrival_rate, service_rate):
    '''Average waiting time of a M/M/1 queue'''
    return arrival_rate / ((service_rate) * (service_rate - arrival_rate))

def prob_n_customers_in_system(n, arrival_rate, service_rate):
    '''Prob n customers are in an M/M/1 system'''
    rho = arrival_rate / service_rate
    p0 = 1.0 - rho
    return p0 * (rho ** n)

# answer should be 2/3 or 0.66 recurring.
arrival_rate = 2.0
service_rate = 3.0

waiting_time(arrival_rate, service_rate)

0.6666666666666666

# p0 should be 1/3 (0.3333...)
prob_n_customers_in_system(0, arrival_rate, service_rate)

0.33333333333333337

# p1 should be 2/9. (0.2222...)
prob_n_customers_in_system(1, arrival_rate, service_rate)

0.22222222222222224

# p1 should be 4/27. (0.148148...)
prob_n_customers_in_system(2, arrival_rate, service_rate)

0.14814814814814817

Exercise 4#

(source: hackerrank.com)

A left rotation operation on an array shifts each of the array’s elements unit to the left.

Example 1

# original list
to_roll = [1, 2, 3, 4, 5]

#number of rotations/rolls left
n = 2

#call the function that rotates the array.
roll_left(to_roll, n)

output:

[3, 4, 5, 1, 2]

Note that the lowest index item moves to the highest index in a rotation. This is called a circular array.

Example 2:

# original list
to_roll = [1, 2, 3, 4, 5]

#number of rotations/rolls left
n = 8

# call the function that rotates the array.
roll_left(to_roll, n)

output:

[4, 5, 1, 2, 3]

Task:

Given an array of integers and a number, perform left rotations on the array.
Return the updated array and print to screen.

Test data

# test 1
to_roll = [1, 2, 3, 4, 5]
n = 7 
expected = [3, 4, 5, 1, 2]

# test 2
to_roll = [41, 73, 89, 7, 10, 1, 59, 58, 84, 77, 77, 97, 58, 1, 86, 58, 26, 
           10, 86, 51]
n = 10
expected = [77, 97, 58, 1, 86, 58, 26, 10, 86, 51, 41, 73, 89, 7, 10, 1, 59, 
            58, 84, 77]

# your code here ...

Example solution#

def roll_left(to_roll, n):
    '''
    Circular rolling of a python list to the left using
    the modulo operator and list slicing.
    
    Returns a new list that has been rolled left.
    
    Params:
    ------
    to_roll: list
        The python list to roll
        
    n: int
        The number of rolls
        
    Returns:
    --------
    list
    
    Example usage:
    --------------
    ```python
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 1)
    [2, 3, 4, 5, 1]
    
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 3)
    [4, 5, 1, 2, 3]
    
    >>> to_roll = [1, 2, 3, 4, 5]
    >>> roll_left(to_rotate, 6)
    [2, 3, 4, 5, 1]
    ```
    '''
    split_index =  n % len(to_roll)
    return to_roll[split_index:] + to_roll[:split_index]

to_roll = [1, 2, 3, 4, 5]
roll_left(to_roll, 8)

[4, 5, 1, 2, 3]

to_roll = [41, 73, 89, 7, 10, 1, 59, 58, 84, 77, 77, 97, 58, 1, 86, 58, 26, 
           10, 86, 51]
roll_left(to_roll, 10)

[77, 97, 58, 1, 86, 58, 26, 10, 86, 51, 41, 73, 89, 7, 10, 1, 59, 58, 84, 77]

Exercise 5#

The Fibonacci numbers, denoted \(F_n\), form a sequence, called the Fibonacci sequence, such that each number is the sum of the two preceding ones, starting from 0 and 1. That is

\(F_{0}=0, F_{1}=1\)

and

\(F_{n}=F_{n-1}+F_{n-2}\) for \(n > 1\).

The beginning of the sequence is:

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, …

Task:

Code a function called fib(n).
The parameter n represents the Fibaonacci number that will be generated.
For example, the value when fib(50) returns 12_586_269_025

# your code here

Example solution#

For this exercise there are several options you can implement. You will get good performance from a basic for loop iteration, but you could also opt for recursive solution. Although the latter will need to be augmented by memoization for it to be efficient.

def fib_iter(n):
    '''
    Generate nth fibonacci number though iteration.
    
    Params:
    ------
    n: int
        The number in the sequence to generate.
        Assumes n > 1
        
    Returns:
    -------
    int
    '''
    last = 0
    nxt = 1
    
    for i in range(1, n):
        last, nxt = nxt, last+nxt
    return nxt

expected = 12_586_269_025
fib_iter(50)

12586269025

As an alternative you could implement a solution using recursion. Given the mathematical formula for the fibonnaci sequence you would forgiven for thinking this is the obvious way to solve it. However, note that with recursion we end up duplicating computational effort. I.e. we repeated resolve the same problem. We can manage this using memoization where we cache fibonnaci numbers already solved (in a dict for fast lookup). The solution below implements this using a custom python decorator, but you could also use functools.lru_cache

def cache(func):
    '''
    Memoization Decorator.  
    Creates a cache (lookup) of the historical calls to @func.
    
    Returns @func wrapped in with a memory to avoid recalling func for 
    cached values.
    
    Params:
    ------
    func: object
        Python function to decorate
        
    Returns:
    -------
    function: cache decorator
    '''
    history = {}
    def cache_decorator(*args):
        try:
            return history[args]
        except KeyError:
            val = func(*args)
            history[args] = val
            return val
    return cache_decorator

@cache
def fib(n):
    '''
    Calculates the nth fibinaci number using recursion.
    
    Note this function is extremely slow for n > 35 unless
    memoization is used.
    
    Params:
    ------
    n: int
        Assumes >= 1
    
    Returns:
    -------
    int
    '''
    if n < 2:
        return n
    else:
        return fib(n-1) + fib(n-2)

expected = 12586269025
fib(50)

12586269025

Exercise 6#

(source hackerrank.com)

Given a 6x6 matrix \(A\)

An hourglass is a subset of values with indices falling in this pattern in \(A\)’s graphical representation:

a b c
  d 
e f g

In total, there are 16 hourglasses in \(A\). An hourglass sum is the sum of all the values it contains.

For example the first and second hour glasses in \(A\) are therefore.

1 1    1 1 0
      0
1 1    1 1 0

subset 1 total = 1+1+1+1+1+1+1 = 7

subset 2 total = 1+1+0+0+1+1+0 = 4

Task:

Code a function that accepts an matrix as a parameter (you can assume it is always 6x6)
The function must calculate all hourglass sums and return the maximum (int).

Test data

# expected answer = 19
matrix = [[1, 1, 1, 0, 0, 0],
          [0, 1, 0, 0, 0, 0],
          [1, 1, 1, 0, 0, 0],
          [0, 0, 2, 4, 4, 0],
          [0, 0, 0, 2, 0, 0],
          [0, 0, 1, 2, 4, 0]]


# expected answer = 13
matrix2 = [[1, 1, 1, 0, 0, 0],
           [0, 1, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 0],
           [0, 9, 2, -4, -4, 0],
           [0, 0, 0, -2, 0, 0],
           [0, 0, -1, -2, -4, 0]]

# your code here ...

Example solution#

matrix = [[1, 1, 1, 0, 0, 0],
          [0, 1, 0, 0, 0, 0],
          [1, 1, 1, 0, 0, 0],
          [0, 0, 2, 4, 4, 0],
          [0, 0, 0, 2, 0, 0],
          [0, 0, 1, 2, 4, 0]]

matrix2 = [[1, 1, 1, 0, 0, 0],
           [0, 1, 0, 0, 0, 0],
           [1, 1, 1, 0, 0, 0],
           [0, 9, 2, -4, -4, 0],
           [0, 0, 0, -2, 0, 0],
           [0, 0, -1, -2, -4, 0]]

def max_hourglass_sum(matrix):
    '''
    For 6 x 6 array calculates all the 3 x 3 hourglass sums
    and returns the maximum.
    
    E.g. Given the 6x6 matrix $A$
    
    1 1 1 0 0 0
    0 1 0 0 0 0
    1 1 1 0 0 0
    0 0 2 4 4 0
    0 0 0 2 0 0
    0 0 1 2 4 0
    
    The first and second hourglasses are
    
    1 1 1    1 1 0
      1        0
    1 1 1    1 1 0
    
    subset_1 total = 1+1+1+1+1+1+1 = 7
    subset_2 total = 1+1+0+0+1+1+0 = 4
       
    Params:
    ------
    matrix: list
        A 6 x 6 matrix implemented as a list of lists
        
    Returns:
    --------
    int
    
    Example usage:
    -------------
    ```python
    >>> matrix = [[1, 1, 1, 0, 0, 0],
    ...           [0, 1, 0, 0, 0, 0],
    ...           [1, 1, 1, 0, 0, 0],
    ...           [0, 0, 2, 4, 4, 0],
    ...           [0, 0, 0, 2, 0, 0],
    ...           [0, 0, 1, 2, 4, 0]]
    >>> max_hourglass_sum(matrix)
    19
    ```
    '''
    n_cols = len(matrix[0])
    n_rows = len(matrix)
    maximum = float('-inf')

    for row in range(n_rows-2):
        for col in range(n_cols - 2):
            
            # hourglass subset
            top = matrix[row][col:col+3]
            middle = [matrix[row+1][col+1]]
            bottom = matrix[row+2][col:col+3]
            
            subset_sum = sum(top + middle + bottom)
            if subset_sum > maximum:
                maximum = subset_sum

    return maximum

print(max_hourglass_sum(matrix))
print(max_hourglass_sum(matrix2))

19
13

Exercise 7#

The longest increasing subsequence problem is to find a subsequence of a given sequence in which the subsequence’s elements are in sorted order, lowest to highest, and in which the subsequence is as long as possible.

Examples

Given the vector [3, 1, 8, 2, 5] the longest increasing subsequence is [1, 2, 5] and has length 3.
Given the vector [5, 2, 8, 6, 3, 6, 9, 5] the longest increasing subsequence is [2, 3, 6, 9] and has length 4

Task

Code a function that accepts a sequence as parameter. The function should return the length of the longest increasing subsequence.

Hints:

You are being asked to return the length not the actual subsequence.

# your code here ...

Example solution#

def longest_inc_subseq(seq):
    '''
    Solve the longest increasing subsequence (lis) problem in a single pass and
    return its length.
    
    Uses a memory recording the longest increasing subsequence up to that point.
    This is a dynamic programming solution to the problem.
    
    A nice video on this is here: 
    https://www.youtube.com/watch?v=aPQY__2H3tE&t=162s
    
    Given a list of values A then the lis up to item n is   
    
    memory[n] = 1 + max([memory[k] for k in range(n) if A[k] < A[n]])
    
    Example:
    
    Given the sequence [3, 1, 8, 2, 5] then the iterations are:
   
    0: memory[0] = 1 
    1: memory[1] = 1 + max(0) = 1 + 0 = 1
    2: memory[2] = 1 + max(0, memory[0], memory[1]) = 1 + 1 = 2
    3. memory[3] = 1 + max(0, memory[1]) = 1 + 1 = 2
    4. memory[4] = 1 + max(0, memory[0], memory[1], memory[3]) = 1 + 2 = 3
   
    Returns:
    --------
    int
    '''
    # memory of lis up to that element in the list
    memory = [1] * len(seq)
    for i in range(1, len(memory)):
        # select the subproblems where value in seq < current value 
        subproblems = [memory[k] for k in range(i) if seq[k] < seq[i]]
        
        # save the lis up to this point - 0 added in as default case.
        memory[i] = 1 + max(subproblems + [0])
        
    return max(memory)

seqs = [[3, 1, 8, 2, 5], [5, 2, 8, 6, 3, 6, 9, 5]]
expected = [3, 4]
[longest_inc_subseq(a) for a in seqs]

[3, 4]

Exercise 8:#

(source: hackerrank.com)

A string is said to be a special string if either of two conditions is met:

All of the characters are the same, e.g. aaa.
All characters except the middle one are the same, e.g. aadaa.

A special substring is any substring of a string which meets one of those criteria. Given a string, determine how many special substrings can be formed from it.

Example

s = 'mnonopoo'

s contains the following 12 special substrings:

{'m', 'n', 'o', 'n', 'o', 'p', 'o', 'o', 'non', 'ono', 'opo', 'oo'}

Task:

Write a function called substr_count(s) that accepts a str parameter s and calculates the number of instances of a special string within it.
Use the example above and the test data below to test your function.
Extra Challenge: can you solve this problem efficiently with only one or two passes of s?

Test data

# expected answer = 7
# {'a', 's', 'a', 's', 'd', 'asa', 'sas'}
s = 'asasd'

# expected answer = 10
# {'a', 'b', 'c', 'b', 'a', 'b', 'a'. 'bcb', 'bab', 'aba'}
s = 'abcbaba'

# expected answer = 10
# {'a', 'a', 'a', 'a', 'aa', 'aa', 'aa', 'aaa', 'aaa', 'aaaa'}
s = 'aaaa'

# expected answer = 393074
# len(big_s) = 327308 (!)
f = open("big_special_str.txt", "r")
big_s = f.read()

I have provided the code below to download the instance of big s for you

RESPONSE_SUCCESS = 200
BIG_S_URL = 'https://raw.githubusercontent.com/health-data-science-OR/' \
            + 'hpdm139-datasets/main/big_special_str.txt'

def get_big_s():
    '''
    downloads large test problem
    
    Returns:
    -------
    str - if successful
    None - if unsuccessful
    '''
    response = requests.get(BIG_S_URL)

    if response.status_code == RESPONSE_SUCCESS:
        # return the string
        return response.text
    else:
        print('connection error for big s')
        return None

# test data
test_data = ['mnonopoo', 'asasd', 'abcbaba', 'aaaa', get_big_s()]
expected = [12, 7, 10, 10, 393074]

Example solution#

# example solution...
def compact_format(s):
    '''
    Converts a string into a compact list format 
    that includes the letter and an integer indicating
    the number of times is appears consecutively before
    a charactor change.
    
    Params:
    ------
    s: str
        A string to convert
        
    Returns:
    ------
    list
    
    Example usage:
    ------------
    
    ```python
    >>> s = 'asasd'
    >>> compact_format(s)
    [['a', 1], ['s', 1], ['a', 1], ['s', 1], ['d', 1]]
    
    >>> s = 'aaaa'
    >>> compact_format(s)
   [['a', 4]]
    ```
    
    '''
    current_char = s[0]
    count = 1
    compact = []

    for i in range(1, len(s)):
        if current_char == s[i]:
            count += 1
        else:
            compact.append([current_char, count])
            current_char = s[i]
            count = 1

    #final letter
    compact.append([current_char, count])
    return compact

def pairwise_comparisons(n):
    '''
    The number of all pairwise combinations that can be performed.
    '''
    return int(((n*(n-1))/2))

def substr_count(s):
    '''
    Count the special substring instances in the string
    
    e.g. 'aaaa' = {'a', 'a', 'a', 'a', 'aa', 'aa', 'aa', 'aaa', 'aaa', 'aaaa'}
    
    function returns = 10
    
    e.g. 'abcbaba' = {'a', 'b', 'c', 'b', 'a', 'b', 'a'. 'bcb', 'bab', 'aba'}

    function returns = 10
    
    Function performs a single pass of s and then a second pass of a smaller
    compact representation.  Technically this could all be achieved in a single
    pass.  This solution is slightly more readable at the loss of a bit of 
    efficiency.
    
    Params:
    -------
    s: str
        The string to parse.  
    
    Returns:
    --------
    int
    
    Example usage:
    ------------
    
    ```python
    >>> s = 'aaaa'
    >>> substr_count(s)
    10
    ```
    
    '''
    count = len(s)
     
    # pre-processing of string into compact format
    cs = compact_format(s)

    # loop comparison of s[1] to s[n-1]
    for i in range(1, len(cs)-1):
        
        count += pairwise_comparisons(cs[i][1])
        
        #only 1 middle char + same char in head & tail
        if cs[i][1] == 1 and cs[i-1][0] == cs[i+1][0] : 
            # e.g.1 cacc.  Therefore count + 1 (cac)
            # e.g.2 cccacc  add 2 (ccacc and cac)
            # e.g.3 ccccacc add 2 (ccacc and cac)
            # this is the minimum of each of the two 
            count += min(cs[i-1][1], cs[i+1][1])
            
    # first and last missed in above loop
    count += pairwise_comparisons(cs[0][1])
    if(len(cs) > 1):
        count += pairwise_comparisons(cs[len(cs)-1][1])
        
    return count

# this is what compact format outputs
s = 'asasd'
compact_format(s)

[['a', 1], ['s', 1], ['a', 1], ['s', 1], ['d', 1]]

# test function
test_data = ['mnonopoo', 'asasd', 'abcbaba', 'aaaa', get_big_s()]
expected = [12, 7, 10, 10, 393074]
results = [substr_count(s) for s in test_data]

print(expected)
print(results)

[12, 7, 10, 10, 393074]
[12, 7, 10, 10, 393074]

Coding scientific functions

Contents

Coding scientific functions#

Imports#

Exercise 1#

Exercise 2#

Example solution#

Exercise 3#

Example solution#

Exercise 4#

Example solution#

Exercise 5#

Example solution#

Exercise 6#

Example solution#

Exercise 7#

Example solution#

Exercise 8:#

Example solution#