# Regression using arrays#

```
import numpy as np
```

`numpy`

arrays are the fundamental building block of the Python SciPy stack.
Scientific computing in Python nearly always makes use of `numpy.ndarray`

at some level.

In this example we will load an array from file and conduct a simple linear regression. The method of Ordinary Least Squares is used to fit a linear model (think \(y = \beta_1 x + \beta_0 + \epsilon \) ).

The file `data/lysis.csv`

contains a monthly proportion of a hospital’s stroke patients that have been treated with a potentially life saving drug to remove a blood clot from their brain.

There are a total of 54 months in the dataset.

In month 42 the hospital introduced a new process for fast tracking patients to treatment. Here we will quantify the difference in treatment rates before and after the fast track intervention and to determine if the result is statistically significant.

To do this we will need to:

Import the OLS class from the

`statsmodels.api`

namespace.Read the

`lysis.csv`

file into a`numpy.ndarray`

variable (watching out for headers in the file)Create a numpy array that is the same size as the lysis array to represent our intervention variable. It should contain zero when it the month (index) is less than when the intervention was introduced (42) and 1 for all months after the intervention was introduced. The end array should look like

```
dummy == [0,0,0, ... ,1,1,1]
```

Once we have conducted the analysis we will print the results summary.

## Import `statsmodels`

#

The code will use the `OLS`

class that cab be found in `statsmodel.api`

```
# this contains the analysis function we will use
import statsmodels.api as sm
```

## Read in the data#

We will use the `genfromtxt`

function, but in this case you could equally use `loadtxt`

as there is no missing data.

```
lysis = np.genfromtxt('data/lysis.csv', skip_header=1)
print(lysis.shape)
# peek at first five data points.
print(lysis[:5])
```

```
(54,)
[0.01886793 0.03030303 0.01886793 0.01886793 0.06060606]
```

## Setup variables#

The intervention takes place in month 42. So we’ll include a dummy variable that contains 0 before this and 1 afterwards. The coefficient of the variable is the effect size of the intervention.

By default the `OLS`

does not include an intercept in the model. To add an intercept the `OLS`

class requires a constant term (a variable with no variation where all values are 1). We can use the `add_constant`

function to add the constant.

```
# create intervention variables
intervention_month = 42
intervention = np.zeros(lysis.shape[0], dtype=int)
intervention[intervention_month:] = 1
# add a constant term - exog is a matrix.
exog = sm.add_constant(intervention)
exog.shape
```

```
(54, 2)
```

```
# peek at both columns and 3 rows
exog[:3, :]
```

```
array([[1., 0.],
[1., 0.],
[1., 0.]])
```

## Regression code#

The next code block illustrates how to run the model. `statsmodels`

refers to the X and y variables as the exogenous and endogenous, respectively.

In the output take a look at the coefficients and CIs for the `const`

(the intercept) and ``x1` (the intervention variable). These can be interpreted as the mean treatment rate before the intervention and the mean increase in treatment rate due to the intervention.

```
# create an instance of the OLS class
model = sm.OLS(endog=lysis, exog=exog)
print(model)
```

```
<statsmodels.regression.linear_model.OLS object at 0x7fa39c8f27f0>
```

```
# fit the model
results = model.fit()
# show the results summary
print(results.summary())
```

```
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.538
Model: OLS Adj. R-squared: 0.529
Method: Least Squares F-statistic: 60.63
Date: Mon, 16 Oct 2023 Prob (F-statistic): 2.77e-10
Time: 14:47:23 Log-Likelihood: 106.92
No. Observations: 54 AIC: -209.8
Df Residuals: 52 BIC: -205.9
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0420 0.005 7.998 0.000 0.031 0.053
x1 0.0868 0.011 7.786 0.000 0.064 0.109
==============================================================================
Omnibus: 12.833 Durbin-Watson: 2.146
Prob(Omnibus): 0.002 Jarque-Bera (JB): 22.219
Skew: 0.678 Prob(JB): 1.50e-05
Kurtosis: 5.835 Cond. No. 2.55
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
```

# Check regression results#

```
mean_before = lysis[:intervention_month].mean()
mean_after = lysis[intervention_month:].mean()
diff = mean_after - mean_before
print(f'Before: {mean_before:.4f}')
print(f'After: {mean_after:.4f}')
print(f'Diff: {diff:.4f}')
```

```
Before: 0.0420
After: 0.1288
Diff: 0.0868
```