# Numpy¶

Numpy is a Python library that supports multi-dimensional arrays.

You need to import Numpy, before you can use it. It is traditional to give the library a shorter name using the import-as mechanism:

```
import numpy as np
```

Once this import is done, you can use to functions from the `numpy`

library using `np`

as the qualifier.

## Getting started¶

To get started, open up a terminal and navigate (`cd`

) to your
`cmsc12100-aut-17-username`

directory. Run ```
git pull upstream
master
```

to collect the lab materials and `git pull`

to sync with
your personal repository. The `lab6`

directory contains a file
named `lab6.py`

.

This file includes a function, `read_file`

, that takes the name of a
CSV file as an argument and returns a list of the column names and a
two dimensional array of data, and a call to the function that loads
the training data from the city dataset for PA #5.

Fire up `ipython3`

and run `lab6.py`

to get started. This run will
print out some output which you can ignore for now.

## One-dimensional arrays in numpy¶

We’ll start by looking at one-dimensional arrays in Numpy. Unlike
Python lists, all of the values in a Numpy array must have the same
type. We can create a one-dimensional numpy array from a list using
the function `np.array`

. For example,

```
In [10]: a1 = np.array([10, 20, 30, 40])
In [11]: a1
Out[11]: array([10, 20, 30, 40])
```

We can compute length and shape of of this array as follows:

```
In [12]: len(a1)
Out[12]: 4
In [13]: a1.shape
Out[13]: (4,)
```

And we can access/update the ith element of the array using the [] notation:

```
In [14]: a1[0]
Out[14]: 10
In [15]: a1[2]
Out[15]: 30
In [17]: a1[2] = 50
In [18]: a1
Out[18]: array([10, 20, 50, 40])
```

Operations on numpy arrays are element-wise. For example, the expression:

```
In [23]: a1*2
Out[23]: array([ 20, 40, 100, 80])
```

yields a new numpy array where the ith element of the result is equal
to the ith element of `a1`

times 2. Note that a given operator
(e.g., `*`

) can have a different meaning depending on the data type to which
is it applied. For example, try making `a1`

a list, rather than a Numpy array,
and repeat the same operation.

Similarly,

```
In [25]: a1
Out[25]: array([10, 20, 50, 40])
In [26]: a2 = np.array([100, 200, 300, 400])
In [27]: a1+a2
Out[27]: array([110, 220, 350, 440])
```

yields a new array where the ith element is the sum of the ith
elements of `a1`

and `a2`

. Again, the plus operator has a very
different meaning for lists. Try applying the `+`

operator to two
lists to compare what happens.

Numpy also provides useful methods for operating on arrays, such as
`sum`

and `mean`

:

```
In [28]: a1.sum()
Out[28]: 120
In [29]: a1.mean()
Out[29]: 30.0
```

which add up the values in the array and compute its mean respectively. These operations can also be written using notation that looks more like a function call:

```
In [32]: np.mean(a1)
Out[32]: 30.0
In [33]: np.sum(a1)
Out[33]: 120
```

**Task 1:** Write a function:

```
def var(y):
```

that computes the variance of `y`

, where `y`

is a numpy array. We
will define variance to be:

where \(\bar y\) denotes the mean of all *y*’s. Your solution
should **not** include an explicit loop.

And then run it on `graffiti`

, which contains the graffiti column
from the city data set and `garbage`

which contains the garbage
column from the city data set . Here’s the output of our
implementation:

```
GRAFFITI 409854.475818
GARBAGE 3159.33473311
```

## Two-dimensional arrays¶

One-dimensional arrays are useful, but the real power of numpy becomes more apparent when working with data that looks more like a matrix. For example, here’s a matrix represented using a list-of-lists:

```
m = [[0, 1, 4, 9],
[16, 25, 36, 49],
[64, 81, 100, 121],
[144, 169, 196, 225],
[256, 289, 324, 361],
[400, 441, 484, 529]]
```

We can convert this data into a two-dimensional array as follows:

```
In [34]: b = np.array(m)
```

where the value of `b`

will be:

```
In [34]: b
array([[ 0, 1, 4, 9],
[ 16, 25, 36, 49],
[ 64, 81, 100, 121],
[144, 169, 196, 225],
[256, 289, 324, 361],
[400, 441, 484, 529]])
```

Accessing elements of a 2D numpy array can be done using the same
syntax as a 2D list, that is, the expression `b[i][j]`

will yield
the jth element of the ith row of `b`

. More conveniently, you can
use a tuple to access the elements of a numpy array. That is, the
expression `b[i, j]`

will also yield the jth element of the ith row
of `b`

.

Numpy arrays also support slicing and other more advanced forms of
indexing. For example, the expression `b[1:4]`

will yield:

```
In [35]: b[1:4]
array([[ 16, 25, 36, 49],
[ 64, 81, 100, 121],
[144, 169, 196, 225]])
```

rows 1, 2, and 3 from `b`

. The expression, `b[1:4, 2:4]`

will
yield columns 2 and 3 from rows 1, 2, and 3 of `b`

:

```
In [36]: b[1:4, 2:4]
array([[ 36, 49],
[100, 121],
[196, 225]])
```

As with slicing and lists, a colon (`:`

) can be used to indicate
that you wish to include all the indices in a particular dimension.
For example, `b[:,2:4]`

will yield a slice of `b`

with columns 2
and 3 from all the rows. Recall that slice excludes the endpoint.

In addition to slicing, you can also specifies a list of indices as an
index. For example, the expression: `b[:, [1,3]]`

will yield
columns 1 and 3 from `b`

:

```
In [37]: b[:, [1,3]]
array([[ 1, 9],
[ 25, 49],
[ 81, 121],
[169, 225],
[289, 361],
[441, 529]])
```

One thing to keep in mind with Numpy arrays is that you will lose a dimension if you specify a single column or row as an index. For example, notice that the results of the following two expressions are both one-dimensional arrays:

```
In [38]: b[1,:]
Out[38]: array([16, 25, 36, 49])
In [39]: b[:,1]
Out[39]: array([ 1, 25, 81, 169, 289, 441])
```

If you wish to retain the dimension, you can use list indexing:

```
In [40]: b[:,[1]]
Out[40]:
array([[ 1],
[ 25],
[ 81],
[169],
[289],
[441]])
In [41]: b[[1], :]
Out[41]: array([[16, 25, 36, 49]])
```

**Task 2:** Write expressions to extract the following subarrays of `b`

,
which is defined for you in `lab6.py`

:

- rows 0, 1, and 2.
- rows 0, 1, and 5
- columns 0, 1, and 2
- columns 0, 1, and 3
- columns 0, 1, and 2 from rows 2 and 3.

**Task 3:** We have imported the `linear_regression`

function from PA #5
in `lab6.py`

. Write code to call `linear_regression`

using
columns 2 (RODENTS) and 3 (GARBAGE) as the value for `X`

and column
7 (CRIME_TOTALS) as the value for Y. This function expects a
two-dimensional numpy array for the value of `X`

and a
one-dimensional numpy array for the value of `Y`

.

Hint: you can do this task in a single line of code.

The result should be:

```
array([ 66.60834501 0.58072845 16.82863941])
```

**Task 4:** Write code to call `linear_regression`

using column 0
(GRAFFITI) as the value for `X`

and column 7 (CRIME_TOTALS) as the
value for Y.

The result should be:

```
array([ 593.77754996 0.76632378])
```

## Other useful operations¶

You can find the number of dimensions, shape, and the number of
elements in a numpy array using the `ndim`

, `shape`

and `size`

properties respectively.

```
In [42]: b.ndim
Out[42]: 2
In [43]: b.shape
Out[43]: (6, 4)
In [44]: b.size
Out[44]: 24
```

As noted above, you can compute the mean of the elements using the
`mean`

method.

```
In [54]: b.mean()
Out[54]: 180.16666666666666
```

You can also compute the per-column mean and the per-row mean using
the mean method by specifying an *axis*, where 0 is the column axis
and 1 is the row axis:

```
In [55]: b.mean(0)
Out[55]: array([ 146.66666667, 167.66666667, 190.66666667, 215.66666667])
In [56]: b.mean(1)
Out[56]: array([ 3.5, 31.5, 91.5, 183.5, 307.5, 463.5])
```

## When Finished¶

When finished with the lab please check in your work (assuming you are inside the lab directory):

```
git add lab6.py
git commit -m "Finished with lab6"
git push
```

No, we’re not grading this, we just want to look for common errors.