Introduction¶

import numpy as np

NumPy is a Python library that supplies two major features: better support for large multi-dimensional arrays and high-level mathematical functions (most of which operate on NumPy arrays). We’re going to focus mostly on how to use arrays in NumPy.

How are arrays different from what we’ve seen so far?

Native array type¶

Python doesn’t include an array type. Instead, we have to use lists, lists-of-lists, lists-of-lists-of-lists, etc. Although these can be used to represent multi-dimensional arrays, they can get pretty messy. For example, here's a matrix as a list-of-lists:

m1 = [[  0,   1,   4], [  9,  16,  25], [ 36,  49,  64], [ 81, 100, 121]]

Let's say I want to create a new matrix, which is just m1 with all its numbers multiplied by 2. Maybe this will work?

print(m1*2)

[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121], [0, 1, 4], [9, 16, 25], [36, 49, 64], [81, 100, 121]]

That just concatenated the list with itself! If I want to actually multiply each element by 2 (in place) I have to do this:

m2 = []
for i in range(len(m1)):
    l = []
    for j in range(len(m1[0])):
        l.append(m1[i][j]*2)
    m2.append(l)

print(m2)

[[0, 2, 8], [18, 32, 50], [72, 98, 128], [162, 200, 242]]

Or, if I want to be a bit fancier:

m2 = [ [x*2 for x in row] for row in m1]

print(m2)

[[0, 2, 8], [18, 32, 50], [72, 98, 128], [162, 200, 242]]

However, with NumPy arrays, this is much simpler:

ma1 = np.array(m1)
print(ma1*2)

[[  0   2   8]
 [ 18  32  50]
 [ 72  98 128]
 [162 200 242]]

Other operations are also much simpler:

print(ma1-1)

[[ -1   0   3]
 [  8  15  24]
 [ 35  48  63]
 [ 80  99 120]]

print(ma1**2)

[[    0     1    16]
 [   81   256   625]
 [ 1296  2401  4096]
 [ 6561 10000 14641]]

print(np.sin(ma1))

[[ 0.          0.84147098 -0.7568025 ]
 [ 0.41211849 -0.28790332 -0.13235175]
 [-0.99177885 -0.95375265  0.92002604]
 [-0.62988799 -0.50636564  0.99881522]]

Basically, NumPy allows us to work with arrays more "naturally"

However, NumPy arrays do have some restrictions that lists don’t: all elements must be of the same type and, once created, an array cannot grow in size.

Better Performance¶

NumPy’s implementation of arrays (and the operations on them) are very efficient (almost as good as writing them in C).

For example, let's create an array with 10,000,000 integers and a list with 10,000,000 integers.

X = np.arange(10000000)
L = range(10000000)

If we add up all the integers in each, the array is an order of magnitude faster:

timeit X.sum()

100 loops, best of 3: 5.6 ms per loop

timeit sum(L)

10 loops, best of 3: 84.6 ms per loop

Creating and accessing arrays¶

As we saw earlier, we can create an array simply calling array() with a list or list-of-lists as a parameter. We can use as many levels of list nesting as dimensions in the array.

# One-dimensional array
a1 = np.array([1,2,3])
print(a1)

[1 2 3]

# Two-dimensional array
a2 = np.array([ [1,2,3] , [4,5,6] ])
print(a2)

[[1 2 3]
 [4 5 6]]

# Three-dimensional array
a3 = np.array([ [ [1,2,3] , [4,5,6] ] , [ [10,20,30] , [40,50,60] ] ])
print(a3)

[[[ 1  2  3]
  [ 4  5  6]]

 [[10 20 30]
  [40 50 60]]]

Accessing individual elements is the done the same way as with lists

print(a1[2])

3

print(a2[1][2])

6

print(a3[1][1][2])

60

We can also create ranges of numbers. Numpy provides arange() and linspace(), which are more versatile than Python’s range().

arange creates ranges of numbers, including floats

print(np.arange(5))

[0 1 2 3 4]

print(np.arange(1, 3, 0.4))

[ 1.   1.4  1.8  2.2  2.6]

linspace is similar, but we specify the amount of numbers in the range, instead of the interval

print(np.linspace(0,100,7) )

[   0.           16.66666667   33.33333333   50.           66.66666667
   83.33333333  100.        ]

We can also take a range of numbers and create a multidimensional array of a given shape

print(np.arange(12).reshape(3,4))

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]

Basic indexing/slicing¶

Indexing arrays is similar to indexing lists. We just use the brackets operator. However, indexing multidimensional arrays is a bit simpler (no need for multiple brackets, just use a tuple). Additionally, we can specify slices in each dimension.

Let's take this array with 12 numbers

a = np.array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121])

Indexing and slicing is the same as with lists

print(a[4])

16

print(a[1:7])

[ 1  4  9 16 25 36]

print(a[3:10:2])

[ 9 25 49 81]

Indexing/slicing gets interesting when we have multi-dimensional arrays

b = np.array([[  0,   1,   4,   9],
              [ 16,  25,  36,  49],
              [ 64,  81, 100, 121],
              [144, 169, 196, 225],
              [256, 289, 324, 361],
              [400, 441, 484, 529]])

Instead of multiple brackets [x][y][z] we can just use a tuple

print(b[2,3])

121

print(b[1,1])

25

We can use slicing with rows...

print(b[1:4])

[[ 16  25  36  49]
 [ 64  81 100 121]
 [144 169 196 225]]

Or with as many dimensions as we want:

print(b[1:4,0:2])

[[ 16  25]
 [ 64  81]
 [144 169]]

print(b[1:4,1])

[ 25  81 169]

Note how this is actually the "second column of rows 1,2,3"

Operators¶

As we saw in the first example, we can apply operators like + to arrays. When the operands are an array and a number, the operator (with the number) is applied to every element of the array. Note that this also works with relational operators like >, <, ==, etc.

We can use arithmetic operators...

print(a1 + 2)

[3 4 5]

print(a1 * 2)

[2 4 6]

...and relational operators.

print(b>100)

[[False False False False]
 [False False False False]
 [False False False  True]
 [ True  True  True  True]
 [ True  True  True  True]
 [ True  True  True  True]]

We also use these operators on two arrays:

c = np.array([10,20,30])

print(a1 + c)

[11 22 33]

What do you think is the result of using the * operator?

d = np.array([[1, 2], [3, 4]])
e = np.array([[1, 0], [0, 1]])

print(d)

[[1 2]
 [3 4]]

print(e)

[[1 0]
 [0 1]]

print(d*e)

[[1 0]
 [0 4]]

It's element-wise multiplication, not matrix multiplication!

If we want to do a matrix product, you need to use the dot() method

print(np.dot(d,e))

[[1 2]
 [3 4]]

We can also apply NumPy math functions (again: elementwise) on the array

f = np.array([[1,-1],[np.pi,-np.pi]])
print(f)

[[ 1.         -1.        ]
 [ 3.14159265 -3.14159265]]

print(np.cos(f))

[[ 0.54030231  0.54030231]
 [-1.         -1.        ]]

Fancy Indexing¶

NumPy provides fancier ways of indexing that are more powerful than those for regular Python lists.

We can specify the indexes we want with a list

print( a[ [1,3,6] ] )

[ 1  9 36]

Do the same but, if we provide a multidimensional array, the values at those indices are returned in an array of that same size.

print( a[ np.array([ [1,3] , [10, 7] ]) ] )

[[  1   9]
 [100  49]]

Specify an array of booleans. This returns a flattened array (1-dimensional) with all the values in the array where the specified boolean array had a True value.

c = np.array([100,200,300])
print( c[np.array([True, False, True])] )

[100 300]

This is mostly useful in combination with relational operators

print( b[b>100] )

[121 144 169 196 225 256 289 324 361 400 441 484 529]

When we use this with assignments, it acts as a filter for what indices the assignment should apply to.

Set all elements greater than 100 to 0

b[b>100] = 0
print(b)

[[  0   1   4   9]
 [ 16  25  36  49]
 [ 64  81 100   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]]

b[b%2==1] = -1
print(b)

[[  0  -1   4  -1]
 [ 16  -1  36  -1]
 [ 64  -1 100   0]
 [  0   0   0   0]
 [  0   0   0   0]
 [  0   0   0   0]]

Reduction Methods¶

Reduction methods allow us to “reduce” an array to a single value. For example, the mean of all the values, the stdev of all the values, etc. We can also do this only for specific dimensions (e.g., the mean of each row, of each column, etc.)

# Let's reset b...
b = (np.arange(24)**2).reshape(6,4)
print(b)

[[  0   1   4   9]
 [ 16  25  36  49]
 [ 64  81 100 121]
 [144 169 196 225]
 [256 289 324 361]
 [400 441 484 529]]

Mean of all elements of b

b.mean()

180.16666666666666

Mean of the 0-th dimension (or "axis 0" in NumPy lingo) i.e. for each column, add up all the values and divide by the number of rows

print(b.mean(0))

[ 146.66666667  167.66666667  190.66666667  215.66666667]

Mean of the 1-th dimension (or "axis 1" in NumPy lingo) i.e. for each row, add up all the values and divide by the number of columns

print(b.mean(1))

[   3.5   31.5   91.5  183.5  307.5  463.5]

Standard deviation

b.std()

164.84883648023995

Bringing it all together¶

If we put all of the above together, we can do some pretty elaborate computations with arrays/matrices in just a few lines

# Normalize to mean 0, stdev 1
b2 = b - b.mean()
z = b2 / b2.std()

print(z)

[[-1.09292046 -1.0868543  -1.06865581 -1.03832499]
 [-0.99586185 -0.94126637 -0.87453858 -0.79567845]
 [-0.704686   -0.60156122 -0.48630411 -0.35891468]
 [-0.21939291 -0.06773883  0.09604759  0.27196633]
 [ 0.4600174   0.6602008   0.87251652  1.09696457]
 [ 1.33354495  1.58225765  1.84310269  2.11608005]]

All values one standard deviations above the mean:

print( b[b>(b.mean() + b.std())] )

[361 400 441 484 529]

Or:

print( b[b2/b2.std() >= 1] )

[361 400 441 484 529]

Filter out all outliers (more than one stdev away from the mean)

print( b[abs(b2/b2.std()) < 1] )

[ 16  25  36  49  64  81 100 121 144 169 196 225 256 289 324]

Linear Algebra¶

NumPy also includes a linear algebra library.

import numpy.linalg as la

X = np.array([ [1,3] , [2, 8] ])
print(X)

[[1 3]
 [2 8]]

Computing the determinant of a matrix:

# |X| = (1*8) - (3*2) = 2
la.det(X)

2.0

Computing the inverse of a matrix:

iX = la.inv(X)
print(iX)

[[ 4.  -1.5]
 [-1.   0.5]]

print(np.dot(X,iX))

[[ 1.  0.]
 [ 0.  1.]]

Solving a system of equations:

\begin{align} 1\cdot x_0 + 3\cdot x_1 & = 11 \\ 2\cdot x_0 + 8\cdot x_1 & = 28 \end{align}\begin{equation*} \mathbf{X} = \left( \begin{matrix} 1 & 3 \\ 2 & 8\end{matrix} \right) \;\;\;\; \mathbf{Y} = \left( \begin{matrix} 11 \\ 28\end{matrix} \right) \end{equation*}

The solution should be $x_0=2, x_1=3$

Y = np.array([ [11], [28] ])
print(Y)

[[11]
 [28]]

We can solve the system with matrix operations:

print(np.dot(iX,Y))

[[ 2.]
 [ 3.]]

Or using linalg.solve:

print(la.solve(X,Y))

[[ 2.]
 [ 3.]]

In this case, they both perform pretty much the same, but linalg.solve is preferable since it can detect when a system of linear equations can be solved more efficiently that by doing the above matrix operation.

What about matrices?¶

Numpy has a matrix type, which has the advantage that some of the operators "make more sense" with matrices. e.g., * will do matrix multiplication, not element-wise multiplication

d = np.matrix([[1, 2], [3, 4]])
e = np.matrix([[1, 0], [0, 1]])

print(d*e)

[[1 2]
 [3 4]]

However, most NumPy developers recommend using the more general array type (anything you can do with a matrix, you can do with an array; e.g., matrix multiplication is just the dot() method)