import numpy as np
NumPy is a Python library that supplies two major features: better support for large multi-dimensional arrays and high-level mathematical functions (most of which operate on NumPy arrays). We’re going to focus mostly on how to use arrays in NumPy.
How are arrays different from what we’ve seen so far?
Python doesn’t include an array type. Instead, we have to use lists, lists-of-lists, lists-of-lists-of-lists, etc. Although these can be used to represent multi-dimensional arrays, they can get pretty messy. For example, here's a matrix as a list-of-lists:
m1 = [[ 0, 1, 4], [ 9, 16, 25], [ 36, 49, 64], [ 81, 100, 121]]
Let's say I want to create a new matrix, which is just m1
with all its numbers multiplied by 2. Maybe this will work?
print(m1*2)
That just concatenated the list with itself! If I want to actually multiply each element by 2 (in place) I have to do this:
m2 = []
for i in range(len(m1)):
l = []
for j in range(len(m1[0])):
l.append(m1[i][j]*2)
m2.append(l)
print(m2)
Or, if I want to be a bit fancier:
m2 = [ [x*2 for x in row] for row in m1]
print(m2)
However, with NumPy arrays, this is much simpler:
ma1 = np.array(m1)
print(ma1*2)
Other operations are also much simpler:
print(ma1-1)
print(ma1**2)
print(np.sin(ma1))
Basically, NumPy allows us to work with arrays more "naturally"
However, NumPy arrays do have some restrictions that lists don’t: all elements must be of the same type and, once created, an array cannot grow in size.
NumPy’s implementation of arrays (and the operations on them) are very efficient (almost as good as writing them in C).
For example, let's create an array with 10,000,000 integers and a list with 10,000,000 integers.
X = np.arange(10000000)
L = range(10000000)
If we add up all the integers in each, the array is an order of magnitude faster:
timeit X.sum()
timeit sum(L)
As we saw earlier, we can create an array simply calling array()
with a list or list-of-lists as a parameter. We can use as many levels of list nesting as dimensions in the array.
# One-dimensional array
a1 = np.array([1,2,3])
print(a1)
# Two-dimensional array
a2 = np.array([ [1,2,3] , [4,5,6] ])
print(a2)
# Three-dimensional array
a3 = np.array([ [ [1,2,3] , [4,5,6] ] , [ [10,20,30] , [40,50,60] ] ])
print(a3)
Accessing individual elements is the done the same way as with lists
print(a1[2])
print(a2[1][2])
print(a3[1][1][2])
We can also create ranges of numbers. Numpy provides arange()
and linspace()
, which are more versatile than Python’s range().
arange
creates ranges of numbers, including floats
print(np.arange(5))
print(np.arange(1, 3, 0.4))
linspace
is similar, but we specify the amount of numbers in the range, instead of the interval
print(np.linspace(0,100,7) )
We can also take a range of numbers and create a multidimensional array of a given shape
print(np.arange(12).reshape(3,4))
Indexing arrays is similar to indexing lists. We just use the brackets operator. However, indexing multidimensional arrays is a bit simpler (no need for multiple brackets, just use a tuple). Additionally, we can specify slices in each dimension.
Let's take this array with 12 numbers
a = np.array([ 0, 1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121])
Indexing and slicing is the same as with lists
print(a[4])
print(a[1:7])
print(a[3:10:2])
Indexing/slicing gets interesting when we have multi-dimensional arrays
b = np.array([[ 0, 1, 4, 9],
[ 16, 25, 36, 49],
[ 64, 81, 100, 121],
[144, 169, 196, 225],
[256, 289, 324, 361],
[400, 441, 484, 529]])
Instead of multiple brackets [x][y][z] we can just use a tuple
print(b[2,3])
print(b[1,1])
We can use slicing with rows...
print(b[1:4])
Or with as many dimensions as we want:
print(b[1:4,0:2])
print(b[1:4,1])
Note how this is actually the "second column of rows 1,2,3"
As we saw in the first example, we can apply operators like + to arrays. When the operands are an array and a number, the operator (with the number) is applied to every element of the array. Note that this also works with relational operators like >, <, ==, etc.
We can use arithmetic operators...
print(a1 + 2)
print(a1 * 2)
...and relational operators.
print(b>100)
We also use these operators on two arrays:
c = np.array([10,20,30])
print(a1 + c)
What do you think is the result of using the *
operator?
d = np.array([[1, 2], [3, 4]])
e = np.array([[1, 0], [0, 1]])
print(d)
print(e)
print(d*e)
It's element-wise multiplication, not matrix multiplication!
If we want to do a matrix product, you need to use the dot() method
print(np.dot(d,e))
We can also apply NumPy math functions (again: elementwise) on the array
f = np.array([[1,-1],[np.pi,-np.pi]])
print(f)
print(np.cos(f))
NumPy provides fancier ways of indexing that are more powerful than those for regular Python lists.
We can specify the indexes we want with a list
print( a[ [1,3,6] ] )
Do the same but, if we provide a multidimensional array, the values at those indices are returned in an array of that same size.
print( a[ np.array([ [1,3] , [10, 7] ]) ] )
Specify an array of booleans. This returns a flattened array (1-dimensional) with all the values in the array where the specified boolean array had a True value.
c = np.array([100,200,300])
print( c[np.array([True, False, True])] )
This is mostly useful in combination with relational operators
print( b[b>100] )
When we use this with assignments, it acts as a filter for what indices the assignment should apply to.
Set all elements greater than 100 to 0
b[b>100] = 0
print(b)
b[b%2==1] = -1
print(b)
Reduction methods allow us to “reduce” an array to a single value. For example, the mean of all the values, the stdev of all the values, etc. We can also do this only for specific dimensions (e.g., the mean of each row, of each column, etc.)
# Let's reset b...
b = (np.arange(24)**2).reshape(6,4)
print(b)
Mean of all elements of b
b.mean()
Mean of the 0-th dimension (or "axis 0" in NumPy lingo) i.e. for each column, add up all the values and divide by the number of rows
print(b.mean(0))
Mean of the 1-th dimension (or "axis 1" in NumPy lingo) i.e. for each row, add up all the values and divide by the number of columns
print(b.mean(1))
Standard deviation
b.std()
If we put all of the above together, we can do some pretty elaborate computations with arrays/matrices in just a few lines
# Normalize to mean 0, stdev 1
b2 = b - b.mean()
z = b2 / b2.std()
print(z)
All values one standard deviations above the mean:
print( b[b>(b.mean() + b.std())] )
Or:
print( b[b2/b2.std() >= 1] )
Filter out all outliers (more than one stdev away from the mean)
print( b[abs(b2/b2.std()) < 1] )
NumPy also includes a linear algebra library.
import numpy.linalg as la
X = np.array([ [1,3] , [2, 8] ])
print(X)
Computing the determinant of a matrix:
# |X| = (1*8) - (3*2) = 2
la.det(X)
Computing the inverse of a matrix:
iX = la.inv(X)
print(iX)
print(np.dot(X,iX))
Solving a system of equations:
\begin{align} 1\cdot x_0 + 3\cdot x_1 & = 11 \\ 2\cdot x_0 + 8\cdot x_1 & = 28 \end{align}\begin{equation*} \mathbf{X} = \left( \begin{matrix} 1 & 3 \\ 2 & 8\end{matrix} \right) \;\;\;\; \mathbf{Y} = \left( \begin{matrix} 11 \\ 28\end{matrix} \right) \end{equation*}The solution should be $x_0=2, x_1=3$
Y = np.array([ [11], [28] ])
print(Y)
We can solve the system with matrix operations:
print(np.dot(iX,Y))
Or using linalg.solve
:
print(la.solve(X,Y))
In this case, they both perform pretty much the same, but linalg.solve
is preferable since it can detect when a system of linear equations can be solved more efficiently that by doing the above matrix operation.
Numpy has a matrix type, which has the advantage that some of the operators "make more sense" with matrices. e.g., * will do matrix multiplication, not element-wise multiplication
d = np.matrix([[1, 2], [3, 4]])
e = np.matrix([[1, 0], [0, 1]])
print(d*e)
However, most NumPy developers recommend using the more general array type (anything you can do with a matrix, you can do with an array; e.g., matrix multiplication is just the dot() method)