Plotting

The objective of this lab is to give you practice in plotting data and, more specifically, on how to work with the Matplotlib library.

Getting started

Do a git pull upstream master in your repository to pick up the files you need for this lab. Once you have collected the lab materials, navigate to the lab4 directory.

If you are on a VM, make sure that Matplotlib is installed on the VM:

sudo apt-get update
sudo apt-get install python3-matplotlib

Matplotlib

Matplotlib is a popular plotting library for Python. It supports a variety of plots that can be tweaked and customized in many different ways. So, while producing simple plots with Matplotlib is very easy, getting all the details right (and figuring out the exact Matplotlib code to do so) can sometimes be challenging.

Thus, when working with Matplotlib, it is common to follow two steps:

  1. Start by experimenting with Matplotlib interactively from a Python interpreter. When doing this, each call to a Matplotlib function will usually alter a plot interactively, which is very convenient when figuring out the exact Matplotlib code for our program.
  2. Once we have figured out the code to produce our plot, we save it to a Python program which, when run, produces the full plot in one go (either displaying it in a window or saving it to a file).

In this lab, we will first go through these two steps in detail with a simple example. Then, we will show you two plots which you should produce following the same methodology.

Plotting interactively with IPython

Matplotlib can be used interactively from any Python interpreter, but IPython in particular has a “pylab” mode that pre-loads all the Matplotlib functions, allowing us to easily use them from the IPython interpreter. To start the interpreter in this mode, run the following:

ipython3 --pylab

We have provided a plotting.py file which includes the data that we will plot in this part of the lab. Run the following to import this data into the interpreter:

from plotting import TEMPS_MIN, TEMPS_AVG, TEMPS_MAX

Each of these variables is a list with 31 floating point numbers, representing temperatures in Chicago during each day of the month of January 2014. TEMPS_MIN contains the minimum temperatures, TEMPS_AVG the average temperatures, and TEMPS_MAX the maximum temperatures. The first element of each list is the temperature for January 1st, the second element corresponds to January 2nd, etc.

We can plot the average temperatures just by running this:

plot(TEMPS_AVG)

This should open up a Matplotlib window (titled “Figure 1”) with a graph that looks roughly like this:

../../_images/simple1.png

Don’t worry if the graphs you see don’t look exactly like the ones you see on this page; they just have to look roughly the same.

Before continuing, close the Matplotlib window (i.e., the window with the graph; never close the window that is running IPython).

Now, let’s try plotting multiple lines. Start by running just this:

plot(TEMPS_AVG)

The same graph as before should appear. If possible, move the Matplotlib window in such a way that you can see both the graph and the IPython interpreter. Now, without closing the Matplotlib window run the following:

plot(TEMPS_MIN)
plot(TEMPS_MAX)

Two additional lines should appear, and you should see them appear on the Matplotlib window. The result should look like this:

../../_images/simple2.png

As you can see, given a list of values, we can very easily create a line plot just by calling the plot() function. However, the resulting graph is very basic: it has no title, no legend, no axis labels, etc.

Let’s produce a more complete version of this graph. Close the Matplotlib window and run the following on the IPython interpreter. The first call to plot() will open a Matplotlib window. Notice how every call after that (not just the other two plot() calls) modifies the plot interactively:

plot(TEMPS_MAX, color="orange", label="Max Temp")
plot(TEMPS_AVG, color="green", label="Avg Temp")
plot(TEMPS_MIN, color="blue", label="Min Temp")

title("Temperatures in Chicago from 1/1/14 to 1/31/14")
xlabel("Day")
ylabel("Temperature (F)")

axhline(32, color="gray", linestyle="--")

legend()

The resulting graph should look something like this:

../../_images/simple3.png

Writing plotting code in a Python program

Now, let’s see how the plotting code we wrote works when we include it in a Python program. Edit the plotting.py file to include this function:

def simple_plot():
    plt.plot(TEMPS_MAX, color="orange", label="Max Temp")
    plt.plot(TEMPS_AVG, color="green", label="Avg Temp")
    plt.plot(TEMPS_MIN, color="blue", label="Min Temp")

    plt.title("Temperatures in Chicago from 1/1/14 to 1/31/14")
    plt.xlabel("Day")
    plt.ylabel("Temperature (F)")

    plt.axhline(32, color="gray", linestyle="--")

    plt.legend()

Notice how the calls to the Matplotlib functions start with plt.. This is because the Matplotlib functions are not loaded the same way as in the IPython interpreter. We need to import them explicitly like this:

import matplotlib.pyplot as plt

Now, exit the IPython interpreter, and run ipython3 without the --pylab option:

ipython3

Now, run the plotting.py file:

run plotting.py

And call the simple_plot() function we just added:

simple_plot()

At this point, nothing should happen. The reason for this is that, when we’re not running code interactively, we need to explicitly tell Matplotlib to show us the plot. We can do so like this:

plt.show()

Notice that, when running Matplotlib non-interactively, your code will block whenever you call plt.show(). I.e., you need to close the Matplotlib window for your program to continue running (or, in this case, to return to the IPython interpreter).

Instead of calling show() from IPython, let’s add it to our simple_plot() function:

def simple_plot():
    plt.plot(TEMPS_MAX, color="orange", label="Max Temp")
    plt.plot(TEMPS_AVG, color="green", label="Avg Temp")
    plt.plot(TEMPS_MIN, color="blue", label="Min Temp")

    plt.title("Temperatures in Chicago from 1/1/14 to 1/31/14")
    plt.xlabel("Day")
    plt.ylabel("Temperature (F)")

    plt.axhline(32, color="gray", linestyle="--")

    plt.show()

However, there is a case where the above code will behave badly. Run this:

plt.plot(range(31), color="red")

Nothing should happen (i.e., there should be no new window with this plot). Now, run this:

run plotting.py
simple_plot()

You will see the temperature plot, but also a red line running through it. The reason this happened is that, when creating a new plot, Matplotlib will include all the plotting commands we run before show(). One way of ensuring that we produce a plot only with the elements we want is to call the figure() function, which basically indicates that all the Matplotlib code that follows that call (and until show() is called) is part of the same “figure”:

def simple_plot():
    plt.figure()
    plt.plot(TEMPS_MAX, color="orange", label="Max Temp")
    plt.plot(TEMPS_AVG, color="green", label="Avg Temp")
    plt.plot(TEMPS_MIN, color="blue", label="Min Temp")

    plt.title("Temperatures in Chicago from 1/1/14 to 1/31/14")
    plt.xlabel("Day")
    plt.ylabel("Temperature (F)")

    plt.axhline(32, color="gray", linestyle="--")

    plt.show()

Verify this is working correctly by running this:

run plotting.py
simple_plot()

The last modification we will make to our code is to add the ability to save the plot to a file instead of showing it in a window. We can do this by using the figure’s savefig() method. Modify the simple_plot() function so it looks like this:

def simple_plot(save_to = None):
    fig = plt.figure()
    plt.plot(TEMPS_MAX, color="orange", label="Max Temp")
    plt.plot(TEMPS_AVG, color="green", label="Avg Temp")
    plt.plot(TEMPS_MIN, color="blue", label="Min Temp")

    plt.title("Temperatures in Chicago from 1/1/14 to 1/31/14")
    plt.xlabel("Day")
    plt.ylabel("Temperature (F)")

    plt.axhline(32, color="gray", linestyle="--")

    plt.legend()

    if save_to is None:
        plt.show()
    else:
        fig.savefig(save_to)

Notice how we’ve added a save_to parameter that defaults to None. When we supply a string parameter, the figure is saved to the file specified by that parameter:

run plotting.py
simple_plot("temperatures.png")

There should now be a temperatures.png file in the same directory as plotting.py. If you open this file, it should contain the same graph that was displayed previously in a Matplotlib window.

When it is not specified, we simply see the plot in a window as before:

simple_plot()

Including a parameter like this can make your function easier to debug, since you can easily switch from saving to a file to viewing the plot in a new window.

Plotting Weather and Crime Data

Now that you’ve worked with some simple Matplotlib code, it’s time to produce a slightly more elaborate graph. The data for this graph is contained in a CSV file called weather_crime.csv that contains the average temperature in Chicago and the number of reported thefts in Chicago for every day between 1/1/2012 and 12/31/2013. The first few rows of the file look like this:

year,month,day,thefts,temp
2012,1,1,666,37.0
2012,1,2,267,24.0
2012,1,3,338,20.0
2012,1,4,345,33.0
2012,1,5,382,36.0
2012,1,6,396,47.0

Your code must read this data in and you must produce a graph that plots the number of thefts and the temperature over those two years:

../../_images/weather_crime.png

This graph informally shows a well-known correlation between certain types of crimes and the weather. In particular, there are fewer thefts in the winter because people (including criminals) tend to stay indoors.

Although this plot is more elaborate than the simple plot we saw earlier, you should be able to produce it with plot and other Matplotlib functions covered in class.

As you attempt to reproduce this graph, we encourage you to follow the same two steps we followed earlier: first play around with Matplotlib in the IPython interpreter (remember to run it with the --pylab option), and then write a function in plotting.py that produces this graph.

Creating an Error Bar Graph

This final exercise is a bit more challenging, because it involves producing a type of graph we did not see in class:

../../_images/errorbar.png

This graph shows the average, maximum, and minimum temperatures for each month between January 2012 and December 2012. To produce this graph, you must take the data in weather_crime.csv and compute the average, maximum, and minimum temperatures before you can call the Matplotlib functions. You may find it helpful to read the Matplotlib documentation on creating error bars and the Matplotlib gallery page on error bars.

As before, first write your code in IPython, and then write a function in plotting.py that produces this graph.

When finished

git add plotting.py
git commit -m "Finished with lab4"
git push