Classes and Objects
===================
Introduction
------------
The goal of this lab is to understand how objects work, including how
an application is designed following an object-oriented paradigm. In
Lab 3, you wrote some functions of your own and got some practice
calling them in order to create larger programs that were made
possible through the composition of those functions. In this lab, you
will take that idea a step further by using *objects* to organize
and compose data. First, we will add some functionality to the
Divvy classes we saw in the lectures, and then we will
write new classes from scratch to model a dataset from the
chisubmit system you use to submit assignments.
Python is an object-oriented language, which means that everything in
Python is really a structure called an *object*. So, for example, when
we create a dictionary::
d = {"foo": 42, "bar": 37}
What we're really doing is creating an instance of the ``dict`` class
which we store in a variable called ``d``. The "type of an object" is
called its *class*. So, when we refer to "the ``dict`` class" we refer
to the specification of a datatype that is used to contain character
strings. In fact, we can also create a dictionary like this::
d = dict([("foo", 42), ("bar", 37)])
Or an empty dictionary like this::
d = dict()
In lecture, we've referred to some data types (like ``int``
and ``float`` as "primitive data types" that specify a domain of values
(like integers, real numbers, boolean values, etc.). In Python, these data types are
actually *also* objects, even if we don't tend to think of them as such (in
fact, some other programming languages, like Java, also handle primitive data types
as non-object types). For example, if you create a ``float`` variable::
x = 0.33
Variable ``x`` is actually an instance of Python's ``float`` class, which has
a few handy methods, like ``as_integer_ratio``, which returns the floating
point number as a numerator/denominator tuple::
>>> x = 0.25
>>> x.as_integer_ratio()
(1, 4)
Play around with this type a bit. Notice anything interesting with certain
floating point numbers?
In this lab, you will be able to get more practice working with
classes. It is divided into two parts: first you will modify
some existing classes, and then you will design some classes
of your own.
To get started, open up a terminal and navigate (``cd``) to
your |repo_name| directory. Run ``git pull upstream
master`` to collect the lab materials and ``git pull`` to sync with
your personal repository.
Working with the Divvy Data
---------------------------
As you probably know, `Divvy `_ is
Chicago's enormously popular bike sharing system. In 2014, Divvy
published (anonymized) data on all the Divvy bicycle trips taken in
2013 (this data was published as part of the `2013 Divvy Data
Challenge `_). The dataset
contains two files: one with information about each Divvy station, and
one with information about each Divvy trip.
In the first part of the lab, we will be using four classes that model
the Divvy dataset:
* ``Location``: A class representing a geographic location.
* ``DivvyStation``: A class representing an individual Divvy station.
* ``DivvyTrip``: A class representing an individual Divvy trip.
* ``DivvyData``: A class representing the entire dataset, which includes a list of
stations and a list of trips.
An important aspect of object orientation is the ability to create
relationships between different classes, to model real-world
relationships. For example, a Divvy trip has an origin station and a
destination station. Instead of trying to pack all the information
about the stations in the ``DivvyTrip`` class, we instead have a
separate ``DivvyStation`` class that is used to represent individual
stations. The ``DivvyTrip`` class then only needs to have two
attributes of type ``DivvyStation``: one for the origin station and
one for the destination station.
These relations are referred as *composition* relationships, because
they allow us to define a class that is *composed of* other classes.
A useful way to think of these kind of relationships is that,
if you can describe the relationship as "has a" (e.g., "A DivvyStation
*has a* Location"), it is probably a composition relationship.
All the composition relations between the Divvy classes are summarized in the following figure:
.. image:: img/divvy-classes.png
:width: 350px
:align: center
1,2. The ``DivvyData`` class represents the entire Divvy dataset, so it
contains (1) a dictionary that maps station identifiers to ``DivvyStation``
objects, and (2) a list of ``DivvyTrip`` objects.
3. As discussed above, a ``DivvyTrip`` has two ``DivvyStation``
objects associated with it. This relationship is implemented simply by setting
two attributes, ``from_station`` and ``to_station`` in the
``DivvyStation`` constructor.
4. Finally, each Divvy station has a location, which we represent
using an instance of the ``Location`` class. Again, this is done
simply setting an attribute, ``location`` to a ``Location`` object
in the ``DivvyStation`` constructor.
The ``DivvyData`` class also has a ``bikeids`` attribute with a set of all the bike
identifiers in the dataset. This is not a composition relationship, but it is
something you may find useful in some of this lab's tasks.
For more details on the Divvy classes, please read the
`Composition `_
section of the textbook's chapter on
`Classes and Objects `_.
Before you get started
----------------------
Before you get started, you will need to download the Divvy dataset files.
To do so, go into the ``lab5/data`` directory in your terminal, and run the
following::
./get_files.sh
This will download the files; if the download is successful, you should see
this at the end of the command's output::
Extracting Divvy data...
divvy_2013_stations.csv
divvy_2013_trips.csv
divvy_2013_trips_medium.csv
divvy_2013_trips_small.csv
divvy_2013_trips_tiny.csv
We have provided the full dataset (``divvy_2013_trips.csv``) but also some
smaller files that will take less time to load.
Next, you will want to have an IPython session open. Make sure you start
IPython from the ``lab5`` directory, and that you load the ``autoreload``
extension, and then import the ``divvy`` module::
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import divvy
Computing the total distance of each bike
-----------------------------------------
We are going to start by adding a simple method to the DivvyData class,
which is contained in the ``divvy.py`` file in your ``lab5`` directory.
You'll see this class already has methods to compute the total distance
of all the trips (``get_total_distance``), and the total duration of all
the trips (``get_total_duration``). Make sure you understand how these
methods work before continuing!
You are going to add a new method to the DivvyData class that computes,
for every bike in the Divvy dataset, the sum of the duration of all the
trips taken by that bike::
def get_bike_times(self):
"""
Computes, for every bike in the Divvy dataset, the sum of the
duration of all the trips taken by that bike.
Returns a dictionary mapping bike identifiers (integer) to
a duration in seconds (integer)
"""
To implement this method, you will need to access the ``bikeid`` attribute
of the DivvyTrip objects in the DivvyData class. This attribute contains
the identifier of the bike used for that trip.
You can test your implementation from IPython by creating a DivvyData object
with our "tiny" dataset, and then testing a few bikes individually. For example::
In [5]: data = divvy.DivvyData("data/divvy_2013_stations.csv",
...: "data/divvy_2013_trips_tiny.csv")
...:
In [6]: dt = data.get_bike_times()
In [7]: dt[27]
Out[7]: 10105
In [8]: dt[44]
Out[8]: 1852
Later on, we'll see a more thorough way to test your implementation.
However, if your implementation works with the examples above, just
move on to the next task for now.
Computing the number of times a bike has been moved
---------------------------------------------------
If you've lived in Chicago long enough, you may have spotted the Divvy
vans that occasionally come to a Divvy station to place bikes
on the station's dock if the station is running low on bikes.
Not just that, they'll also take bikes away if the station has
too many bikes and seems to be underutilized.
So, you will sometimes see trips like this in the dataset:
* Trip #1234: Customer A took Bike 44 from station 10 to station 20
* Trip #1235: Customer B took Bike 44 from station 20 to station 30
* Trip #1236: Customer C took Bike 44 from station 50 to station 70
This means that the bike was moved by a Divvy van from station 30
to station 50!
You will add a method to the DivvyData class that finds any such
movements for all the bikes in the dataset::
def get_bike_movements(self):
"""
Returns a dictionary mapping bike identifiers (integer)
to a list of tuples, where each tuple represents that bike
being moved from one station to another.
Each tuple contains three values: the station the bike was
moved from (DivvyStation object), the station the bike was
moved to (DivvyStation object), and the difference in capacity
between the two stations (more specifically, the capacity
of the station the bike was moved to minus the capacity
of the station the bike was moved from). Note that this
will be an integer that can be either positive or negative.
Note that the dictionary must also include entries for
the bikes that have not been moved at all (those entries
will just map to an empty list)
"""
To implement this method, you will need to access the ``dpcapacity``
attribute of the ``DivvyStation`` objects. You will also want to
use the ``bikeids`` attribute of ``DivvyData``. Finally,
take into account that the ``trips`` attribute in ``DivvyData``
has the trips sorted by their start time.
You can test your implementation from IPython using the ``data``
object we created earlier::
In [11]: bm = data.get_bike_movements()
In [13]: bm[409]
Out[13]:
[(,
,
0)]
Notice how the ``DivvyStation`` objects are shown using the
string representation returned by ``DivvyStation``'s ``__repr__``
method.
If you print the entire ``bm`` dictionary, you should actually see that
none of the bikes have any movements (except bike 409). Remember we loaded
a "tiny" subset of the full dataset, so this is not unexpected.
Testing your Divvy methods more thoroughly
------------------------------------------
As part of the ``divvy.py`` file, we have included some code that uses
these two methods to answer these questions:
* What is the average amount of time a bike is used?
* What is the most used bike in the Divvy system?
* What is the average number of times a bike is moved?
* Do vans tend to move bikes from high capacity stations to low capacity
stations, from low to high, or neither?
Once you've implemented the two methods specified above, just run the ``divvy.py``
file from the terminal as follows::
python3 divvy.py data/divvy_2013_stations.csv data/divvy_2013_trips_medium.csv
Notice how we're using our "medium"-sized dataset for this. If you implemented
the methods correctly, the output should end with this::
The average total usage of a bike is 1d 8h 0m 39s
The most used bike is 444, used a total of 3d 17h 41m 10s
The average number of times a bike was moved was 16.17
On average, a bike is moved to a station with 1.05 more docks
(Standard deviation: 10.29)
If your code works with the medium data set, try running it with the full dataset::
python3 divvy.py data/divvy_2013_stations.csv data/divvy_2013_trips.csv
Please note that this will take a few seconds to run. The output should end with the following::
The average total usage of a bike is 3d 18h 36m 38s
The most used bike is 199, used a total of 9d 11h 30m 17s
The average number of times a bike was moved was 122.30
On average, a bike is moved to a station with 0.12 more docks
(Standard deviation: 9.28)
These results are interesting, but don't forget to check out the code that
produces them! You can find it towards the bottom of the ``divvy.py`` file.
Designing your own classes
--------------------------
In this part of the lab, we will be using a different dataset: an anonymized
chisubmit dataset from a different course in the department. This dataset
contains:
- Assignments: An assignment is a piece of work that must be handed in
by some deadline. The class had six assignments, ``pa1`` through ``pa6``.
- Students: The students registered in the class (their names and CNetIDs
have been anonymized using the Python `names `_ package).
- Teams: This class allowed students to submit assignments individually or
in pairs. In the chisubmit model, teams are the ones that submit assignments so,
if a student works individually, then they're actually a "team of one" and they
makes submissions under that team.
- Submissions: Each team can have zero or more submissions, with at most one
submission per assignment (while chisubmit does allow multiple submissions
per assignment, the dataset we're providing only includes the final submission
of each assignment, i.e., the one that was sent to the graders).
This dataset is stored in three JSON files, ``assignments.json``, ``students.json``,
and ``teams.json`` in the ``lab5/data`` directory (the ``teams.json`` file
also contains information about the submissions). While we provide the code to
load the data from these files, you should nonetheless take a look at contents
of the files.
You will do your work in the ``course.py`` file, which already includes an implementation
of an ``Assignment`` class and a ``Student`` class. We also provide the code that
creates all the ``Assignment`` and ``Student`` objects based on the dataset.
Our code also loads the data for teams and submissions into a list of dictionaries
calles ``teams_json``, where a team dictionary looks like this::
{
"students": [
"jdunlap",
"ghood"
],
"team_id": "jdunlap-ghood",
"submissions": [
{
"assignment_id": "pa2",
"submitted_at": "2017-01-18 00:27:35.886530+00:00",
"extensions_used": 0
},
{
"assignment_id": "pa3",
"submitted_at": "2017-01-24 02:00:11.428773+00:00",
"extensions_used": 0
}
]
}
Notice how a team dictionary also includes the list of submissions for that team.
Your task is to design and implement a ``Team`` class and a ``Submission`` class, and
create ``Team`` and ``Submission`` objects based on the data loaded into
``teams_json``. Make sure you add your code in the parts of ``course.py``
labelled ``YOUR CODE HERE``.
When implementing these classes, take the following into account:
- There are several composition relationships in these classes:
- A ``Team`` object has one or two ``Student`` objects.
- A ``Team`` object has zero or more ``Submission`` objects.
- A ``Submission`` object has an ``Assignment`` object.
Note that the above relationships are one-way. For example, we would model
the relationship between a ``Submission`` and an ``Assignment`` by adding
an attribute in ``Submission`` that contains the ``Assignment`` object for
a given submission. However, that doesn't mean that ``Assignment`` must
also have an attribute with a list of all the submissions for a given assignment
(this would be a valid way of modeling the relationship in a bidirectional way,
but we're just not requiring it here)
- Your ``Team`` class must include at least the following methods:
- ``includes_dropped()``: returns ``True`` if at least one of the students
in the team has dropped the class, ``False`` otherwise.
- ``extensions_used()``: returns the number of extensions used by this team
across all its submissions.
And the following attribute:
- ``submissions``: A list of ``Submission`` objects.
- Your ``Submission`` class must include at least the following method:
- ``deadline_delta()``: The difference in seconds between the submission time and the
assignment deadline (the difference will be positive if the submission was made
after the deadline, and negative if it was made before the deadline). You can use
the subtraction operator on the ``datetime`` type (e.g., ``d1 -d2``). This will
return a ``timedelta`` object. You will need to read the
`timedelta `_
documentation to obtain the total number of seconds.
A good rule of thumb
--------------------
Object oriented design can be tricky, and one aspect that can be challenging is
deciding what should go in a given class (be it the ``Team`` and ``Submission`` classes
in this lab or the ``Voter``, ``VoterGenerator``, and ``Precinct`` classes from the assignment).
When designing a class, you should try to remember this rule of thumb:
**If a class X has an attribute Y, the phrase "X has a Y" must make sense**
Try it out with some of the classes we've seen in this lab:
* A ``Location`` **has a** longitude and latitude
* A ``DivvyStation`` **has a** name
* A ``DivvyTrip`` **has a** bikeid (of the bike used in that trip)
* A ``DivvyTrip`` **has a** tripduration
Composition relationships are simply "has a" relationships where the "Y" (in "X has a Y") is, itself,
an object (or objects). For example:
* A ``DivvyStation`` **has a** location (which is represented as a ``Location`` object)
* A ``DivvyTrip`` **has two** stations (origin and destination, each a ``DivvyStation`` object)
* A ``Team`` **has** submissions (i.e., a list of ``Submission`` objects)
So, as you design your classes for PA #4, make sure you use you use this rule of thumb:
* *"A Precinct has an arrival time"*? Doesn't make sense: precincts don't have arrival times,
voters do.
* *"A Precinct has arrival times"*? That phrase makes a bit more sense, but it's not the Precinct
which directly has arrival times. It's the voters voting at that precinct who have arrival times.
* *"A VoterGenerator has a voting duration"*? The VoterGenerator is responsible for randomly
generating the voting duration of a voter, but the VoterGenerator itself doesn't have a voting
duration.
* etc.
If you add an attribute and the phrase "X has a Y" doesn't make sense, you're probably heading
down the wrong path. If so, and you can't figure out the correct attributes, please ask on
Piazza or come to office hours.
Testing
-------
Because we do not know the exact way in which you will design your classes, we
cannot provide tests for your classes. We suggest you test your code in IPython
as you work through this part of the lab. You should start your IPython
session as follows::
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import course
In [4]: a, s, t = course.load_data("data/assignments.json", "data/students.json", "data/teams.json")
In [5]: assignments = course.create_assignment_objects(a)
In [6]: students = course.create_student_objects(s)
``t`` will contain the list of team dictionaries, which you can use to test the creation
of your ``Team`` and ``Submission`` objects. Note that you will be able to access
the ``Student`` and ``Assignment`` objects through the ``assignments`` and ``students``
variables.
Once you have a complete implementation, you can run ``course.py`` from the command-line::
python3 course.py
If your implementation is correct, it should print out the following::
The number of teams with dropped students is 6
On average, non-late submissions are made 6h 13m 59s before the deadline
When Finished
-------------
.. include:: includes/finished-labs-1.txt
.. code::
git add divvy.py
git add course.py
git commit -m "Finished with lab5"
git push
.. _math: https://docs.python.org/2/library/math.html
.. _library's API: https://docs.python.org/2/library/math.html
.. include:: includes/finished-labs-2.txt