Visualizing Avian Biodiversity Using Treemaps¶
Due: Friday, December 4th at 3pm CST.
The purpose of this assignment is to give you practice working with recursive data structures and writing recursive functions.
You must work alone on this assignment.
Introduction¶
The Cornell Lab of Ornithology manages a website called eBird, which collects bird sighting data from birdwatchers around the world. eBird claims to be one of the world’s largest biodiversity-related projects, collecting over 100 million bird sightings per year. As they explain on their website, the data they collect “inform novel conservation actions, from site-specific information about the occurrence and abundance of birds, to looking at patterns of species abundance across continent-spanning flyways. eBird data contribute to hundreds of conservation decisions and peer-reviewed papers, thousands of student projects, and help inform research worldwide. Applications range from ecological and ornithological research, to conservation application, to advancing research in the fields of socioeconomics, artificial intelligence, and computer science.”
A question we might want to investigate about our Chicago community is, “Which birds are most prevelant in our area, and how does this depend on the time of year?”
We can use the eBird data 1 to address this. eBird makes their basic dataset, which contains raw data of bird observations, available to anyone under their terms of use.
Here is a bar chart showing the number of times each species of bird was seen in Cook County in October 2020:
From this we can see the most commonly seen 2 bird species in Cook County in October 2020 was the American Robin. But there are some shortcomings to this visualization.
For example, while this figure is fairly large, it still only shows the 40 most commonly sighted bird species in Cook County. In total, there were 230 different species sighted. We cannot determine from this figure even approximately what fraction of bird sightings were of the American Robin.
Additionally, we might notice that there are a number of species of sparrow that appear on this list, including the House Sparrow, the White-throated Sparrow, and the Song Sparrow, among others. All together, sparrows outnumber robins by quite a bit, but that is not clear from this figure.
So, we might appreciate a visualization in which related species are grouped together. The scientific study of how to group together related species is taxonomy. All living organisms are organized into a taxonomic hierarchy. For example, the class Aves of all birds is divided into a number of orders, such as Passeriformes, Anseriformes, and Piciformes. Each order is divided into a number of families, each family is divided into a number of genera (plural of genus), and each genus is divided into a number of species. This hierarchical structure is naturally represented as a tree, where each subdivision is a node in the tree.
The National Center for Biotechnology Information (NCBI) provides a Taxonomy Database.
Below we show a tree containing only the birds species that appear in our data set (further abridged). Each level of the tree represents a taxonomic rank (class, order, family, genus, or species).
Each node at a particular level represents a taxon at that level — that is, a member of that taxonomic rank.
Each internal (non-leaf) node has a key consisting of the taxonomic rank and the name of that taxon (for example, “family Passerellidae”), while each leaf has a key which is the the common name of a species. Each node has a value which is the number of times that species (or genus, family, etc.) was sighted.
class Aves: 67629
│
├──order Anseriformes: 7188
│ │
│ └──family Anatidae: 7188
│ │
│ ├──genus Anas: 2337
│ │ │
│ │ ├──American Black Duck: 47
│ │ │
│ │ ├──Blue-winged Teal: 84
│ │ │
│ │ ├──Green-winged Teal: 158
│ │ │
│ │ ├──Mallard (Domestic type): 1779
│ │ │
│ │ ├──Northern Pintail: 101
│ │ │
│ │ └──Northern Shoveler: 168
│ │
│ ├──genus Mareca: 297
│ │ │
│ │ ├──American Wigeon: 83
│ │ │
│ │ └──Gadwall: 214
│ │
│ ├──genus Melanitta: 128
│ │ │
│ │ ├──Black Scoter: 32
│ │ │
│ │ ├──Surf Scoter: 81
│ │ │
│ │ └──White-winged Scoter: 15
│ │
│ ├──genus Bucephala: 102
│ │ │
│ │ ├──Bufflehead: 82
│ │ │
│ │ └──Common Goldeneye: 20
│ │
│ ├──genus Branta: 2329
│ │ │
│ │ ├──Cackling Goose: 31
│ │ │
│ │ └──Canada Goose: 2298
│ │
│ ├──genus Aythya: 522
│ │ │
│ │ ├──Canvasback: 32
│ │ │
│ │ ├──Greater Scaup: 89
│ │ │
│ │ ├──Lesser Scaup: 69
│ │ │
│ │ ├──Redhead: 207
│ │ │
│ │ └──Ring-necked Duck: 125
│ │
│ ├──genus Mergus: 238
│ │ │
│ │ ├──Common Merganser: 31
│ │ │
│ │ └──Red-breasted Merganser: 207
│ │
│ ├──genus Anser: 62
│ │ │
│ │ ├──no rank unclassified Anser: 2
│ │ │ │
│ │ │ └──Domestic goose sp. (Domestic type): 2
│ │ │
│ │ ├──Greater White-fronted Goose: 10
│ │ │
│ │ └──Snow Goose: 50
│ │
│ ├──genus Lophodytes: 174
│ │ │
│ │ └──Hooded Merganser: 174
│ │
│ ├──genus Clangula: 47
│ │ │
│ │ └──Long-tailed Duck: 47
│ │
│ ├──genus Cygnus: 120
│ │ │
│ │ ├──Mute Swan: 82
│ │ │
│ │ ├──Trumpeter Swan: 36
│ │ │
│ │ └──Tundra Swan: 2
│ │
│ ├──genus Oxyura: 172
│ │ │
│ │ └──Ruddy Duck: 172
│ │
│ └──genus Aix: 660
│ │
│ └──Wood Duck: 660
│
...
[17 additional orders]
While this tree representation helps us to make comparisons, it is only practical to view a portion of the tree at any given time. It would be useful see a visual representation of the data, which is the role information visualization plays in computing and in data science.
How can we visualize this information in an effective way? We can use treemaps, which are an excellent tool for visualizing hierarchical data. Here, for example, is a treemap corresponding to the tree above.
Each rectangle represents a single species. The area of the rectangle is proportional to the number of sightings of that species. Species in the same genus are grouped together into a larger rectangle, genera in the same family are grouped together, etc.
Looking at the data in this form, we can immediately see that, for instance, the American Robin (near the center of the treemap) is more prevalent in the data than other species, but that sparrows (the group of boxes in the upper-left corner) as a whole are three to four times more prevalent than robins.
In general, treemaps are a space-constrained method for visualizing hierarchical structures that present a sense of “mass” and proportionality in a way that the typical tree diagram does not. Treemaps allow the viewer to compare leaves and sub-trees even at varying depths in the tree, and to spot patterns and exceptions. Ben Shneiderman designed treemaps during the 1990s as a way to visualize the contents of a file system. This technique has since been used to visualize many different types of data, including stock portfolios, oil production, a gene ontology, stimulus spending, and more. The original idea has been extended in many interesting ways.
In this assignment, you will write code to generate treemaps to visualize this avian biodiversity data from samples taken throughout the year.
- 1
eBird. 2017. eBird: An online database of bird distribution and abundance [web application]. eBird, Cornell Lab of Ornithology, Ithaca, New York. Available: http://www.ebird.org. (Accessed: Date November 21, 2020).
- 2
We are relying on just the number of sightings recorded on eBird. There may be other factors to adjust for to determine the actual prevalence of each species, such as how easy a bird is to see, whether someone reports it, and how much effort a person put in to find it. eBird does also provide data on “effort”, but we will not use that for this assignment.
Getting started¶
Before you start working on the assignment’s tasks, please take a
moment to follow the steps described in Coursework Basics page
to get the files for this assignment (these steps will be the same as
the ones you followed to get the files previous assignments. Please
note that you will not be able to start working on the assignment
until you fetch the assignment files (which will appear in a pa6/
directory in your repository).
The pa6/
directory contains several Python files, but you will
only be modifying one of them: treemap.py
. The rest of the
files are described throughout the assignment.
You can also consult the README for a description of each file.
As usual, you will be able to test your code from IPython and by using py.test
.
When using IPython, make sure to enable autoreload before importing the Python
files we will use in this assignment:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import treemap, drawing
In [4]: import tree
The drawing
module
As we’ll describe later on, we provide a drawing
module for visualizing treemaps.
This module is not required for completing the assignment, and will only work on
the virtual desktop. If you are working on the assignment through SSH, the above
import
commands may produce the following messages:
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
You can safely ignore these messages. Because SSH is a text-only interface, these messages
are simply warning you that the drawing
module will not work.
Like previous assignments, we also include a series of automated tests. These
tests require an additional set of files, which you can download by running
the following from the Linux command-line inside the pa6/
directory:
$ ./get_files.sh
Representing trees¶
We provide a class Tree
in the file tree.py
that you can use to represent tree data. This is essentially the same Tree
class we saw in class, Lab #6, and Short Exercises #6 (although it is slightly different from the Tree
class in the textbook). The class is useful for representing the avian biodiversity data, but it is not specific to the biodiversity data.
The public interface for this class includes:
Constructor:
t = Tree(key, value)
creates a new tree with the given key and value and no children.
Attributes:
t.key
andt.value
hold data associated with the root node of the treet
. Initially, a node has a key and value, but we will add additional attributes in the course of the assignment that hold additional information about each node.t.children
is a list of the child subtrees oft
.
Methods:
t.add_child(t2)
adds an existing treet2
as a child subtree oft
.t.num_children()
returns the number of child subtrees oft
.t.print()
prints the treet
, and is useful for debugging purposes.
We can use this class to represent our avian biodiversity data as described above: the key
attribute holds the name of the class, order, etc. that node represents and the value
attribute to hold the number of sightings of that taxon.
In Task 2, you will add an additional attribute, path
which will hold a tuple listing the keys on the path from the root to that node. We use this to determine which colors to use in the treemap. We elaborate on this requirement in Task 2.
If you are wondering why we are using generic names — key and value — rather than taxon or number of sightings, it is because this approach allows your treemap implementation to generalize to situations beyond avian biodiversity data.
eBird and NCBI data¶
Using the data from eBird and NCBI, we have constructed trees of bird sighting data like the one in the introduction. We have stored these trees in a JSON file, birds.json
, in the data/
directory.
We have provided a function, load_trees
for loading them into Tree
instances. You can use it as follows.
In [5]: data = treemap.load_trees('data/birds.json')
This defines data
to be a dictionary mapping strings to trees. For instance birds_by_month['Oct']
is the tree similar to the one in the introduction. However, this tree lacks values for the internal (non-leaf) nodes, which you will add in Task 2.
You can print this tree out in its entirety by running
In [6]: data['Oct'].print()
Here are the valid keys for this dictionary, and the corresponding trees:
Key |
Tree |
---|---|
|
Birds sighted in Cook County, IL in November 2019 |
|
Birds sighted in Cook County, IL in December 2019 |
|
Birds sighted in Cook County, IL in January 2020 |
|
Birds sighted in Cook County, IL in February 2020 |
|
Birds sighted in Cook County, IL in March 2020 |
|
Birds sighted in Cook County, IL in April 2020 |
|
Birds sighted in Cook County, IL in May 2020 |
|
Birds sighted in Cook County, IL in June 2020 |
|
Birds sighted in Cook County, IL in July 2020 |
|
Birds sighted in Cook County, IL in August 2020 |
|
Birds sighted in Cook County, IL in September 2020 |
|
Birds sighted in Cook County, IL in October 2020 |
|
Birds sighted in Cook County, IL from November 2019 to October 2020 |
|
Small sample tree useful for testing purposes |
|
Small sample tree useful for testing purposes |
|
Small sample tree useful for testing purposes |
Notice that in addition to the trees representing actual bird sighting data, we include three sample trees (with keys s1
, s2
, and s3
) that are smaller, simpler trees that are useful for testing, and which we will be using for demonstration throughout the assignment.
In addition to birds.json
, we have provided another JSON file, sparrows.json
, also in the data/
directory, which will be useful for testing Task 1. You can load this file using the same load_trees
function:
In [7]: data_flat = treemap.load_trees('data/sparrows.json')
This defines data_flat
to be a dictionary mapping strings to trees. The valid keys for sparrows.json
are the same as the valid keys for birds.json
. You can try printing one of these trees:
In [8]: data_flat['Oct'].print()
: 9142
│
├──American Tree Sparrow: 117
│
├──Chipping Sparrow: 185
│
├──Clay-colored Sparrow: 45
│
├──Field Sparrow: 213
│
├──Dark-eyed Junco: 1793
│
├──Eastern Towhee: 264
│
├──Fox Sparrow: 630
│
├──Harris's Sparrow: 53
│
├──White-crowned Sparrow: 1100
│
├──White-throated Sparrow: 1868
│
├──Lark Bunting: 1
│
├──LeConte's Sparrow: 37
│
├──Nelson's Sparrow: 121
│
├──Lincoln's Sparrow: 389
│
├──Song Sparrow: 1204
│
├──Swamp Sparrow: 820
│
├──Savannah Sparrow: 288
│
└──Vesper Sparrow: 14
Each tree in sparrows.json
has a very simple structure: It consists of a root node with a number of children, all of which are leaves (have no children). We again include three small sample trees ('s1'
, 's2'
, 's3'
) which we will use in our demonstrations and which you can use for testing. Each of these three sample trees is derived from the corresponding tree in birds.json
by discarding all nodes other than the root and its children (except that unlike in birds.json
, all of the nodes in sparrows.json
have their values pre-set, and in particular the value of the root node is the sum of the values of its children).
For the trees in sparrows.json
other than the sample trees, the leaves of the tree represent species from the family Passerellidae, which largely consists of sparrows. The leaf’s key is the common name of the species, and its value is the number of sightings during the specified month/year. The root node has its key set to the empty string, and its value is the sum of the values of its children (again, all the nodes have their values pre-set).
We will use the trees in sparrows.json
in Task 1.
Drawing rectangles¶
Recall that a treemap looks like this:
Notice how all the individual shapes in the diagram are all rectangles. In fact, drawing a treemap mostly involves figuring out the exact rectangles we need to draw on a “canvas”. Our canvas is a coordinate system from (0.0, 0.0) (the top-left corner) to (1.0, 1.0) (the bottom-right corner). This is often referred to as the bounding rectangle, because it describes the bounds of where we can “draw” things (in this case, rectangles)
An individual rectangle is specified by providing its origin point (the position of the top-left corner of the rectangle) and its size (its width along the x axis and its height along the y axis).
We provide a simple Rectangle
class in treemap.py
and
a draw_rectangles
function in drawing.py
that will
draw a list of rectangles. For example, the following
code draws a rectangle with origin (0.25, 0.5), width 0.75, height 0.25,
and label “Example Rectangle”.
(The fourth argument to the Rectangle
constructor needs to be a tuple of strings and is used for determining the color of the rectangle. You can ignore it for now.)
In [9]: r = treemap.Rectangle((0.25, 0.5), (0.75, 0.25), "Example Rectangle", ("ex",))
In [10]: drawing.draw_rectangles([r])
Reminder: The drawing
module
As noted earlier, the drawing
module will only work through the virtual desktop.
This does not mean you have to do your work over the virtual desktop! We provide this
module as a way to optionally visualize the treemaps, but none of the tasks you have
to do rely on having access to the drawing
module or to the virtual desktop.
This will open a window that displays something like this:
Note: you must close this window before you can continue entering code in IPython.
The following code draws two rectangles:
In [11]: r1 = treemap.Rectangle((0.20, 0.20), (0.20, 0.60), "Example Rectangle 1", ("ex1",))
In [12]: r2 = treemap.Rectangle((0.60, 0.20), (0.20, 0.60), "Example Rectangle 2", ("ex2",))
In [13]: drawing.draw_rectangles([r1, r2])
And will show something like this:
So, we need to figure out exactly what rectangles to generate (and with what positions and sizes) for a given tree.
Drawing one-level treemaps¶
Before we turn our attention to hierarchical data, let us consider how we would layout data that comes to us without a hierarchy. We could think of this data as just being a list of (key, value), pairs, but for compatibility with the later tasks, we will think of it as a tree with one root node, a list of child nodes, each of which is a leaf (so, there is only one level to this tree below the root). Here again is such a tree:
In [8]: data_flat['Oct'].print()
: 9142
│
├──American Tree Sparrow: 117
│
├──Chipping Sparrow: 185
│
├──Clay-colored Sparrow: 45
│
├──Field Sparrow: 213
│
├──Dark-eyed Junco: 1793
│
├──Eastern Towhee: 264
│
├──Fox Sparrow: 630
│
├──Harris's Sparrow: 53
│
├──White-crowned Sparrow: 1100
│
├──White-throated Sparrow: 1868
│
├──Lark Bunting: 1
│
├──LeConte's Sparrow: 37
│
├──Nelson's Sparrow: 121
│
├──Lincoln's Sparrow: 389
│
├──Song Sparrow: 1204
│
├──Swamp Sparrow: 820
│
├──Savannah Sparrow: 288
│
└──Vesper Sparrow: 14
Recall we want one rectangle per leaf node, and we want the area of that rectangle to be proportional to the value of the node (the number of birds seen of that species). One way to accomplish this would be to make a single stack of rectangles:
From this image, you can get some idea of the relative sizes of the rectangles, but the rectangles are very wide and short. Many are too short to fit a label. And, it can be hard to visually gauge the relative sizes of rectangles that are so elongated.
It would be preferable to have rectangles that are closer to squares, like this:
A type of treemap called a squarified treemap aims to use rectangles that approximate squares. It was introduced by Mark Bruls, Kees Huizing, and Jarke J. van Wijk in 2000. They developed an algorithm for laying out the rectangles, described below. You will implement this algorithm.
The input to the algorithm is a bounding rectangle for the treemap, and a tree. For now, we will assume the tree is of the kind described above, where all children of the root node are leaves. Each leaf represents on data point.
Example
We’ll use the following tree as a running example:
: 20
│
├──Three: 3
│
├──Two: 2
│
├──Six (a): 6
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
We’ll use a bounding box with width 1 and height 0.8.
Step 1: Take the list of data points (leaves), and sort the list in descending order of value.
Example
In our example, this gives us the list
Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
Step 2: We are about to layout a single row of rectangles, either along the left edge of our bounding box, or along the top edge. If the bounding box is wider than it is tall, or if the width and height are equal, choose the left edge. Otherwise, choose the top edge.
Example
In our example, the bounding box has width 1 and height 0.8, so it is wider than it is tall, and we’ll use the left edge.
Step 3: Determine how many rectangles to include in the row. Our goal is choose the number of rectangles such that each rectangle is not too distorted. We do this by first trying to fit a single rectangle in the row, and calculating its distortion. Then we try fitting two rectangles, then three, and so on as long as the distortion keeps improving (that is, the rectangles become closer to square). We stop fitting additional rectangles in the row once the distortion starts getting worse instead of better.
To be more precise: We will make rectangles corresponding to the first \(k\) data points in the list, for some value of \(k\). We will try \(k = 1, 2, 3, \dots\) in order.
For each value of \(k\):
Take the first \(k\) ordered data points, and sum their values. Then, divide by the sum of all the values in the entire list. This fraction is the fraction of the area of the bounding box that should be taken up by this row of rectangles.
For each of the \(k\) data points, divide its value by the sum of the \(k\) values in the row. This is the fraction of the area of the row that should be taken up by the rectangle for this data point. The rectangles in the row are laid out from top to bottom (if the row is against the left edge of the bounding box) or from left to right (if the row is against the top edge).
The calculations above allow us to determine the height and width of each rectangle in the row. Compute the aspect ratio of each rectangle by dividing its longer side by its shorter side. The aspect ratio is a number greater than or equal to 1, where 1 means that the rectangles is a square, while a very high aspect ratio means the rectangle is very distorted.
Define the distortion of this row to be the maximum aspect ratio of any rectangle in the row.
If \(k = 1\), or if the distortion for this row is less than or equal to the distortion for the previous \(k\), then move on to the next \(k\). If there is no next value of \(k\) because this row already uses all the data points in the list, then stop the process, and keep this value of \(k\) (which gives the lowest distortion).
On the other hand, if the distortion is higher than it was for the previous \(k\), we will not move on to the next \(k\) nor keep the current \(k\). Instead, stop the process and choose the previous value of \(k\) (the one that gave the lowest distortion).
Example
Let’s try this on our example. We start with \(k = 1\).
We take the first data point in our list,
Six (a): 6
. This row has a total value of 6, while the whole data set has a total value of 20. So, this row will take up a 6/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.3.We have one rectangle taking up the whole row. So, the height of that rectangle is the full height of the row (which is the full height of the bounding box), 0.8.
The aspect ratio of this rectangle is \(0.8/0.3 = 8/3 \approx 2.667\).
Here is a table with information about this rectangle, including its label, the (x, y) coordinates of its origin, its width and height, and its aspect ratio:
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.300
0.800
2.667
And here is a visualiztion of this rectangle:
There is only one rectangle in the row, so the distortion is its aspect ratio: \(8/3 \approx 2.667\).
We then move on to \(k = 2\).
We take the first 2 data points in our list,
Six (a): 6
andSix (b): 6
. This row has a total value of \(6 + 6 = 12\), while the whole data set has a total value of 20. So, this row will take up a 12/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.6.This row has two rectangles, each with value 6. The total value of of the row is 12, so each rectangle takes up a 6/12 fraction of the height of the row. Since the row has height 0.8, each rectangle has height 0.4.
The aspect ratio of each of these rectangles is \(0.6/0.4 = 3/2 = 1.5\).
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.600
0.400
1.500
Six (b)
0.000
0.400
0.600
0.400
1.500
The distortion of this row is the maximum of these two aspect ratios (which are both the same), \(1.5\).
Since the distortion for \(k = 2\), which is \(1.5\), is better (lower) than the distortion for \(k = 1\), which was approximately \(2.667\), we continue to the next \(k\).
We next try \(k = 3\).
We take the first 3 data points in our list,
Six (a): 6
,Six(b): 6
, andThree: 3
. This row has a total value of \(6 + 6 + 3 = 15\), while the whole data set has a total value of 20. So, this row will take up a 15/20 fraction of the area of the bounding box; it will have a width of 0.75.This row has three rectangles, with values 6, 6, and 3. Since the total value of the row is 15, the rectangles take up a 6/15, 6/15, and 3/15 fraction of the height, respectively. Since the whole row has height 0.8, the rectangles have heights 0.32, 0.32, and 0.16.
Since the rectangles have sizes 0.75 by 0.32, 0.75 by 0.32, and 0.75 by 0.16, the aspect ratios are \(0.75/0.32 = 2.34375\), \(0.75/0.32 = 2.34375\), and \(0.75/0.16 = 4.6875\).
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.750
0.320
2.344
Six (b)
0.000
0.320
0.750
0.320
2.344
Three
0.000
0.640
0.750
0.160
4.687
The distortion of the row is the maximum of these three aspect ratios, which is \(4.6875\).
Since the distortion for \(k = 3\), which is \(4.6875\), is worse (higher) than the distortion for \(k = 2\), which was \(1.5\), we stop progressing through values of \(k\), and instead choose the previous value of \(k\) — that is, \(k = 2\) — which gave us the best (lowest) distortion.
Step 4: At this point, we have filled in one row of rectangles. We have a list of remaining data points that have not been made into rectangles. There is a rectangle of leftover space within our bounding box which has not yet been filled. We now recursively fill that leftover space using the list of remaining data points. Those data points have already been sorted in descending order of value, so we can start with Step 2.
When there is no space remaining (equivalently, when there are no data points with nonzero values), we have a complete treemap and we are done!
Example
In our example, we have used our first two data points, so our remaining data points are
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
And, our remaining space is a rectangle with origin (0.6, 0.0), width 0.4, and height 0.8. This rectangle becomes our bounding box in the recursive call.
Since this bounding box is taller than it is wide, our new row will be placed along the top edge.
We start by laying out \(k = 1\) rectangles in our new row.
The first data point in our list is
Three: 3
. This row has a total value of 3, while the whole data set has a total value of \(3 + 2 + 1 + 1 + 1 = 8\) (because this recursive call uses the new, shorter list of data points). So, this row will take up a 3/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.3.We have one rectangle taking up the whole row. So, the width of that rectangle is the full width of the row (which is the full width of the bounding box), 0.4.
The aspect ratio of this rectangle is \(0.4/0.3 = 4/3 \approx 1.333\).
label
x
y
width
height
aspect ratio
Three
0.600
0.000
0.400
0.300
1.333
Here’s a visualization of this treemap (new row in blue):
There is only one rectangle in the row, so the distortion is its aspect ratio: \(4/3 \approx 1.333\).
We then try \(k = 2\).
We take the first 2 data points in our list,
Three: 3
andTwo: 2
. This row has a total value of \(3 + 2 = 5\), while the whole data set has a total value of 8. So, this row will take up a 5/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.5.This row has two rectangles, with values 3 and 2. The total value of of the row is 5, so the rectangles take a 3/5 and 2/5 fraction of the width of the row, respectively. Since the row has width 0.4, the rectangles have widths 0.24 and 0.16.
The aspect ratios these rectangles are \(0.5/0.24 = 25/12 \approx 2.083\), and \(0.5/0.16 = 25/8 = 3.125\).
label
x
y
width
height
aspect ratio
Three
0.600
0.000
0.240
0.500
2.083
Two
0.840
0.000
0.160
0.500
3.125
The distortion of this row is the maximum of these two aspect ratios \(3.125\).
Since the distortion for \(k = 2\), which is \(3.125\), is worse (higher) than the distortion for \(k = 1\), which was approximately \(1.333\), we do not continue to the next value of \(k\), but instead stop this process and revert back to the previous \(k\), which was \(k = 1\), and which featured the lowest distortion.
At this point, our list of remaining data points for which we have not yet created rectangles is
Two: 2
One (a): 1
One (b): 1
One (c): 1
and our rectangle of unused space has origin (0.6, 0.3), width 0.4, and height 0.5. This list and this leftover rectangle become the the inputs to the next recursive call.
Try continuing this process to work out the sizes and locations of the remaining rectangles. The final treemap will consists of these rectangles:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.600 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.400 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Visualized, it will look like this:
Task 1: One-level treemaps¶
Your first task is to fill in this function in treemap.py
:
def compute_rectangles(t, bounding_rec_width=1.0, bounding_rec_height=1.0):
'''
Computes the rectangles for drawing a treemap of the provided tree.
Inputs:
t: (Tree) a tree
bounding_rec_height, bounding_rec_width: (float) the size of
the bounding rectangle.
Returns: a list of Rectangle objects.
'''
This function takes a tree and, optionally, the height and width of the initial
bounding rectangle, and returns a list of Rectangle
objects.
Each Rectangle
instance should have its label
attribute set to equal the key of the corresponding leaf.
For this task, you should assume the tree has the shape described in the section on drawing one-level treemaps, above. That is, you should assume the tree consists of a root with a number of children, all of which are leaves. Each leaf has a string key and a positive integer value. The root has the empty string as its key, and its value is the sum of the values of the children.
You will modify this function in Task 3 to work with a tree of any shape.
We provide two helper functions which you can (and should) use in your solution. The first, sorted_trees
, sorts a list of trees in descending order of value:
def sorted_trees(tree_list):
'''
Sort a list of Tree instances by the value of their roots in descending
order. Ties are broken by the key of the root, in (forward) alphabetical
order. Returns a new sorted list without modifying the input list.
Inputs:
tree_list: list of Tree instances, each with an integer value.
Returns: list of Tree instances, sorted.
'''
The second, compute_row
, does much of the heavy lifting of determining whether a row should go against the left or top edge, and laying out rectangles in that row:
def compute_row(bounding_rec, row_data, total_sum):
'''
Lay out the given data points as rectangles in one row of a
treemap. The row will be against the left or top edge of the
bounding rectangle, depending on whether the rectangle is at least
as wide as it is tall.
Inputs:
bounding_rec: (Rectangle) the bounding rectangle
row_data: (list of Tree) the list of data points that should be
included in this row.
total_sum: (integer) the total value of all the rectangles that
will eventually fill the bounding rectangle (not just those
in this row).
Returns: a tuple (row_layout, leftover), where
row_layout: list of pairs (rec, t), where rec is a Rectangle, and
t is the Tree that Rectangle corresponds to, and
leftover: a Rectangle representing the leftover space in the
bounding rectangle that was not used by this row.
'''
This function, compute_row
, will lay out a rectangle for each data point it is given — no more and no less. It essentially handles Steps 2, 3a, and 3b of the algorithm. After applying this function, you will still need to compute the aspect ratio of each rectangle, compute the distortion, and ultimately determine the appropriate \(k\), the number of rectangles in the row. Additionally, this function also does not set the label
attribute of the rectangles it returns; you will need to do that yourself.
You can use a loop to iterate through values of \(k\) (you should not use recursion for this). However, your solution to Task 1 must use recursion to fill in the rectangle that is left over after placing one row. Figuring out the recursion can be challenging. Make sure to think through the design of the recursion following the steps we gave in class.
Hint: compute_rectangles
itself should not be recursive. Instead,
write a recursive helper function that you will call from compute_rectangles
.
Note
In our example, we made tables showing the size and positions of each rectangle. While making tables like the these is not part of this assignment, writing strategic print statements to display analogous data while you are initially writing and debugging your program will help you to isolate errors that are due to your generation of the rectangles, as opposed to errors drawing the rectangles you generate. We highly recommend you print out such information, and we will ask you to show us these sorts of print-outs when helping you to debug your code.
Testing Task 1¶
We suggest you start by testing this function informally from IPython
by printing out the rectangles produced by the function. You should use the trees provided in sparrows.json
, which have the required structure. Here are
some sample calls using this dataset.
In [14]: data_flat = treemap.load_trees('data/sparrows.json')
In [15]: recs = treemap.compute_rectangles(data_flat['s1'])
In [16]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.8000 0.5000 B40
RECTANGLE 0.0000 0.5000 0.8000 0.5000 C40
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20
In [17]: recs = treemap.compute_rectangles(data_flat['s2'])
In [18]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.6000 0.5000 Six (a)
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.4000 0.3750 Three
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)
In [19]: recs = treemap.compute_rectangles(data_flat['s3'])
In [20]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.5000 0.5000 6
RECTANGLE 0.0000 0.5000 0.5000 0.5000 6
RECTANGLE 0.5000 0.0000 0.5000 0.3333 4
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 1
Note that data_flat['s2']
is the tree that we used in the explanation of the algorithm.
Once you are confident that you’re producing the correct rectangles,
you can start visualizing them using the draw_rectangles
function:
In [21]: drawing.draw_rectangles(recs)
Note that sometimes, a rectangles might be too small to fit a label. When that happens, you may see a message about it in the terminal.
To run the automated tests for this task only, you can run the following:
$ py.test -x -v -k rectangles_flat
Task 2: Values and paths¶
We now turn our attention to trees with multiple levels and more complex structure, like the ones in birds.json
.
The trees in birds.json
only include a value for the
value
attribute in the leaf nodes (in all other nodes, the value
is set to None
). Furthermore, all nodes are lacking the path
attribute mentioned earlier, which will be used for assigning colors to our treemap. In Task 2, you will
complete the following two recursive functions in treemap.py
to
respectively compute the value
for all internal nodes and to set the
path
for all nodes.
Your solution to each must be recursive. Non-recursive solutions (i.e., functions that do not call themselves with an input that is in some way smaller) will not receive credit for this task. Furthermore, each function should be generalizable, working for a tree of any height. As a guideline, in our implementation each of these functions requires fewer than ten lines of code.
def compute_internal_values(t):
'''
Assign a value to the internal nodes. The value of the leaves should
already be set. The value of an internal node is the sum of the value
of its children.
Inputs:
t (Tree): a tree
Returns:
The input tree t should be modified so that every internal node's
value is set to be the sum of the values of its children.
The return value will be the value of the root of t (that is,
t.value)
'''
def compute_paths(t, prefix=()):
'''
Assign the path attribute of all nodes to be the full path through the
tree from the root down to, but not including, that node. For example,
following the path
"class Aves" --> "order Passeriformes" --> "family Passerellidae"
from the root to a node should result in the path atttribute
("class Aves", "order Passeriformes")
being assigned to the node with key "family Passerellidae".
For the root node, the path attribute should be an empty tuple.
Inputs:
t (Tree): a tree
prefix (tuple of strings): Prefix to add to verbose label
Outputs:
Nothing. The input tree t should be modified to contain a path
attribute for all nodes.
'''
Testing Task 2¶
You can informally test these functions from IPython by printing
out the tree after calling one of these functions, and checking whether
whether the value
or path
are set correctly.
You can use the trees provided in birds.json
We suggest you start with the smaller, sample trees, which are easier to check manually.
For example, you can check compute_internal_values
as follows:
In [22]: data = treemap.load_trees("data/birds.json")
In [23]: treemap.compute_internal_values(data['s1'])
Out[23]: 100
In [24]: data['s1'].print()
A100: 100
│
├──B40: 40
│ │
│ ├──E10: 10
│ │
│ ├──F10: 10
│ │ │
│ │ ├──J6: 6
│ │ │ │
│ │ │ ├──L1: 1
│ │ │ │
│ │ │ ├──M3: 3
│ │ │ │
│ │ │ └──N2: 2
│ │ │
│ │ └──K4: 4
│ │
│ └──G20: 20
│
├──C40: 40
│ │
│ ├──H30: 30
│ │
│ └──I10: 10
│
└──D20: 20
Caution
Note that compute_internal_values
modifies the tree in place (as does compute_paths
). If you want a fresh copy of the tree for further testing, you need to re-load the data.
You can check compute_paths
as follows (note how the print
method has an optional argument to print the paths)
In [25]: data = treemap.load_trees("data/birds.json")
In [26]: treemap.compute_paths(data['s1'])
In [27]: data['s1'].print(paths=True)
A100: ()
│
├──B40: ('A100',)
│ │
│ ├──E10: ('A100', 'B40')
│ │
│ ├──F10: ('A100', 'B40')
│ │ │
│ │ ├──J6: ('A100', 'B40', 'F10')
│ │ │ │
│ │ │ ├──L1: ('A100', 'B40', 'F10', 'J6')
│ │ │ │
│ │ │ ├──M3: ('A100', 'B40', 'F10', 'J6')
│ │ │ │
│ │ │ └──N2: ('A100', 'B40', 'F10', 'J6')
│ │ │
│ │ └──K4: ('A100', 'B40', 'F10')
│ │
│ └──G20: ('A100', 'B40')
│
├──C40: ('A100',)
│ │
│ ├──H30: ('A100', 'C40')
│ │
│ └──I10: ('A100', 'C40')
│
└──D20: ('A100',)
To run the automated tests for this task only, you can run the following:
$ py.test -x -v -k values
$ py.test -x -v -k paths
Drawing multiple-level treemaps¶
We now want to draw a treemap for a tree that may have multiple levels, like the trees in our original data set. In a finished treemap, there should be one rectangle per leaf of the tree (e.g. one rectangle per bird species), and each rectangle should have area proportional to the corresponding leaf’s value (e.g. proportional to the number of sightings of that bird).
One approach would be to put all of these leaves into one big list, and use the one-level treemap algorithm described earlier. But, the resulting visualization would abandon the structure of the tree. We would prefer that the rectangles of closely-related leaves appear close together in the treemap.
Here is an algorithm to accomplish this.
Example
We will use this tree as a running example:
Root: None
│
├──Three: None
│ │
│ ├──Three ~ Two: 2
│ │
│ └──Three ~ One: 1
│
├──Two: 2
│
├──Six (a): None
│ │
│ ├──Six (a) ~ Four: None
│ │ │
│ │ ├──Six (a) ~ Four ~ Three: 3
│ │ │
│ │ └──Six (a) ~ Four ~ One: 1
│ │
│ └──Six (a) ~ Two: 2
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
Step 1: Compute the values of the internal nodes (the values of the leaves should already be set). Each internal node should have its value set equal to the sum of the values of its children.
We will additionally set the path attribute of each node at this time.
Example
After computing the internal values, our example tree is as follows.
Root: 20
│
├──Three: 3
│ │
│ ├──Three ~ Two: 2
│ │
│ └──Three ~ One: 1
│
├──Two: 2
│
├──Six (a): 6
│ │
│ ├──Six (a) ~ Four: 4
│ │ │
│ │ ├──Six (a) ~ Four ~ Three: 3
│ │ │
│ │ └──Six (a) ~ Four ~ One: 1
│ │
│ └──Six (a) ~ Two: 2
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
Step 2: Focus on the root and its children. Lay out rectangles for the children according to the one-level algorithm.
Example
Ignoring all nodes other than the root and its children, our example tree looks like the following.
Root: 20
│
├──Three: 3
│
├──Two: 2
│
├──Six (a): 6
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
The child nodes are the same as in our example for the one-level algorithm. Recall that the rectangles generated were as follows.
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.600 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.400 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
And, this is visualized as the following.
Note that each rectangle created in this step corresponds not necessarily to a leaf, but to a subtree.
Step 3 Deal with each rectangle created in Step 2. What we do depends on whether or not the subtree associated with the rectangle is a leaf.
If the subtree is a leaf, then we are done with this rectangle: this is one of the rectangles we will draw.
If the subtree is not a leaf, then we further subdivide it. We lay out the sub-rectangles within this rectangle recursively: this rectangle serves as the bounding rectangle, and we apply the (multiple-level) treemap algorithm to the corresponding subtree.
Example
In our example, the children of the root are the following:
Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
Of these, Six (b): 6
, Two: 2
, One (a): 1
, One (b): 1
, and One (c): 1
are leaves. So, the corresponding rectangles can be added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Here they are visualized:
The other children are not leaves and need to be dealt with recursively. The first of these children is the following subtree:
Six (a): 6
│
├──Six (a) ~ Four: 4
│ │
│ ├──Six (a) ~ Four ~ Three: 3
│ │
│ └──Six (a) ~ Four ~ One: 1
│
└──Six (a) ~ Two: 2
It corresponds to the rectangle with origin (0., 0.), width 0.6 and height 0.4. Instead of drawing this rectangle, we fill it with sub-rectangles corresponding to the leaves of the subtree. To do that, we recursively use the (multiple-level) treemap algorithm on this subtree, using this rectangle as the bounding rectangle. This results in the following rectangles getting added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.400 |
0.300 |
|
0.000 |
0.300 |
0.400 |
0.100 |
|
0.400 |
0.000 |
0.200 |
0.400 |
Visualizing all the rectangles we have listed for drawing so far:
There is one other child we haven’t dealt with, the following subtree.
Three: 3
│
├──Three ~ Two: 2
│
└──Three ~ One: 1
It corresponds to the rectangle with origin (0.6, 0.), width 0.4, and height 0.3. Again, we do not draw this rectangle, but instead recursively fill it with sub-rectangles corresponding to the leaves of the subtree. These are the resulting rectangles that get added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.600 |
0.000 |
0.267 |
0.300 |
|
0.867 |
0.000 |
0.133 |
0.300 |
The full list of rectangles to draw is as follows.
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.400 |
0.300 |
|
0.000 |
0.300 |
0.400 |
0.100 |
|
0.400 |
0.000 |
0.200 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.267 |
0.300 |
|
0.867 |
0.000 |
0.133 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Here is a visualization of the full list of rectangles.
Note: The order in which you compute the rectangles to draw may be different from the order in which they are described here; that is okay.
Task 3: Multiple-level treemaps¶
Your final task is to modify the compute_rectangles
function from Task 1 to work with a tree of any shape.
First, add the following two lines of code in compute_rectangles
immediately after the docstring.
compute_internal_values(t)
compute_paths(t)
Since we are including code to call compute_internal_counts
and compute_verbose_labels
within compute_rectangles
, you do not need to call them before calling compute_rectangles
.
In Task 1, you set the label
attribute of each rectangle to be the key of the corresponding node. In this task, you must also set the color_code
attribute of each rectangle to equal the path
attribute of the corresponding node. This attribute is used by our drawing function to determine the color of each rectangle. Rectangles with the same color code are drawn in the same color. Rectangles that have nearby color codes (color codes that start the same way) are given similar colors. Since we are setting the color code of a rectangle to be the path attribute of the corresponding node, this means that nodes that are closely related in the tree will give rise to rectangles that are close in color.
Update your code to implement the multiple-level treemap algorithm from the previous section.
You must use recursion to accomplish this task. Also, you may not make any assumptions about the number of levels in the tree.
The implementation of this task is not too complex (our solution adds less than 10 lines to our code from Task 1), but again figuring out the recursion can be challenging. Make sure to think through the design of the recursion.
Testing Task 3¶
We again suggest you start by testing this function informally from IPython
by printing out the rectangles produced by the function. You can use the trees provided in birds.json
, which have multiple levels. Your updated function should also still work on the trees in sparrows.json
. Here are
some sample calls using the birds.json
dataset.
In [14]: data = treemap.load_trees('data/birds.json')
In [15]: recs = treemap.compute_rectangles(data['s1'])
In [16]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.4000 0.5000 G20
RECTANGLE 0.4000 0.0000 0.4000 0.2500 E10
RECTANGLE 0.4000 0.2500 0.2400 0.1250 M3
RECTANGLE 0.4000 0.3750 0.1600 0.1250 N2
RECTANGLE 0.5600 0.3750 0.0800 0.1250 L1
RECTANGLE 0.6400 0.2500 0.1600 0.2500 K4
RECTANGLE 0.0000 0.5000 0.6000 0.5000 H30
RECTANGLE 0.6000 0.5000 0.2000 0.5000 I10
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20
In [17]: recs = treemap.compute_rectangles(data['s2'])
In [18]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.4000 0.3750 Six (a) ~ Four ~ Three
RECTANGLE 0.0000 0.3750 0.4000 0.1250 Six (a) ~ Four ~ One
RECTANGLE 0.4000 0.0000 0.2000 0.5000 Six (a) ~ Two
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.2667 0.3750 Three ~ Two
RECTANGLE 0.8667 0.0000 0.1333 0.3750 Three ~ One
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)
In [19]: recs = treemap.compute_rectangles(data['s3'])
In [20]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.2500 0.5000 3
RECTANGLE 0.2500 0.0000 0.2500 0.3333 2
RECTANGLE 0.2500 0.3333 0.2500 0.1667 1
RECTANGLE 0.0000 0.5000 0.4167 0.5000 5
RECTANGLE 0.4167 0.5000 0.0833 0.5000 1
RECTANGLE 0.5000 0.0000 0.2500 0.3333 2a
RECTANGLE 0.7500 0.0000 0.2500 0.3333 2b
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 **1**
Note that data['s2']
is the tree that we used in the explanation of the multiple-level treemap algorithm. Note also that if you were to keep only the root node and its children in the tree data['s1']
, data['s2']
, or data['s3']
, you would get the tree data_flat['s1']
, data_flat['s2']
, or data_flat['s3']
, respectively. You can compare the rectangles you get from each of these to help you debug.
Once you are confident that you’re producing the correct rectangles,
you can again try visualizing them using the draw_rectangles
function:
In [21]: drawing.draw_rectangles(recs)
To run the automated tests that are new for this task, you can run the following:
$ py.test -x -v -k rectangles_full
To run all the automated tests for drawing rectangles, including the tests from Task 1, you can run the following:
$ py.test -x -v -k rectangles
Putting it all together¶
Once you have completed all the tasks, you will be able to easily generate any treemap you want from the command-line. In particular, you can run the following:
$ python3 treemap.py data/birds.json KEY
Where KEY
is the key for the tree for which you would like to produce a treemap.
Here are some examples and the expected treemap. Take into account that
running treemap.py
will open a new window with the treemap;
you will need to close that window before you can return to the
command line.
NOTE: If you are passing all the tests, do not worry if the produced treemaps do not match ours exactly! Your completeness grade will depend on the result of the tests, not on the exact graphics you produce.
$ python3 treemap.py data/birds.json Oct
$ python3 treemap.py data/birds.json Jul
$ python3 treemap.py data/birds.json Apr
$ python3 treemap.py data/birds.json Feb
$ python3 treemap.py data/birds.json Year
You can use the data/sparrows.json
data file as well:
$ python3 treemap.py data/sparrows.json Oct
Grading¶
Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our PA Rubric page.)
The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:
Completeness: 50%
Correctness: 20%
Design: 15%
Style: 15%
In this assignment, the Design grade will largely be based on how you designed your recursive functions.
Please note that if you do not use recursion in a given task, you can expect a considerable deduction in both the Correctness and Design portions of the rubric. If you do not use recursion at all in your code, you will receive a zero in both Correctness and Design.
You must include header comments in all the methods you implement.
Obtaining your test score¶
Like previous assignments, you can obtain your test score by running py.test
followed by ../common/grader.py
.
Submission¶
You must submit your work through Gradescope (linked from our Canvas
site). In the “Programming Assignment #6” assignment, simply upload
the file treemap.py
(do not upload any
other files!). Please note:
You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.
Acknowledgments: Gordon Kindlmann originally recommended drawing treemaps as good topic for an assignment.