Visualizing Avian Biodiversity Using Treemaps¶
Due: Friday, December 3rd at 4:30pm CST.
The purpose of this assignment is to give you practice working with recursive data structures and writing recursive functions.
You must work alone on this assignment.
Introduction¶
The Cornell Lab of Ornithology manages a website called eBird, which collects bird sighting data from birdwatchers around the world. eBird claims to be one of the world’s largest biodiversity-related projects, collecting over 100 million bird sightings per year. As they explain on their website, the data they collect “inform novel conservation actions, from site-specific information about the occurrence and abundance of birds, to looking at patterns of species abundance across continent-spanning flyways. eBird data contribute to hundreds of conservation decisions and peer-reviewed papers, thousands of student projects, and help inform research worldwide. Applications range from ecological and ornithological research, to conservation application, to advancing research in the fields of socioeconomics, artificial intelligence, and computer science.”
A question we might want to investigate about our Chicago community is, “Which birds are most prevelant in our area, and how does this depend on the time of year?”
We can use the eBird data 1 to address this. eBird makes their basic dataset, which contains raw data of bird observations, available to anyone under their terms of use.
Here is a bar chart showing the number of times each species of bird was seen in Cook County in October 2020:
From this we can see the most commonly seen 2 bird species in Cook County in October 2020 was the American Robin. But there are some shortcomings to this visualization.
For example, while this figure is fairly large, it still only shows the 40 most commonly sighted bird species in Cook County. In total, there were 230 different species sighted. We cannot determine from this figure even approximately what fraction of bird sightings were of the American Robin.
Additionally, we might notice that there are a number of species of sparrow that appear on this list, including the House Sparrow, the White-throated Sparrow, and the Song Sparrow, among others. All together, sparrows outnumber robins by quite a bit, but that is not clear from this figure.
So, we might appreciate a visualization in which related species are grouped together. The scientific study of how to group together related species is taxonomy. All living organisms are organized into a taxonomic hierarchy. For example, the class Aves of all birds is divided into a number of orders, such as Passeriformes, Anseriformes, and Piciformes. Each order is divided into a number of families, each family is divided into a number of genera (plural of genus), and each genus is divided into a number of species. This hierarchical structure is naturally represented as a tree, where each subdivision is a node in the tree.
The National Center for Biotechnology Information (NCBI) provides a Taxonomy Database.
Below we show a tree containing only the birds species that appear in our data set (further abridged). Each level of the tree represents a taxonomic rank (class, order, family, genus, or species).
Each node at a particular level represents a taxon at that level — that is, a member of that taxonomic rank.
Each internal (non-leaf) node has a key consisting of the taxonomic rank and the name of that taxon (for example, “family Passerellidae”), while each leaf has a key which is the the common name of a species. Each node has a value which is the number of times that species (or genus, family, etc.) was sighted.
class Aves: 67629
│
├──order Anseriformes: 7188
│ │
│ └──family Anatidae: 7188
│ │
│ ├──genus Anas: 2337
│ │ │
│ │ ├──American Black Duck: 47
│ │ │
│ │ ├──Blue-winged Teal: 84
│ │ │
│ │ ├──Green-winged Teal: 158
│ │ │
│ │ ├──Mallard (Domestic type): 1779
│ │ │
│ │ ├──Northern Pintail: 101
│ │ │
│ │ └──Northern Shoveler: 168
│ │
│ ├──genus Mareca: 297
│ │ │
│ │ ├──American Wigeon: 83
│ │ │
│ │ └──Gadwall: 214
│ │
│ ├──genus Melanitta: 128
│ │ │
│ │ ├──Black Scoter: 32
│ │ │
│ │ ├──Surf Scoter: 81
│ │ │
│ │ └──White-winged Scoter: 15
│ │
│ ├──genus Bucephala: 102
│ │ │
│ │ ├──Bufflehead: 82
│ │ │
│ │ └──Common Goldeneye: 20
│ │
│ ├──genus Branta: 2329
│ │ │
│ │ ├──Cackling Goose: 31
│ │ │
│ │ └──Canada Goose: 2298
│ │
│ ├──genus Aythya: 522
│ │ │
│ │ ├──Canvasback: 32
│ │ │
│ │ ├──Greater Scaup: 89
│ │ │
│ │ ├──Lesser Scaup: 69
│ │ │
│ │ ├──Redhead: 207
│ │ │
│ │ └──Ring-necked Duck: 125
│ │
│ ├──genus Mergus: 238
│ │ │
│ │ ├──Common Merganser: 31
│ │ │
│ │ └──Red-breasted Merganser: 207
│ │
│ ├──genus Anser: 62
│ │ │
│ │ ├──no rank unclassified Anser: 2
│ │ │ │
│ │ │ └──Domestic goose sp. (Domestic type): 2
│ │ │
│ │ ├──Greater White-fronted Goose: 10
│ │ │
│ │ └──Snow Goose: 50
│ │
│ ├──genus Lophodytes: 174
│ │ │
│ │ └──Hooded Merganser: 174
│ │
│ ├──genus Clangula: 47
│ │ │
│ │ └──Long-tailed Duck: 47
│ │
│ ├──genus Cygnus: 120
│ │ │
│ │ ├──Mute Swan: 82
│ │ │
│ │ ├──Trumpeter Swan: 36
│ │ │
│ │ └──Tundra Swan: 2
│ │
│ ├──genus Oxyura: 172
│ │ │
│ │ └──Ruddy Duck: 172
│ │
│ └──genus Aix: 660
│ │
│ └──Wood Duck: 660
│
...
[17 additional orders]
While this tree representation helps us to make comparisons, it is only practical to view a portion of the tree at any given time. It would be useful see a visual representation of the data, which is the role information visualization plays in computing and in data science.
How can we visualize this information in an effective way? We can use treemaps, which are an excellent tool for visualizing hierarchical data. Here, for example, is a treemap corresponding to the tree above.
Each rectangle represents a single species. The area of the rectangle is proportional to the number of sightings of that species. Species in the same genus are grouped together into a larger rectangle, genera in the same family are grouped together, etc.
Looking at the data in this form, we can immediately see that, for instance, the American Robin (near the center of the treemap) is more prevalent in the data than other species, but that sparrows (the group of boxes in the upper-left corner) as a whole are three to four times more prevalent than robins.
In general, treemaps are a space-constrained method for visualizing hierarchical structures that present a sense of “mass” and proportionality in a way that the typical tree diagram does not. Treemaps allow the viewer to compare leaves and sub-trees even at varying depths in the tree, and to spot patterns and exceptions. Ben Shneiderman designed treemaps during the 1990s as a way to visualize the contents of a file system. This technique has since been used to visualize many different types of data, including stock portfolios, oil production, a gene ontology, stimulus spending, and more. The original idea has been extended in many interesting ways.
In this assignment, you will write code to generate treemaps to visualize this avian biodiversity data from samples taken throughout the year.
- 1
eBird. 2017. eBird: An online database of bird distribution and abundance [web application]. eBird, Cornell Lab of Ornithology, Ithaca, New York. Available: http://www.ebird.org. (Accessed: Date November 21, 2020).
- 2
We are relying on just the number of sightings recorded on eBird. There may be other factors to adjust for to determine the actual prevalence of each species, such as how easy a bird is to see, whether someone reports it, and how much effort a person put in to find it. eBird does also provide data on “effort”, but we will not use that for this assignment.
Getting started¶
Before you start, make sure to review the Coursework Basics page. Then, using the invitation URL provided on Ed Discussion, create a repository for your PA #6 files.
Next, make sure you’ve
set the GITHUB_USERNAME
variable by running the following (replacing replace_me
with your GitHub username):
GITHUB_USERNAME=replace_me
(remember you can double-check whether the variable is properly set by
running echo $GITHUB_USERNAME
).
And finally, run these commands (if you don’t recall what some of these commands do, they are explained at the end of the Git Tutorial):
cd ~/cmsc12100
mkdir pa6-$GITHUB_USERNAME
cd pa6-$GITHUB_USERNAME
git init
git remote add origin git@github.com:uchicago-cmsc12100-aut-21/pa6-$GITHUB_USERNAME.git
git remote add upstream git@github.com:uchicago-cmsc12100-aut-21/pa6.git
git pull upstream main
git branch -M main
git push -u origin main
You will find the files you need for the programming assignment directly in the root of your
repository, including a README.txt
file that explains what each file is. Make sure you read that file.
Your PA #6 repository contains several Python files, but you will
only be modifying one of them: treemap.py
. The rest of the
files are described throughout the assignment.
You can also consult the README for a description of each file.
As usual, you will be able to test your code from IPython and by using py.test
.
When using IPython, make sure to enable autoreload before importing the Python
files we will use in this assignment:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import treemap, drawing
In [4]: import tree
The drawing
module
As we’ll describe later on, we provide a drawing
module for visualizing treemaps.
This module is not required for completing the assignment, and will only work on
the virtual desktop. If you are working on the assignment through SSH, the above
import
commands may produce the following messages:
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed
You can safely ignore these messages. Because SSH is a text-only interface, these messages
are simply warning you that the drawing
module will not work.
Like previous assignments, we also include a series of automated tests. These tests require an additional set of files, which you can download by running the following from the Linux command-line inside your PA #6 repository:
$ ./get_files.sh
Representing trees¶
We provide a class Tree
in the file tree.py
that you can use to represent tree data. This is essentially the same Tree
class we saw in class, Team Tutorial #6, and Short Exercises #6 (although it is slightly different from the Tree
class in the textbook). The class is useful for representing the avian biodiversity data, but it is not specific to the biodiversity data.
The public interface for this class includes:
Constructor:
t = Tree(key, value)
creates a new tree with the given key and value and no children.
Attributes:
t.key
andt.value
hold data associated with the root node of the treet
. Initially, a node has a key and value, but we will add additional attributes in the course of the assignment that hold additional information about each node.t.children
is a list of the child subtrees oft
.
Methods:
t.add_child(t2)
adds an existing treet2
as a child subtree oft
.t.num_children()
returns the number of child subtrees oft
.t.print()
prints the treet
, and is useful for debugging purposes.
We can use this class to represent our avian biodiversity data as described above: the key
attribute holds the name of the class, order, etc. that node represents and the value
attribute to hold the number of sightings of that taxon.
In Task 2, you will add an additional attribute, path
which will hold a tuple listing the keys on the path from the root to that node. We use this to determine which colors to use in the treemap. We elaborate on this requirement in Task 2.
If you are wondering why we are using generic names — key and value — rather than taxon or number of sightings, it is because this approach allows your treemap implementation to generalize to situations beyond avian biodiversity data.
eBird and NCBI data¶
Using the data from eBird and NCBI, we have constructed trees of bird sighting data like the one in the introduction. We have stored these trees in a JSON file, birds.json
, in the data/
directory.
We have provided a function, load_trees
for loading them into Tree
instances. You can use it as follows.
In [5]: data = treemap.load_trees('data/birds.json')
This defines data
to be a dictionary mapping strings to trees. For instance birds_by_month['Oct']
is the tree similar to the one in the introduction. However, this tree lacks values for the internal (non-leaf) nodes, which you will add in Task 2.
You can print this tree out in its entirety by running
In [6]: data['Oct'].print()
Here are the valid keys for this dictionary, and the corresponding trees:
Key |
Tree |
---|---|
|
Birds sighted in Cook County, IL in November 2019 |
|
Birds sighted in Cook County, IL in December 2019 |
|
Birds sighted in Cook County, IL in January 2020 |
|
Birds sighted in Cook County, IL in February 2020 |
|
Birds sighted in Cook County, IL in March 2020 |
|
Birds sighted in Cook County, IL in April 2020 |
|
Birds sighted in Cook County, IL in May 2020 |
|
Birds sighted in Cook County, IL in June 2020 |
|
Birds sighted in Cook County, IL in July 2020 |
|
Birds sighted in Cook County, IL in August 2020 |
|
Birds sighted in Cook County, IL in September 2020 |
|
Birds sighted in Cook County, IL in October 2020 |
|
Birds sighted in Cook County, IL from November 2019 to October 2020 |
|
Small sample tree useful for testing purposes |
|
Small sample tree useful for testing purposes |
|
Small sample tree useful for testing purposes |
Notice that in addition to the trees representing actual bird sighting data, we include three sample trees (with keys 's1'
, 's2'
, and 's3'
) that are smaller, simpler trees that are useful for testing, and which we will be using for demonstration throughout the assignment.
In addition to birds.json
, we have provided another JSON file, sparrows.json
, also in the data/
directory, which will be useful for testing Task 1. You can load this file using the same load_trees
function:
In [7]: data_flat = treemap.load_trees('data/sparrows.json')
This defines data_flat
to be a dictionary mapping strings to trees. The valid keys for sparrows.json
are the same as the valid keys for birds.json
. You can try printing one of these trees:
In [8]: data_flat['Oct'].print()
: 9142
│
├──American Tree Sparrow: 117
│
├──Chipping Sparrow: 185
│
├──Clay-colored Sparrow: 45
│
├──Field Sparrow: 213
│
├──Dark-eyed Junco: 1793
│
├──Eastern Towhee: 264
│
├──Fox Sparrow: 630
│
├──Harris's Sparrow: 53
│
├──White-crowned Sparrow: 1100
│
├──White-throated Sparrow: 1868
│
├──Lark Bunting: 1
│
├──LeConte's Sparrow: 37
│
├──Nelson's Sparrow: 121
│
├──Lincoln's Sparrow: 389
│
├──Song Sparrow: 1204
│
├──Swamp Sparrow: 820
│
├──Savannah Sparrow: 288
│
└──Vesper Sparrow: 14
Each tree in sparrows.json
has a very simple structure: It consists of a root node with a number of children, all of which are leaves (have no children). We again include three small sample trees ('s1'
, 's2'
, 's3'
) which we will use in our demonstrations and which you can use for testing. Each of these three sample trees is derived from the corresponding tree in birds.json
by discarding all nodes other than the root and its children (except that unlike in birds.json
, all of the nodes in sparrows.json
have their values pre-set, and in particular the value of the root node is the sum of the values of its children).
For the trees in sparrows.json
other than the sample trees, the leaves of the tree represent species from the family Passerellidae, which consists largely of sparrows. The leaf’s key is the common name of the species, and its value is the number of sightings during the specified month/year. The root node has its key set to the empty string, and its value is the sum of the values of its children (again, all the nodes have their values pre-set).
We will use the trees in sparrows.json
in Task 1.
Drawing rectangles¶
Recall that a treemap looks like this:
Notice how all the individual shapes in the diagram are all rectangles. In fact, drawing a treemap mostly involves figuring out the exact rectangles we need to draw on a “canvas”. Our canvas is a coordinate system from (0.0, 0.0) (the top-left corner) to (1.0, 1.0) (the bottom-right corner). This is often referred to as the bounding rectangle, because it describes the bounds of where we can “draw” things (in this case, rectangles)
An individual rectangle is specified by providing its origin point (the position of the top-left corner of the rectangle) and its size (its width along the x axis and its height along the y axis).
We provide a simple Rectangle
class in treemap.py
and
a draw_rectangles
function in drawing.py
that will
draw a list of rectangles. For example, the following
code draws a rectangle with origin (0.25, 0.5), width 0.75, height 0.25,
and label “Example Rectangle”.
(The fourth argument to the Rectangle
constructor needs to be a tuple of strings and is used for determining the color of the rectangle. You can ignore it for now.)
In [9]: r = treemap.Rectangle((0.25, 0.5), (0.75, 0.25), "Example Rectangle", ("ex",))
In [10]: drawing.draw_rectangles([r])
Reminder: The drawing
module
As noted earlier, the drawing
module will only work through the virtual desktop.
This does not mean you have to do your work over the virtual desktop! We provide this
module as a way to optionally visualize the treemaps, but none of the tasks you have
to do rely on having access to the drawing
module or to the virtual desktop.
This will open a window that displays something like this:
Note: you must close this window before you can continue entering code in IPython.
The following code draws two rectangles:
In [11]: r1 = treemap.Rectangle((0.20, 0.20), (0.20, 0.60), "Example Rectangle 1", ("ex1",))
In [12]: r2 = treemap.Rectangle((0.60, 0.20), (0.20, 0.60), "Example Rectangle 2", ("ex2",))
In [13]: drawing.draw_rectangles([r1, r2])
And will show something like this:
So, we need to figure out exactly what rectangles to generate (and with what positions and sizes) for a given tree.
Drawing one-level treemaps¶
Before we turn our attention to hierarchical data, let us consider how we would layout data that comes to us without a hierarchy. We could think of this data as just being a list of (key, value), pairs, but for compatibility with the later tasks, we will think of it as a tree with one root node, a list of child nodes, each of which is a leaf (so, there is only one level to this tree below the root). Here again is such a tree:
In [8]: data_flat['Oct'].print()
: 9142
│
├──American Tree Sparrow: 117
│
├──Chipping Sparrow: 185
│
├──Clay-colored Sparrow: 45
│
├──Field Sparrow: 213
│
├──Dark-eyed Junco: 1793
│
├──Eastern Towhee: 264
│
├──Fox Sparrow: 630
│
├──Harris's Sparrow: 53
│
├──White-crowned Sparrow: 1100
│
├──White-throated Sparrow: 1868
│
├──Lark Bunting: 1
│
├──LeConte's Sparrow: 37
│
├──Nelson's Sparrow: 121
│
├──Lincoln's Sparrow: 389
│
├──Song Sparrow: 1204
│
├──Swamp Sparrow: 820
│
├──Savannah Sparrow: 288
│
└──Vesper Sparrow: 14
Recall we want one rectangle per leaf node, and we want the area of that rectangle to be proportional to the value of the node (the number of birds seen of that species). One way to accomplish this would be to make a single stack of rectangles:
From this image, you can get some idea of the relative sizes of the rectangles, but the rectangles are very wide and short. Many are too short to fit a label. And, it can be hard to visually gauge the relative sizes of rectangles that are so elongated.
It would be preferable to have rectangles that are closer to squares, like this:
A type of treemap called a squarified treemap aims to use rectangles that approximate squares. It was introduced by Mark Bruls, Kees Huizing, and Jarke J. van Wijk in 2000. They developed an algorithm for laying out the rectangles, described below. You will implement this algorithm.
The input to the algorithm is a bounding rectangle for the treemap, and a tree. For now, we will assume the tree is of the kind described above, where all children of the root node are leaves. Each leaf represents on data point.
Example
We’ll use the following tree as a running example:
: 20
│
├──Three: 3
│
├──Two: 2
│
├──Six (a): 6
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
We’ll use a bounding box with width 1 and height 0.8.
Step 1: Take the list of data points (leaves), and sort the list in descending order of value.
Example
In our example, this gives us the list
Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
Step 2: We are about to layout a single row of rectangles, either along the left edge of our bounding box, or along the top edge. If the bounding box is wider than it is tall, or if the width and height are equal, choose the left edge. Otherwise, choose the top edge.
Example
In our example, the bounding box has width 1 and height 0.8, so it is wider than it is tall, and we’ll use the left edge.
Step 3: Determine how many rectangles to include in the row. Our goal is choose the number of rectangles such that each rectangle is not too distorted. We do this by first trying to fit a single rectangle in the row, and calculating its distortion. Then we try fitting two rectangles, then three, and so on as long as the distortion keeps improving (that is, the rectangles become closer to square). We stop fitting additional rectangles in the row once the distortion starts getting worse instead of better.
To be more precise: We will make rectangles corresponding to the first \(k\) data points in the list, for some value of \(k\). We will try \(k = 1, 2, 3, \dots\) in order.
For each value of \(k\):
Take the first \(k\) ordered data points, and sum their values. Then, divide by the sum of all the values in the entire list. This fraction is the fraction of the area of the bounding box that should be taken up by this row of rectangles.
For each of the \(k\) data points, divide its value by the sum of the \(k\) values in the row. This is the fraction of the area of the row that should be taken up by the rectangle for this data point. The rectangles in the row are laid out from top to bottom (if the row is against the left edge of the bounding box) or from left to right (if the row is against the top edge).
The calculations above allow us to determine the height and width of each rectangle in the row. Compute the aspect ratio of each rectangle by dividing its longer side by its shorter side. The aspect ratio is a number greater than or equal to 1, where 1 means that the rectangles is a square, while a very high aspect ratio means the rectangle is very distorted.
Define the distortion of this row to be the maximum aspect ratio of any rectangle in the row.
If \(k = 1\), or if the distortion for this row is less than or equal to the distortion for the previous \(k\), then move on to the next \(k\). If there is no next value of \(k\) because this row already uses all the data points in the list, then stop the process, and keep this value of \(k\) (which gives the lowest distortion).
On the other hand, if the distortion is higher than it was for the previous \(k\), we will not move on to the next \(k\) nor keep the current \(k\). Instead, stop the process and choose the previous value of \(k\) (the one that gave the lowest distortion).
Example
Let’s try this on our example. We start with \(k = 1\).
We take the first data point in our list,
Six (a): 6
. This row has a total value of 6, while the whole data set has a total value of 20. So, this row will take up a 6/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.3.We have one rectangle taking up the whole row. So, the height of that rectangle is the full height of the row (which is the full height of the bounding box), 0.8.
The aspect ratio of this rectangle is \(0.8/0.3 = 8/3 \approx 2.667\).
Here is a table with information about this rectangle, including its label, the (x, y) coordinates of its origin, its width and height, and its aspect ratio:
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.300
0.800
2.667
And here is a visualiztion of this rectangle:
There is only one rectangle in the row, so the distortion is its aspect ratio: \(8/3 \approx 2.667\).
We then move on to \(k = 2\).
We take the first 2 data points in our list,
Six (a): 6
andSix (b): 6
. This row has a total value of \(6 + 6 = 12\), while the whole data set has a total value of 20. So, this row will take up a 12/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.6.This row has two rectangles, each with value 6. The total value of of the row is 12, so each rectangle takes up a 6/12 fraction of the height of the row. Since the row has height 0.8, each rectangle has height 0.4.
The aspect ratio of each of these rectangles is \(0.6/0.4 = 3/2 = 1.5\).
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.600
0.400
1.500
Six (b)
0.000
0.400
0.600
0.400
1.500
The distortion of this row is the maximum of these two aspect ratios (which are both the same), \(1.5\).
Since the distortion for \(k = 2\), which is \(1.5\), is better (lower) than the distortion for \(k = 1\), which was approximately \(2.667\), we continue to the next \(k\).
We next try \(k = 3\).
We take the first 3 data points in our list,
Six (a): 6
,Six(b): 6
, andThree: 3
. This row has a total value of \(6 + 6 + 3 = 15\), while the whole data set has a total value of 20. So, this row will take up a 15/20 fraction of the area of the bounding box; it will have a width of 0.75.This row has three rectangles, with values 6, 6, and 3. Since the total value of the row is 15, the rectangles take up a 6/15, 6/15, and 3/15 fraction of the height, respectively. Since the whole row has height 0.8, the rectangles have heights 0.32, 0.32, and 0.16.
Since the rectangles have sizes 0.75 by 0.32, 0.75 by 0.32, and 0.75 by 0.16, the aspect ratios are \(0.75/0.32 = 2.34375\), \(0.75/0.32 = 2.34375\), and \(0.75/0.16 = 4.6875\).
label
x
y
width
height
aspect ratio
Six (a)
0.000
0.000
0.750
0.320
2.344
Six (b)
0.000
0.320
0.750
0.320
2.344
Three
0.000
0.640
0.750
0.160
4.687
The distortion of the row is the maximum of these three aspect ratios, which is \(4.6875\).
Since the distortion for \(k = 3\), which is \(4.6875\), is worse (higher) than the distortion for \(k = 2\), which was \(1.5\), we stop progressing through values of \(k\), and instead choose the previous value of \(k\) — that is, \(k = 2\) — which gave us the best (lowest) distortion.
Step 4: At this point, we have filled in one row of rectangles. We have a list of remaining data points that have not been made into rectangles. There is a rectangle of leftover space within our bounding box which has not yet been filled. The problem of filling that rectangle of leftover space with the list of remaining data points is equivalent to our original problem but on a simpler input. So, we can recursively fill that leftover space using the remaining data points. Those data points have already been sorted in descending order of value, so we can start with Step 2.
When there is no space remaining (equivalently, when there are no data points with nonzero values), we have a complete treemap and we are done!
Example
In our example, we have used our first two data points, so our remaining data points are
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
And, our remaining space is a rectangle with origin (0.6, 0.0), width 0.4, and height 0.8. This rectangle becomes our bounding box in the recursive call.
Since this bounding box is taller than it is wide, our new row will be placed along the top edge.
We start by laying out \(k = 1\) rectangles in our new row.
The first data point in our list is
Three: 3
. This row has a total value of 3, while the whole data set has a total value of \(3 + 2 + 1 + 1 + 1 = 8\) (because this recursive call uses the new, shorter list of data points). So, this row will take up a 3/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.3.We have one rectangle taking up the whole row. So, the width of that rectangle is the full width of the row (which is the full width of the bounding box), 0.4.
The aspect ratio of this rectangle is \(0.4/0.3 = 4/3 \approx 1.333\).
label
x
y
width
height
aspect ratio
Three
0.600
0.000
0.400
0.300
1.333
Here’s a visualization of this treemap (new row in blue):
There is only one rectangle in the row, so the distortion is its aspect ratio: \(4/3 \approx 1.333\).
We then try \(k = 2\).
We take the first 2 data points in our list,
Three: 3
andTwo: 2
. This row has a total value of \(3 + 2 = 5\), while the whole data set has a total value of 8. So, this row will take up a 5/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.5.This row has two rectangles, with values 3 and 2. The total value of of the row is 5, so the rectangles take a 3/5 and 2/5 fraction of the width of the row, respectively. Since the row has width 0.4, the rectangles have widths 0.24 and 0.16.
The aspect ratios these rectangles are \(0.5/0.24 = 25/12 \approx 2.083\), and \(0.5/0.16 = 25/8 = 3.125\).
label
x
y
width
height
aspect ratio
Three
0.600
0.000
0.240
0.500
2.083
Two
0.840
0.000
0.160
0.500
3.125
The distortion of this row is the maximum of these two aspect ratios \(3.125\).
Since the distortion for \(k = 2\), which is \(3.125\), is worse (higher) than the distortion for \(k = 1\), which was approximately \(1.333\), we do not continue to the next value of \(k\), but instead stop this process and revert back to the previous \(k\), which was \(k = 1\), and which featured the lowest distortion.
At this point, our list of remaining data points for which we have not yet created rectangles is
Two: 2
One (a): 1
One (b): 1
One (c): 1
and our rectangle of unused space has origin (0.6, 0.3), width 0.4, and height 0.5. This list and this leftover rectangle become the the inputs to the next recursive call.
Try continuing this process to work out the sizes and locations of the remaining rectangles. The final treemap will consists of these rectangles:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.600 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.400 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Visualized, it will look like this:
Task 1: One-level treemaps¶
Your first task is to fill in this function in treemap.py
:
def compute_rectangles(t, bounding_rec_width=1.0, bounding_rec_height=1.0):
'''
Computes the rectangles for drawing a treemap of the provided tree.
Inputs:
t: (Tree) a tree
bounding_rec_width, bounding_rec_height: (float) the width and height
of the bounding rectangle.
Returns: a list of Rectangle objects.
'''
This function takes a tree and, optionally, the height and width of the initial
bounding rectangle, and returns a list of Rectangle
objects.
Each Rectangle
instance should have its label
attribute set to equal the key of the corresponding leaf.
For this task, you should assume the tree has the shape described in the section on drawing one-level treemaps, above. That is, you should assume the tree consists of a root with a number of children, all of which are leaves. Each leaf has a string key and a positive integer value. The root has the empty string as its key, and its value is the sum of the values of the children.
You will modify this function in Task 3 to work with a tree of any shape.
We provide two helper functions which you can (and should) use in your solution. The first, sorted_trees
, sorts a list of trees in descending order of value:
def sorted_trees(tree_list):
'''
Sort a list of Tree instances by the value of their roots in descending
order. Ties are broken by the key of the root, in (forward) alphabetical
order. Returns a new sorted list without modifying the input list.
Inputs:
tree_list: list of Tree instances, each with an integer value.
Returns: list of Tree instances, sorted.
'''
The second, compute_row
, does much of the heavy lifting of determining whether a row should go against the left or top edge, and laying out rectangles in that row:
def compute_row(bounding_rec, row_data, total_sum):
'''
Lay out the given data points as rectangles in one row of a
treemap. The row will be against the left or top edge of the
bounding rectangle, depending on whether the rectangle is at least
as wide as it is tall.
Inputs:
bounding_rec: (Rectangle) the bounding rectangle
row_data: (list of Tree) the list of data points that should be
included in this row.
total_sum: (integer) the total value of all the rectangles that
will eventually fill the bounding rectangle (not just those
in this row).
Returns: a tuple (row_layout, leftover_rec), where
row_layout: list of pairs (rec, t), where rec is a Rectangle, and
t is the Tree that Rectangle corresponds to, and
leftover_rec: a Rectangle representing the leftover space in the
bounding rectangle that was not used by this row.
'''
This function, compute_row
, will lay out a rectangle for each data point it is given — no more and no less. It essentially handles Steps 2, 3a, and 3b of the algorithm. After applying this function, you will still need to compute the aspect ratio of each rectangle, compute the distortion, and ultimately determine the appropriate \(k\), the number of rectangles in the row. Additionally, this function also does not set the label
attribute of the rectangles it returns; you will need to do that yourself.
You can use a loop to iterate through values of \(k\) in Step 3 (you should not use recursion for that part of the algorithm).
In Step 4 of the algorithm, after placing one row, we have a rectangle of leftover space and a list of remaining data points which we will use to fill that space. Since this is a simpler version of our original problem (we again need to fill a bounding rectangle based on a list of data points, but now we have a smaller bounding rectangle and fewer data points), this is a natural place to use recursion. Your function will need just a single recursive call. Figuring out the recursion can be challenging, but it gets easier with practice, and this is a good opportunity to get some practice. Make sure to think through the design of the recursion following the steps we discussed in class.
Hint: compute_rectangles
itself should not be recursive. Instead,
write a recursive helper function that you will call from compute_rectangles
.
When you have a function that only makes a single recusive call, there’s often a reasonable way to write that function using iteration instead of recursion. For example, recall the factorial function: it can be written recursively with a single recursive call (the computation of factorial(5)
relies only on a call to factorial(4)
, for instance), but we also saw a version of the factorial function that used iteration instead. Both versions of the factorial function were straightforward and were about the same length. Like the factorial function, the compute_rectangles
function that you are writing in this task requires only a single recursive call. Also like the factorial function, the compute_rectangles
function can be written using iteration instead of recursion. We encourage you to use recursion for this task as this is a naturally recursive problem, and it’s good practice. But, an iterative solution will also be accepted for this task (though note that recursive solutions will be required for Tasks 2 and 3).
Note
In our example, we made tables showing the size and positions of each rectangle. While making tables like the these is not part of this assignment, we encourage you to write strategic print statements to display analogous data while you are initially writing and debugging your program. This will help you to isolate errors that are due to your generation of the rectangles. We highly recommend you print out such information, and we will ask you to show us these sorts of print-outs when helping you to debug your code.
Testing Task 1¶
We suggest you start by testing this function informally from IPython
by printing out the rectangles produced by the function. You should use the trees provided in sparrows.json
, which have the required structure. Here are
some sample calls using this dataset.
Note that the string represenation of a Rectangle
object consists of the word “RECTANGLE” followed by the x- and y-coordinates of the rectangle’s origin, then the rectangle’s width, height, and label.
In [14]: data_flat = treemap.load_trees('data/sparrows.json')
In [15]: recs = treemap.compute_rectangles(data_flat['s1'])
In [16]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.8000 0.5000 B40
RECTANGLE 0.0000 0.5000 0.8000 0.5000 C40
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20
In [17]: recs = treemap.compute_rectangles(data_flat['s2'])
In [18]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.6000 0.5000 Six (a)
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.4000 0.3750 Three
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)
In [19]: recs = treemap.compute_rectangles(data_flat['s3'])
In [20]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.5000 0.5000 6
RECTANGLE 0.0000 0.5000 0.5000 0.5000 6
RECTANGLE 0.5000 0.0000 0.5000 0.3333 4
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 1
Note that data_flat['s2']
is the tree that we used in the explanation of the algorithm.
Once you are confident that you’re producing the correct rectangles,
you can start visualizing them using the draw_rectangles
function:
In [21]: drawing.draw_rectangles(recs)
Note that sometimes, a rectangles might be too small to fit a label. When that happens, you may see a message about it in the terminal.
To run the automated tests for this task only, you can run the following:
$ py.test -x -v -k rectangles_flat
Task 2: Values and paths¶
We now turn our attention to trees with multiple levels and more complex structure, like the ones in birds.json
.
The trees in birds.json
only include a value for the
value
attribute in the leaf nodes (in all other nodes, the value
is set to None
). Furthermore, all nodes are lacking the path
attribute mentioned earlier, which will be used for assigning colors to our treemap. In Task 2, you will
complete the following two recursive functions in treemap.py
to
respectively compute the value
for all internal nodes and to set the
path
for all nodes.
Your solution to each must be recursive. Non-recursive solutions (i.e., functions that do not call themselves with an input that is in some way smaller) will not receive credit for this task. Furthermore, each function should be generalizable, working for a tree of any height. As a guideline, in our implementation each of these functions requires fewer than ten lines of code.
def compute_internal_values(t):
'''
Assign a value to the internal nodes. The value of the leaves should
already be set. The value of an internal node is the sum of the value
of its children.
Inputs:
t (Tree): a tree
Returns: The input tree t should be modified so that every internal node's
value is set to be the sum of the values of its children. The return
value is the value of the root of t (that is, t.value).
'''
def compute_paths(t, prefix=()):
'''
Assign the path attribute of all nodes. The path attribute of a node
should be a tuple of strings made up of the keys of the nodes along a
full path through the tree from the root down to, but not including,
that node. For example, following the path
"class Aves" --> "order Passeriformes" --> "family Passerellidae"
from the root to a node should result in the path atttribute
("class Aves", "order Passeriformes")
being assigned to the node with key "family Passerellidae".
For the root node, the path attribute should be an empty tuple.
Inputs:
t (Tree): a tree
prefix (tuple of strings): Prefix to add to path
Returns: Nothing. The input tree t should be modified to contain a path
attribute for all nodes.
'''
Testing Task 2¶
You can informally test these functions from IPython by printing
out the tree after calling one of these functions, and checking whether
whether the value
or path
are set correctly.
You can use the trees provided in birds.json
We suggest you start with the smaller, sample trees, which are easier to check manually.
For example, you can check compute_internal_values
as follows:
In [22]: data = treemap.load_trees("data/birds.json")
In [23]: treemap.compute_internal_values(data['s1'])
Out[23]: 100
In [24]: data['s1'].print()
A100: 100
│
├──B40: 40
│ │
│ ├──E10: 10
│ │
│ ├──F10: 10
│ │ │
│ │ ├──J6: 6
│ │ │ │
│ │ │ ├──L1: 1
│ │ │ │
│ │ │ ├──M3: 3
│ │ │ │
│ │ │ └──N2: 2
│ │ │
│ │ └──K4: 4
│ │
│ └──G20: 20
│
├──C40: 40
│ │
│ ├──H30: 30
│ │
│ └──I10: 10
│
└──D20: 20
Caution
Note that compute_internal_values
modifies the tree in place (as does compute_paths
). If you want a fresh copy of the tree for further testing, you need to re-load the data.
You can check compute_paths
as follows (note how the print
method has an optional argument to print the paths)
In [25]: data = treemap.load_trees("data/birds.json")
In [26]: treemap.compute_paths(data['s1'])
In [27]: data['s1'].print(paths=True)
A100: ()
│
├──B40: ('A100',)
│ │
│ ├──E10: ('A100', 'B40')
│ │
│ ├──F10: ('A100', 'B40')
│ │ │
│ │ ├──J6: ('A100', 'B40', 'F10')
│ │ │ │
│ │ │ ├──L1: ('A100', 'B40', 'F10', 'J6')
│ │ │ │
│ │ │ ├──M3: ('A100', 'B40', 'F10', 'J6')
│ │ │ │
│ │ │ └──N2: ('A100', 'B40', 'F10', 'J6')
│ │ │
│ │ └──K4: ('A100', 'B40', 'F10')
│ │
│ └──G20: ('A100', 'B40')
│
├──C40: ('A100',)
│ │
│ ├──H30: ('A100', 'C40')
│ │
│ └──I10: ('A100', 'C40')
│
└──D20: ('A100',)
To run the automated tests for this task only, you can run the following:
$ py.test -x -v -k values
$ py.test -x -v -k paths
Drawing multiple-level treemaps¶
We now want to draw a treemap for a tree that may have multiple levels, like the trees in our original data set. In a finished treemap, there should be one rectangle per leaf of the tree (e.g. one rectangle per bird species), and each rectangle should have area proportional to the corresponding leaf’s value (e.g. proportional to the number of sightings of that bird).
One approach would be to put all of these leaves into one big list, and use the one-level treemap algorithm described earlier. But, the resulting visualization would abandon the structure of the tree. We would prefer that the rectangles of closely-related leaves appear close together in the treemap.
Here is an algorithm to accomplish this.
Example
We will use this tree as a running example:
Root: None
│
├──Three: None
│ │
│ ├──Three ~ Two: 2
│ │
│ └──Three ~ One: 1
│
├──Two: 2
│
├──Six (a): None
│ │
│ ├──Six (a) ~ Four: None
│ │ │
│ │ ├──Six (a) ~ Four ~ Three: 3
│ │ │
│ │ └──Six (a) ~ Four ~ One: 1
│ │
│ └──Six (a) ~ Two: 2
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
Step 1: Compute the values of the internal nodes (the values of the leaves should already be set). Each internal node should have its value set equal to the sum of the values of its children.
We will additionally set the path attribute of each node at this time.
Example
After computing the internal values, our example tree is as follows.
Root: 20
│
├──Three: 3
│ │
│ ├──Three ~ Two: 2
│ │
│ └──Three ~ One: 1
│
├──Two: 2
│
├──Six (a): 6
│ │
│ ├──Six (a) ~ Four: 4
│ │ │
│ │ ├──Six (a) ~ Four ~ Three: 3
│ │ │
│ │ └──Six (a) ~ Four ~ One: 1
│ │
│ └──Six (a) ~ Two: 2
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
Step 2: Focus on the root and its children. Lay out rectangles for the children according to the one-level algorithm.
Example
Ignoring all nodes other than the root and its children, our example tree looks like the following.
Root: 20
│
├──Three: 3
│
├──Two: 2
│
├──Six (a): 6
│
├──Six (b): 6
│
├──One (a): 1
│
├──One (b): 1
│
└──One (c): 1
The child nodes are the same as in our example for the one-level algorithm. Recall that the rectangles generated were as follows.
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.600 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.400 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
And, this is visualized as the following.
Note that each rectangle created in this step corresponds not necessarily to a leaf, but to a subtree.
Step 3 Deal with each rectangle created in Step 2. What we do depends on whether or not the subtree associated with the rectangle is a leaf.
If the subtree is a leaf, then we are done with this rectangle: this is one of the rectangles we will draw.
If the subtree is not a leaf, then we further subdivide it. We lay out the sub-rectangles within this rectangle recursively: this rectangle serves as the bounding rectangle, and we apply the (multiple-level) treemap algorithm to the corresponding subtree.
Example
In our example, the children of the root are the following:
Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1
Of these, Six (b): 6
, Two: 2
, One (a): 1
, One (b): 1
, and One (c): 1
are leaves. So, the corresponding rectangles can be added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Here they are visualized:
The other children are not leaves and need to be dealt with recursively. The first of these children is the following subtree:
Six (a): 6
│
├──Six (a) ~ Four: 4
│ │
│ ├──Six (a) ~ Four ~ Three: 3
│ │
│ └──Six (a) ~ Four ~ One: 1
│
└──Six (a) ~ Two: 2
It corresponds to the rectangle with origin (0., 0.), width 0.6 and height 0.4. Instead of drawing this rectangle, we fill it with sub-rectangles corresponding to the leaves of the subtree. To do that, we recursively use the (multiple-level) treemap algorithm on this subtree, using this rectangle as the bounding rectangle. This results in the following rectangles getting added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.400 |
0.300 |
|
0.000 |
0.300 |
0.400 |
0.100 |
|
0.400 |
0.000 |
0.200 |
0.400 |
Visualizing all the rectangles we have listed for drawing so far:
There is one other child we haven’t dealt with, the following subtree.
Three: 3
│
├──Three ~ Two: 2
│
└──Three ~ One: 1
It corresponds to the rectangle with origin (0.6, 0.), width 0.4, and height 0.3. Again, we do not draw this rectangle, but instead recursively fill it with sub-rectangles corresponding to the leaves of the subtree. These are the resulting rectangles that get added to the list of rectangles to draw:
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.600 |
0.000 |
0.267 |
0.300 |
|
0.867 |
0.000 |
0.133 |
0.300 |
The full list of rectangles to draw is as follows.
label |
x |
y |
width |
height |
---|---|---|---|---|
|
0.000 |
0.000 |
0.400 |
0.300 |
|
0.000 |
0.300 |
0.400 |
0.100 |
|
0.400 |
0.000 |
0.200 |
0.400 |
|
0.000 |
0.400 |
0.600 |
0.400 |
|
0.600 |
0.000 |
0.267 |
0.300 |
|
0.867 |
0.000 |
0.133 |
0.300 |
|
0.600 |
0.300 |
0.400 |
0.200 |
|
0.600 |
0.500 |
0.267 |
0.150 |
|
0.600 |
0.650 |
0.267 |
0.150 |
|
0.867 |
0.500 |
0.133 |
0.300 |
Here is a visualization of the full list of rectangles.
Note: The order in which you compute the rectangles to draw may be different from the order in which they are described here; that is okay.
Task 3: Multiple-level treemaps¶
Your final task is to modify the compute_rectangles
function from Task 1 to work with a tree of any shape.
First, add the following two lines of code in compute_rectangles
immediately after the docstring.
compute_internal_values(t)
compute_paths(t)
Since we are including code to call compute_internal_values
and compute_paths
within compute_rectangles
, you do not need to call them before calling compute_rectangles
.
In Task 1, you set the label
attribute of each rectangle to be the key of the corresponding node. In this task, you must also set the color_code
attribute of each rectangle to equal the path
attribute of the corresponding node. This attribute is used by our drawing function to determine the color of each rectangle. Rectangles with the same color code are drawn in the same color. Rectangles that have nearby color codes (color codes that start the same way) are given similar colors. Since we are setting the color code of a rectangle to be the path attribute of the corresponding node, this means that nodes that are closely related in the tree will give rise to rectangles that are close in color.
Update your code to implement the multiple-level treemap algorithm from the previous section.
You must use recursion to accomplish this task. Also, you may not make any assumptions about the number of levels in the tree.
The implementation of this task is not too complex (our solution adds less than 10 lines to our code from Task 1), but again figuring out the recursion can be challenging. Make sure to think through the design of the recursion.
Testing Task 3¶
We again suggest you start by testing this function informally from IPython
by printing out the rectangles produced by the function. You can use the trees provided in birds.json
, which have multiple levels. Your updated function should also still work on the trees in sparrows.json
. Here are
some sample calls using the birds.json
dataset.
In [14]: data = treemap.load_trees('data/birds.json')
In [15]: recs = treemap.compute_rectangles(data['s1'])
In [16]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.4000 0.5000 G20
RECTANGLE 0.4000 0.0000 0.4000 0.2500 E10
RECTANGLE 0.4000 0.2500 0.2400 0.1250 M3
RECTANGLE 0.4000 0.3750 0.1600 0.1250 N2
RECTANGLE 0.5600 0.3750 0.0800 0.1250 L1
RECTANGLE 0.6400 0.2500 0.1600 0.2500 K4
RECTANGLE 0.0000 0.5000 0.6000 0.5000 H30
RECTANGLE 0.6000 0.5000 0.2000 0.5000 I10
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20
In [17]: recs = treemap.compute_rectangles(data['s2'])
In [18]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.4000 0.3750 Six (a) ~ Four ~ Three
RECTANGLE 0.0000 0.3750 0.4000 0.1250 Six (a) ~ Four ~ One
RECTANGLE 0.4000 0.0000 0.2000 0.5000 Six (a) ~ Two
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.2667 0.3750 Three ~ Two
RECTANGLE 0.8667 0.0000 0.1333 0.3750 Three ~ One
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)
In [19]: recs = treemap.compute_rectangles(data['s3'])
In [20]: for r in recs:
...: print(r)
...:
RECTANGLE 0.0000 0.0000 0.2500 0.5000 3
RECTANGLE 0.2500 0.0000 0.2500 0.3333 2
RECTANGLE 0.2500 0.3333 0.2500 0.1667 1
RECTANGLE 0.0000 0.5000 0.4167 0.5000 5
RECTANGLE 0.4167 0.5000 0.0833 0.5000 1
RECTANGLE 0.5000 0.0000 0.2500 0.3333 2a
RECTANGLE 0.7500 0.0000 0.2500 0.3333 2b
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 **1**
Note that data['s2']
is the tree that we used in the explanation of the multiple-level treemap algorithm. Note also that if you were to keep only the root node and its children in the tree data['s1']
, data['s2']
, or data['s3']
, you would get the tree data_flat['s1']
, data_flat['s2']
, or data_flat['s3']
, respectively. You can compare the rectangles you get from each of these to help you debug.
Once you are confident that you’re producing the correct rectangles,
you can again try visualizing them using the draw_rectangles
function:
In [21]: drawing.draw_rectangles(recs)
To run the automated tests that are new for this task, you can run the following:
$ py.test -x -v -k rectangles_full
To run all the automated tests for drawing rectangles, including the tests from Task 1, you can run the following:
$ py.test -x -v -k rectangles
Putting it all together¶
Once you have completed all the tasks, you will be able to easily generate any treemap you want from the command-line. In particular, you can run the following:
$ python3 treemap.py data/birds.json KEY
Where KEY
is the key for the tree for which you would like to produce a treemap.
Here are some examples and the expected treemap. Take into account that
running treemap.py
will open a new window with the treemap;
you will need to close that window before you can return to the
command line.
NOTE: If you are passing all the tests, do not worry if the produced treemaps do not match ours exactly! Your completeness grade will depend on the result of the tests, not on the exact graphics you produce. Remember that while working on the assignment via SSH, you will not be able to produce images at all, but that is okay.
$ python3 treemap.py data/birds.json Oct
$ python3 treemap.py data/birds.json Jul
$ python3 treemap.py data/birds.json Apr
$ python3 treemap.py data/birds.json Feb
$ python3 treemap.py data/birds.json Year
You can use the data/sparrows.json
data file as well:
$ python3 treemap.py data/sparrows.json Oct
Grading¶
The assignment will be graded according to the Rubric for Programming Assignments. Make sure to read that page carefully; the remainder of this section explains aspects of the grading that are specific to this assignment.
In particular, your completeness score will be determined on the basis of the automated tests, which provide a measure of how many of the tasks you have completed:
Grade |
Percent tests passed |
---|---|
Exemplary |
at least 95% |
Satisfactory |
at least 80% |
Needs Improvement |
at least 60% |
Ungradable |
less than 60% |
For example, if your implementation passes 85% of the tests, you will earn an S (satisfactory) score for completeness.
The code quality score will be based on the general criteria in the Rubric for Programming Assignments but, in this programming assignment, we will also be paying special attention to the following:
Use of recursion: The goal of this assignment is for you to gain familiarity with recursion, and tasks 2 and 3 require you to use recursion. If you do not use recursion for a function that requires it, the automated tests for that function will not count, and your code quality score will also be negatively affected. If you do not use recursion at all in your code, you will receive a U for Code Quality.
Appropriate use of recursive helper functions: The function
compute_rectangles
should not itself be recursive. Instead,compute_rectangles
should call a recursive helper function. You have the ability to decide what will be the inputs and outputs of that recursive helper function, and a careful choice will make the rest of your work easier and will make your code cleaner.Repeating code: You should never copy-paste code if you need to repeat a task. There are various ways to avoid copy-pasting, depending on the situation: you might put the code in a helper function, you might move the code inside/outside a loop, or, if the code is inside multiple branches of a conditional, you might move that code outside the conditional.
Computing the same value multiple times: As part of the algorithm, you compute a row layout for each value of \(k\), but do you then recompute the row layout for the best value of \(k\)? How can you avoid repeating that work?
Sorting a list that is already sorted: Sorting a list takes more time than just iterating through the list (specifically, sorting a list of length \(n\) takes time \(O(n \log n)\)). So, you should call
sorted_trees
whenever you need to sort a list of trees, but don’t call it more than you have to. In particular, don’t re-sort a list that is already sorted.
While these are the main things we care about in this assignment, please remember that it is not possible for us to give you an exhaustive list of every single thing that could affect your code quality score (and that thinking in those terms is generally counterproductive to learning how to program; see our How to Learn in this Class page for more details).
In general, avoiding all of the above issues will increase the likelihood of getting an E; if your code has a few (but not most) of the above issues, that will likely result in an S; if your code suffers from most of those issues, it will likely get an N. That said, to reiterate, we could also find other issues in your code that we did not anticipate, but that end up affecting your code quality score. When that happens, we encourage you to see these as opportunities to improve your code in future assignments (or as specific things to change if you decide to resubmit the assignment).
And don’t forget that style also matters! You could avoid all of the above issues, and still not get an E because of style issues in your code. Make sure to review the general guidelines in the Rubric for Programming Assignments, as well as our Style Guide.
Cleaning up¶
Before you submit your final solution, you should, remove
any
print
statements that you added for debugging purposes andall in-line comments of the form: “YOUR CODE HERE” and “REPLACE …”
Also, check your code against the style guide. Did you use good variable names? Do you have any lines that are too long?
Make sure you have included header comments, that is, the triple-quote strings that describe the purpose, inputs, and return values of each function, for every function you have written.
As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.
Submission¶
You will be submitting your work through Gradescope (linked from our Canvas site). The process will be the same as with previous coursework: Gradescope will fetch your files directly from your PA #6 repository on GitHub, so it is important that you remember to commit and push your work! You should also get into the habit of making partial submissions as you make progress on the assignment; remember that you’re allowed to make as many submissions as you want before the deadline.
To submit your work, go to the “Gradescope” section on our Canvas site. Then, click on “Programming Assignment #6”. If you completed previous assignments, Gradescope should already be connected to your GitHub account. If it isn’t, you will see a “Connect to GitHub” button. Pressing that button will take you to a GitHub page asking you to confirm that you want to authorize Gradescope to access your GitHub repositories. Just click on the green “Authorize gradescope” button.
Then, under “Repository”, make sure to select your uchicago-cmsc12100-aut-21/pa6-$GITHUB_USERNAME.git
repository.
Under “Branch”, just select “main”.
Finally, click on “Upload”. An autograder will run, and will report back a score. Please note that this autograder runs the exact same tests (and the exact same grading script) described in Testing Your Code. If there is a discrepancy between the tests when you run them on your computer, and when you submit your code to Gradescope, please let us know.
Acknowledgments: Gordon Kindlmann originally recommended drawing treemaps as good topic for an assignment.