Visualizing Avian Biodiversity Using Treemaps¶

Due: Friday, December 3rd at 4:30pm CST.

The purpose of this assignment is to give you practice working with recursive data structures and writing recursive functions.

You must work alone on this assignment.

Introduction¶

The Cornell Lab of Ornithology manages a website called eBird, which collects bird sighting data from birdwatchers around the world. eBird claims to be one of the world’s largest biodiversity-related projects, collecting over 100 million bird sightings per year. As they explain on their website, the data they collect “inform novel conservation actions, from site-specific information about the occurrence and abundance of birds, to looking at patterns of species abundance across continent-spanning flyways. eBird data contribute to hundreds of conservation decisions and peer-reviewed papers, thousands of student projects, and help inform research worldwide. Applications range from ecological and ornithological research, to conservation application, to advancing research in the fields of socioeconomics, artificial intelligence, and computer science.”

A question we might want to investigate about our Chicago community is, “Which birds are most prevelant in our area, and how does this depend on the time of year?”

We can use the eBird data 1 to address this. eBird makes their basic dataset, which contains raw data of bird observations, available to anyone under their terms of use.

Here is a bar chart showing the number of times each species of bird was seen in Cook County in October 2020:

Top 40 most commonly sighted birds in Cook County, IL, October 2020¶

From this we can see the most commonly seen 2 bird species in Cook County in October 2020 was the American Robin. But there are some shortcomings to this visualization.

For example, while this figure is fairly large, it still only shows the 40 most commonly sighted bird species in Cook County. In total, there were 230 different species sighted. We cannot determine from this figure even approximately what fraction of bird sightings were of the American Robin.

Additionally, we might notice that there are a number of species of sparrow that appear on this list, including the House Sparrow, the White-throated Sparrow, and the Song Sparrow, among others. All together, sparrows outnumber robins by quite a bit, but that is not clear from this figure.

So, we might appreciate a visualization in which related species are grouped together. The scientific study of how to group together related species is taxonomy. All living organisms are organized into a taxonomic hierarchy. For example, the class Aves of all birds is divided into a number of orders, such as Passeriformes, Anseriformes, and Piciformes. Each order is divided into a number of families, each family is divided into a number of genera (plural of genus), and each genus is divided into a number of species. This hierarchical structure is naturally represented as a tree, where each subdivision is a node in the tree.

The National Center for Biotechnology Information (NCBI) provides a Taxonomy Database.

Below we show a tree containing only the birds species that appear in our data set (further abridged). Each level of the tree represents a taxonomic rank (class, order, family, genus, or species).

Each node at a particular level represents a taxon at that level — that is, a member of that taxonomic rank.
Each internal (non-leaf) node has a key consisting of the taxonomic rank and the name of that taxon (for example, “family Passerellidae”), while each leaf has a key which is the the common name of a species. Each node has a value which is the number of times that species (or genus, family, etc.) was sighted.

class Aves: 67629
  │
  ├──order Anseriformes: 7188
  │  │
  │  └──family Anatidae: 7188
  │     │
  │     ├──genus Anas: 2337
  │     │  │
  │     │  ├──American Black Duck: 47
  │     │  │
  │     │  ├──Blue-winged Teal: 84
  │     │  │
  │     │  ├──Green-winged Teal: 158
  │     │  │
  │     │  ├──Mallard (Domestic type): 1779
  │     │  │
  │     │  ├──Northern Pintail: 101
  │     │  │
  │     │  └──Northern Shoveler: 168
  │     │
  │     ├──genus Mareca: 297
  │     │  │
  │     │  ├──American Wigeon: 83
  │     │  │
  │     │  └──Gadwall: 214
  │     │
  │     ├──genus Melanitta: 128
  │     │  │
  │     │  ├──Black Scoter: 32
  │     │  │
  │     │  ├──Surf Scoter: 81
  │     │  │
  │     │  └──White-winged Scoter: 15
  │     │
  │     ├──genus Bucephala: 102
  │     │  │
  │     │  ├──Bufflehead: 82
  │     │  │
  │     │  └──Common Goldeneye: 20
  │     │
  │     ├──genus Branta: 2329
  │     │  │
  │     │  ├──Cackling Goose: 31
  │     │  │
  │     │  └──Canada Goose: 2298
  │     │
  │     ├──genus Aythya: 522
  │     │  │
  │     │  ├──Canvasback: 32
  │     │  │
  │     │  ├──Greater Scaup: 89
  │     │  │
  │     │  ├──Lesser Scaup: 69
  │     │  │
  │     │  ├──Redhead: 207
  │     │  │
  │     │  └──Ring-necked Duck: 125
  │     │
  │     ├──genus Mergus: 238
  │     │  │
  │     │  ├──Common Merganser: 31
  │     │  │
  │     │  └──Red-breasted Merganser: 207
  │     │
  │     ├──genus Anser: 62
  │     │  │
  │     │  ├──no rank unclassified Anser: 2
  │     │  │  │
  │     │  │  └──Domestic goose sp. (Domestic type): 2
  │     │  │
  │     │  ├──Greater White-fronted Goose: 10
  │     │  │
  │     │  └──Snow Goose: 50
  │     │
  │     ├──genus Lophodytes: 174
  │     │  │
  │     │  └──Hooded Merganser: 174
  │     │
  │     ├──genus Clangula: 47
  │     │  │
  │     │  └──Long-tailed Duck: 47
  │     │
  │     ├──genus Cygnus: 120
  │     │  │
  │     │  ├──Mute Swan: 82
  │     │  │
  │     │  ├──Trumpeter Swan: 36
  │     │  │
  │     │  └──Tundra Swan: 2
  │     │
  │     ├──genus Oxyura: 172
  │     │  │
  │     │  └──Ruddy Duck: 172
  │     │
  │     └──genus Aix: 660
  │        │
  │        └──Wood Duck: 660
  │
  ...
  [17 additional orders]

While this tree representation helps us to make comparisons, it is only practical to view a portion of the tree at any given time. It would be useful see a visual representation of the data, which is the role information visualization plays in computing and in data science.

How can we visualize this information in an effective way? We can use treemaps, which are an excellent tool for visualizing hierarchical data. Here, for example, is a treemap corresponding to the tree above.

Treemap of bird sightings in Cook County, IL in October 2020¶

Each rectangle represents a single species. The area of the rectangle is proportional to the number of sightings of that species. Species in the same genus are grouped together into a larger rectangle, genera in the same family are grouped together, etc.

Looking at the data in this form, we can immediately see that, for instance, the American Robin (near the center of the treemap) is more prevalent in the data than other species, but that sparrows (the group of boxes in the upper-left corner) as a whole are three to four times more prevalent than robins.

In general, treemaps are a space-constrained method for visualizing hierarchical structures that present a sense of “mass” and proportionality in a way that the typical tree diagram does not. Treemaps allow the viewer to compare leaves and sub-trees even at varying depths in the tree, and to spot patterns and exceptions. Ben Shneiderman designed treemaps during the 1990s as a way to visualize the contents of a file system. This technique has since been used to visualize many different types of data, including stock portfolios, oil production, a gene ontology, stimulus spending, and more. The original idea has been extended in many interesting ways.

In this assignment, you will write code to generate treemaps to visualize this avian biodiversity data from samples taken throughout the year.

1: eBird. 2017. eBird: An online database of bird distribution and abundance [web application]. eBird, Cornell Lab of Ornithology, Ithaca, New York. Available: http://www.ebird.org. (Accessed: Date November 21, 2020).
2: We are relying on just the number of sightings recorded on eBird. There may be other factors to adjust for to determine the actual prevalence of each species, such as how easy a bird is to see, whether someone reports it, and how much effort a person put in to find it. eBird does also provide data on “effort”, but we will not use that for this assignment.

Getting started¶

Before you start, make sure to review the Coursework Basics page. Then, using the invitation URL provided on Ed Discussion, create a repository for your PA #6 files.

Next, make sure you’ve set the GITHUB_USERNAME variable by running the following (replacing replace_me with your GitHub username):

GITHUB_USERNAME=replace_me

(remember you can double-check whether the variable is properly set by running echo $GITHUB_USERNAME).

And finally, run these commands (if you don’t recall what some of these commands do, they are explained at the end of the Git Tutorial):

cd ~/cmsc12100
mkdir pa6-$GITHUB_USERNAME
cd pa6-$GITHUB_USERNAME
git init
git remote add origin git@github.com:uchicago-cmsc12100-aut-21/pa6-$GITHUB_USERNAME.git
git remote add upstream git@github.com:uchicago-cmsc12100-aut-21/pa6.git
git pull upstream main
git branch -M main
git push -u origin main

You will find the files you need for the programming assignment directly in the root of your repository, including a README.txt file that explains what each file is. Make sure you read that file.

Your PA #6 repository contains several Python files, but you will only be modifying one of them: treemap.py. The rest of the files are described throughout the assignment. You can also consult the README for a description of each file.

As usual, you will be able to test your code from IPython and by using py.test. When using IPython, make sure to enable autoreload before importing the Python files we will use in this assignment:

In [1]: %load_ext autoreload

In [2]: %autoreload 2

In [3]: import treemap, drawing

In [4]: import tree

The drawing module

As we’ll describe later on, we provide a drawing module for visualizing treemaps. This module is not required for completing the assignment, and will only work on the virtual desktop. If you are working on the assignment through SSH, the above import commands may produce the following messages:

Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused
Failed to connect to Mir: Failed to connect to server socket: No such file or directory
Unable to init server: Could not connect: Connection refused

(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

(ipython3:35882): Gdk-CRITICAL **: gdk_cursor_new_for_display: assertion 'GDK_IS_DISPLAY (display)' failed

You can safely ignore these messages. Because SSH is a text-only interface, these messages are simply warning you that the drawing module will not work.

Like previous assignments, we also include a series of automated tests. These tests require an additional set of files, which you can download by running the following from the Linux command-line inside your PA #6 repository:

$ ./get_files.sh

Representing trees¶

We provide a class Tree in the file tree.py that you can use to represent tree data. This is essentially the same Tree class we saw in class, Team Tutorial #6, and Short Exercises #6 (although it is slightly different from the Tree class in the textbook). The class is useful for representing the avian biodiversity data, but it is not specific to the biodiversity data.

The public interface for this class includes:

Constructor:

t = Tree(key, value) creates a new tree with the given key and value and no children.

Attributes:

t.key and t.value hold data associated with the root node of the tree t. Initially, a node has a key and value, but we will add additional attributes in the course of the assignment that hold additional information about each node.
t.children is a list of the child subtrees of t.

Methods:

t.add_child(t2) adds an existing tree t2 as a child subtree of t.
t.num_children() returns the number of child subtrees of t.
t.print() prints the tree t, and is useful for debugging purposes.

We can use this class to represent our avian biodiversity data as described above: the key attribute holds the name of the class, order, etc. that node represents and the value attribute to hold the number of sightings of that taxon.

In Task 2, you will add an additional attribute, path which will hold a tuple listing the keys on the path from the root to that node. We use this to determine which colors to use in the treemap. We elaborate on this requirement in Task 2.

If you are wondering why we are using generic names — key and value — rather than taxon or number of sightings, it is because this approach allows your treemap implementation to generalize to situations beyond avian biodiversity data.

eBird and NCBI data¶

Using the data from eBird and NCBI, we have constructed trees of bird sighting data like the one in the introduction. We have stored these trees in a JSON file, birds.json, in the data/ directory.

We have provided a function, load_trees for loading them into Tree instances. You can use it as follows.

In [5]: data = treemap.load_trees('data/birds.json')

This defines data to be a dictionary mapping strings to trees. For instance birds_by_month['Oct'] is the tree similar to the one in the introduction. However, this tree lacks values for the internal (non-leaf) nodes, which you will add in Task 2.

You can print this tree out in its entirety by running

In [6]: data['Oct'].print()

Here are the valid keys for this dictionary, and the corresponding trees:

Valid keys for the dictionary loaded from `birds.json`¶
Key	Tree
`'Nov'`	Birds sighted in Cook County, IL in November 2019
`'Dec'`	Birds sighted in Cook County, IL in December 2019
`'Jan'`	Birds sighted in Cook County, IL in January 2020
`'Feb'`	Birds sighted in Cook County, IL in February 2020
`'Mar'`	Birds sighted in Cook County, IL in March 2020
`'Apr'`	Birds sighted in Cook County, IL in April 2020
`'May'`	Birds sighted in Cook County, IL in May 2020
`'Jun'`	Birds sighted in Cook County, IL in June 2020
`'Jul'`	Birds sighted in Cook County, IL in July 2020
`'Aug'`	Birds sighted in Cook County, IL in August 2020
`'Sep'`	Birds sighted in Cook County, IL in September 2020
`'Oct'`	Birds sighted in Cook County, IL in October 2020
`'Year'`	Birds sighted in Cook County, IL from November 2019 to October 2020
`'s1'`	Small sample tree useful for testing purposes
`'s2'`	Small sample tree useful for testing purposes
`'s3'`	Small sample tree useful for testing purposes

Notice that in addition to the trees representing actual bird sighting data, we include three sample trees (with keys 's1', 's2', and 's3') that are smaller, simpler trees that are useful for testing, and which we will be using for demonstration throughout the assignment.

In addition to birds.json, we have provided another JSON file, sparrows.json, also in the data/ directory, which will be useful for testing Task 1. You can load this file using the same load_trees function:

In [7]: data_flat = treemap.load_trees('data/sparrows.json')

This defines data_flat to be a dictionary mapping strings to trees. The valid keys for sparrows.json are the same as the valid keys for birds.json. You can try printing one of these trees:

In [8]: data_flat['Oct'].print()

: 9142
  │
  ├──American Tree Sparrow: 117
  │
  ├──Chipping Sparrow: 185
  │
  ├──Clay-colored Sparrow: 45
  │
  ├──Field Sparrow: 213
  │
  ├──Dark-eyed Junco: 1793
  │
  ├──Eastern Towhee: 264
  │
  ├──Fox Sparrow: 630
  │
  ├──Harris's Sparrow: 53
  │
  ├──White-crowned Sparrow: 1100
  │
  ├──White-throated Sparrow: 1868
  │
  ├──Lark Bunting: 1
  │
  ├──LeConte's Sparrow: 37
  │
  ├──Nelson's Sparrow: 121
  │
  ├──Lincoln's Sparrow: 389
  │
  ├──Song Sparrow: 1204
  │
  ├──Swamp Sparrow: 820
  │
  ├──Savannah Sparrow: 288
  │
  └──Vesper Sparrow: 14

Each tree in sparrows.json has a very simple structure: It consists of a root node with a number of children, all of which are leaves (have no children). We again include three small sample trees ('s1', 's2', 's3') which we will use in our demonstrations and which you can use for testing. Each of these three sample trees is derived from the corresponding tree in birds.json by discarding all nodes other than the root and its children (except that unlike in birds.json, all of the nodes in sparrows.json have their values pre-set, and in particular the value of the root node is the sum of the values of its children).

For the trees in sparrows.json other than the sample trees, the leaves of the tree represent species from the family Passerellidae, which consists largely of sparrows. The leaf’s key is the common name of the species, and its value is the number of sightings during the specified month/year. The root node has its key set to the empty string, and its value is the sum of the values of its children (again, all the nodes have their values pre-set).

We will use the trees in sparrows.json in Task 1.

Drawing rectangles¶

Recall that a treemap looks like this:

Notice how all the individual shapes in the diagram are all rectangles. In fact, drawing a treemap mostly involves figuring out the exact rectangles we need to draw on a “canvas”. Our canvas is a coordinate system from (0.0, 0.0) (the top-left corner) to (1.0, 1.0) (the bottom-right corner). This is often referred to as the bounding rectangle, because it describes the bounds of where we can “draw” things (in this case, rectangles)

An individual rectangle is specified by providing its origin point (the position of the top-left corner of the rectangle) and its size (its width along the x axis and its height along the y axis).

We provide a simple Rectangle class in treemap.py and a draw_rectangles function in drawing.py that will draw a list of rectangles. For example, the following code draws a rectangle with origin (0.25, 0.5), width 0.75, height 0.25, and label “Example Rectangle”. (The fourth argument to the Rectangle constructor needs to be a tuple of strings and is used for determining the color of the rectangle. You can ignore it for now.)

In [9]: r = treemap.Rectangle((0.25, 0.5), (0.75, 0.25), "Example Rectangle", ("ex",))

In [10]: drawing.draw_rectangles([r])

Reminder: The drawing module

As noted earlier, the drawing module will only work through the virtual desktop. This does not mean you have to do your work over the virtual desktop! We provide this module as a way to optionally visualize the treemaps, but none of the tasks you have to do rely on having access to the drawing module or to the virtual desktop.

This will open a window that displays something like this:

Note: you must close this window before you can continue entering code in IPython.

The following code draws two rectangles:

In [11]: r1 = treemap.Rectangle((0.20, 0.20), (0.20, 0.60), "Example Rectangle 1", ("ex1",))

In [12]: r2 = treemap.Rectangle((0.60, 0.20), (0.20, 0.60), "Example Rectangle 2", ("ex2",))

In [13]: drawing.draw_rectangles([r1, r2])

And will show something like this:

So, we need to figure out exactly what rectangles to generate (and with what positions and sizes) for a given tree.

Drawing one-level treemaps¶

Before we turn our attention to hierarchical data, let us consider how we would layout data that comes to us without a hierarchy. We could think of this data as just being a list of (key, value), pairs, but for compatibility with the later tasks, we will think of it as a tree with one root node, a list of child nodes, each of which is a leaf (so, there is only one level to this tree below the root). Here again is such a tree:

In [8]: data_flat['Oct'].print()

: 9142
  │
  ├──American Tree Sparrow: 117
  │
  ├──Chipping Sparrow: 185
  │
  ├──Clay-colored Sparrow: 45
  │
  ├──Field Sparrow: 213
  │
  ├──Dark-eyed Junco: 1793
  │
  ├──Eastern Towhee: 264
  │
  ├──Fox Sparrow: 630
  │
  ├──Harris's Sparrow: 53
  │
  ├──White-crowned Sparrow: 1100
  │
  ├──White-throated Sparrow: 1868
  │
  ├──Lark Bunting: 1
  │
  ├──LeConte's Sparrow: 37
  │
  ├──Nelson's Sparrow: 121
  │
  ├──Lincoln's Sparrow: 389
  │
  ├──Song Sparrow: 1204
  │
  ├──Swamp Sparrow: 820
  │
  ├──Savannah Sparrow: 288
  │
  └──Vesper Sparrow: 14

Recall we want one rectangle per leaf node, and we want the area of that rectangle to be proportional to the value of the node (the number of birds seen of that species). One way to accomplish this would be to make a single stack of rectangles:

From this image, you can get some idea of the relative sizes of the rectangles, but the rectangles are very wide and short. Many are too short to fit a label. And, it can be hard to visually gauge the relative sizes of rectangles that are so elongated.

It would be preferable to have rectangles that are closer to squares, like this:

A type of treemap called a squarified treemap aims to use rectangles that approximate squares. It was introduced by Mark Bruls, Kees Huizing, and Jarke J. van Wijk in 2000. They developed an algorithm for laying out the rectangles, described below. You will implement this algorithm.

The input to the algorithm is a bounding rectangle for the treemap, and a tree. For now, we will assume the tree is of the kind described above, where all children of the root node are leaves. Each leaf represents on data point.

Example

We’ll use the following tree as a running example:

: 20
  │
  ├──Three: 3
  │
  ├──Two: 2
  │
  ├──Six (a): 6
  │
  ├──Six (b): 6
  │
  ├──One (a): 1
  │
  ├──One (b): 1
  │
  └──One (c): 1

We’ll use a bounding box with width 1 and height 0.8.

Step 1: Take the list of data points (leaves), and sort the list in descending order of value.

Example

In our example, this gives us the list

Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1

Step 2: We are about to layout a single row of rectangles, either along the left edge of our bounding box, or along the top edge. If the bounding box is wider than it is tall, or if the width and height are equal, choose the left edge. Otherwise, choose the top edge.

Example

In our example, the bounding box has width 1 and height 0.8, so it is wider than it is tall, and we’ll use the left edge.

Step 3: Determine how many rectangles to include in the row. Our goal is choose the number of rectangles such that each rectangle is not too distorted. We do this by first trying to fit a single rectangle in the row, and calculating its distortion. Then we try fitting two rectangles, then three, and so on as long as the distortion keeps improving (that is, the rectangles become closer to square). We stop fitting additional rectangles in the row once the distortion starts getting worse instead of better.

To be more precise: We will make rectangles corresponding to the first $k$ data points in the list, for some value of $k$. We will try $k = 1, 2, 3, \dots$ in order.

For each value of $k$:

Take the first $k$ ordered data points, and sum their values. Then, divide by the sum of all the values in the entire list. This fraction is the fraction of the area of the bounding box that should be taken up by this row of rectangles.
For each of the $k$ data points, divide its value by the sum of the $k$ values in the row. This is the fraction of the area of the row that should be taken up by the rectangle for this data point. The rectangles in the row are laid out from top to bottom (if the row is against the left edge of the bounding box) or from left to right (if the row is against the top edge).
The calculations above allow us to determine the height and width of each rectangle in the row. Compute the aspect ratio of each rectangle by dividing its longer side by its shorter side. The aspect ratio is a number greater than or equal to 1, where 1 means that the rectangles is a square, while a very high aspect ratio means the rectangle is very distorted.
Define the distortion of this row to be the maximum aspect ratio of any rectangle in the row.
If $k = 1$, or if the distortion for this row is less than or equal to the distortion for the previous $k$, then move on to the next $k$. If there is no next value of $k$ because this row already uses all the data points in the list, then stop the process, and keep this value of $k$ (which gives the lowest distortion).

On the other hand, if the distortion is higher than it was for the previous $k$, we will not move on to the next $k$ nor keep the current $k$. Instead, stop the process and choose the previous value of $k$ (the one that gave the lowest distortion).

Example

Let’s try this on our example. We start with $k = 1$.

We take the first data point in our list, Six (a): 6. This row has a total value of 6, while the whole data set has a total value of 20. So, this row will take up a 6/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.3.
We have one rectangle taking up the whole row. So, the height of that rectangle is the full height of the row (which is the full height of the bounding box), 0.8.
The aspect ratio of this rectangle is $0.8/0.3 = 8/3 \approx 2.667$.

Here is a table with information about this rectangle, including its label, the (x, y) coordinates of its origin, its width and height, and its aspect ratio:

label

x

y

width

height

aspect ratio

Six (a)

0.000

0.000

0.300

0.800

2.667

And here is a visualiztion of this rectangle:
There is only one rectangle in the row, so the distortion is its aspect ratio: $8/3 \approx 2.667$.

label	x	y	width	height	aspect ratio
`Six (a)`	0.000	0.000	0.300	0.800	2.667

We then move on to $k = 2$.

We take the first 2 data points in our list, Six (a): 6 and Six (b): 6. This row has a total value of $6 + 6 = 12$, while the whole data set has a total value of 20. So, this row will take up a 12/20 fraction of the area of the bounding box. Since the bounding box has width 1, this row will have width 0.6.
This row has two rectangles, each with value 6. The total value of of the row is 12, so each rectangle takes up a 6/12 fraction of the height of the row. Since the row has height 0.8, each rectangle has height 0.4.
The aspect ratio of each of these rectangles is $0.6/0.4 = 3/2 = 1.5$.

label

x

y

width

height

aspect ratio

Six (a)

0.000

0.000

0.600

0.400

1.500

Six (b)

0.000

0.400

0.600

0.400

1.500
The distortion of this row is the maximum of these two aspect ratios (which are both the same), $1.5$.
Since the distortion for $k = 2$, which is $1.5$, is better (lower) than the distortion for $k = 1$, which was approximately $2.667$, we continue to the next $k$.

label	x	y	width	height	aspect ratio
`Six (a)`	0.000	0.000	0.600	0.400	1.500
`Six (b)`	0.000	0.400	0.600	0.400	1.500

We next try $k = 3$.

We take the first 3 data points in our list, Six (a): 6, Six(b): 6, and Three: 3. This row has a total value of $6 + 6 + 3 = 15$, while the whole data set has a total value of 20. So, this row will take up a 15/20 fraction of the area of the bounding box; it will have a width of 0.75.
This row has three rectangles, with values 6, 6, and 3. Since the total value of the row is 15, the rectangles take up a 6/15, 6/15, and 3/15 fraction of the height, respectively. Since the whole row has height 0.8, the rectangles have heights 0.32, 0.32, and 0.16.
Since the rectangles have sizes 0.75 by 0.32, 0.75 by 0.32, and 0.75 by 0.16, the aspect ratios are $0.75/0.32 = 2.34375$, $0.75/0.32 = 2.34375$, and $0.75/0.16 = 4.6875$.

label

x

y

width

height

aspect ratio

Six (a)

0.000

0.000

0.750

0.320

2.344

Six (b)

0.000

0.320

0.750

0.320

2.344

Three

0.000

0.640

0.750

0.160

4.687
The distortion of the row is the maximum of these three aspect ratios, which is $4.6875$.
Since the distortion for $k = 3$, which is $4.6875$, is worse (higher) than the distortion for $k = 2$, which was $1.5$, we stop progressing through values of $k$, and instead choose the previous value of $k$ — that is, $k = 2$ — which gave us the best (lowest) distortion.

label	y	width	height	aspect ratio
`Six (a)`	0.000	0.750	0.320	2.344
`Six (b)`	0.320	0.750	0.320	2.344
`Three`	0.640	0.750	0.160	4.687

Step 4: At this point, we have filled in one row of rectangles. We have a list of remaining data points that have not been made into rectangles. There is a rectangle of leftover space within our bounding box which has not yet been filled. The problem of filling that rectangle of leftover space with the list of remaining data points is equivalent to our original problem but on a simpler input. So, we can recursively fill that leftover space using the remaining data points. Those data points have already been sorted in descending order of value, so we can start with Step 2.

When there is no space remaining (equivalently, when there are no data points with nonzero values), we have a complete treemap and we are done!

Example

In our example, we have used our first two data points, so our remaining data points are

Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1

And, our remaining space is a rectangle with origin (0.6, 0.0), width 0.4, and height 0.8. This rectangle becomes our bounding box in the recursive call.

Since this bounding box is taller than it is wide, our new row will be placed along the top edge.

We start by laying out $k = 1$ rectangles in our new row.

The first data point in our list is Three: 3. This row has a total value of 3, while the whole data set has a total value of $3 + 2 + 1 + 1 + 1 = 8$ (because this recursive call uses the new, shorter list of data points). So, this row will take up a 3/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.3.
We have one rectangle taking up the whole row. So, the width of that rectangle is the full width of the row (which is the full width of the bounding box), 0.4.
The aspect ratio of this rectangle is $0.4/0.3 = 4/3 \approx 1.333$.

label

x

y

width

height

aspect ratio

Three

0.600

0.000

0.400

0.300

1.333

Here’s a visualization of this treemap (new row in blue):
There is only one rectangle in the row, so the distortion is its aspect ratio: $4/3 \approx 1.333$.

label	x	y	width	height	aspect ratio
`Three`	0.600	0.000	0.400	0.300	1.333

We then try $k = 2$.

We take the first 2 data points in our list, Three: 3 and Two: 2. This row has a total value of $3 + 2 = 5$, while the whole data set has a total value of 8. So, this row will take up a 5/8 fraction of the area of the bounding box. Since the bounding box has height 0.8, this row will have height 0.5.
This row has two rectangles, with values 3 and 2. The total value of of the row is 5, so the rectangles take a 3/5 and 2/5 fraction of the width of the row, respectively. Since the row has width 0.4, the rectangles have widths 0.24 and 0.16.
The aspect ratios these rectangles are $0.5/0.24 = 25/12 \approx 2.083$, and $0.5/0.16 = 25/8 = 3.125$.

label

x

y

width

height

aspect ratio

Three

0.600

0.000

0.240

0.500

2.083

Two

0.840

0.000

0.160

0.500

3.125
The distortion of this row is the maximum of these two aspect ratios $3.125$.
Since the distortion for $k = 2$, which is $3.125$, is worse (higher) than the distortion for $k = 1$, which was approximately $1.333$, we do not continue to the next value of $k$, but instead stop this process and revert back to the previous $k$, which was $k = 1$, and which featured the lowest distortion.

label	x	y	width	height	aspect ratio
`Three`	0.600	0.000	0.240	0.500	2.083
`Two`	0.840	0.000	0.160	0.500	3.125

At this point, our list of remaining data points for which we have not yet created rectangles is

Two: 2
One (a): 1
One (b): 1
One (c): 1

and our rectangle of unused space has origin (0.6, 0.3), width 0.4, and height 0.5. This list and this leftover rectangle become the the inputs to the next recursive call.

Try continuing this process to work out the sizes and locations of the remaining rectangles. The final treemap will consists of these rectangles:

label	x	y	width	height
`Six (a)`	0.000	0.000	0.600	0.400
`Six (b)`	0.000	0.400	0.600	0.400
`Three`	0.600	0.000	0.400	0.300
`Two`	0.600	0.300	0.400	0.200
`One (a)`	0.600	0.500	0.267	0.150
`One (b)`	0.600	0.650	0.267	0.150
`One (c)`	0.867	0.500	0.133	0.300

Visualized, it will look like this:

Task 1: One-level treemaps¶

Your first task is to fill in this function in treemap.py:

def compute_rectangles(t, bounding_rec_width=1.0, bounding_rec_height=1.0):
    '''
    Computes the rectangles for drawing a treemap of the provided tree.

    Inputs:
        t: (Tree) a tree
        bounding_rec_width, bounding_rec_height: (float) the width and height
            of the bounding rectangle.

    Returns: a list of Rectangle objects.
    '''

This function takes a tree and, optionally, the height and width of the initial bounding rectangle, and returns a list of Rectangle objects. Each Rectangle instance should have its label attribute set to equal the key of the corresponding leaf.

For this task, you should assume the tree has the shape described in the section on drawing one-level treemaps, above. That is, you should assume the tree consists of a root with a number of children, all of which are leaves. Each leaf has a string key and a positive integer value. The root has the empty string as its key, and its value is the sum of the values of the children.

You will modify this function in Task 3 to work with a tree of any shape.

We provide two helper functions which you can (and should) use in your solution. The first, sorted_trees, sorts a list of trees in descending order of value:

def sorted_trees(tree_list):
    '''
    Sort a list of Tree instances by the value of their roots in descending
    order. Ties are broken by the key of the root, in (forward) alphabetical
    order. Returns a new sorted list without modifying the input list.

    Inputs:
        tree_list: list of Tree instances, each with an integer value.

    Returns: list of Tree instances, sorted.
    '''

The second, compute_row, does much of the heavy lifting of determining whether a row should go against the left or top edge, and laying out rectangles in that row:

def compute_row(bounding_rec, row_data, total_sum):
    '''
    Lay out the given data points as rectangles in one row of a
    treemap. The row will be against the left or top edge of the
    bounding rectangle, depending on whether the rectangle is at least
    as wide as it is tall.

    Inputs:
        bounding_rec: (Rectangle) the bounding rectangle
        row_data: (list of Tree) the list of data points that should be
            included in this row.
        total_sum: (integer) the total value of all the rectangles that
            will eventually fill the bounding rectangle (not just those
            in this row).

    Returns: a tuple (row_layout, leftover_rec), where
        row_layout: list of pairs (rec, t), where rec is a Rectangle, and
            t is the Tree that Rectangle corresponds to, and
        leftover_rec: a Rectangle representing the leftover space in the
            bounding rectangle that was not used by this row.
    '''

This function, compute_row, will lay out a rectangle for each data point it is given — no more and no less. It essentially handles Steps 2, 3a, and 3b of the algorithm. After applying this function, you will still need to compute the aspect ratio of each rectangle, compute the distortion, and ultimately determine the appropriate $k$, the number of rectangles in the row. Additionally, this function also does not set the label attribute of the rectangles it returns; you will need to do that yourself.

You can use a loop to iterate through values of $k$ in Step 3 (you should not use recursion for that part of the algorithm).

In Step 4 of the algorithm, after placing one row, we have a rectangle of leftover space and a list of remaining data points which we will use to fill that space. Since this is a simpler version of our original problem (we again need to fill a bounding rectangle based on a list of data points, but now we have a smaller bounding rectangle and fewer data points), this is a natural place to use recursion. Your function will need just a single recursive call. Figuring out the recursion can be challenging, but it gets easier with practice, and this is a good opportunity to get some practice. Make sure to think through the design of the recursion following the steps we discussed in class. Hint: compute_rectangles itself should not be recursive. Instead, write a recursive helper function that you will call from compute_rectangles.

When you have a function that only makes a single recusive call, there’s often a reasonable way to write that function using iteration instead of recursion. For example, recall the factorial function: it can be written recursively with a single recursive call (the computation of factorial(5) relies only on a call to factorial(4), for instance), but we also saw a version of the factorial function that used iteration instead. Both versions of the factorial function were straightforward and were about the same length. Like the factorial function, the compute_rectangles function that you are writing in this task requires only a single recursive call. Also like the factorial function, the compute_rectangles function can be written using iteration instead of recursion. We encourage you to use recursion for this task as this is a naturally recursive problem, and it’s good practice. But, an iterative solution will also be accepted for this task (though note that recursive solutions will be required for Tasks 2 and 3).

Note

In our example, we made tables showing the size and positions of each rectangle. While making tables like the these is not part of this assignment, we encourage you to write strategic print statements to display analogous data while you are initially writing and debugging your program. This will help you to isolate errors that are due to your generation of the rectangles. We highly recommend you print out such information, and we will ask you to show us these sorts of print-outs when helping you to debug your code.

Testing Task 1¶

We suggest you start by testing this function informally from IPython by printing out the rectangles produced by the function. You should use the trees provided in sparrows.json, which have the required structure. Here are some sample calls using this dataset. Note that the string represenation of a Rectangle object consists of the word “RECTANGLE” followed by the x- and y-coordinates of the rectangle’s origin, then the rectangle’s width, height, and label.

In [14]: data_flat = treemap.load_trees('data/sparrows.json')

In [15]: recs = treemap.compute_rectangles(data_flat['s1'])

In [16]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.8000 0.5000 B40
RECTANGLE 0.0000 0.5000 0.8000 0.5000 C40
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20

In [17]: recs = treemap.compute_rectangles(data_flat['s2'])

In [18]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.6000 0.5000 Six (a)
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.4000 0.3750 Three
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)

In [19]: recs = treemap.compute_rectangles(data_flat['s3'])

In [20]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.5000 0.5000 6
RECTANGLE 0.0000 0.5000 0.5000 0.5000 6
RECTANGLE 0.5000 0.0000 0.5000 0.3333 4
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 1

Note that data_flat['s2'] is the tree that we used in the explanation of the algorithm.

Once you are confident that you’re producing the correct rectangles, you can start visualizing them using the draw_rectangles function:

In [21]: drawing.draw_rectangles(recs)

Note that sometimes, a rectangles might be too small to fit a label. When that happens, you may see a message about it in the terminal.

To run the automated tests for this task only, you can run the following:

$ py.test -x -v -k rectangles_flat

Task 2: Values and paths¶

We now turn our attention to trees with multiple levels and more complex structure, like the ones in birds.json.

The trees in birds.json only include a value for the value attribute in the leaf nodes (in all other nodes, the value is set to None). Furthermore, all nodes are lacking the path attribute mentioned earlier, which will be used for assigning colors to our treemap. In Task 2, you will complete the following two recursive functions in treemap.py to respectively compute the value for all internal nodes and to set the path for all nodes.

Your solution to each must be recursive. Non-recursive solutions (i.e., functions that do not call themselves with an input that is in some way smaller) will not receive credit for this task. Furthermore, each function should be generalizable, working for a tree of any height. As a guideline, in our implementation each of these functions requires fewer than ten lines of code.

def compute_internal_values(t):
    '''
    Assign a value to the internal nodes. The value of the leaves should
    already be set. The value of an internal node is the sum of the value
    of its children.

    Inputs:
        t (Tree): a tree

    Returns: The input tree t should be modified so that every internal node's
        value is set to be the sum of the values of its children. The return
        value is the value of the root of t (that is, t.value).
    '''

def compute_paths(t, prefix=()):
    '''
    Assign the path attribute of all nodes. The path attribute of a node
    should be a tuple of strings made up of the keys of the nodes along a
    full path through the tree from the root down to, but not including,
    that node. For example, following the path
        "class Aves" --> "order Passeriformes" --> "family Passerellidae"
    from the root to a node should result in the path atttribute
        ("class Aves", "order Passeriformes")
    being assigned to the node with key "family Passerellidae".
    For the root node, the path attribute should be an empty tuple.

    Inputs:
        t (Tree): a tree
        prefix (tuple of strings): Prefix to add to path

    Returns: Nothing. The input tree t should be modified to contain a path
        attribute for all nodes.
    '''

Testing Task 2¶

You can informally test these functions from IPython by printing out the tree after calling one of these functions, and checking whether whether the value or path are set correctly. You can use the trees provided in birds.json We suggest you start with the smaller, sample trees, which are easier to check manually.

For example, you can check compute_internal_values as follows:

In [22]: data = treemap.load_trees("data/birds.json")

In [23]: treemap.compute_internal_values(data['s1'])
Out[23]: 100

In [24]: data['s1'].print()

A100: 100
  │
  ├──B40: 40
  │  │
  │  ├──E10: 10
  │  │
  │  ├──F10: 10
  │  │  │
  │  │  ├──J6: 6
  │  │  │  │
  │  │  │  ├──L1: 1
  │  │  │  │
  │  │  │  ├──M3: 3
  │  │  │  │
  │  │  │  └──N2: 2
  │  │  │
  │  │  └──K4: 4
  │  │
  │  └──G20: 20
  │
  ├──C40: 40
  │  │
  │  ├──H30: 30
  │  │
  │  └──I10: 10
  │
  └──D20: 20

Caution

Note that compute_internal_values modifies the tree in place (as does compute_paths). If you want a fresh copy of the tree for further testing, you need to re-load the data.

You can check compute_paths as follows (note how the print method has an optional argument to print the paths)

In [25]: data = treemap.load_trees("data/birds.json")

In [26]: treemap.compute_paths(data['s1'])

In [27]: data['s1'].print(paths=True)

A100: ()
  │
  ├──B40: ('A100',)
  │  │
  │  ├──E10: ('A100', 'B40')
  │  │
  │  ├──F10: ('A100', 'B40')
  │  │  │
  │  │  ├──J6: ('A100', 'B40', 'F10')
  │  │  │  │
  │  │  │  ├──L1: ('A100', 'B40', 'F10', 'J6')
  │  │  │  │
  │  │  │  ├──M3: ('A100', 'B40', 'F10', 'J6')
  │  │  │  │
  │  │  │  └──N2: ('A100', 'B40', 'F10', 'J6')
  │  │  │
  │  │  └──K4: ('A100', 'B40', 'F10')
  │  │
  │  └──G20: ('A100', 'B40')
  │
  ├──C40: ('A100',)
  │  │
  │  ├──H30: ('A100', 'C40')
  │  │
  │  └──I10: ('A100', 'C40')
  │
  └──D20: ('A100',)

To run the automated tests for this task only, you can run the following:

$ py.test -x -v -k values

$ py.test -x -v -k paths

Drawing multiple-level treemaps¶

We now want to draw a treemap for a tree that may have multiple levels, like the trees in our original data set. In a finished treemap, there should be one rectangle per leaf of the tree (e.g. one rectangle per bird species), and each rectangle should have area proportional to the corresponding leaf’s value (e.g. proportional to the number of sightings of that bird).

One approach would be to put all of these leaves into one big list, and use the one-level treemap algorithm described earlier. But, the resulting visualization would abandon the structure of the tree. We would prefer that the rectangles of closely-related leaves appear close together in the treemap.

Here is an algorithm to accomplish this.

Example

We will use this tree as a running example:

Root: None
  │
  ├──Three: None
  │  │
  │  ├──Three ~ Two: 2
  │  │
  │  └──Three ~ One: 1
  │
  ├──Two: 2
  │
  ├──Six (a): None
  │  │
  │  ├──Six (a) ~ Four: None
  │  │  │
  │  │  ├──Six (a) ~ Four ~ Three: 3
  │  │  │
  │  │  └──Six (a) ~ Four ~ One: 1
  │  │
  │  └──Six (a) ~ Two: 2
  │
  ├──Six (b): 6
  │
  ├──One (a): 1
  │
  ├──One (b): 1
  │
  └──One (c): 1

Step 1: Compute the values of the internal nodes (the values of the leaves should already be set). Each internal node should have its value set equal to the sum of the values of its children.

We will additionally set the path attribute of each node at this time.

Example

After computing the internal values, our example tree is as follows.

Root: 20
  │
  ├──Three: 3
  │  │
  │  ├──Three ~ Two: 2
  │  │
  │  └──Three ~ One: 1
  │
  ├──Two: 2
  │
  ├──Six (a): 6
  │  │
  │  ├──Six (a) ~ Four: 4
  │  │  │
  │  │  ├──Six (a) ~ Four ~ Three: 3
  │  │  │
  │  │  └──Six (a) ~ Four ~ One: 1
  │  │
  │  └──Six (a) ~ Two: 2
  │
  ├──Six (b): 6
  │
  ├──One (a): 1
  │
  ├──One (b): 1
  │
  └──One (c): 1

Step 2: Focus on the root and its children. Lay out rectangles for the children according to the one-level algorithm.

Example

Ignoring all nodes other than the root and its children, our example tree looks like the following.

Root: 20
  │
  ├──Three: 3
  │
  ├──Two: 2
  │
  ├──Six (a): 6
  │
  ├──Six (b): 6
  │
  ├──One (a): 1
  │
  ├──One (b): 1
  │
  └──One (c): 1

The child nodes are the same as in our example for the one-level algorithm. Recall that the rectangles generated were as follows.

label	x	y	width	height
`Six (a)`	0.000	0.000	0.600	0.400
`Six (b)`	0.000	0.400	0.600	0.400
`Three`	0.600	0.000	0.400	0.300
`Two`	0.600	0.300	0.400	0.200
`One (a)`	0.600	0.500	0.267	0.150
`One (b)`	0.600	0.650	0.267	0.150
`One (c)`	0.867	0.500	0.133	0.300

And, this is visualized as the following.

Note that each rectangle created in this step corresponds not necessarily to a leaf, but to a subtree.

Step 3 Deal with each rectangle created in Step 2. What we do depends on whether or not the subtree associated with the rectangle is a leaf.

If the subtree is a leaf, then we are done with this rectangle: this is one of the rectangles we will draw.
If the subtree is not a leaf, then we further subdivide it. We lay out the sub-rectangles within this rectangle recursively: this rectangle serves as the bounding rectangle, and we apply the (multiple-level) treemap algorithm to the corresponding subtree.

Example

In our example, the children of the root are the following:

Six (a): 6
Six (b): 6
Three: 3
Two: 2
One (a): 1
One (b): 1
One (c): 1

Of these, Six (b): 6, Two: 2, One (a): 1, One (b): 1, and One (c): 1 are leaves. So, the corresponding rectangles can be added to the list of rectangles to draw:

label	x	y	width	height
`Six (b)`	0.000	0.400	0.600	0.400
`Two`	0.600	0.300	0.400	0.200
`One (a)`	0.600	0.500	0.267	0.150
`One (b)`	0.600	0.650	0.267	0.150
`One (c)`	0.867	0.500	0.133	0.300

Here they are visualized:

The other children are not leaves and need to be dealt with recursively. The first of these children is the following subtree:

Six (a): 6
  │
  ├──Six (a) ~ Four: 4
  │  │
  │  ├──Six (a) ~ Four ~ Three: 3
  │  │
  │  └──Six (a) ~ Four ~ One: 1
  │
  └──Six (a) ~ Two: 2

It corresponds to the rectangle with origin (0., 0.), width 0.6 and height 0.4. Instead of drawing this rectangle, we fill it with sub-rectangles corresponding to the leaves of the subtree. To do that, we recursively use the (multiple-level) treemap algorithm on this subtree, using this rectangle as the bounding rectangle. This results in the following rectangles getting added to the list of rectangles to draw:

label	x	y	width	height
`Six (a) ~ Four ~ Three`	0.000	0.000	0.400	0.300
`Six (a) ~ Four ~ One`	0.000	0.300	0.400	0.100
`Six (a) ~ Two`	0.400	0.000	0.200	0.400

Visualizing all the rectangles we have listed for drawing so far:

There is one other child we haven’t dealt with, the following subtree.

Three: 3
  │
  ├──Three ~ Two: 2
  │
  └──Three ~ One: 1

It corresponds to the rectangle with origin (0.6, 0.), width 0.4, and height 0.3. Again, we do not draw this rectangle, but instead recursively fill it with sub-rectangles corresponding to the leaves of the subtree. These are the resulting rectangles that get added to the list of rectangles to draw:

label	x	y	width	height
`Three ~ Two`	0.600	0.000	0.267	0.300
`Three ~ One`	0.867	0.000	0.133	0.300

The full list of rectangles to draw is as follows.

label	x	y	width	height
`Six (a) ~ Four ~ Three`	0.000	0.000	0.400	0.300
`Six (a) ~ Four ~ One`	0.000	0.300	0.400	0.100
`Six (a) ~ Two`	0.400	0.000	0.200	0.400
`Six (b)`	0.000	0.400	0.600	0.400
`Three ~ Two`	0.600	0.000	0.267	0.300
`Three ~ One`	0.867	0.000	0.133	0.300
`Two`	0.600	0.300	0.400	0.200
`One (a)`	0.600	0.500	0.267	0.150
`One (b)`	0.600	0.650	0.267	0.150
`One (c)`	0.867	0.500	0.133	0.300

Here is a visualization of the full list of rectangles.

Note: The order in which you compute the rectangles to draw may be different from the order in which they are described here; that is okay.

Task 3: Multiple-level treemaps¶

Your final task is to modify the compute_rectangles function from Task 1 to work with a tree of any shape.

First, add the following two lines of code in compute_rectangles immediately after the docstring.

compute_internal_values(t)
compute_paths(t)

Since we are including code to call compute_internal_values and compute_paths within compute_rectangles, you do not need to call them before calling compute_rectangles.

In Task 1, you set the label attribute of each rectangle to be the key of the corresponding node. In this task, you must also set the color_code attribute of each rectangle to equal the path attribute of the corresponding node. This attribute is used by our drawing function to determine the color of each rectangle. Rectangles with the same color code are drawn in the same color. Rectangles that have nearby color codes (color codes that start the same way) are given similar colors. Since we are setting the color code of a rectangle to be the path attribute of the corresponding node, this means that nodes that are closely related in the tree will give rise to rectangles that are close in color.

Update your code to implement the multiple-level treemap algorithm from the previous section.

You must use recursion to accomplish this task. Also, you may not make any assumptions about the number of levels in the tree.

The implementation of this task is not too complex (our solution adds less than 10 lines to our code from Task 1), but again figuring out the recursion can be challenging. Make sure to think through the design of the recursion.

Testing Task 3¶

We again suggest you start by testing this function informally from IPython by printing out the rectangles produced by the function. You can use the trees provided in birds.json, which have multiple levels. Your updated function should also still work on the trees in sparrows.json. Here are some sample calls using the birds.json dataset.

In [14]: data = treemap.load_trees('data/birds.json')

In [15]: recs = treemap.compute_rectangles(data['s1'])

In [16]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.4000 0.5000 G20
RECTANGLE 0.4000 0.0000 0.4000 0.2500 E10
RECTANGLE 0.4000 0.2500 0.2400 0.1250 M3
RECTANGLE 0.4000 0.3750 0.1600 0.1250 N2
RECTANGLE 0.5600 0.3750 0.0800 0.1250 L1
RECTANGLE 0.6400 0.2500 0.1600 0.2500 K4
RECTANGLE 0.0000 0.5000 0.6000 0.5000 H30
RECTANGLE 0.6000 0.5000 0.2000 0.5000 I10
RECTANGLE 0.8000 0.0000 0.2000 1.0000 D20

In [17]: recs = treemap.compute_rectangles(data['s2'])

In [18]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.4000 0.3750 Six (a) ~ Four ~ Three
RECTANGLE 0.0000 0.3750 0.4000 0.1250 Six (a) ~ Four ~ One
RECTANGLE 0.4000 0.0000 0.2000 0.5000 Six (a) ~ Two
RECTANGLE 0.0000 0.5000 0.6000 0.5000 Six (b)
RECTANGLE 0.6000 0.0000 0.2667 0.3750 Three ~ Two
RECTANGLE 0.8667 0.0000 0.1333 0.3750 Three ~ One
RECTANGLE 0.6000 0.3750 0.4000 0.2500 Two
RECTANGLE 0.6000 0.6250 0.2667 0.1875 One (a)
RECTANGLE 0.6000 0.8125 0.2667 0.1875 One (b)
RECTANGLE 0.8667 0.6250 0.1333 0.3750 One (c)

In [19]: recs = treemap.compute_rectangles(data['s3'])

In [20]: for r in recs:
    ...:     print(r)
    ...:
RECTANGLE 0.0000 0.0000 0.2500 0.5000 3
RECTANGLE 0.2500 0.0000 0.2500 0.3333 2
RECTANGLE 0.2500 0.3333 0.2500 0.1667 1
RECTANGLE 0.0000 0.5000 0.4167 0.5000 5
RECTANGLE 0.4167 0.5000 0.0833 0.5000 1
RECTANGLE 0.5000 0.0000 0.2500 0.3333 2a
RECTANGLE 0.7500 0.0000 0.2500 0.3333 2b
RECTANGLE 0.5000 0.3333 0.5000 0.2500 3
RECTANGLE 0.5000 0.5833 0.4000 0.2083 2x
RECTANGLE 0.5000 0.7917 0.4000 0.2083 2y
RECTANGLE 0.9000 0.5833 0.1000 0.4167 **1**

Note that data['s2'] is the tree that we used in the explanation of the multiple-level treemap algorithm. Note also that if you were to keep only the root node and its children in the tree data['s1'], data['s2'], or data['s3'], you would get the tree data_flat['s1'], data_flat['s2'], or data_flat['s3'], respectively. You can compare the rectangles you get from each of these to help you debug.

Once you are confident that you’re producing the correct rectangles, you can again try visualizing them using the draw_rectangles function:

In [21]: drawing.draw_rectangles(recs)

To run the automated tests that are new for this task, you can run the following:

$ py.test -x -v -k rectangles_full

To run all the automated tests for drawing rectangles, including the tests from Task 1, you can run the following:

$ py.test -x -v -k rectangles

Putting it all together¶

Once you have completed all the tasks, you will be able to easily generate any treemap you want from the command-line. In particular, you can run the following:

$ python3 treemap.py data/birds.json KEY

Where KEY is the key for the tree for which you would like to produce a treemap.

Here are some examples and the expected treemap. Take into account that running treemap.py will open a new window with the treemap; you will need to close that window before you can return to the command line.

NOTE: If you are passing all the tests, do not worry if the produced treemaps do not match ours exactly! Your completeness grade will depend on the result of the tests, not on the exact graphics you produce. Remember that while working on the assignment via SSH, you will not be able to produce images at all, but that is okay.

$ python3 treemap.py data/birds.json Oct

Treemap of bird sightings in Cook County, IL in October 2020¶

$ python3 treemap.py data/birds.json Jul

Treemap of bird sightings in Cook County, IL in July 2020¶

$ python3 treemap.py data/birds.json Apr

Treemap of bird sightings in Cook County, IL in April 2020¶

$ python3 treemap.py data/birds.json Feb

Treemap of bird sightings in Cook County, IL in February 2020¶

$ python3 treemap.py data/birds.json Year

Treemap of bird sightings in Cook County, IL from November 2019 to October 2020¶

You can use the data/sparrows.json data file as well:

$ python3 treemap.py data/sparrows.json Oct

Treemap of species in family Passerellidae sighted in Cook County, IL in October 2020¶

Grading¶

The assignment will be graded according to the Rubric for Programming Assignments. Make sure to read that page carefully; the remainder of this section explains aspects of the grading that are specific to this assignment.

In particular, your completeness score will be determined on the basis of the automated tests, which provide a measure of how many of the tasks you have completed:

Grade	Percent tests passed
Exemplary	at least 95%
Satisfactory	at least 80%
Needs Improvement	at least 60%
Ungradable	less than 60%

For example, if your implementation passes 85% of the tests, you will earn an S (satisfactory) score for completeness.

The code quality score will be based on the general criteria in the Rubric for Programming Assignments but, in this programming assignment, we will also be paying special attention to the following:

Use of recursion: The goal of this assignment is for you to gain familiarity with recursion, and tasks 2 and 3 require you to use recursion. If you do not use recursion for a function that requires it, the automated tests for that function will not count, and your code quality score will also be negatively affected. If you do not use recursion at all in your code, you will receive a U for Code Quality.
Appropriate use of recursive helper functions: The function compute_rectangles should not itself be recursive. Instead, compute_rectangles should call a recursive helper function. You have the ability to decide what will be the inputs and outputs of that recursive helper function, and a careful choice will make the rest of your work easier and will make your code cleaner.
Repeating code: You should never copy-paste code if you need to repeat a task. There are various ways to avoid copy-pasting, depending on the situation: you might put the code in a helper function, you might move the code inside/outside a loop, or, if the code is inside multiple branches of a conditional, you might move that code outside the conditional.
Computing the same value multiple times: As part of the algorithm, you compute a row layout for each value of $k$, but do you then recompute the row layout for the best value of $k$? How can you avoid repeating that work?
Sorting a list that is already sorted: Sorting a list takes more time than just iterating through the list (specifically, sorting a list of length $n$ takes time $O(n \log n)$). So, you should call sorted_trees whenever you need to sort a list of trees, but don’t call it more than you have to. In particular, don’t re-sort a list that is already sorted.

While these are the main things we care about in this assignment, please remember that it is not possible for us to give you an exhaustive list of every single thing that could affect your code quality score (and that thinking in those terms is generally counterproductive to learning how to program; see our How to Learn in this Class page for more details).

In general, avoiding all of the above issues will increase the likelihood of getting an E; if your code has a few (but not most) of the above issues, that will likely result in an S; if your code suffers from most of those issues, it will likely get an N. That said, to reiterate, we could also find other issues in your code that we did not anticipate, but that end up affecting your code quality score. When that happens, we encourage you to see these as opportunities to improve your code in future assignments (or as specific things to change if you decide to resubmit the assignment).

And don’t forget that style also matters! You could avoid all of the above issues, and still not get an E because of style issues in your code. Make sure to review the general guidelines in the Rubric for Programming Assignments, as well as our Style Guide.

Cleaning up¶

Before you submit your final solution, you should, remove

any print statements that you added for debugging purposes and
all in-line comments of the form: “YOUR CODE HERE” and “REPLACE …”

Also, check your code against the style guide. Did you use good variable names? Do you have any lines that are too long?

Make sure you have included header comments, that is, the triple-quote strings that describe the purpose, inputs, and return values of each function, for every function you have written.

As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.

Submission¶

You will be submitting your work through Gradescope (linked from our Canvas site). The process will be the same as with previous coursework: Gradescope will fetch your files directly from your PA #6 repository on GitHub, so it is important that you remember to commit and push your work! You should also get into the habit of making partial submissions as you make progress on the assignment; remember that you’re allowed to make as many submissions as you want before the deadline.

To submit your work, go to the “Gradescope” section on our Canvas site. Then, click on “Programming Assignment #6”. If you completed previous assignments, Gradescope should already be connected to your GitHub account. If it isn’t, you will see a “Connect to GitHub” button. Pressing that button will take you to a GitHub page asking you to confirm that you want to authorize Gradescope to access your GitHub repositories. Just click on the green “Authorize gradescope” button.

Then, under “Repository”, make sure to select your uchicago-cmsc12100-aut-21/pa6-$GITHUB_USERNAME.git repository. Under “Branch”, just select “main”.

Finally, click on “Upload”. An autograder will run, and will report back a score. Please note that this autograder runs the exact same tests (and the exact same grading script) described in Testing Your Code. If there is a discrepancy between the tests when you run them on your computer, and when you submit your code to Gradescope, please let us know.

Acknowledgments: Gordon Kindlmann originally recommended drawing treemaps as good topic for an assignment.