Polling places

Due: Friday, November 11th, 4:30pm CT

You may work alone or in a pair on this assignment.

The goal of this assignment is to give you practice designing and implementing classes and methods, and working with class instances.

Getting started

If you are going to work individually, follow the invitation URL provided on Ed Discussion and when prompted for a team name, simply enter your CNetID. Because of a quirk in how GitHub manages team assignments, it may complain that your CNetID is already “taken” if you did PA #2 individually (this does not mean anyone is usurping your CNetID; GitHub simply requires that you sign up with a different name, because you already used your CNetID in PA #2). If this happens, simply add a -4 to your CNetID. For example, CNetID jrandom would become jrandom-4. Make sure to use this modified CNetID in place of your CNetID in all the instructions below.

If you are going to work in a pair, the process involves a couple of extra steps. The process is described in our Coursework Basics page. Please head over to that page, and return here once you’ve created your team repository.

Next, you will need to initialize your repository. If you are working in a pair only one of you should complete these steps. If you repeat these steps more than once, you may break your repository.

First, create a TEAM variable in the terminal that contains either your CNetID (if you’re working individually) or your team name (if you’re working in a pair):

TEAM=replace_me_with_your_cnetid_or_team_name

(remember you can double-check whether the variable is properly set by running echo $TEAM)

Finally, run these commands (if you don’t recall what some of these commands do, they are explained in the CAPP Camp Git I material):

cd ~/capp30121
mkdir pa4-$TEAM
cd pa4-$TEAM
git init
git remote add origin git@github.com:uchicago-CAPP30121-aut-2022/pa4-$TEAM.git
git remote add upstream git@github.com:uchicago-CAPP30121-aut-2022/pa4-initial-code.git
git pull upstream main
git branch -M main
git push -u origin main

You will find the files you need for the programming assignment directly in the root of your repository, including a README.txt file that explains what each file is. Make sure you read that file.

If you are working in a pair, the person who did not initialize the repository should setup the TEAM variable (as shown above) and then clone the repository like this:

cd ~/capp30121
git clone git@github.com:uchicago-CAPP30121-aut-2022/pa4-$TEAM.git
cd pa4-$TEAM
git remote add upstream git@github.com:uchicago-CAPP30121-aut-2022/pa4-initial-code.git

If you are working in a pair, you should also take a moment to read the working_pair section of the Coursework Basics page.

Introduction

../../_images/voting_booths.png

Every two years, many people around the United States wait in long lines to vote, in part due to inadequate provisioning of voting booths.

In this assignment, you will write code to simulate the flow of voters through polling places and analyze the interplay the number of voting booths assigned to a precinct and the amount of time voters spend waiting in line.

We will view this problem as an instance of the more general problem of simulating an \(M\)/\(M\)/\(N\) queue, where:

  • the first \(M\) indicates that arrivals (voters arriving at the polls) obey a Markov process;

  • the second \(M\) indicates that the service times obey a Markov process (the amount of time a voter takes to complete a ballot); and finally

  • the \(N\) indicates that there are \(N\) servers (the number of voting booths).

We will tell you how to structure your implementation at a high level, but you will be responsible for the design details.

Representing Voters

In Task 2 of this assignment, you will write a Voter class to model voters. This class will include:

  • the time the voter arrives at the polls,

  • the voting duration, that is, the amount of time the voter takes to vote,

  • the time the voter is assigned to a voting booth (aka, the start time), assuming they vote,

  • the time the voter leaves the voting booth (aka, the departure time), assuming they vote.

For simplicity, we will model time as a floating point value, representing the number of minutes since the polls opened. So, if we say a voter arrived at the polls at time 60.5, that means they arrived an hour and thirty seconds after the polls opened.

We will know the voter’s arrival time and voting duration when we construct the Voter instance representing a voter. Both the arrival time and the voting duration for the voter should be initialized to their proper values when the voter instance is constructed.

The start time for a voter, which is not available at construction time, should be initialized to None when the voter instance is constructed. This value should be reset to the correct value when the voter is assigned to a voting booth and starts to vote.

Similarly, a voter’s departure time cannot be determined when the voter is constructed. It too should be initialized to None when the voter instance is constructed and should be reset to the appropriate time when the voter is assigned to a voting booth.

Your implementation must include at a minimum a constructor and public attributes named:

  • arrival_time

  • voting_duration

  • start_time

  • departure_time

Our utility function for printing voters, described below, relies on the existence of these attributes.

In addition to a constructor, this class could have a method to set the voter’s start and departure times, both of which can be set at the same time. Another reasonable design is to separate the work of handling these updates into multiple methods. Whether you use one method for these updates or more than one is up to you.

Precincts

Cities are broken up into voting precincts. In Tasks 2 and 3, you will complete the definition of a Precinct class that represents a single precinct, and which will be responsible for simulating the behavior of voters in that precinct.

A precinct is open for some number of hours and has some maximum number of voters who can vote in the precinct (you can think of this as the number of voters who are registered to vote in that precinct). We will assume that all registered voters will go to the polls on election day, but not all arrive in time to vote.

In addition to these values, a precinct has two more values–the arrival rate and the voting duration rate–that are used to generate voters. We discuss this process below.

Precinct Configuration

The configuration for a precinct is specified using a dictionary containing the following information:

  • The name of the precinct (name)

  • Number of hours the polls are open (hours_open)

  • Number of voters assigned to the precinct (num_voters)

  • Number of voting booths assigned to the precinct (num_booths)

  • The rate at which voters arrive (arrival_rate)

  • The rate at which voters complete their voting (voting_duration_rate)

  • Whether or not the precinct is large (large_precinct)

The significance of the rates will be explained later.

For example:

{
 "name": "Downtown",
 "hours_open": 1,
 "num_voters": 5,
 "num_booths": 1,
 "voting_duration_rate": 0.1,
 "arrival_rate": 0.11,
 "large_precinct": False
}

Information about a precinct will be stored in a JSON file that contains a dictionary with two items: the key "precinct" maps to a precinct configuration as specified above and the key "seed" maps to a random seed. Similar to PA #1, the random seed will be used to ensure that a simulation involving randomness produces the same result every time it runs (which will allow us to test whether or not your code produces the expected results). We recommend reviewing PA #1 (and how it used random seeds) if you’re unsure of how random seeds are used in the context of running a simulation.

Precinct class

Notice how having a Precinct class allows us to have multiple instances of the Precinct class in our program, each with its own set of parameters. This design will make it easier for us to simulate how voters behave given different settings for the parameters.

We have provided a skeleton for the Precinct class that includes the constructor and a simulate method to simulate a single voting day for that precinct. While you are allowed to add additional methods to this class, you cannot modify the parameter list of the constructor or the simulate method.

Please note that we will consider each Precinct to be independent from each other: voters cannot go from one precinct to another, and there is no information shared between the precincts.

Generating Voters

One of the main responsibilities of the Precinct class is to generate the voters. When we generate a voter, we set their arrival time, so let’s take a closer look at how those arrival times are generated.

The arrival time of our voters will follow a Poisson process. From Wikipedia: a Poisson process, named after the French mathematician Siméon-Denis Poisson (1781-1840), is a stochastic process in which the time to the next event is independent of past events. We can use the following fact to generate a voter sample: if the number of arrivals in the time interval \([0, t]\) follows the Poisson distribution with mean \(t \cdot \lambda\), then the gap between arrivals times follows the exponential distribution with rate parameter \(\lambda\).

This means that to generate a new voter, we need to know the arrival time of the previously generated voter. Our voter simulation code should keep track of a “current time” representing when the most recent voter arrived. We will keep track of time in minutes starting at time zero. To generate a voter, you will need to generate an arrival time and a voting duration.

Arrival time

To generate the arrival time of the next voter, you will generate an inter-arrival time (\(\textit{gap}\)) from the exponential distribution using the specified arrival rate as \(\lambda\) and add it to the current time (\(t\)). You’ll then move time forward by updating the time from \(t\) to \(t + \textit{gap}\). Notice how \(\textit{gap}\) represents the time that elapses between the arrival of the previous voter and the arrival of the next voter.

For example, let’s say we want to generate a voter in a precinct, and the random number generator returns 5.7 for that voter’s gap. Since this is the first voter we generate, their arrival time will be 5.7, and we advance the precinct’s current time from 0 to 5.7. We then generate another voter, and the random number generator returns 2.4 for the voter’s gap. This means that this second voter arrives 2.4 minutes after the previous voter, so the second voter’s arrival time will be 8.1 (5.7 + 2.4) and we advance the precinct’s current time to 8.1.

Voting Duration

When we generate a voter, we also need to assign a voting duration to that voter. The voting duration also follows a Poisson process so you will draw the voter’s voting duration from the exponential distribution using the specified voting duration rate as \(\lambda\).

Task 1: Creating data files

First, you will create the data files used throughout this assignment. Currently, the entire precinct data set is stored in a CSV file. However, as you will see, it will be easier to work with the data if it is stored in separate JSON files, one for each precinct. In this task, you will create these JSON files, which will have the form described in the “Precinct Configuration” section above.

Complete the function precincts_to_json in process_data.py. This function takes the name of the CSV file containing precinct data, and the name of the directory in which to store the JSON files.

In this function, you will need to:

  • Open the CSV file and load the data,

  • Clean the data (described below), and

  • Create a number of JSON files, one for each row of the input CSV file (also described below)

Now might be a good time to review the “Working with Data” chapter of the book. We have also uploaded an extra video about JSON files to Canvas under the “Panopto” tab to help you get started.

Cleaning the data

Often, real-world data sets are messy. There can be values missing from the data, they can contain extraneous information, or have important information missing. Open data/precincts.csv and you will notice some of these issues.

After loading the CSV data (we recommend using the DictReader class to read the data into a list of dictionaries), you will need to process and clean the data.

Each JSON file will contain a dictionary with two items: the key "seed" that maps to a random seed and the key "precinct" that maps to a precinct configuration. For example, here are the expected contents of precinct-0.json:

{
    "seed": 1468604453,
    "precinct": {
        "name": "Downtown",
        "hours_open": 1,
        "num_voters": 5,
        "num_booths": 1,
        "voting_duration_rate": 0.1,
        "arrival_rate": 0.11,
        "large_precinct": false
    }
}

In order to create the dictionary above, you will need to do a number of things, described below.

Handle missing values: Some values in the "name" and "num_booths" columns of the CSV file are missing. If you find a missing value for one of these keys, you should insert a default value rather than leaving the value blank. The default for "name" should be "Unknown" and the default for "num_booths" should be 1.

Cast values to the correct type: If you use the DictReader class (recommended above), the values in each column of the CSV file will be loaded as strings. Instead:

  • The values of seed, hours_open, num_voters, and num_booths should be integers, and

  • The values of voting_duration_rate and arrival_rate should be floats.

Add a key: Add the key “large_precinct” to the precinct. This key should be True if the precinct is “large”, and False otherwise. We will say that a precinct is “large” if it has more than 100 voters.

Remove a key: Remove the key, value pair associated with the key “pop” from the data. This is extraneous information in the CSV file which will not be of use to us.

Create the JSON files

After cleaning the data, you will create a number of JSON files in the directory specified by the output_directory parameter. Name your files as follows:

  • precinct-0.json for the first row of data in the CSV file,

  • precinct-1.json for the second row of data,

  • precinct-2.json for the third row of data,

  • and so on.

Running your code

Create your JSON files calling precincts_to_json in IPython. As usual, we recommend enabling the autoreload extension whenever you start an IPython session.

In [1]: %load_ext autoreload

In [2]: %autoreload 2

In [3]: import process_data

In [4]: process_data.precincts_to_json("data/precincts.csv", "data")

After this call, you should find a number of JSON files (precinct-0.json, precinct-1.json, and so on) in the data directory.

Testing

When you begin your work, we recommend testing your code by visually inspect the JSON files. Open your JSON files in VSCode an verify that they contain the expected results. We have also incluced all of the expected JSON files in the test-data directory. You can compare your results to these files side-by-side.

We have provided the function load_precinct in util.py that takes the name of a JSON precinct file and loads the file into two values: the precinct configuration dictionary (as specified above) and the random seed. You should not make any calls to load_precinct in the code you write yourself, but it can be a useful function when testing your code. In particular, you can check the contents of your JSON files in IPython.

Here is an example of how to use load_precinct to open the data file data/precinct-0.json in IPython, and how to access each value associated with the precinct.

In [3]: import util

In [4]: precinct, seed = util.load_precinct("data/precinct-0.json")

In [5]: precinct
Out[5]:
{'name': 'Downtown',
 'hours_open': 1,
 'num_voters': 5,
 'num_booths': 1,
 'voting_duration_rate': 0.1,
 'arrival_rate': 0.11,
 'large_precinct': False}

In [6]: seed
Out[6]: 1468604453

In [7]: precinct["num_voters"]
Out[7]: 5

In [8]: precinct["arrival_rate"]
Out[8]: 0.11

Once you’ve done some manual testing, you can run additional automated tests with py.test:

$ py.test -xvk data

(Recall that we use $ to indicate the Linux command-line prompt. You should not include it when you run this command.)

As usual, remember that you can find more detailed instructions on how manual and automated testing work in the Testing Your Code page.

Task 2: Implement voter generation

Implement the Voter class. And, in the provided Precinct class, fill in the constructor, and add code to the simulate method for generating a list of voters according to the configuration of the precinct.

In other words, instead of carrying out a full simulation of voters moving through the polling place, you will just generate instances of the Voter class for the voters who will be able to vote in the precinct, and return a list of these instances.

Doing this partial implementation of a voting simulation will ensure that you’ve correctly implemented the voter generation logic we described earlier. The code for generating the voters in your Precinct should be fairly simple (our implementation is only 15 lines long, not including comments).

When writing your code, it is important that you take into account the following notes.

Repeatability

For testing purposes, it is important that you draw the gap and determine the voter’s voting duration in the same order as our code. To reduce the likelihood that the testing process fails simply because the values were drawn in the wrong order, we have provided a function, gen_voter_parameters, in util.py that takes the precinct’s arrival rate and voting duration rate and returns a tuple with the gap as a float and the voting duration as a float (in that order). We recommend familiarizing yourself with this function to understand how arrival gaps and voting duration times are determined. It might be helpful to review the M/D/1 queue material from lecture.

When to stop generating voters

The number of voters generated will be determined by one of two stopping conditions: (1) you have generated the maximum number of the voters specified for the precinct, or (2) you generate a voter who arrives after closing time. In the second case, the late voter should be discarded and no further voters should be generated. We will refer to voters who arrive before the polls close as valid.

Testing

So that you can test your implementation manually, we have provided two sets of voter generation parameters. The first set (see data/precinct-0.json) includes the following:

  • hours open: 1

  • number of voters in the precinct: 5

  • number of booths in the precinct: 1

  • voting duration rate: 0.1 (corresponds to a mean voting duration of 10 minutes)

  • arrival rate: 0.11 (corresponds to mean time between voters of approximately 9 minutes)

  • seed for random number generator: 1468604453

and corresponds to a voter population in which all the voters arrive before the polls close.

The random seed

Remember that the precincts file includes a seed value (returned by load_precincts). This seed must be passed to simulate (using the seed parameter) and you must use the random.seed(seed) function to set the seed inside your method that generates the list of voters. The seed only needs to be set once for each list of voters that you generate.

Remember that PA #1 includes a longer discussion of random seeds, and their significance in testing simulations.

We have also provided two functions that will be useful to you as you test this part your code. The first, a function named run_simulation in test_utils, takes the name of a precinct file and one optional parameter–the number of voting booths assigned to the precinct. This function loads the precinct data from the file, creates an instance of the Precinct class using that data, and then returns the result of calling simulate on the precinct.

The second function, print_voters in util.py takes a list of Voter instances and an optional file name. This function will either print a representation of the voters to screen or to a file, depending on whether a filename is specified (you likely won’t have to write to a file yourselves, but it is something that we’ve found useful on occasion).

You can use these functions to call your voter list generation method and then to print and check the voters that get returned.

In [10]: import util, test_utils

In [11]: voters = test_utils.run_simulation("data/precinct-0.json")

In [12]: util.print_voters(voters)
Arrival Time   Voting Duration   Start Time    Departure Time
-------------------------------------------------------------
6.78           2.11              None          None

7.01           5.13              None          None

25.27          2.68              None          None

26.89          1.10              None          None

33.48         10.08              None          None

Notice how the Start Time and the Departure Time are initially None because you are not yet calculating the time when the voters start voting (they may need to wait in line once they arrive at the precinct, and we describe this in more detail in the next task).

The second set of voter generation parameters, data/precinct-1.json, is the same except the arrival rate is set to 0.04, which corresponds to a mean gap of 25 minutes between arrivals:

In [6]: voters = test_utils.run_simulation("data/precinct-1.json")

In [7]: util.print_voters(voters)
Arrival Time   Voting Duration   Start Time    Departure Time
-------------------------------------------------------------
18.64          2.11              None          None

19.28          5.13              None          None

Notice how, in the above set, only two voters get to vote. If you actually generated the third voter, it would have an arrival time of 69.49, which means they would arrive after the polls close since this precinct is only open for one hour.

Once you’ve done some manual testing, you can run additional automated tests with py.test:

$ py.test -xvk voter

Task 3: Simulate an election day

In this task, you will be simulating an election for a single precinct. You will do this by completing the implementation of the simulate method in Precinct.

Simulating a Precinct

The simulation itself will be similar to the M/D/1 queue example from class. In the M/D/1 example, we have a single server, such as a toll booth, that serves a queue of clients, such as cars, that arrive with a specified distribution and have a fixed service time. In this assignment, your simulation will need to simulate multiple servers (voting booths) and variable service times (voting durations), but the structure of your simulation will be similar.

The biggest difference is that you will be using our VotingBooths class (described below) to model the “multiple servers”. This class will be responsible for determining whether a voting booth is available and, if not, when a booth will become available. This information will be necessary to determine when a given voter will start voting (since it depends on when they can access a booth), and to ensure that the number of booths in use will never be no more than the number of booths assigned to the precinct.

Example

Let’s walk through an example. The file data/precinct-2.json contains the following precinct:

{
 "name": "Little Rodentia",
 "hours_open": 1,
 "num_voters": 10,
 "num_booths": 2,
 "voting_duration_rate": 0.1,
 "arrival_rate": 0.16666666666666666,
 "large_precinct": false
}

And the seed 1438018944.

The precinct has 10 voters who arrive at the precinct once every 6 minutes on average. Voters take 10 minutes on average to vote.

As you work on your implementation, it can be useful to print out the voters returned by your simulate method, even if you have not yet filled in the start and departure times of the voters:

In [15]: voters = test_utils.run_simulation("data/precinct-2.json")

In [16]: util.print_voters(voters)
Arrival Time   Voting Duration   Start Time    Departure Time
-------------------------------------------------------------
 3.29           5.39             None          None

12.39           5.74             None          None

12.68          24.45             None          None

15.78           6.08             None          None

19.76          39.64             None          None

20.52           5.79             None          None

22.33          16.65             None          None

24.23           3.07             None          None

26.75           5.24             None          None

29.04          15.43             None          None

Let’s consider what happens when we simulate this precinct with two voting booths.

Our first voter first voter arrives at 3.29, enters a voting booth immediately at time 3.29, and finishes voting at 8.69 (3.29 + 5.39). (All times in this example have been rounded to two digits for clarity; your code should not round times.)

The second voter also starts voting immediately, because a booth is open, and finishes voting at time 18.13 (12.39 + 5.74).

The third voter arrives at time 12.68 and has a voting duration of 24.45 minutes. Since the first voter departed at 8.69, this voter can start voting immediately, and finishes voting at 12.68.

The fourth voter arrives at 15.78 and has a voting duration of 6.08 minutes. Both booths are occupied when the fourth voter arrives, so this voter must wait until a booth becomes free. This happens when the second voter finishes at 18.13. So, the fourth voter’s start time will be 18.13 and their departure time will be 24.21. Notice that the fourth voter departs before the third voter.

The fifth voter arrives at time 19.79 and will not start voting until the fourth voter finishes at time 24.21. Notice that the third voter is still voting when the fifth voter starts!

And so on.

All ten voters arrive before the polls close so all ten get to vote. Please note that (as typical in the US) if a voter arrives before the polls close, they still get to vote even if they will not start voting until after the polls have closed. In this example, the polls close at time 60 (the precinct configuration dictionary specifies the number of hours the polls are open, while all other times are measured in minutes).

Here’s the result of our simulation:

In [20]: voters = test_utils.run_simulation("data/precinct-2.json", num_booths=2)

In [21]: util.print_voters(voters)
Arrival Time   Voting Duration   Start Time    Departure Time
-------------------------------------------------------------
 3.29           5.39              3.29          8.69

12.39           5.74             12.39         18.13

12.68          24.45             12.68         37.13

15.78           6.08             18.13         24.21

19.76          39.64             24.21         63.85

20.52           5.79             37.13         42.92

22.33          16.65             42.92         59.57

24.23           3.07             59.57         62.64

26.75           5.24             62.64         67.88

29.04          15.43             63.85         79.28

Notice that we provided a value for the optional parameter of test_utils.run_simulation: num_booths is two, since there are two booths in Precinct 2.

Managing the Voting Booths

We have supplied a VotingBooths class to model the voting booths. This class allows you to add and remove voters from booths and to track the state of the booths.

VotingBooths class has a constructor that takes the number of booths and initializes the data structures used to manage the booths.

Priority Queues

The underlying data structure used to represent the booths in the VotingBooths class is a priority queue. You do not need to understand priority queues to use the VotingBooths class. But in case you are curious: a minimum priority queue is a data structure that includes operations for (1) adding items along with a priority to the queue (where the priority is just a number), (2) removing the item with the lowest priority value from the queue, and (3) determining whether the queue is empty. Removing the elements with the lowest priority value may seem counter-intuitive but, in Computer Science, a lower value usually denotes a higher priority (think of it as a ranking: if you’re #1, your ranked above someone who is ranked #2).

The class also provides a collection of methods:

  • is_booth_available: which returns True if there is at least one unoccupied booth and False otherwise.

  • is_some_booth_occupied: which returns True if there is at least one occupied booth and False otherwise.

  • time_next_free: returns a float with the time that the next booth will be free next. This method should only be called when at least one booth is occupied.

  • enter_booth: takes an instance of the Voter class and adds the voter to a booth. This method should only be called when at least one booth is open and it also requires the start_time and voting_duration attributes of the Voter to be set.

  • exit_booth: removes the next voter to depart from their booth and returns their departure time. This method should only be called when at least one booth is occupied.

When simulating an election day, you are responsible for ensuring that all booths are empty at the end of the day.

Your Precinct code may use as many of these methods as needed. Your code may not access the underlying data directly.

Returning the voters

Your simulate function must return a list of all the Voter objects. The start_time and departure_time attributes should be updated for the voters who voted. Furthermore, for a given precinct, the list of voters should be sorted by the voters’ arrival time. Since your Precinct should generate voters in the order in which they arrive, producing a list of voters sorted by arrival time should not require any additional steps. If you do find that you need to sort the list explicitly, it is likely that your solution is headed in the wrong direction; come to office hours or ask for help on Ed if that is the case.

Testing

We have provided a main function in simulate.py that allows you to run simulations from the command line. If you run simulate.py with just the name of a configuration file, it will print a summary of the simulation results using that precinct with one voting machine. For example:

$ python3 simulate.py data/precinct-0.json

PRECINCT 'Downtown'
- 5 voters voted
- Polls closed at 60.0 and last voter departed at 43.56
- Avg wait time: 0.59

You can use the --print-voters options to, instead, print all the voters:

$ python3 simulate.py data/precinct-0.json --print-voters
PRECINCT 'Downtown'
Arrival Time   Voting Duration   Start Time    Departure Time
    6.78         2.11             6.78              8.89

    7.01         5.13             8.89             14.02

    25.27         2.68            25.27             27.94

    26.89         1.10            27.94             29.04

    33.48        10.08            33.48             43.56

We have also provided some automated tests that will test your implementation of the simulate method. You can run these tests like this:

$ py.test -xvk day

Task 4: Finding the average waiting time of a precinct

At this point, we have the ability to simulate precincts under a variety of parameters. An interesting parameter to tweak is the number of voting booths, since it has a direct effect on the average waiting time of the voters.

In this task, you will implement a function that, given a single precinct and a number of booths, computes the average waiting time when simulating that precinct with that number of booths:

def find_avg_wait_time(precinct, num_booths, ntrials, initial_seed=0):

Where:

  • precinct is a precinct configuration dictionary

  • num_booths is the number of booths to use in the simulation. Notice that this will override whatever value is specified in the precinct dictionary. However do not modify the num_booths field of the precinct dictionary.

  • ntrials is a number of trials (this is explained below)

  • initial_seed is an initial seed (also explained below)

Given a single simulation of a precinct, computing the average waiting time of the voters is not difficult. However, as with any simulation, it is unreasonable to accept the result of a single simulation (or “trial”). To avoid having the quirks of a specific set of voters mislead us, we will run through this process a specified number of times (ntrials times) and return the median average wait time across the different trials.

You will need to use a different random seed for each trial. Start with the specified initial_seed and add 1 for each new trial.

Finding the median of a set of trial runs

In this task, you will need to run a series of trials, collect the results from the trials, and then report the median result. One easy way to find the median is to store the results in a list (let’s call it lst), sort the list using the built-in sorted function (sorted(lst)), and then return the middle element of the sorted list (sorted(lst)[len(lst) // 2]). Our test code for this task assumes you use this approach to compute the median.

Testing

To test your implementation of the function, you can call the function from IPython. For example:

In [4]: import util, simulate

In [5]: p, seed = util.load_precinct("data/precinct-3.json")

In [6]: p
Out[6]:
{'name': 'Unknown',
'hours_open': 13,
'num_voters': 500,
'num_booths': 5,
'voting_duration_rate': 0.1,
'arrival_rate': 0.7,
'large_precinct': True}

In [7]: seed
Out[7]: 1438021247

In [8]: simulate.find_avg_wait_time(p, num_booths=1, ntrials=20, initial_seed=seed)
Out[8]: 2203.0502079782727

In [9]: simulate.find_avg_wait_time(p, num_booths=8, ntrials=20, initial_seed=seed)
Out[9]: 4.630351652014112

You can also run the automated tests for this function like this:

$ py.test -xvk avg

Task 5: Finding the optimal number of booths

Ideally, we would like to have enough booths on hand to allow everyone to vote without having to wait too long in line.

At this point, we have all the necessary pieces to answer the following question: “Given a target waiting time \(W\), how many booths does the precinct need to ensure that the average waiting time is less that \(W\)?”.

Your task is to implement the following function to answer this question:

def find_number_of_booths(precinct, target_wait_time, max_num_booths, ntrials, seed=0):

Where:

  • precinct is a precinct configuration dictionary

  • target_wait_time is \(W\) as defined above

  • max_num_booths is the maximum number of booths the precinct is willing to have

  • ntrials is the number of trials to run when finding the average waiting time of a precinct

  • seed is a random seed

Your function will do a simple linear search: you will first simulate the precinct with one booth, and check whether the average waiting time of the voters (as produced by find_avg_wait_time from Task 4) is strictly below target_wait_time. If it isn’t, you will simulate the precinct with two booths, and check the average waiting time, and so on up to max_num_booths or until you find a number of booths that achieves an average waiting time less than target_wait_time. Take into account that it could also be the case that the provided target_wait_time is infeasible with a number of booths less than or equal to max_num_booths.

So, the result of this search with be a tuple with two values: the optimal number of booths, and the average waiting time with that number of booths. If the provided target waiting time is infeasible, return (0, None).

Testing

To test your implementation of the function, you can call the function from IPython. For example:

In [22]: import util, simulate

In [23]: p, seed = util.load_precinct("data/precinct-3.json")

In [27]: simulate.find_number_of_booths(p, ntrials=20, target_wait_time=10, max_num_booths=10, seed=seed)
Out[27]: (8, 4.630351652014112)

In [28]: simulate.find_number_of_booths(p, ntrials=20, target_wait_time=10, max_num_booths=5, seed=seed)
Out[28]: (0, None)

You can also run the function from the command-line by using the --target-wait-time and --max-num-booths options:

$ python3 simulate.py data/precinct-3.json --max-num-booths 10 --target-wait-time 10
Precinct 'Unknown' can achieve average waiting time of 4.63 with 8 booths

$ python3 simulate.py data/precinct-3.json --max-num-booths 5 --target-wait-time 10
The target wait time (10.00) is infeasible in precinct 'Sahara Square' with 5 or less booths

Note: The command-line tool sets the value of ntrials to 20.

And, finally, you can run the automated tests for this function like this:

$ py.test -xvk find

Grading

The assignment will be graded according to the Rubric for Programming Assignments. Make sure to read that page carefully; the remainder of this section explains aspects of the grading that are specific to this assignment.

In particular, your completeness score will be determined solely on the basis of the automated tests, which provide a measure of how many of the tasks you have completed:

Grade

Percent tests passed

Exemplary

at least 95%

Satisfactory

at least 80%

Needs Improvement

at least 50%

Ungradable

less than 50%

For example, if your implementation passes 85% of the tests, you will earn an S (satisfactory) score for completeness.

The code quality score will be based on the general criteria in the Rubric for Programming Assignments but, in this programming assignment, we will also be paying special attention to the following:

  • Appropriate design for the Voter class Did you design one or more methods that are apporpriate for updating voters once they vote?

  • Appropriate use of the VotingBooths class Does your implementation leverage the VotingBooths class to manage the voters properly or did you recreate some of the functionality of this class in your own code?

  • Appropriate use of helper functions Did you design a good method for creating voters? Did you write well-designed helper methods for the parameter sweeps?

While these are the main things we care about in this assignment, please remember that it is not possible for us to give you an exhaustive list of every single thing that could affect your code quality score (and that thinking in those terms is generally counterproductive to learning how to program; see our How to Learn in this Class page for more details).

In general, avoiding all of the above issues will increase the likelihood of getting an E; if your code has a few (but not most) of the above issues, that will likely result in an S; if your code suffers from most of those issues, it will likely get an N. That said, to reiterate, we could also find other issues in your code that we did not anticipate, but that end up affecting your code quality score. When that happens, we encourage you to see these as opportunities to improve your code in future assignments (or as specific things to change if you decide to resubmit the assignment).

And don’t forget that style also matters! You could avoid all of the above issues, and still not get an E because of style issues in your code. Make sure to review the general guidelines in the Rubric for Programming Assignments, as well as our Style Guide.

Cleaning up

Before you submit your final solution, you should, remove

  • any print statements that you added for debugging purposes and

  • all in-line comments of the form: “YOUR CODE HERE” and “REPLACE …”

Also, check your code against the style guide. Did you use good variable names? Do you have any lines that are too long, etc.

Make sure you have included header comments, that is, the triple-quote strings that describe the purpose, inputs, and return values of each function, for every function you have written.

As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.

Submission

You will be submitting your work through Gradescope (linked from our Canvas site). The process will be the same as with previous coursework: Gradescope will fetch your files directly from your PA #4 repository on GitHub, so it is important that you remember to commit and push your work! You should also get into the habit of making partial submissions as you make progress on the assignment; remember that you’re allowed to make as many submissions as you want before the deadline.

If you are working in a pair, only one of you has to make a submission

To submit your work, go to the “Gradescope” section on our Canvas site. Then, click on “Programming Assignment #4”.

Under “Repository”, make sure to select your uchicago-CAPP30121-aut-2022/pa4-$TEAM.git repository. Under “Branch”, just select “main”.

Finally, click on “Upload”. An autograder will run, and will report back a score. Please note that this autograder runs the exact same tests (and the exact same grading script) described in Testing Your Code. If there is a discrepancy between the tests when you run them on your computer, and when you submit your code to Gradescope, please let us know.

If you are working in a pair, Gradescope will initially create an individual submission. You will need to explicitly add your partner to your submission, so that your work is associated with them too.