Polling places¶
Due: Friday, November 11th, 4:30pm CT
You may work alone or in a pair on this assignment.
The goal of this assignment is to give you practice designing and implementing classes and methods, and working with class instances.
Getting started¶
If you are going to work individually, follow the invitation URL
provided on Ed Discussion and when prompted for a team name, simply
enter your CNetID. Because of a quirk in how GitHub manages team
assignments, it may complain that your CNetID is already “taken” if
you did PA #2 individually (this does not mean anyone is usurping
your CNetID; GitHub simply requires that you sign up with a different
name, because you already used your CNetID in PA #2). If this happens,
simply add a -4
to your CNetID. For example, CNetID jrandom
would become jrandom-4
. Make sure to use this modified CNetID
in place of your CNetID in all the instructions below.
If you are going to work in a pair, the process involves a couple of extra steps. The process is described in our Coursework Basics page. Please head over to that page, and return here once you’ve created your team repository.
Next, you will need to initialize your repository. If you are working in a pair only one of you should complete these steps. If you repeat these steps more than once, you may break your repository.
First, create a TEAM
variable in the terminal that contains either
your CNetID (if you’re working individually) or your team name (if
you’re working in a pair):
TEAM=replace_me_with_your_cnetid_or_team_name
(remember you can double-check whether the variable is properly set by
running echo $TEAM
)
Finally, run these commands (if you don’t recall what some of these commands do, they are explained in the CAPP Camp Git I material):
cd ~/capp30121
mkdir pa4-$TEAM
cd pa4-$TEAM
git init
git remote add origin git@github.com:uchicago-CAPP30121-aut-2022/pa4-$TEAM.git
git remote add upstream git@github.com:uchicago-CAPP30121-aut-2022/pa4-initial-code.git
git pull upstream main
git branch -M main
git push -u origin main
You will find the files you need for the programming assignment
directly in the root of your repository, including a README.txt
file that explains what each file is. Make sure you read that file.
If you are working in a pair, the person who did not initialize the repository should setup
the TEAM
variable (as shown above) and then clone the repository like this:
cd ~/capp30121
git clone git@github.com:uchicago-CAPP30121-aut-2022/pa4-$TEAM.git
cd pa4-$TEAM
git remote add upstream git@github.com:uchicago-CAPP30121-aut-2022/pa4-initial-code.git
If you are working in a pair, you should also take a moment to read the working_pair section of the Coursework Basics page.
Introduction¶
Every two years, many people around the United States wait in long lines to vote, in part due to inadequate provisioning of voting booths.
In this assignment, you will write code to simulate the flow of voters through polling places and analyze the interplay the number of voting booths assigned to a precinct and the amount of time voters spend waiting in line.
We will view this problem as an instance of the more general problem of simulating an \(M\)/\(M\)/\(N\) queue, where:
the first \(M\) indicates that arrivals (voters arriving at the polls) obey a Markov process;
the second \(M\) indicates that the service times obey a Markov process (the amount of time a voter takes to complete a ballot); and finally
the \(N\) indicates that there are \(N\) servers (the number of voting booths).
We will tell you how to structure your implementation at a high level, but you will be responsible for the design details.
Representing Voters¶
In Task 2 of this assignment, you will write a Voter
class to model voters.
This class will include:
the time the voter arrives at the polls,
the voting duration, that is, the amount of time the voter takes to vote,
the time the voter is assigned to a voting booth (aka, the start time), assuming they vote,
the time the voter leaves the voting booth (aka, the departure time), assuming they vote.
For simplicity, we will model time as a floating point value, representing the number of minutes since the polls opened. So, if we say a voter arrived at the polls at time 60.5, that means they arrived an hour and thirty seconds after the polls opened.
We will know the voter’s arrival time and voting duration when we
construct the Voter
instance representing a voter. Both the
arrival time and the voting duration for the voter should be
initialized to their proper values when the voter instance is
constructed.
The start time for a voter, which is not available at construction
time, should be initialized to None
when the voter instance is
constructed. This value should be reset to the correct value when the
voter is assigned to a voting booth and starts to vote.
Similarly, a voter’s departure time cannot be determined when the
voter is constructed. It too should be initialized to None
when
the voter instance is constructed and should be reset to the
appropriate time when the voter is assigned to a voting booth.
Your implementation must include at a minimum a constructor and public attributes named:
arrival_time
voting_duration
start_time
departure_time
Our utility function for printing voters, described below, relies on the existence of these attributes.
In addition to a constructor, this class could have a method to set the voter’s start and departure times, both of which can be set at the same time. Another reasonable design is to separate the work of handling these updates into multiple methods. Whether you use one method for these updates or more than one is up to you.
Precincts¶
Cities are broken up into voting precincts.
In Tasks 2 and 3, you will complete the definition of
a Precinct
class that represents a single precinct, and which will be responsible for simulating
the behavior of voters in that precinct.
A precinct is open for some number of hours and has some maximum number of voters who can vote in the precinct (you can think of this as the number of voters who are registered to vote in that precinct). We will assume that all registered voters will go to the polls on election day, but not all arrive in time to vote.
In addition to these values, a precinct has two more values–the arrival rate and the voting duration rate–that are used to generate voters. We discuss this process below.
Precinct Configuration¶
The configuration for a precinct is specified using a dictionary containing the following information:
The name of the precinct (
name
)Number of hours the polls are open (
hours_open
)Number of voters assigned to the precinct (
num_voters
)Number of voting booths assigned to the precinct (
num_booths
)The rate at which voters arrive (
arrival_rate
)The rate at which voters complete their voting (
voting_duration_rate
)Whether or not the precinct is large (
large_precinct
)
The significance of the rates will be explained later.
For example:
{
"name": "Downtown",
"hours_open": 1,
"num_voters": 5,
"num_booths": 1,
"voting_duration_rate": 0.1,
"arrival_rate": 0.11,
"large_precinct": False
}
Information about a precinct will be stored in a JSON file that contains a
dictionary with two items: the key "precinct"
maps to a precinct
configuration as specified above and the key "seed"
maps to a
random seed. Similar to PA #1, the random seed will be used to ensure
that a simulation involving randomness produces the same result every
time it runs (which will allow us to test whether or not your code produces
the expected results). We recommend reviewing PA #1 (and how it used
random seeds) if you’re unsure of how random seeds are used in the
context of running a simulation.
Precinct
class¶
Notice how having a Precinct
class allows us to have multiple
instances of the Precinct
class in our program, each with its own
set of parameters. This design will make it easier for us to simulate
how voters behave given different settings for the parameters.
We have provided a skeleton for the Precinct
class that
includes the constructor and a simulate
method to simulate a
single voting day for that precinct. While you are allowed to add
additional methods to this class, you cannot modify the parameter
list of the constructor or the simulate
method.
Please note that we will consider each Precinct
to be independent
from each other: voters cannot go from one precinct to another, and
there is no information shared between the precincts.
Generating Voters¶
One of the main responsibilities of the Precinct
class is to generate
the voters.
When we generate a voter, we set their arrival time, so
let’s take a closer look at how those arrival times are generated.
The arrival time of our voters will follow a Poisson process. From Wikipedia: a Poisson process, named after the French mathematician Siméon-Denis Poisson (1781-1840), is a stochastic process in which the time to the next event is independent of past events. We can use the following fact to generate a voter sample: if the number of arrivals in the time interval \([0, t]\) follows the Poisson distribution with mean \(t \cdot \lambda\), then the gap between arrivals times follows the exponential distribution with rate parameter \(\lambda\).
This means that to generate a new voter, we need to know the arrival time of the previously generated voter. Our voter simulation code should keep track of a “current time” representing when the most recent voter arrived. We will keep track of time in minutes starting at time zero. To generate a voter, you will need to generate an arrival time and a voting duration.
Arrival time
To generate the arrival time of the next voter, you will generate an inter-arrival time (\(\textit{gap}\)) from the exponential distribution using the specified arrival rate as \(\lambda\) and add it to the current time (\(t\)). You’ll then move time forward by updating the time from \(t\) to \(t + \textit{gap}\). Notice how \(\textit{gap}\) represents the time that elapses between the arrival of the previous voter and the arrival of the next voter.
For example, let’s say we want to generate a voter in a precinct,
and the random number generator returns 5.7
for that
voter’s gap. Since this is the first voter we generate, their arrival
time will be 5.7
, and we advance the precinct’s current time from 0
to
5.7
. We then generate another voter, and the random number generator
returns 2.4
for the voter’s gap. This means that this second voter
arrives 2.4
minutes after the previous voter, so the second voter’s
arrival time will be 8.1
(5.7 + 2.4
) and we advance the precinct’s
current time to 8.1
.
Voting Duration
When we generate a voter, we also need to assign a voting duration to that voter. The voting duration also follows a Poisson process so you will draw the voter’s voting duration from the exponential distribution using the specified voting duration rate as \(\lambda\).
Task 1: Creating data files¶
First, you will create the data files used throughout this assignment. Currently, the entire precinct data set is stored in a CSV file. However, as you will see, it will be easier to work with the data if it is stored in separate JSON files, one for each precinct. In this task, you will create these JSON files, which will have the form described in the “Precinct Configuration” section above.
Complete the function precincts_to_json
in process_data.py
. This function
takes the name of the CSV file containing precinct data, and the name of the directory
in which to store the JSON files.
In this function, you will need to:
Open the CSV file and load the data,
Clean the data (described below), and
Create a number of JSON files, one for each row of the input CSV file (also described below)
Now might be a good time to review the “Working with Data” chapter of the book. We have also uploaded an extra video about JSON files to Canvas under the “Panopto” tab to help you get started.
Cleaning the data¶
Often, real-world data sets are messy. There can be values missing from the data, they can contain
extraneous information, or have important information missing.
Open data/precincts.csv
and you will notice some of these issues.
After loading the CSV data (we recommend using the DictReader
class to read
the data into a list of dictionaries), you will need to process and clean the data.
Each JSON file will contain a
dictionary with two items: the key "seed"
that maps to a
random seed and the key "precinct"
that maps to a precinct
configuration. For example, here are the expected contents of precinct-0.json
:
{
"seed": 1468604453,
"precinct": {
"name": "Downtown",
"hours_open": 1,
"num_voters": 5,
"num_booths": 1,
"voting_duration_rate": 0.1,
"arrival_rate": 0.11,
"large_precinct": false
}
}
In order to create the dictionary above, you will need to do a number of things, described below.
Handle missing values: Some values in the "name"
and "num_booths"
columns of the CSV file are missing.
If you find a missing value for one of these keys, you should insert a default value rather than
leaving the value blank. The default
for "name"
should be "Unknown"
and the default for "num_booths"
should be 1.
Cast values to the correct type: If you use the DictReader
class (recommended above),
the values in each column of the CSV file will be loaded as strings. Instead:
The values of
seed
,hours_open
,num_voters
, andnum_booths
should be integers, andThe values of
voting_duration_rate
andarrival_rate
should be floats.
Add a key: Add the key “large_precinct”
to the precinct. This key should be True
if the precinct
is “large”, and False
otherwise.
We will say that a precinct is “large” if it has more than 100 voters.
Remove a key: Remove the key, value pair associated with the key “pop”
from the data. This is extraneous
information in the CSV file which will not be of use to us.
Create the JSON files¶
After cleaning the data, you will create a number of JSON files in the directory
specified by the output_directory
parameter. Name your files as follows:
precinct-0.json
for the first row of data in the CSV file,precinct-1.json
for the second row of data,precinct-2.json
for the third row of data,and so on.
Running your code¶
Create your JSON files calling precincts_to_json
in IPython. As usual,
we recommend enabling the autoreload
extension
whenever you start an IPython session.
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: import process_data
In [4]: process_data.precincts_to_json("data/precincts.csv", "data")
After this call, you should find a number of JSON files (precinct-0.json
,
precinct-1.json
, and so on) in the data
directory.
Testing¶
When you begin your work, we recommend testing your code by visually inspect the JSON files. Open your
JSON files in VSCode an verify that they contain the expected results.
We have also incluced all of the expected JSON files in the test-data
directory.
You can compare your results to these files side-by-side.
We have provided the function load_precinct
in util.py
that
takes the name of a JSON precinct file and loads the file into two values:
the precinct configuration dictionary (as specified above) and the random seed.
You should not make any calls to load_precinct
in the code you write yourself,
but it can be a useful function when testing your code.
In particular, you can check the contents of your JSON files in IPython.
Here is an example of how to use load_precinct
to open the data file
data/precinct-0.json
in IPython, and how to access each value associated
with the precinct.
In [3]: import util
In [4]: precinct, seed = util.load_precinct("data/precinct-0.json")
In [5]: precinct
Out[5]:
{'name': 'Downtown',
'hours_open': 1,
'num_voters': 5,
'num_booths': 1,
'voting_duration_rate': 0.1,
'arrival_rate': 0.11,
'large_precinct': False}
In [6]: seed
Out[6]: 1468604453
In [7]: precinct["num_voters"]
Out[7]: 5
In [8]: precinct["arrival_rate"]
Out[8]: 0.11
Once you’ve done some manual testing, you can run additional automated tests with py.test:
$ py.test -xvk data
(Recall that we use $
to indicate the Linux command-line prompt.
You should not include it when you run this command.)
As usual, remember that you can find more detailed instructions on how manual and automated testing work in the Testing Your Code page.
Task 2: Implement voter generation¶
Implement the Voter
class. And, in the provided Precinct
class, fill in the constructor,
and add code to the simulate
method
for generating a list of voters according to the configuration of the precinct.
In other words, instead of carrying out a full simulation of voters moving through the polling place, you will just
generate instances of the Voter
class for the voters who will be
able to vote in the precinct, and return a list of these instances.
Doing this partial implementation of a voting simulation will
ensure that you’ve correctly implemented the voter generation logic we
described earlier. The code for generating the voters in your
Precinct
should be fairly simple (our implementation is only 15
lines long, not including comments).
When writing your code, it is important that you take into account the following notes.
Repeatability
For testing purposes, it is important that you draw the gap and determine
the voter’s voting duration in the same
order as our code. To reduce the likelihood that the testing process
fails simply because the values were drawn in the wrong order, we have
provided a function, gen_voter_parameters
, in util.py
that
takes the precinct’s arrival rate and voting duration rate and returns a tuple with the gap as a float and
the voting duration as a float (in that order). We
recommend familiarizing yourself with this function to understand how
arrival gaps and voting duration times are determined.
It might be helpful to review the M/D/1 queue material from lecture.
When to stop generating voters
The number of voters generated will be determined by one of two stopping conditions: (1) you have generated the maximum number of the voters specified for the precinct, or (2) you generate a voter who arrives after closing time. In the second case, the late voter should be discarded and no further voters should be generated. We will refer to voters who arrive before the polls close as valid.
Testing¶
So that you can test your implementation manually, we have provided two sets of
voter generation parameters. The first set (see
data/precinct-0.json
) includes the following:
hours open: 1
number of voters in the precinct: 5
number of booths in the precinct: 1
voting duration rate: 0.1 (corresponds to a mean voting duration of 10 minutes)
arrival rate: 0.11 (corresponds to mean time between voters of approximately 9 minutes)
seed for random number generator: 1468604453
and corresponds to a voter population in which all the voters arrive before the polls close.
The random seed
Remember that the precincts file includes a seed value
(returned by load_precincts
). This seed must be passed
to simulate
(using the seed
parameter) and
you must use the random.seed(seed)
function to set the seed inside
your method that generates the list of voters.
The seed only needs to be set once for each list of voters that you generate.
Remember that PA #1 includes a longer discussion of random seeds, and their significance in testing simulations.
We have also provided two functions that will be useful to you as you
test this part your code. The first, a function named
run_simulation
in test_utils
, takes the name of a precinct
file and one optional parameter–the number of voting booths assigned
to the precinct. This function loads the
precinct data from the file, creates an instance of the Precinct
class using that data, and then returns the result of calling
simulate
on the precinct.
The second function, print_voters
in util.py
takes a list of
Voter
instances and an optional file name. This function will
either print a representation of the voters to screen or to a file,
depending on whether a filename is specified (you likely won’t have to
write to a file yourselves, but it is something that we’ve found
useful on occasion).
You can use these functions to call your voter list generation method and then to print and check the voters that get returned.
In [10]: import util, test_utils
In [11]: voters = test_utils.run_simulation("data/precinct-0.json")
In [12]: util.print_voters(voters)
Arrival Time Voting Duration Start Time Departure Time
-------------------------------------------------------------
6.78 2.11 None None
7.01 5.13 None None
25.27 2.68 None None
26.89 1.10 None None
33.48 10.08 None None
Notice how the Start Time
and the Departure Time
are initially
None
because you are not yet calculating the time when the voters
start voting (they may need to wait in line once they arrive at the
precinct, and we describe this in more detail in the next task).
The second set of voter generation parameters,
data/precinct-1.json
, is the same except the arrival
rate is set to 0.04, which corresponds to a mean gap of 25 minutes
between arrivals:
In [6]: voters = test_utils.run_simulation("data/precinct-1.json")
In [7]: util.print_voters(voters)
Arrival Time Voting Duration Start Time Departure Time
-------------------------------------------------------------
18.64 2.11 None None
19.28 5.13 None None
Notice how, in the above set, only two voters get to vote. If you actually generated the third voter, it would have an arrival time of 69.49, which means they would arrive after the polls close since this precinct is only open for one hour.
Once you’ve done some manual testing, you can run additional automated tests with py.test:
$ py.test -xvk voter
Task 3: Simulate an election day¶
In this task, you will be simulating an election for a single
precinct. You will do this by completing the implementation of the
simulate
method in Precinct
.
Simulating a Precinct¶
The simulation itself will be similar to the M/D/1 queue example from class. In the M/D/1 example, we have a single server, such as a toll booth, that serves a queue of clients, such as cars, that arrive with a specified distribution and have a fixed service time. In this assignment, your simulation will need to simulate multiple servers (voting booths) and variable service times (voting durations), but the structure of your simulation will be similar.
The biggest difference is that you will be using our VotingBooths
class (described below) to model the “multiple servers”. This class
will be responsible for determining whether a voting booth is
available and, if not, when a booth will become available. This
information will be necessary to determine when a given voter will
start voting (since it depends on when they can access a booth), and
to ensure that the number of booths in use will never be no more than
the number of booths assigned to the precinct.
Example¶
Let’s walk through an example. The file
data/precinct-2.json
contains the following
precinct:
{
"name": "Little Rodentia",
"hours_open": 1,
"num_voters": 10,
"num_booths": 2,
"voting_duration_rate": 0.1,
"arrival_rate": 0.16666666666666666,
"large_precinct": false
}
And the seed 1438018944
.
The precinct has 10 voters who arrive at the precinct once every 6 minutes on average. Voters take 10 minutes on average to vote.
As you work on your implementation, it can be useful to print out the
voters returned by your simulate
method, even if you have not yet
filled in the start and departure times of the voters:
In [15]: voters = test_utils.run_simulation("data/precinct-2.json")
In [16]: util.print_voters(voters)
Arrival Time Voting Duration Start Time Departure Time
-------------------------------------------------------------
3.29 5.39 None None
12.39 5.74 None None
12.68 24.45 None None
15.78 6.08 None None
19.76 39.64 None None
20.52 5.79 None None
22.33 16.65 None None
24.23 3.07 None None
26.75 5.24 None None
29.04 15.43 None None
Let’s consider what happens when we simulate this precinct with two voting booths.
Our first voter first voter arrives at 3.29, enters a voting booth immediately at time 3.29, and finishes voting at 8.69 (3.29 + 5.39). (All times in this example have been rounded to two digits for clarity; your code should not round times.)
The second voter also starts voting immediately, because a booth is open, and finishes voting at time 18.13 (12.39 + 5.74).
The third voter arrives at time 12.68 and has a voting duration of 24.45 minutes. Since the first voter departed at 8.69, this voter can start voting immediately, and finishes voting at 12.68.
The fourth voter arrives at 15.78 and has a voting duration of 6.08 minutes. Both booths are occupied when the fourth voter arrives, so this voter must wait until a booth becomes free. This happens when the second voter finishes at 18.13. So, the fourth voter’s start time will be 18.13 and their departure time will be 24.21. Notice that the fourth voter departs before the third voter.
The fifth voter arrives at time 19.79 and will not start voting until the fourth voter finishes at time 24.21. Notice that the third voter is still voting when the fifth voter starts!
And so on.
All ten voters arrive before the polls close so all ten get to vote. Please note that (as typical in the US) if a voter arrives before the polls close, they still get to vote even if they will not start voting until after the polls have closed. In this example, the polls close at time 60 (the precinct configuration dictionary specifies the number of hours the polls are open, while all other times are measured in minutes).
Here’s the result of our simulation:
In [20]: voters = test_utils.run_simulation("data/precinct-2.json", num_booths=2)
In [21]: util.print_voters(voters)
Arrival Time Voting Duration Start Time Departure Time
-------------------------------------------------------------
3.29 5.39 3.29 8.69
12.39 5.74 12.39 18.13
12.68 24.45 12.68 37.13
15.78 6.08 18.13 24.21
19.76 39.64 24.21 63.85
20.52 5.79 37.13 42.92
22.33 16.65 42.92 59.57
24.23 3.07 59.57 62.64
26.75 5.24 62.64 67.88
29.04 15.43 63.85 79.28
Notice that we provided a value for the optional parameter of test_utils.run_simulation
:
num_booths
is two, since there are two booths in Precinct 2.
Managing the Voting Booths¶
We have supplied a VotingBooths
class to model the voting booths.
This class allows you to add and remove voters from booths and to
track the state of the booths.
VotingBooths
class has a constructor that takes the number of
booths and initializes the data structures used to manage the booths.
Priority Queues
The underlying data structure used to represent the booths in the
VotingBooths
class is a priority queue. You do not need
to understand priority queues to use the VotingBooths
class.
But in case you are curious: a minimum priority queue is a data
structure that includes operations for (1) adding items along with
a priority to the queue (where the priority is just a
number), (2) removing the item with the lowest priority value
from the queue, and (3) determining whether the queue is empty.
Removing the elements with the lowest priority value may seem
counter-intuitive but, in Computer Science, a lower value usually
denotes a higher priority (think of it as a ranking: if you’re #1,
your ranked above someone who is ranked #2).
The class also provides a collection of methods:
is_booth_available
: which returnsTrue
if there is at least one unoccupied booth andFalse
otherwise.is_some_booth_occupied
: which returnsTrue
if there is at least one occupied booth andFalse
otherwise.time_next_free
: returns a float with the time that the next booth will be free next. This method should only be called when at least one booth is occupied.enter_booth
: takes an instance of theVoter
class and adds the voter to a booth. This method should only be called when at least one booth is open and it also requires thestart_time
andvoting_duration
attributes of theVoter
to be set.exit_booth
: removes the next voter to depart from their booth and returns their departure time. This method should only be called when at least one booth is occupied.
When simulating an election day, you are responsible for ensuring that all booths are empty at the end of the day.
Your Precinct
code may use as many of these methods as needed.
Your code may not access the underlying data directly.
Returning the voters¶
Your simulate
function must return a list of all the Voter
objects. The start_time
and departure_time
attributes should be updated for the voters who voted. Furthermore, for a
given precinct, the list of voters should be sorted by the voters’
arrival time. Since your Precinct
should generate voters in the
order in which they arrive, producing a list of voters sorted by
arrival time should not require any additional steps. If you do find
that you need to sort the list explicitly, it is likely that your
solution is headed in the wrong direction; come to office hours or ask
for help on Ed if that is the case.
Testing¶
We have provided a main function in simulate.py
that
allows you to run simulations from the command line. If you run
simulate.py
with just the name of a configuration file, it will
print a summary of the simulation results using that precinct with one
voting machine. For example:
$ python3 simulate.py data/precinct-0.json
PRECINCT 'Downtown'
- 5 voters voted
- Polls closed at 60.0 and last voter departed at 43.56
- Avg wait time: 0.59
You can use the --print-voters
options to, instead, print
all the voters:
$ python3 simulate.py data/precinct-0.json --print-voters
PRECINCT 'Downtown'
Arrival Time Voting Duration Start Time Departure Time
6.78 2.11 6.78 8.89
7.01 5.13 8.89 14.02
25.27 2.68 25.27 27.94
26.89 1.10 27.94 29.04
33.48 10.08 33.48 43.56
We have also provided some automated tests that will test your
implementation of the simulate
method. You can run these tests
like this:
$ py.test -xvk day
Task 4: Finding the average waiting time of a precinct¶
At this point, we have the ability to simulate precincts under a variety of parameters. An interesting parameter to tweak is the number of voting booths, since it has a direct effect on the average waiting time of the voters.
In this task, you will implement a function that, given a single precinct and a number of booths, computes the average waiting time when simulating that precinct with that number of booths:
def find_avg_wait_time(precinct, num_booths, ntrials, initial_seed=0):
Where:
precinct
is a precinct configuration dictionarynum_booths
is the number of booths to use in the simulation. Notice that this will override whatever value is specified in theprecinct
dictionary. However do not modify thenum_booths
field of theprecinct
dictionary.ntrials
is a number of trials (this is explained below)initial_seed
is an initial seed (also explained below)
Given a single simulation of a precinct, computing the average waiting time
of the voters is not difficult. However, as with any simulation,
it is unreasonable to accept the result of a single simulation (or “trial”).
To avoid having the quirks of a specific set of voters mislead us, we will run through
this process a specified number of times (ntrials
times) and return the median
average wait time across the different trials.
You will need to use a different random seed for each trial.
Start with the specified initial_seed
and add 1 for each new
trial.
Finding the median of a set of trial runs¶
In this task, you will need to run a series of trials,
collect the results from the trials, and then report the median
result. One easy way to find the median is to store the results in a
list (let’s call it lst
), sort the list using the built-in
sorted
function (sorted(lst)
), and then return the middle
element of the sorted list (sorted(lst)[len(lst) // 2]
). Our test
code for this task assumes you use this approach to compute the
median.
Testing¶
To test your implementation of the function, you can call the function from IPython. For example:
In [4]: import util, simulate
In [5]: p, seed = util.load_precinct("data/precinct-3.json")
In [6]: p
Out[6]:
{'name': 'Unknown',
'hours_open': 13,
'num_voters': 500,
'num_booths': 5,
'voting_duration_rate': 0.1,
'arrival_rate': 0.7,
'large_precinct': True}
In [7]: seed
Out[7]: 1438021247
In [8]: simulate.find_avg_wait_time(p, num_booths=1, ntrials=20, initial_seed=seed)
Out[8]: 2203.0502079782727
In [9]: simulate.find_avg_wait_time(p, num_booths=8, ntrials=20, initial_seed=seed)
Out[9]: 4.630351652014112
You can also run the automated tests for this function like this:
$ py.test -xvk avg
Task 5: Finding the optimal number of booths¶
Ideally, we would like to have enough booths on hand to allow everyone to vote without having to wait too long in line.
At this point, we have all the necessary pieces to answer the following question: “Given a target waiting time \(W\), how many booths does the precinct need to ensure that the average waiting time is less that \(W\)?”.
Your task is to implement the following function to answer this question:
def find_number_of_booths(precinct, target_wait_time, max_num_booths, ntrials, seed=0):
Where:
precinct
is a precinct configuration dictionarytarget_wait_time
is \(W\) as defined abovemax_num_booths
is the maximum number of booths the precinct is willing to haventrials
is the number of trials to run when finding the average waiting time of a precinctseed
is a random seed
Your function will do a simple linear search: you will first simulate the precinct
with one booth, and check whether the average waiting time of the voters
(as produced by find_avg_wait_time
from Task 4)
is strictly below target_wait_time
. If it isn’t, you will simulate the precinct
with two booths, and check the average waiting time, and so on up to max_num_booths
or until you
find a number of booths that achieves an average waiting time less than
target_wait_time
. Take into account that it could also be the case that the provided
target_wait_time
is infeasible with a number of booths less than or
equal to max_num_booths
.
So, the result of this search with be a tuple with two values: the optimal number of booths,
and the average waiting time with that number of booths. If the provided
target waiting time is infeasible, return (0, None)
.
Testing¶
To test your implementation of the function, you can call the function from IPython. For example:
In [22]: import util, simulate
In [23]: p, seed = util.load_precinct("data/precinct-3.json")
In [27]: simulate.find_number_of_booths(p, ntrials=20, target_wait_time=10, max_num_booths=10, seed=seed)
Out[27]: (8, 4.630351652014112)
In [28]: simulate.find_number_of_booths(p, ntrials=20, target_wait_time=10, max_num_booths=5, seed=seed)
Out[28]: (0, None)
You can also run the function from the command-line by using the --target-wait-time
and --max-num-booths
options:
$ python3 simulate.py data/precinct-3.json --max-num-booths 10 --target-wait-time 10
Precinct 'Unknown' can achieve average waiting time of 4.63 with 8 booths
$ python3 simulate.py data/precinct-3.json --max-num-booths 5 --target-wait-time 10
The target wait time (10.00) is infeasible in precinct 'Sahara Square' with 5 or less booths
Note: The command-line tool sets the value of ntrials
to 20
.
And, finally, you can run the automated tests for this function like this:
$ py.test -xvk find
Grading¶
The assignment will be graded according to the Rubric for Programming Assignments. Make sure to read that page carefully; the remainder of this section explains aspects of the grading that are specific to this assignment.
In particular, your completeness score will be determined solely on the basis of the automated tests, which provide a measure of how many of the tasks you have completed:
Grade |
Percent tests passed |
---|---|
Exemplary |
at least 95% |
Satisfactory |
at least 80% |
Needs Improvement |
at least 50% |
Ungradable |
less than 50% |
For example, if your implementation passes 85% of the tests, you will earn an S (satisfactory) score for completeness.
The code quality score will be based on the general criteria in the Rubric for Programming Assignments but, in this programming assignment, we will also be paying special attention to the following:
Appropriate design for the
Voter
class Did you design one or more methods that are apporpriate for updating voters once they vote?Appropriate use of the
VotingBooths
class Does your implementation leverage theVotingBooths
class to manage the voters properly or did you recreate some of the functionality of this class in your own code?Appropriate use of helper functions Did you design a good method for creating voters? Did you write well-designed helper methods for the parameter sweeps?
While these are the main things we care about in this assignment, please remember that it is not possible for us to give you an exhaustive list of every single thing that could affect your code quality score (and that thinking in those terms is generally counterproductive to learning how to program; see our How to Learn in this Class page for more details).
In general, avoiding all of the above issues will increase the likelihood of getting an E; if your code has a few (but not most) of the above issues, that will likely result in an S; if your code suffers from most of those issues, it will likely get an N. That said, to reiterate, we could also find other issues in your code that we did not anticipate, but that end up affecting your code quality score. When that happens, we encourage you to see these as opportunities to improve your code in future assignments (or as specific things to change if you decide to resubmit the assignment).
And don’t forget that style also matters! You could avoid all of the above issues, and still not get an E because of style issues in your code. Make sure to review the general guidelines in the Rubric for Programming Assignments, as well as our Style Guide.
Cleaning up¶
Before you submit your final solution, you should, remove
any
print
statements that you added for debugging purposes andall in-line comments of the form: “YOUR CODE HERE” and “REPLACE …”
Also, check your code against the style guide. Did you use good variable names? Do you have any lines that are too long, etc.
Make sure you have included header comments, that is, the triple-quote strings that describe the purpose, inputs, and return values of each function, for every function you have written.
As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.
Submission¶
You will be submitting your work through Gradescope (linked from our Canvas site). The process will be the same as with previous coursework: Gradescope will fetch your files directly from your PA #4 repository on GitHub, so it is important that you remember to commit and push your work! You should also get into the habit of making partial submissions as you make progress on the assignment; remember that you’re allowed to make as many submissions as you want before the deadline.
If you are working in a pair, only one of you has to make a submission
To submit your work, go to the “Gradescope” section on our Canvas site. Then, click on “Programming Assignment #4”.
Under “Repository”, make sure to select your
uchicago-CAPP30121-aut-2022/pa4-$TEAM.git
repository. Under “Branch”, just
select “main”.
Finally, click on “Upload”. An autograder will run, and will report back a score. Please note that this autograder runs the exact same tests (and the exact same grading script) described in Testing Your Code. If there is a discrepancy between the tests when you run them on your computer, and when you submit your code to Gradescope, please let us know.
If you are working in a pair, Gradescope will initially create an individual submission. You will need to explicitly add your partner to your submission, so that your work is associated with them too.