Polling places

Due: Friday, Nov 4th, 4pm

You may work alone or in a pair on this assignment.

../../_images/voting_booths.png

Every two years, many people around the country wait in long lines to vote due, in part, to inadequate provisioning of voting booths. In this assignment you will write code to simulate the flow of voters through precincts. The information generated by your simulator could be used by an election official as a crude lower bound to help determine the number of voting booths needed at a precinct.

We will view this problem as an instance of the more general problem of simulating an \(M\)/\(M\)/\(N\) queue, where:

  • the first \(M\) indicates that arrivals (voters arriving at the polls) obey a Markov process;
  • the second \(M\) indicates that the service times obey a Markov process (the amount of time a voter takes to complete a ballot); and finally
  • the \(N\) indicates that there are \(N\) servers (the number of voting booths).

We will tell you how to structure your implementation at a high level, but you will be responsible for the design details.

Getting started

See these start-up instructions if you intend to work alone.

See these start-up instructions if you intend to work with the same partner as in a previous assignment.

See these start-up instructions if you intend to work in a NEW pair.

The pa4 directory includes a file named simulate.py, which you will modify, a file named util.py that contains a few useful functions, and a directory named data, which contains sample data. See the file README.txt in pa4/data for an explanation of the contents of the data directory.

Please put all of your code for this assignment in simulate.py. Do not add extra files for the required classes.

Precinct Configurations

The configurations for a precinct, except the number of voting booths, is specified in a JSON file that contains the:

  • number of voters assigned to the precinct (num_voters),
  • number of hours the polls are open (hours_open),
  • voting duration rate (voting_duration_rate),
  • arrival rate (arrival_rate), and a
  • seed for the random number generator (seed).

The number of voting booths (number_of_booths) will be supplied as a command-line argument.

We have provided a function, setup_config in util.py that takes the name of a configuration file and a value for the number of voting booths and returns a dictionary containing the configuration parameters. The key for each value is shown in parenthesis next to its description above.

In addition to being called to set up the simulation in the main function of simulate.py, this function will also be useful when testing your code.

Here is an example use of this function:

In [1]: import util

In [2]: util.setup_config("data/config-0.json", 3)
Out[2]:
{'arrival_rate': 0.11,
 'hours_open': 1,
 'num_voters': 5,
 'number_of_booths': 3,
 'seed': 1468604453,
 'voting_duration_rate': 0.1}

Representing Voters

Your implementation must contain a class for representing a voter. This data structure must include the time the voter arrives at the polls, voting duration, that is, the amount of time the voter takes to vote, and the time the voter is assigned to a voting booth (aka, the start time). We found it convenient for debugging purposes to store a unique identifier and the time the voter leaves the voting booth (aka, the departure time) as well.

The start time for a voter, which is not available at construction time, should be initialized to None when the voter object is constructed. This value should be reset to the correct value when the voter is assigned to a voting booth. Similarly, voters’ departure times cannot be determined until we know their start times.

Your implementation must include at a minimum: a constructor and properties named arrival_time, voting_duration, and start_time for the corresponding values. Our utility function for printing voters, described below, relies on the existence of these properties.

This class is straightforward. It merely serves as a way to encapsulate information about a voter. Our implementation is roughly 30 lines, including the code for generating unique identifiers for the voters and for handling the optional attributes.

Please note, the voter sample class, described next, is the only class that is allowed to call the voter constructor.

Generating a voter sample

Your implementation must include a class for generating voter samples using a Poisson process. From Wikipedia: a Poisson process, named after the French mathematician Simeon-Denis Poisson (1781-1840), is the stochastic process in which events occur continuously and independently of one another. We can use the following fact to generate a voter sample: if the number of arrivals in a given time interval [0,t] follows the Poisson distribution, with mean \(t*\lambda\), then the gap between arrivals times follows the exponential distribution with rate parameter \(\lambda\).

You may not generate the voters en masse when you construct a voter sample. Instead, you must generate voters one at a time as requests are received from the client of the sample.

To generate a voter, you will need to generate an arrival time and a voting duration. To generate the arrival time of the next voter, you will draw an inter-arrival time (\(gap\)) from the exponential distribution using the specified arrival rate as \(\lambda\) and add it to the current time (\(t\)). You’ll then move time forward by setting \(t\) to \(t+gap\). You will draw the voter’s voting duration by drawing a value from the exponential distribution using the specified voting duration rate as \(\lambda\).

For testing purposes, it is important that you draw the gap and the voting duration in the same order as our code. To reduce the likelihood that the testing process fails simply because the values were drawn in the wrong order, we have provided a function, gen_voter_parameters, in util.py that takes the arrival rate and the voting duration rate as arguments and returns a pair of floats to use for the gap and voting duration (in that order).

We have one detail left to describe: the stopping conditions. Your simulator should generate new voters until (1) you have generated the number of the voters specified for the precinct or (2) you generate a voter who arrives after closing time. In the second case, the late voter should be discarded and no further voters should be generated. We will refer to voters who arrive before the polls close as valid.

Here is a more formal statement of the stopping conditions: if \(M\) is the number of voters who arrive before the polls close and \(N\) is the number of voters in the precinct, then your implementation should generate exactly:

  • \(N\) voters if \(M=N\)
  • \(M+1\) voters if \(M<N\)

Our implementation has a constructor, a method named has_next that returns a boolean to indicate whether there are valid voters left to be generated in the sample, and a method named next that returns the next voter in the sample. To simplify debugging our next method fails on an assertion if it is called after the last valid voter has been generated. You can use this organization or another as you prefer as long as your implementation generates the expected voters on demand (that is, as needed).

Printing a voter sample

We have provided a function named print_voters in util.py that takes a list of voter instances and an optional file name. This function will either print a representation of the voters to screen or to a file, depending on whether a filename is specified.

Testing

The code for generating the voter sample is not too long, but can be tricky to get right. We strongly urge you to test your voter sample class carefully before you move on to the next step.

When we were working on our code, we wrote a short function (8 lines) for testing sample creation. This function takes the name of a configuration file and creates a sample, pulls the voters one by one from it and constructs a list, and then calls util.print_voters with the list. You may want to consider writing a similar function to aid in your testing.

To help with testing your voter sample class, we have provided two sets of sample parameters. The first set, (see data/config-0.json), includes the following:

  • number of voters in the sample: 5
  • hours open: 1
  • arrival rate: 0.11 (corresponds to mean time between voters of approximately 9 minutes)
  • voting duration rate: 0.1 (corresponds to a mean voting duration of 10 minutes)
  • seed for random number generator: 1468604453

and corresponds to a sample in which all the voters arrive before the polls close.

Here is the result of calling our sample testing function with data/config-0.json.

Arrival Time   Voting Duration   Start Time    Departure Time
      6.78         2.11             None            None

      7.01         5.13             None            None

     25.27         2.68             None            None

     26.89         1.10             None            None

     33.48        10.08             None            None

The second set, data/config-1.json, is the same except the arrival rate is set to 0.04, which corresponds to a mean gap of 25 minutes. In this sample, only two of the voters arrive before the polls close. Here is the result of calling our sample testing function with data/config-1.json.

Arrival Time   Voting Duration   Start Time    Departure Time
     18.64         2.11             None            None

     19.28         5.13             None            None

Representing the Voting Booths

Cities are broken up into voting precincts, each of which has a number of voting booths. We’ll be simulating election day for a single precinct.

Your implementation must include a class for representing a precinct with a specified number of voting booths. At a minimum, your implementation must provide a means for adding voters to voting booths and removing voters from voting booths in increasing order of departure time.

You will need a data structure to keep track of the voters who are currently occupying voting booths and to determine which voter will depart next. Your implementation should use the PriorityQueue class the from the python queue library for this purpose. A minimum priority queue is a data structure that includes operations for (1) adding items along with their priorities to the queue, (2) removing the item with the lowest priority from the queue, and (3) determining whether the queue is empty. The PriorityQueue library also allows you to specify an upper bound on the number of items allowed to be in the queue at the same time and provides a method for determining whether the queue is full. Please look through the queue library for details on how to use the this class.

For this assignment, the priorities will be the voters’ departure times.

Simulate election day

Finally, you must complete the function simulate_election_day in simulate.py that simulates the flow of voters through a precinct on election day. Your function should take a precinct configuration and return a list of the valid voters, updated with their assigned starting times, in the order they arrived at the precinct

In lecture, we will describe a program that simulates a single server, such as a toll booth, that serves a queue of clients, such as cars, that arrive with a specified distribution and have a fixed service time. You need to simulate multiple servers (voting booths) and variable service times (voting durations), but the structure of your simulation will be similar.

There are three main differences. First, instead of generating arrival times for the clients directly in the simulation code, you will construct a voter sample and pull voters out of the sample as needed. Second, you will use the precinct data structure to determine whether a voting booth is available and if not, when a booth will become available. And third, you will need to limit the number of booths in use to be no more than the number of booths assigned to the precinct.

Your implementation of the election day simulator must use the voter sample and precinct data structures. It must not call the voter constructor or access the precinct priority queue directly.

Example

Let’s walk through an example. The file data/config-2.json contains the following configuration:

  • num_voters: 10
  • hours_open: 1
  • arrival_rate: 0.16666666666666666
  • voting_duration_rate: 0.1
  • seed: 1438018944

The precinct has 10 voters who arrive at the precinct once every 6 minutes on average and take 10 minutes on average to vote.

Let’s consider what happens when we simulate this precinct using two voting booths. The first call to util.gen_voter_parameters generates (3.29, 5.39) as the gap and voting duration, which means that our first voter arrives at 3.29, enters a voting booth immediately at time 3.29, and finishes voting at 8.69 (3.29 + 5.39). (All times have been rounded to two digits for clarity.)

The second call to util.gen_voter_parameters yields (9.10, 5.74), which means that the second voter arrives at time 12.39 (3.29 + 9.10). She starts voting immediately, because a booth is open. She finishes at time 18.13 (12.39 + 5.74).

The third voter arrives at time 12.68 (gap: 0.29) and has a voting duration of 24.45 minutes. Since the first voter departed at 8.69, this voter can start voting immediately. He finishes voting at 37.13.

The fourth voter arrives at 15.78 (gap: 3.10) and has a voting duration of 6.08 minutes. Both booths are occupied when she arrives, so this voter must wait until a booth becomes free, which happens when the second voter finishes at time 18.13. Her start time will be 18.13 and her finish time will be 24.21. Notice that this voter finishes before the third voter.

The fifth voter arrives at time 19.76 (gap: 3.99) and will not start voting until the fourth voter finishes at time 24.21. Notice that the third voter is still voting when the fifth voter starts!

And so on.

All ten voters arrive before the polls close and so all ten get to vote even though the last two will not start voting until after the polls have closed at time 60.

Here’s the output of our simulation:

Arrival Time   Voting Duration   Start Time    Departure Time
      3.29         5.39             3.29              8.69

     12.39         5.74            12.39             18.13

     12.68        24.45            12.68             37.13

     15.78         6.08            18.13             24.21

     19.76        39.64            24.21             63.85

     20.52         5.79            37.13             42.92

     22.33        16.65            42.92             59.57

     24.23         3.07            59.57             62.64

     26.75         5.24            62.64             67.88

     29.04        15.43            63.85             79.28

Testing

We have provided four sample configurations along with the output of running our implementation on each configuration with one voting booth. We have also included the example above (see data/config-2-2-voters.txt). These test cases should be used merely a starting point; our grading code uses these with configurations with different numbers of voting booths along with some precinct configurations that describe corner cases.

Our function, util.print_voters, takes a filename as an optional parameter. If you print your voters to a file using this function can compare your results with ours using the Linux diff command. For example, if you simulated data/config-0.json with one voting booth and saved the resulting voters in a file named config-0-1-myresults.txt, you could compare your results with ours with the command:

diff config-0-1-myresults.txt data/config-0-1-voters.txt

Please note that diff is silent when the two files are the same. It will generate output that attempts to identify the differences when the files are different. This output can be a bit cryptic. You may want to try including the -y flag or -u flag to get output from diff in an alternative format that might be find easier to understand.

Putting it all together

We have provided a main function that parses the command-line arguments (configuration file name and number of voting booths), calls simulate_election_day, and prints the resulting voters to standard out.

Please note that if you run simulate.py in ipython3 without any arguments, it will generate a usage error message:

In [9]: run simulate-skeleton.py
usage: python3 simulate-skeleton.py <configuration filename> [number of voting booths]

You can safely ignore this message.

If your code does not generate any debugging output or sends its debugging output to standard error (include file=sys.stderr as an argument in your calls to print), then you can compare your results to ours using a Linux pipe and diff. For example,:

python3 simulate.py data/config-0.json | diff - data/config-0-1-voters.txt

Using dash (-) as an argument to diff indicates that the data coming from the pipe should be used in place of data from a file.

Submission

See these submission instructions if you are working alone.

See these submission instructions if you are working in the same pair as for PA #2 or PA #3.

See these submission instructions if you are working in a NEW pair.