Homework #4¶

Due: Thursday, November 11th at 11:59pm

In this homework assignment you will analyze the performance of your twitter application from Project #1.

Getting started¶

For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.

To actually get your private repository, you will need this invitation URL:

HW4 invitation (See Ed Post)

When you click on an invitation URL, you will have to complete the following steps:

You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

Note

If you are on the waiting list for this course you will not have a repository made for you until you are admitted into the course. I will post the starter code on Ed so you can work on the assignment until you are admitted into the course.

You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
You now need to clone your repository (i.e., download it to your machine).
- Make sure you’ve set up SSH access on your GitHub account.
- For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52960-aut21/hw4-GITHUB-USERNAME.git.
- If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

Part 1: Repository Configuration¶

In this assignment, we are going to see how the number of threads and block size affects the performance of your parallel implementation from Project 1. We will compare this to your sequential version by producing speedup graphs. However, we first need to configure the homework 4 directory to include your project 1 code. Follow the below steps:

Notice that your homework repository includes a hw4/proj1 directory. We are still working with the project 1 module so we need to keep the same structure. Also, notice there is a new folder inside the hw4/proj1 directory called benchmark. You will mostly be working within this directory for this assignment and will examine it further in Part 2.
Inside the of the hw4/proj1, copy over the following directories from your proj1 repository:
- feed
- lock
- server
- twitter
- And any additional code/files/directories that is needed to make your twitter client run correctly.
Update your twitter.go program such that if the number of threads and block size are not provided as command line arguments than the program defaults to running your sequential version.

Precompiled Project 1¶

If you project #1 does not pass all the required tests then you can use the precompiled solution on a CS Linux machine. Please note that you must run the program only on a CS Linux machine since it was compiled on that architecture. You cannot use this program on your local machine. You must login to a CS machine to work on this homework assignment.

Note

The course website provides documentation for logging in remotely (I would highly recommend using the VSCode way):

If you prefer a visual login mechanism then Techstaff has also setup a Virtual Desktop that mimics being at a CS machine in CSIL:

Remote Desktop Linux

After performing the above three steps, you will need to do the following:

Grab an updated version of benchmark.go by going into the benchmark directory and retrieving the updated version here:
```
$: cd benchmark
$: wget https://classes.cs.uchicago.edu/archive/2021/fall/52060-1/assignments/hw4/benchmark.go
```

Grab the pre-compiled version of the twitter client and place it into the twitter directory:

$: cd twitter
$: wget https://classes.cs.uchicago.edu/archive/2021/fall/52060-1/assignments/hw4/twitter
$: chmod 775 twitter

In the next part, we will explain how to use the precompiled version.

Now you are ready to start fully benchmarking the performance of your program.

Part 2: Performance Measurement¶

Inside the hw4/proj1/benchmark directory, you will see the a file called benchmark.go. This program copies over the all requests test cases you saw from project 1 (i.e., extra-small, small, medium, large, and extra-large). The benchmark program allows you to execute one of these test cases using your sequential or parallel versions and outputs the elapsed time for executing that test. Please read over the usage statement for this program to understand how to use it:

Usage: benchmark version testSize threads blockSize
version =  (p) - parallel version, (s) sequential version
testSize = Any of the following commands can be used for the testSize argument
        xsmall = Run the extra small test size
        small = Run the small test size
        medium = Run the  medium test size
        large = Run the large test size
        xlarge = Run the extra large test size
threads (required for p version only) = the number of threads to pass to twitter.go
blockSize (required for p version only) = the block size to pass to twitter.go

Sample Runs

Here’s how to run your sequential version on the extra-small test case:

$: go run benchmark.go s xsmall
0.27

The only output is the execution time (in seconds) to run the extra small test.

Here’s how to run your parallel version on the medium test case with 4 threads and a block size of 2:

$: go run benchmark.go p medium 4 2
0.95

Notice that you only need to specify these arguments after the test command (i.e., medium) when running the parallel version. The additional arguments are the number of threads followed by the block size.

If you are using the precompiled version then you will need to add the -c flag before the version argument. For example:

$: go run benchmark.go -c s xsmall
0.27
$: go run benchmark.go -c p medium 4 2
0.95

Play around with running the benchmark program before moving on to the next subsection.

Generation of speedup graphs¶

We will use the benchmark.go program to produce speedup graphs for the different test-cases by varying the number of threads and block-size. Each speedup graph is based around a single test case (e.g., xsmall) where each line represents running that test-case with a specific block size. The set of threads will be {2,4,6,8,12} and will remain the same for all speedup graphs. Here is the breakdown for producing the speedup graphs:

You will have a total of 4 speedup graphs for each test-case size: xsmall, small, medium, and large. We are not using xlarge in this assignment.
Each line in the graph represents a block-size that you will run for each thread number in the set of threads (i.e., {2,4,6,8,12}. Specifically, each line represents making the block-size equal to exactly 1, 10%, 25%, and 50% of the total number of requests inside each test-case. Here are block-size amounts (i.e., the lines you will have) for each graph:
- xsmall(Total=20) : 1, 2, 5, 10
- small(Total=100) : 1, 10, 25, 50
- medium(Total=10,000) : 1, 1000, 2500, 5000
- large(Total=25,000) : 1, 2500, 6250, 12500
Similar to homework #3, you must run each line execution 5 times in a row. For example, running the xsmall line for block-size=1 and threads =2:
```
$ go run benchmark.go p xsmall 2 1
0.30
$ go run benchmark.go p xsmall 2 1
0.27
$ go run benchmark.go p xsmall 2 1
0.27
$ go run benchmark.go p xsmall 2 1
0.26
$ go run benchmark.go p xsmall 2 1
0.27
```
and use the average time (0.274) to use for the speedup calculation, which again is

\[Speedup = \frac{\text{wall-clock time of serial execution}}{\text{wall-clock time of parallel execution}}\]

Here’s my extra-small graph that I ran on my local machine:

Your graph may look vastly different from mine and that’s okay!. You may or may not have speedups for all lines and the speedups may vary from thread to thread. Your lines may just look odd and that’s all okay. You will analyze these graphs in the next part.
The names for each graph file will be the name of the test case. For example, the extra-small speedup graph will be named xsmall.png.
For each speedup graph, the y-axis will list the speedup measurement and the x-axis will list the number of threads. Similar to the graph shown below. Make make sure to title the graph, and label each axis. Make sure to adjust your y-axis range so that we can accurately see the values. That is, if most of your values fall between a range of [0,1] then don’t make your speedup range [0,14].
You must write a script that produces all five graphs on the debug Peanut cluster. Use the original benchmark-proj1.sh file as your template and keep the configuration settings the same except for choosing the debug partition.
All your work for this section must be placed in the benchmark directory along with the generated speedup graphs.

Note

You do not have to use the elapsed time provided by the benchmark program. You can still use time or if you are using Python some other mechanism such as timeit. You must be consistent with your choice of a timing mechanism. This means you cannot use the elapsed time from the benchmark program for one sample run and then other timing mechanism for other sample runs. This is not a stable timing environment so you must stick with the same mechanism for producing all graphs.

A few additional notes:

Feel free to change the #SBATCH --time=5:00 configuration to a longer time if needed.
Make sure you create the slurm/out directories before submitting your script using sbatch.

Part 3: Performance Analysis¶

Please submit a report (pdf document, text file, etc.) summarizing your results from the experiments and the conclusions you draw from them. Your report should also include the graphs as specified above and an analysis of the graphs. That is, somebody should be able to read the report alone and understand what code you developed, what experiments you ran and how the data supports the conclusions you draw. The report must also include the following:

A brief description of the project (i.e., an explanation what you implemented in feed.go, server.go, twitter.go. A paragraph or two recap will suffice.
Instructions on how to run your testing script. We should be able to just say sbatch benchmark-proj1.sh; however, if we need to do another step then please let us know in the report.
As stated previously, you need to explain the results of your graph. Based on your implementation why are you getting those results? Answers the following questions:
- What affect does the block size have on performance? Does changing the block size actually affect the speedup?
- Does the problem size (i.e., the test size) along with the block-size affect performance?
- Does the hardware have any affect on the performance of the benchmarks?
- If you are using the precompiled version, although you cannot see the code, you can still answer the above questions since the implementation structure is the same for everyone.
Based on the topics we discussed in class, identify the areas in your implementation that could hypothetically see increases in performance if you were to use a different synchronization technique or improved queuing techniques. Explain why you would see those increases.
Do not just restate what the graph is showing. For example,

“We can see that for the extra-small case there is no speedup when the block-size is 2 and threads is equal to 2.”

Yes that’s obvious from the looking at the graph but make sure you analyze why that is happening. Do not just state the obvious because we can see the graph. We want you to go deeper to try to explain the reasoning why there’s no speedup based on your implementation.
If you are using the precompiled solution because your project #1 does not work then you must answer these additional questions:
- Why do you think your solution does not work properly? What could potentially be the problem if given more time?
- Based on your solution, do you expect your implementation to be similar to the results produced by the pre-complied program?

Grading¶

For this assignment, there are no automated tests and you will be solely graded on your analysis of your report and the script that produces your speedup graphs (i.e., does it actually produce the graphs):

Speedup Graphs & Testing script: 50%
Performance Analysis Writeup: 50%

Submission¶

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #4” assignment page via two ways,

Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repository you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

A few other notes:

You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.