Homework #5¶
Due: Thursday, December 2nd at 11:59pm
Graduating students if you have an extension left then you will only be able to use one because of all grades must be turned in by Friday December 3rd @ 11:59pm
In this homework assignment you will analyze the performance of your image processing system from Project #2.
Getting started¶
For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.
To actually get your private repository, you will need this invitation URL:
When you click on an invitation URL, you will have to complete the following steps:
You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.
You must click “Accept this assignment” or your repository will not actually be created.
After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.
- You now need to clone your repository (i.e., download it to your machine).
Make sure you’ve set up SSH access on your GitHub account.
For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52060-aut/hw5-GITHUB-USERNAME.git.
If you do not know how to use
git clone
to clone your repository then follow this guide that Github provides: Cloning a Repository
If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.
Part 1: Repository Configuration¶
Before getting started on the performance analysis, we first need to configure the homework 5 directory to include your project 2 code. Follow the below steps:
Notice that your homework repository includes a
hw5/proj2
directory. We are still working with the project 2 module so we need to keep the same structure. Also, notice there is a new folder inside thehw5/proj2
directory calledbenchmark
.Inside the of the
hw5/proj2
, copy over all the necessary directories from yourproj2
repository that is needed to make your editor run correctly.Place the
data
directory inside yourhw5/proj2
directory. If you are running this on a linux CS machine then you can perform the following commands to grab thedata
directory and unzip it:$: cd hw5/proj2 $: wget https://www.dropbox.com/s/cwse3i736ejcxpe/data.zip $ unzip data
Make sure to delete the data.zip
so you’re not taking up your disk space.
Precompiled Project 2¶
If you project #2 does not pass all the required tests then you can use the precompiled solution on a CS Linux machine. Please note that you must run the program only on a CS Linux machine since it was compiled on that architecture. You cannot use this program on your local machine. You must login to a CS machine to work on this homework assignment. After performing the above three steps, you will need to do the following:
The pre-compiled solution is already inside the
editor
directory. It’s namededitor
. You will just need to change the permissions as such:$: cd editor $ chmod 775 editor
To test that the precompiled version works, run it using the following commands:
$ cd editor $ ./editor small bsp 2
4.27
Part 2: Performance Measurements and Speedup Graphs¶
We will run timing measurements on both the sequential and parallel versions of the editor.go
program. The data
directory we be used to measure the performance of the parallel versions versus the sequential version. For this assignment, we will keep things simple and only look at measuring single data directories: small
, mixture
, and big
. The measurements gathered will be used to create speedup grades (similiar to project 1). Each speedup graph is based around a single parallel version (e.g.,
pipeline
) where each line represents running a specific data directory. The set of threads will be {2,4,6,8,12}
and will remain the same for all speedup graphs. Here is the breakdown for producing the speedup graphs:
You will have a total of 2 speedup graphs for parallel versions:
bsp
, andpipeline
.Each line in the graph represents a
data
directory size (i.e.,small
,mixture
, andbig
) that you will run for each thread number in the set of threads (i.e.,{2,4,6,8,12}
).Similar to homework #4, you must run each line execution 5 times in a row. For example, running the
small
line forbsp
withthreads == 2
:$ go run proj2/editor small bsp 2 4.33 $ go run proj2/editor small bsp 2 4.35 $ go run proj2/editor small bsp 2 4.27 $ go run proj2/editor small bsp 2 4.30 $ go run proj2/editor small bsp 2 4.29
and use the average time (4.31) to use for the speedup calculation, which again is
\[Speedup = \frac{\text{wall-clock time of serial execution}}{\text{wall-clock time of parallel execution}}\]Here’s my
bsp
speedup graph that I ran on my local machine:Your graph may look vastly different from mine and that’s okay!. You may or may not have speedups for all lines and the speedups may vary from thread to thread. Your lines may just look odd and that’s all okay. You will analyze these graphs in the next part.
The names for each graph file will be the name of the parallel versions (i.e.,
speedup-bsp.png
andspeedup-pipeline.png
)For each speedup graph, the y-axis will list the speedup measurement and the x-axis will list the number of threads. Similar to the graph shown below. Make make sure to title the graph, and label each axis. Make sure to adjust your y-axis range so that we can accurately see the values. That is, if most of your values fall between a range of [0,1] then don’t make your speedup range [0,14].
You must write a script that produces both graphs on the
debug
Peanut cluster. Use the originalbenchmark-proj`1.sh
file as your template but name the actual slurm file for homework 5,benchmark-proj2.sh
and keep the configuration settings the same execept for choosing thedebug
parition.All your work for this section must be placed in the
benchmark
directory along with the generated speedup graphs.
Note
You do not have to use the elapsed time provided by the benchmark program. You can still use time
or if you are using Python some other mechanism such as timeit
. You must be consistent with your choice of a timing mechanism. This means you cannot use the elasped time from the benchmark program for one sample run and then other timing mechanism for other sample runs. This is not a stable timing environment so you must stick with the same mechanism for producing all graphs.
Part 3: Performance Analysis¶
Please submit a report (pdf document, text file, etc.) summarizing your results from the experiments and the conclusions you draw from them. Your report should also include the graphs as specified above and an analysis of the graphs. That is, somebody should be able to read the report alone and understand what code you developed, what experiments you ran and how the data supports the conclusions you draw. The report must also include the following:
A brief description of the project. A paragraph or two recap will suffice.
Instructions on how to run your testing script. We should be able to just say
sbatch benchmark-proj2.sh
; however, if we need to do another step then please let us know in the report.- As stated previously, you need to explain the results of your graph. Based on your implementation why are you getting those results? Answers the following questions:
What are the hotspots and bottlenecks in your sequential program?
Which parallel implementation is performing better? Why do you think it is?
Does the problem size (i.e., the data size) affect performance?
The Go runtime scheduler uses an
N:M
scheduler. However, how would the performance measurements be different if it used a1:1
orN:1
scheduler?If you are using the precompiled version, although you cannot see the code, you can still answer the above questions since the implementation structure is the same for everyone.
Based on the topics we discussed in class, identify the areas in your implementation that could hypothetically see increases in performance (if any). Explain why you would see those increases.
Do not just restate what the graph is showing. For example,
“We can see that for the bsp implementation there is no speedup when using the small directory and threads is equal to 2.”
Yes that’s obvious from the looking at the graph but make sure you analyze why that is happening. Do not just state the obvious because we can see the graph. We want you to go deeper to try to explain the reasoning why there’s no speedup.
If you are using the precompiled solution because your project #2 does not work then you must answer these additional questions:
Why do you think your solution does not work properly? What could potentially be the problem if given more time?
Based on your solution, do you expect your implementation to be similar to the results produced by the pre-complied program?
Grading¶
For this assignment, there are no automated tests and you will be solely graded on your analysis of your report and the script that produces your speedup graphs (i.e., does it actually produce the graphs):
Speedup Graphs & Testing script: 50%
Performance Analysis Writeup: 50%
Submission¶
Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #5” assignment page via two ways,
Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repsotiory you wish to upload and the branch (which should always be “main” or “master”) for this course.
Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.
A few other notes:
You are allowed to make as many submissions as you want before the deadline.
Please make sure you have read and understood our Late Submission Policy.