Homework #5

Due: Thursday, December 2nd at 11:59pm

Graduating students if you have an extension left then you will only be able to use one because of all grades must be turned in by Friday December 3rd @ 11:59pm

In this homework assignment you will analyze the performance of your image processing system from Project #2.

Getting started

For each assignment, a Git repository will be created for you on GitHub. However, before that repository can be created for you, you need to have a GitHub account. If you do not yet have one, you can get an account here: https://github.com/join.

To actually get your private repository, you will need this invitation URL:

When you click on an invitation URL, you will have to complete the following steps:

  1. You will need to select your CNetID from a list. This will allow us to know what student is associated with each GitHub account. This step is only done for the very first invitation you accept.

  2. You must click “Accept this assignment” or your repository will not actually be created.

  3. After accepting the assignment, Github will take a few minutes to create your repository. You should receive an email from Github when your repository is ready. Normally, it’s ready within seconds and you can just refresh the page.

  4. You now need to clone your repository (i.e., download it to your machine).
    • Make sure you’ve set up SSH access on your GitHub account.

    • For each repository, you will need to get the SSH URL of the repository. To get this URL, log into GitHub and navigate to your project repository (take into account that you will have a different repository per project). Then, click on the green “Code” button, and make sure the “SSH” tab is selected. Your repository URL should look something like this: git@github.com:mpcs52060-aut/hw5-GITHUB-USERNAME.git.

    • If you do not know how to use git clone to clone your repository then follow this guide that Github provides: Cloning a Repository

If you run into any issues, or need us to make any manual adjustments to your registration, please let us know via Ed Discussion.

Part 1: Repository Configuration

Before getting started on the performance analysis, we first need to configure the homework 5 directory to include your project 2 code. Follow the below steps:

  1. Notice that your homework repository includes a hw5/proj2 directory. We are still working with the project 2 module so we need to keep the same structure. Also, notice there is a new folder inside the hw5/proj2 directory called benchmark.

  2. Inside the of the hw5/proj2, copy over all the necessary directories from your proj2 repository that is needed to make your editor run correctly.

  3. Place the data directory inside your hw5/proj2 directory. If you are running this on a linux CS machine then you can perform the following commands to grab the data directory and unzip it:

    $: cd hw5/proj2
    $: wget https://www.dropbox.com/s/cwse3i736ejcxpe/data.zip
    $  unzip data
    

Make sure to delete the data.zip so you’re not taking up your disk space.

Precompiled Project 2

If you project #2 does not pass all the required tests then you can use the precompiled solution on a CS Linux machine. Please note that you must run the program only on a CS Linux machine since it was compiled on that architecture. You cannot use this program on your local machine. You must login to a CS machine to work on this homework assignment. After performing the above three steps, you will need to do the following:

  1. The pre-compiled solution is already inside the editor directory. It’s named editor. You will just need to change the permissions as such:

    $: cd editor
    $  chmod 775 editor
    
  1. To test that the precompiled version works, run it using the following commands:

    $ cd editor
    $ ./editor small bsp 2
    

    4.27

Part 2: Performance Measurements and Speedup Graphs

We will run timing measurements on both the sequential and parallel versions of the editor.go program. The data directory we be used to measure the performance of the parallel versions versus the sequential version. For this assignment, we will keep things simple and only look at measuring single data directories: small, mixture, and big. The measurements gathered will be used to create speedup grades (similiar to project 1). Each speedup graph is based around a single parallel version (e.g., pipeline) where each line represents running a specific data directory. The set of threads will be {2,4,6,8,12} and will remain the same for all speedup graphs. Here is the breakdown for producing the speedup graphs:

  1. You will have a total of 2 speedup graphs for parallel versions: bsp, and pipeline.

  2. Each line in the graph represents a data directory size (i.e., small, mixture, and big) that you will run for each thread number in the set of threads (i.e., {2,4,6,8,12}).

    Similar to homework #4, you must run each line execution 5 times in a row. For example, running the small line for bsp with threads == 2:

    $ go run proj2/editor small bsp 2
    4.33
    $ go run proj2/editor small bsp 2
    4.35
    $ go run proj2/editor small bsp 2
    4.27
    $ go run proj2/editor small bsp 2
    4.30
    $ go run proj2/editor small bsp 2
    4.29
    

    and use the average time (4.31) to use for the speedup calculation, which again is

    \[Speedup = \frac{\text{wall-clock time of serial execution}}{\text{wall-clock time of parallel execution}}\]

    Here’s my bsp speedup graph that I ran on my local machine:

    ../../_images/speedup-bsp.png

    Your graph may look vastly different from mine and that’s okay!. You may or may not have speedups for all lines and the speedups may vary from thread to thread. Your lines may just look odd and that’s all okay. You will analyze these graphs in the next part.

  3. The names for each graph file will be the name of the parallel versions (i.e., speedup-bsp.png and speedup-pipeline.png)

  4. For each speedup graph, the y-axis will list the speedup measurement and the x-axis will list the number of threads. Similar to the graph shown below. Make make sure to title the graph, and label each axis. Make sure to adjust your y-axis range so that we can accurately see the values. That is, if most of your values fall between a range of [0,1] then don’t make your speedup range [0,14].

  5. You must write a script that produces both graphs on the debug Peanut cluster. Use the original benchmark-proj`1.sh file as your template but name the actual slurm file for homework 5, benchmark-proj2.sh and keep the configuration settings the same execept for choosing the debug parition.

  6. All your work for this section must be placed in the benchmark directory along with the generated speedup graphs.

Note

You do not have to use the elapsed time provided by the benchmark program. You can still use time or if you are using Python some other mechanism such as timeit. You must be consistent with your choice of a timing mechanism. This means you cannot use the elasped time from the benchmark program for one sample run and then other timing mechanism for other sample runs. This is not a stable timing environment so you must stick with the same mechanism for producing all graphs.

Part 3: Performance Analysis

Please submit a report (pdf document, text file, etc.) summarizing your results from the experiments and the conclusions you draw from them. Your report should also include the graphs as specified above and an analysis of the graphs. That is, somebody should be able to read the report alone and understand what code you developed, what experiments you ran and how the data supports the conclusions you draw. The report must also include the following:

  • A brief description of the project. A paragraph or two recap will suffice.

  • Instructions on how to run your testing script. We should be able to just say sbatch benchmark-proj2.sh; however, if we need to do another step then please let us know in the report.

  • As stated previously, you need to explain the results of your graph. Based on your implementation why are you getting those results? Answers the following questions:
    • What are the hotspots and bottlenecks in your sequential program?

    • Which parallel implementation is performing better? Why do you think it is?

    • Does the problem size (i.e., the data size) affect performance?

    • The Go runtime scheduler uses an N:M scheduler. However, how would the performance measurements be different if it used a 1:1 or N:1 scheduler?

    • If you are using the precompiled version, although you cannot see the code, you can still answer the above questions since the implementation structure is the same for everyone.

  • Based on the topics we discussed in class, identify the areas in your implementation that could hypothetically see increases in performance (if any). Explain why you would see those increases.

  • Do not just restate what the graph is showing. For example,

    “We can see that for the bsp implementation there is no speedup when using the small directory and threads is equal to 2.”

    Yes that’s obvious from the looking at the graph but make sure you analyze why that is happening. Do not just state the obvious because we can see the graph. We want you to go deeper to try to explain the reasoning why there’s no speedup.

  • If you are using the precompiled solution because your project #2 does not work then you must answer these additional questions:

    • Why do you think your solution does not work properly? What could potentially be the problem if given more time?

    • Based on your solution, do you expect your implementation to be similar to the results produced by the pre-complied program?

Grading

For this assignment, there are no automated tests and you will be solely graded on your analysis of your report and the script that produces your speedup graphs (i.e., does it actually produce the graphs):

  • Speedup Graphs & Testing script: 50%

  • Performance Analysis Writeup: 50%

Submission

Before submitting, make sure you’ve added, committed, and pushed all your code to GitHub. You must submit your final work through Gradescope (linked from our Canvas site) in the “Homework #5” assignment page via two ways,

  1. Uploading from Github directly (recommended way): You can link your Github account to your Gradescope account and upload the correct repository based on the homework assignment. When you submit your homework, a pop window will appear. Click on “Github” and then “Connect to Github” to connect your Github account to Gradescope. Once you connect (you will only need to do this once), then you can select the repsotiory you wish to upload and the branch (which should always be “main” or “master”) for this course.

  2. Uploading via a Zip file: You can also upload a zip file of the homework directory. Please make sure you upload the entire directory and keep the initial structure the same as the starter code; otherwise, you run the risk of not passing the automated tests.

A few other notes:

  • You are allowed to make as many submissions as you want before the deadline.

  • Please make sure you have read and understood our Late Submission Policy.