Lab 4 - CMSC 154 Lab Spring 2012

Lab 4 Goals

By the end of this lab, you will be able to debug programs that use signals, multiple processes, and multiple threads. You should feel comfortable with - but wary of - parallel programs.

Walkthrough

In the lab4 directory of your SVN repository, there is a walkthrough directory that contains a simple program with a data race on a global variable. First, compile and run the program to see the results of the data race:

$ cd cs154/lab4/walkthrough
$ gcc -g -o main main.c -lpthread
$ ./main
$ ./main
$ ./main

The output of the walkthrough program will be around 10,000 but can vary randomly. While you can inspect the code or attempt to set breakpoints to find these kinds of issues, there are tools that help to automatically track down threading issues. One of the most popular free tools for program analysis is the Valgrind instrumentation framework. For example, the Valgrind tool Helgrind has thread error detection, which helps to find data race issues like the one in this code. Valgrind automatically annotates your code with verification checks that look for errors as your program runs. To run Valgrind against the walkthrough, use the following command:

$ valgrind --tool=helgrind ./main

Valgrind will run for a minute or so, performing an analysis of the program. It will print out a set of errors about the data race. Valgrind numbers different threads of execution and then reports each of the read or write data races within those threads. In particular, it points out that the reads from and writes to the variable v are unsafe.

Exercises

First, please create and add a readme.txt file at the top of your lab4 directory with the names of all folks working together. Use this file to record your answers to questions in the exercises. If you work with several people, it's fine to just commit one readme.txt file and set of changes in one directory for this exercise.

$ cd ~/cs154/lab4
$ emacs readme.txt
$ svn add readme.txt
$ svn ci -m "Readme with member names"

Exercise 1

Unix signals arrive asynchronously, and it's easy to forget to mask or unmask interrupts at the correct times, fail to set up your signal handler appropriately, or just have a misunderstanding of what the system will do in different circumstances. Compile and run the program in the ex1 directory:

$ gcc -g -o main main.c
$ ./main

It should print out "Error: pid N had not been enqueued!" What's happening that causes the program to attempt to dequeue an item before the item was enqueued?

You will have to use your understanding of signals, signal masking, and printf debugging statements to understand what is happening. While gdb is a useful tool for many problems and has support for printing signals as they arrive (through the handle SIGCHLD print command), gdb changes the timing of signal arrival so significantly that you will be unable to use it for this assignment.

Determine what is wrong with the progam that causes this erratic behavior, make the change to make it work, and commit the fixed version of main.c. You do not need to write anything in readme.txt about this exercise.

HINT: the course text, CS:APP, has information on signal masking in section 8.5.6.

Exercise 2

Semaphores are commonly used to protect access to a limited number of resources. If many threads want to share access to few items, a semaphore implements an interface that supports it - a semaphore can be acquired or released, and in the implementation it handles tracking the number of current owners.

For this example, we will use a buggy version of a semaphore implemented using the pthreads mutex and condition variable objects. It is your job to find and fix the bug. To build and run the code, use the following commands:

$ gcc -g -o main main.c -lpthread
$ ./main

The code will print out an error message on the invariant that has been violated - a semaphore's count should never be less than zero.

This code assumes that it is only possible to run two copies of the slowIsPrime function at a time, but has split the work up into three threads calling that function. The semaphore ensures that only two run at once. You should consider using gdb and putting a breakpoint where the error is generated. Gdb will stop all of the pthreads when the breakpoint is hit. The info threads command will list all of the pthreads that are still alive, and the thread N command switches the current stack to the thread with number N.

Commit the fixed code to your repository. You do not need to write anything in readme.txt about this exercise.

HINT: there is documentation here on condition variables, which are the element of the program on which you should focus.

HINT2: if you're still stuck, there is further information here.

Exercise 3

A large program often has many different objects, each of which have their own locking requirements. When writing parallel code that uses multiple objects, each with their own associated locking, it's important to watch the order in which the those objects are accessed.

In this code, there is a potential deadlock. Compile and then run the program in the ex3 directory:

$ gcc -g -o main main.c -lpthread
$ ./main
$ ./main
$ ./main

It may print out 500 each of odds and evens, but it is equally likely to lock up. If it does, use the CTRL+C key combination to cancel execution.

Use the Valgrind tool as described in the walkthrough section on this exercise. Valgrind will print out information on lock order violations. Once Valgrind has printed information about Thread #3, you may press CTRL+C to stop execution of the program.

In your readme.txt file, copy the information about the lock order violation and explain in your own words what is happening in the code. Use names of variables in your description. If you have trouble interpreting the output, there is documentation here. You do not need to fix the code.

Bonus Problems

Do not attempt these problems until you have completed all of the normal exercises.