CS 154, Autumn 2019: Lab3

CS 154, Autumn 2019: Lab 3
Introduction
By the end of this lab, you should have a feel for debugging C programs with valgrind. You will get to know valgrind's error messages in controlled circumstances, so that you'll better know how to recognize and interpret them in your own code later. You will also gain an appreciation of valgrind's blind spots around bugs in using arrays allocated in certain ways, so that you know for Project 3 code how to best create and use arrays (if you want to be able to debug their use with valgrind).
The run-time analysis that valgrind performs makes it a powerful complement to debugging with gdb (and printf). Previous experience suggests that the types of problems that students have with Project 3 can often be found and resolved with the help of valgrind. If your Project 3 code crashes and you ask for help from the course staff, they may first ask, "what does valgrind say about why it crashed"? In addition, memory leaks (allocating but never freeing data on the heap) are a classic symptom of undisciplined coding, and valgrind can detect leaks easily.
In this lab, we are using the "Memcheck" tool in valgrind (the default tool), rather than the "Lackey" tool that the Project 3 internally uses to generate the memory traces. Refer to the Valgrind manual whenever you want.
When you say "valgrind", make sure it rhymes with "shall grinned", not "shall grind".
Do this lab on a CSIL Linux machine. Although we are not grading the lab, we will know that you haven't completed the lab by looking at your repository, and we may ask you to complete the lab if you are having basic problems with memory use that could have been resolved by using valgrind.
Basics
It is easy to analyze a program with valgrind: just put "valgrind " before the program's normal command-line invocation. To see an example of this, first cd into your CNET-cs154-aut-19 checkout (the one that contains hw1, hw2, p1bitmanip, etc. Then:
$ svn update
$ /usr/bin/time du -sh .
$ /usr/bin/time du -sh .
The svn update was just to make sure to get the lab3 directory you'll need for this Lab. Running the du program twice shows you the speed-up due to another kind of caching (which we haven't yet studied), the filesystem cache. Now try:
$ /usr/bin/time valgrind du -sh .
Note:

The lines printed by valgrind are prefixed by something like ==12345== (or it could be any number), where 12345 is the process ID of the program who's execution is being analyzed.
Even with no errors, valgrind prints things at the beginning and end of the program, ending with "ERROR SUMMARY".
Many commonly used programs (like du) will have memory in use at exit (pointers pointing to memory that was malloc()ed but not free()ed by the end). You can see this with the "HEAP SUMMARY ... in use at exit:" lines. Having your own programs free up all dynamically-allocated memory prior to exit is a good house keeping practice, but you may find that library functions like printf don't always do the same.
Things run slower under valgrind; sometimes more than 10 times slower. Valgrind's ability to find sneaky memory bugs makes this worth it, though.

In the lab3 subdirectory of your CNET-cs154-aut-19 checkout, Create the readme.txt file in which you will put your answers:
$ cd lab3
$ whoami > readme.txt
$ svn add readme.txt
$ svn commit -m "for lab3 answers" readme.txt
Note: make your answers in readme.txt as terse as you'd like.
Build, run, and valgrind the vgme program that you will be modifying and re-building in this lab:
$ make
$ ./vgme
$ valgrind ./vgme
The last command should be an example of what valgrind shows when there are zero problems. Note that the CFLAGS definition in this Makefile is the same as in your p3cache. Note also that, even though the vgme program is full of obvious bugs, "gcc -g -Wall -Werror" sees no problems. Valgrind will find the problems because it does not just execute program code: it instruments the code (it supplements the instructions with additional book-keeping and debugging information), and then executes the instrumented code on a simulated (software) CPU.
All of the following exercises are very short and simple. This lab will be easy to finish; use the rest of the time to start working on Project 3. Like the other projects, it will take sustained effort and creativity, not an flurry of last-minute hacking.
Exercise 0
Not all developers use valgrind as diligently as you will. Try (from within lab3):
$ valgrind svn update
Record in readme.txt the lines indicating any memory leaks (real and possible). See the manual for information about how memory leaks are reported.
Were memory leaks the only problem you and valgrind found with svn?
Exercise 1
Look at the code in vgme.c for fun0, fun1, and fun2 (find where gxd is set), and run:
$ valgrind ./vgme 0
$ valgrind ./vgme 1
$ valgrind ./vgme 2
Answer in readme.txt:
What problem is valgrind reporting in these three cases?
Try running:
$ valgrind --track-origins=yes ./vgme 0
$ valgrind --track-origins=yes ./vgme 1
$ valgrind --track-origins=yes ./vgme 2
As revealed by what valgrind output from "--track-origins=yes", which one of the three problems has a different origin from the other two?
Try changing fun0 to:
void fun0() {
  int y;
  printf("%s\n", y > 0 ? "foo" : "bar");
  return;
}
and then run make again, and if that works, then "valgrind ./vgme 0". The code has the same underlying problem, but what is recognizing the problem now? Change fun0 back to the way it was originally, before proceeding.
Comparing with fun2, look at the code for fun3 (find where gxs[] is declared) and run "valgrind ./vgme 3". Is any error reported? Have we learned in this class why or why not? In any case, make a mental note of what kind of array declaration prevents valgrind from doing its intended job.
Exercise 2
Look at the code for fun4 and fun5.

In terms of the regions of a process's address space that we have been learning about, what is the difference between the two x[SIZE] arrays?
What is the bug in both functions?
Compare the results of running "valgrind ./vgme 4" and "valgrind ./vgme 5". Why are they so different? Which is more helpful to you during debugging?
Why are there two errors reported for the single offending line of C code in fun5? Make sure you see how the two errors reported for fun5 are different.
Run "valgrind ./vgme 6". How well does valgrind detect the error in this case? Again, make a mental note of what kind array declaration prevents valgrind from doing its intended job.

Exercise 3
Compare the code for fun5 and fun7.

What is the bug in fun7?
Comparing the results of running "valgrind ./vgme 5" and "valgrind ./vgme 7", what words clue you into the difference in bugs in fun5 and fun7?
Hypothetically, if you didn't already know that 4 == sizeof(int), how could you learn that from either valgrind output (combined with referencing the code)?

Exercise 4
Look at the code for fun8 and run "valgrind ./vgme 8".
There are 6 frees and 5 valgrind error messages about (most of) them. Which is the only free() statement that didn't generate a valgrind error?
At least four of the 5 broken free() calls can be identified with one of the process's four memory regions that we talked about in class: which free calls are they, and which regions are involved?
Try removing -g from the compilation ("CFLAGS = -Wall -Werror -std=c99") and describe what information is missing about where in fun8 the problems arise.
Why was the same function introduced? Hint: try removing it in one or the other of the two lines:
  free(fun1);
  free(gxs);
and running make.
Exercise 5
Look at code for fun9 and run "valgrind ./vgme 9".

What is the bug? What is the line in the valgrind output that identifies it? (make sure you've turned -g back on in the Makefile).
Does "valgrind --leak-check=full ./vgme 9" help isolate exactly where the memory bug is?

Exercise 6
Run ./vgme 10 and note how it crashes. Then look at code for fun10 and run "valgrind ./vgme 10". This relies on nodeNew, a simple example of a "constructor" or allocator function for a C struct. Answer in your readme.txt:

What is the bug? What is the line in the valgrind output that identifies it?
nodeNew follows a common convention for such constructors: with good input (here, str could be parsed as an int) a pointer to the new struct is returned, but in case of bad input, NULL is returned. Users of such constructors are responsible for making sure that a non-NULL address has been returned! Otherwise you will soon get a segfault. If you ask for help saying "My code segfaults", we may ask you to, "Use the tool you learned about in Lab 3 to find the bug and fix it".
Fix the code for fun10, by fixing the logic inside the while (strings[si]) loop so that a NULL return from nodeNew is detected, and the list is NULL-terminated there. Commit your fix to this code.
Don't forget to write your answers (again, as terse as you'd like) to the questions above into your readme.txt (in the lab3 directory), and svn commit this.

(This lab was created by Gordon Kindlmann for cs154-2014, and updated for 2019)