A Beginner's GNU GCC Project

Wesley Pegden

1 The Project

1.1 Project Selection

In choosing the area of work for my open source project, I wanted a project which would challenging, yet reasonable. I wanted to avoid getting stuck in a situation where I was unable to make progress, while at the same time avoiding a project which would be too simple to be worthwhile.

I became interested in GCC due to its wide use and its beginner-friendly attitude towards contribution. Sam pointed me to GCC's Beginner Project page, which featured a list of projects including anything from bug hunting to rewriting functions. I became interested in the ``code cleanliness'' section of the project list, as this is an area where the range of skills required by potential projects can vary greatly. Thus, while I would have the opportunity to try something relatively ambitious, I would also be able to contribute something regardless of the extent of my success.

1.2 Project Goals

Beginning with the goal of removing excess preprocessor ``#if 0'' statements and looking through the GCC code, it became obvious that a script which would automatically search through the GCC source code to assist with the identification and removal of the #if 0 statements could be very useful. Thus I aimed initially to create a script which would find instances of the preprocessor statements, and copy them to a separate text file with the line number of the beginning of the statement, which would be easy to configure to act on all or part of the tree, which would be easily modifiable to be used to search for instances of some other type of code, and which would add convenience specific to GCC which a generic ``maintainer tools'' package wouldn't provide.

Later in the project, as a result of discussion in a Monday meeting, I decided to try to implement a means of getting age information on the code from CVS. This would add value to the program by making it immediately clear to a user of the script which instances of the preprocessor statements could be discarded because of their age alone. It would also satisfy my third goal for the project, making this a script which, while adaptable to most any code base, is more convenient in a way made possible by its specificity to GNU GCC.

2 The Script

2.1 Input/Output

The script accepts one argument (and ignores any arguments after the first). That argument is the name of a text file which contains a list of source files for the script to act on. For every file in the list which contains the ``#if 0'' statements, the script creates a file (e.g. source.c.if0s) which contains all of the instances of the statements, complete with the code they nest, the line number the statement begins on, and the date the #if 0 line was last modified.

2.2 Structure

The script is organized into five functions, as laid out as follows:

main(filelist){

loop

read a filename from filelist

send it to doit(filename)

repeat

}

doit(filename){

loop

read a line from filename

run endstringer(line) to return the line without initial spaces

if endstringer(line) begins with ``#if 0''

write line number (e.g. linenum) to filename.if0s

call cvsannotate(filename) to create filename.ann

run getdate(linenum, filename.ann) to get line history information

write line history to filename.if0s

copy lines to filename.if0s until #endif is reached

endif

loop

}

cvsannotate(filename){

create the directories for the target filename.ann

run cvs annotate and send output to the file filename.ann

}

getdate(linenum, filename.ann){

read the ``linenum'' numbered line from filename.ann

copy the date information in the line to a string

return the string

}

endstringer(linestring){

count off characters of linestring until the first non-whitespace character

copy the remaining characters to a string

return the string

}

Thus doit actually finds the #if 0 statements, cvsannotate downloads the history information for the file, and getdate and endstringer consolidate commonly performed tasks into there own functions for the sake of convenience and clarity.

2.3 Mechanisms and Strategies Used

2.3.1 String Manipulation

String manipulation is the most dominant task facing the script. In reading strings for use as filenames, comparison strings, etc., there were a few things which I had to be able to do. First I had to be able to remove newline characters from the end of a line of the filelist so that the line could be used as the name of the file. Initially, this seemed like a trivial task, though, there was an unforeseeable quirk which caused some grief.

Originally, I dealt with newline characters like this:

given a string,

look at the last character:

if it is a newline (`\n') character,

terminate the string before it

look at the second to last character:

if it is a newline (`\n') character,

terminate the string before it

return the string

This had two problems which could arise from the fact that I was recycling strings. First, there could be a bunch of stray characters after the newline character, meaning this wouldn't catch those. Incidentally, the reason I was bothering to check two different characters at all was that in the case of the last line, the string can have a newline character at the end without the `\0' string terminating character, making it possible for the `\n'to not be second to last as you expect in the rest of the file. Additionally (and this problem was quite tricky) there are of course two ways of writing the newline character, `\n' || `\021'. However, much to my surprise, C does not consider `\n' == `\021' to be a true statement. Therefore, when checking for newline characters, I would occasionally miss a newline character of the second form, breaking the program when it attempted to send the name (with newline) as part of a command. Thus, my final code looked like this:

given a string,

starting at the first character, count forward to the first `\021' OR `\n'

replace this character with `\0'

return the string

I also had to be able to compare lines to a string, ignoring any initial tabs, spaces, etc. This was pretty straightforward. My function endstringer(*str) looks like this:

endstringer(*str){

count forward until isspace(str+count) returns no,

change str to str+count

}

Thus, this simple function takes a string pointer and simply moves it to the first non-whitespace character of the string.

2.3.2 Examples of Command and File Manipulation

A lot of what the script does is run commands to create directories and get cvs information, among other things. While most of the errors I encountered with my commands ended up being string manipulation issues (see above), one particular example in my code stands out as something that was able to accomplish its task simply and without the filename parsing that would have been necessary otherwise.

As GCC contains many source files in subdirectories, there can be many filenames in ``filelist'' which begin with a subdirectory. When I saved cvs annotation information for the file, I wanted to do so in a separate directory which preserved the tree. While I could have parsed out the subdirectories and filename from each line of the ``filelist'' file, I avoided that clutter by taking advantage of the -p flag for mkdir, which creates the parent directories of the target directory if they do not exist.

If my script was given the filename ``~/subdir/file.c'', then, it would create the directory ``~/subdir'' as follows:

: mkdir -p filename
rm -rf filename

which is equivalent in this case to...

: mkdir -p ~/subdir/file.c
rm -rf ~/subdir/file.c

So mkdir -p actually creates a directory named file.c, along with all of its parents, while rm -rf removes that directory, leaving only its parents.

3 The Process

3.1 Coding

Writing the script was an educational experience in and of itself. Being that I was writing in a setting where I was able and encouraged to accept suggestions from as many directions as possible, there was great value in keeping the program logically organized and clearly understandable and explainable. It was easier to modify how I removed initial spaces in a string, for example, when that task was its own function. Similarly, if one wanted to change the output format for the date of last change of a line of code, the fact that getdate()is its own function makes that task much simpler.

3.2 Communication

In weekly meetings where we discussed possible improvements on the script, the learning experience was two-fold. Firstly, I heard good suggestions for modifications to the program. Secondly, and more importantly, I think, I realized that, assuming my script is used by people trying to find old useless code in GCC, there are going to be better ways of doing things that they will undoubtedly come up with and want to implement. When showing my program to the class on Monday's, for example, I noticed it was easiest to get suggestions for improvements to sections which were logically organized and well presented. The meetings, then, served as a reminder that my code be easily communicable by being structured in a way that is conducive to being understood and modified.

3.3 Interaction with the GNU GCC Maintainers

In my interactions with the GNU GCC Maintainers, I noticed some interesting things about the structure of one particular microcosm of the open-source community. First, it was immediately obvious that the GNU GCC people were eager for useful contributions. They welcomed my offer to upload the script to the ``contrib'' directory of the GCC tree. They also, however, have in place a hierarchy which allows for sufficient review of changes to give GNU GCC a unified direction of development, even as it accepts contributions from vastly different individuals. Still involved in the process myself of completing the appropriate paperwork to submit my changes, I notice the balance necessary between organization and centralization versus contribution from a wide range of different sources.

4 The Future

While it has been very gratifying to write a script for GCC which can be used as a tool to assist with development of the compiler, It is even more so to have contributed something which can become a part of the open source community. In reality, the development process for my GCC script has (hopefully) just begun, as the script can be changed now over time to be accommodated to more specific (or more general) tasks, or simply be improved to work better. A strength of open source, it is clear, is that the positive aspects of my work are what define my contribution, as its shortcomings can and will be overcome by future developers that take part in the evolution of the code.

About this document ...

A Beginner's GNU GCC Project

This document was generated using the LaTeX2HTML translator Version 2002-2 (1.70)

The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir6681zETCNI/lyx_tmpbuf0/gnugcc.tex

The translation was initiated by Wesley Pegden on 2002-12-12

Wesley Pegden 2002-12-12