Wesley Pegden
In choosing the area of work for my open source project, I wanted a project which would challenging, yet reasonable. I wanted to avoid getting stuck in a situation where I was unable to make progress, while at the same time avoiding a project which would be too simple to be worthwhile.
I became interested in GCC due to its wide use and its beginner-friendly attitude towards contribution. Sam pointed me to GCC's Beginner Project page, which featured a list of projects including anything from bug hunting to rewriting functions. I became interested in the ``code cleanliness'' section of the project list, as this is an area where the range of skills required by potential projects can vary greatly. Thus, while I would have the opportunity to try something relatively ambitious, I would also be able to contribute something regardless of the extent of my success.
Beginning with the goal of removing excess preprocessor ``#if 0'' statements and looking through the GCC code, it became obvious that a script which would automatically search through the GCC source code to assist with the identification and removal of the #if 0 statements could be very useful. Thus I aimed initially to create a script which would find instances of the preprocessor statements, and copy them to a separate text file with the line number of the beginning of the statement, which would be easy to configure to act on all or part of the tree, which would be easily modifiable to be used to search for instances of some other type of code, and which would add convenience specific to GCC which a generic ``maintainer tools'' package wouldn't provide.
Later in the project, as a result of discussion in a Monday meeting, I decided to try to implement a means of getting age information on the code from CVS. This would add value to the program by making it immediately clear to a user of the script which instances of the preprocessor statements could be discarded because of their age alone. It would also satisfy my third goal for the project, making this a script which, while adaptable to most any code base, is more convenient in a way made possible by its specificity to GNU GCC.
The script accepts one argument (and ignores any arguments after the first). That argument is the name of a text file which contains a list of source files for the script to act on. For every file in the list which contains the ``#if 0'' statements, the script creates a file (e.g. source.c.if0s) which contains all of the instances of the statements, complete with the code they nest, the line number the statement begins on, and the date the #if 0 line was last modified.
The script is organized into five functions, as laid out as follows:
loop
read a filename from filelist
send it to doit(filename)
repeat
}
doit(filename){
loop
read a line from filename
run endstringer(line) to return the line without initial spaces
if endstringer(line) begins with ``#if 0''
write line number (e.g. linenum) to filename.if0s
call cvsannotate(filename) to create filename.ann
run getdate(linenum, filename.ann) to get line history information
write line history to filename.if0s
copy lines to filename.if0s until #endif is reached
endif
loop
}
cvsannotate(filename){
login to CVS if not already logged in
create the directories for the target filename.ann
run cvs annotate and send output to the file filename.ann
}
getdate(linenum, filename.ann){
read the ``linenum'' numbered line from filename.ann
copy the date information in the line to a string
return the string
}
endstringer(linestring){
count off characters of linestring until the first non-whitespace character
copy the remaining characters to a string
return the string
}
String manipulation is the most dominant task facing the script. In reading strings for use as filenames, comparison strings, etc., there were a few things which I had to be able to do. First I had to be able to remove newline characters from the end of a line of the filelist so that the line could be used as the name of the file. Initially, this seemed like a trivial task, though, there was an unforeseeable quirk which caused some grief.
Originally, I dealt with newline characters like this:
look at the last character:
if it is a newline (`\n') character,
terminate the string before it
look at the second to last character:
if it is a newline (`\n') character,
terminate the string before it
return the string
starting at the first character, count forward to the first `\021' OR `\n'
replace this character with `\0'
return the string
count forward until isspace(str+count) returns no,
change str to str+count
}
A lot of what the script does is run commands to create directories and get cvs information, among other things. While most of the errors I encountered with my commands ended up being string manipulation issues (see above), one particular example in my code stands out as something that was able to accomplish its task simply and without the filename parsing that would have been necessary otherwise.
As GCC contains many source files in subdirectories, there can be many filenames in ``filelist'' which begin with a subdirectory. When I saved cvs annotation information for the file, I wanted to do so in a separate directory which preserved the tree. While I could have parsed out the subdirectories and filename from each line of the ``filelist'' file, I avoided that clutter by taking advantage of the -p flag for mkdir, which creates the parent directories of the target directory if they do not exist.
If my script was given the filename ``~/subdir/file.c'', then, it would create the directory ``~/subdir'' as follows:
rm -rf filename
rm -rf ~/subdir/file.c
Writing the script was an educational experience in and of itself. Being that I was writing in a setting where I was able and encouraged to accept suggestions from as many directions as possible, there was great value in keeping the program logically organized and clearly understandable and explainable. It was easier to modify how I removed initial spaces in a string, for example, when that task was its own function. Similarly, if one wanted to change the output format for the date of last change of a line of code, the fact that getdate() is its own function makes that task much simpler.
In weekly meetings where we discussed possible improvements on the script, the learning experience was two-fold. Firstly, I heard good suggestions for modifications to the program. Secondly, and more importantly, I think, I realized that, assuming my script is used by people trying to find old useless code in GCC, there are going to be better ways of doing things that they will undoubtedly come up with and want to implement. When showing my program to the class on Monday's, for example, I noticed it was easiest to get suggestions for improvements to sections which were logically organized and well presented. The meetings, then, served as a reminder that my code be easily communicable by being structured in a way that is conducive to being understood and modified.
In my interactions with the GNU GCC Maintainers, I noticed some interesting things about the structure of one particular microcosm of the open-source community. First, it was immediately obvious that the GNU GCC people were eager for useful contributions. They welcomed my offer to upload the script to the ``contrib'' directory of the GCC tree. They also, however, have in place a hierarchy which allows for sufficient review of changes to give GNU GCC a unified direction of development, even as it accepts contributions from vastly different individuals. Still involved in the process myself of completing the appropriate paperwork to submit my changes, I notice the balance necessary between organization and centralization versus contribution from a wide range of different sources.
While it has been very gratifying to write a script for GCC which can be used as a tool to assist with development of the compiler, It is even more so to have contributed something which can become a part of the open source community. In reality, the development process for my GCC script has (hopefully) just begun, as the script can be changed now over time to be accommodated to more specific (or more general) tasks, or simply be improved to work better. A strength of open source, it is clear, is that the positive aspects of my work are what define my contribution, as its shortcomings can and will be overcome by future developers that take part in the evolution of the code.
This document was generated using the LaTeX2HTML translator Version 2002-2 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -no_subdir -split 0 -show_section_numbers /tmp/lyx_tmpdir6681zETCNI/lyx_tmpbuf0/gnugcc.tex
The translation was initiated by Wesley Pegden on 2002-12-12