Unix Systems Programming: Lab 2 - Awk programming and regular expressions

Due:          Monday, January 21, 20135:00 pm

Purpose and Rationale

The purpose of this lab is to allow students to become comfortable with basic Awk programming, and regular expression parsing.

Recources

FAQ (submission instructions and other useful resources)

Please review lecture 2 notes and read Effective awk Programming, 3rd ed., by Arnold Robbins, available online

If you are not in our course email list, please subscribe to the cspp51081 email list here: http://mailman.cs.uchicago.edu/mailman/listinfo/cspp51081

All work should be done on a machine in the department's Linux cluster. You can refer to ssh for more information on how to log into a remote machine.

Marks Distribution

Awk script file 10 points
Command line argument   2 points
Output   4 points
TOTAL 16 points

LAB 2

Consider the file normal.precip.txt. It contains the average amount of rainfall over a 30 year period for about 275 cities in North America. The first line of the file is a header, which tells you what each column contains. Basically, each line contains a city name, the state the city is in, and then average rainfall amounts from January through December, and then an annual average for all months. The file is TAB delimited. QUESTION: What is the total average amount of rainfall, in inches, for the month of January, for the following states: California (CA), Texas (TX), Alaska (AK). I want individual totalized averages for each of the above states for the month of January. As a check on your algorithm, note that the totalized average rainfall for California (CA) for the month of February over the same period was 2.65067. This was derived by adding the rainfall values together for all cities in California for the month of February, and simply taking their average.  You may find that there are inconsistencies in the data format which you may need to deal with intelligently.

Please use Awk programming to implement above procedure. There are several steps for this lab:

  1. Figure out the regular expression that will match all the valid chunks of the form described above. Mark has provided a very useful program to help with this process called showmatch that can be used in the following manner:
    hangao@gawaine:CSPP51081% echo "FOO8AB10BAR" |
    /home/mark/pub/51081/showmatch '[0-9]AB[0-9]'
    FOO8AB10BAR
    ^^^^
    hangao@gawaine:CSPP51081%
    The matched portion of the string is returned with carrot characters underneath. This will help you understand how "gawk" works.  You may also use the egrep --colour option to egrep to accomplish something similar (except of course, using egrep, not awk), e.g.:

    $ echo 123 | egrep --colour '1'
    123

  2. If a question is about California, notice that a mere grep of /CA/ will include:
    WINNEMUCCA, NV
    POCATELLO, ID
    Probably NOT what you wanted. The shell script "showmatch" maybe help you debug in this regard.

  3. Use "gawk" to integrate the patterns and actions we described above.

Hints

Take it slowly and complete each portion of the homework before working on the next as each step depends upon the correct solution to the last. If you think you have the right answer for a step but the test is still failing, mail the list. If you are having any questions at all, mail the list.

Deliverables

Carefully follow the steps below.

    1. Create a local lab2 directory and copy the files within the LAB2 directory (http://www.classes.cs.uchicago.edu/archive/2013/winter/51081-1/labs/LAB2)
    2. You should create two files in this directory:
      • ex.awk : The awk script file.
      • ex.rain : The command line argument and the output of your awk script ex.awk
    3. When you are finished with your directory you will create a compressed archive file using tar (this utility stores your directory as a single file, then compresses its size.)
             tar -czvf   username.lab2.tgz   username.lab2

    4. You will email your file to our grader as an attachment. She will send an acknowledgement that your assignment has been received.


    Maria Power