Unix Systems Programming: Lab 2 - Awk programming and regular expressions
Due:
Monday, January 21, 20135:00
pm
Purpose and Rationale
The purpose of this lab is to allow students to become comfortable with
basic Awk programming, and regular expression parsing.
Recources
FAQ
(submission instructions and other useful resources)
Please review lecture 2 notes and read Effective
awk Programming, 3rd ed., by Arnold Robbins, available online
If you are not in our course email list, please subscribe
to the cspp51081 email list here: http://mailman.cs.uchicago.edu/mailman/listinfo/cspp51081
All work should be done on a machine in the department's
Linux cluster. You can refer to ssh for
more
information on
how to log into a remote machine.
Marks Distribution
Awk script file |
10 points |
Command line argument |
2 points |
Output |
4 points |
TOTAL |
16 points |
LAB 2
Consider the file
normal.precip.txt.
It contains the average amount of rainfall
over
a
30 year period for about 275 cities in North America. The first line of
the file is a header, which tells you what each column contains.
Basically,
each line contains a city name, the state the city is in, and then
average
rainfall amounts from January through December, and then an annual
average
for all months. The file is TAB delimited. QUESTION: What is the total
average amount of rainfall, in inches, for the month of January, for
the
following states: California (CA), Texas (TX), Alaska (AK). I want
individual
totalized averages for each of the above states for the month of
January. As a check on your algorithm, note that the totalized average
rainfall for
California (CA) for the month of February over the same period was
2.65067. This was derived by adding the rainfall values together for
all cities in
California for the month of February, and simply taking their
average. You may find that there are inconsistencies in the data
format which you may need to deal with intelligently.
Please use Awk programming to implement above procedure.
There
are several steps for this lab:
- Figure out the regular expression that will match all
the
valid chunks of the form described above. Mark has provided a very
useful program to help with this process called showmatch
that can be used in the following manner:
hangao@gawaine:CSPP51081% echo "FOO8AB10BAR" |
/home/mark/pub/51081/showmatch '[0-9]AB[0-9]'
FOO8AB10BAR
^^^^
hangao@gawaine:CSPP51081%
The
matched portion of the string is returned with carrot characters
underneath. This will help you understand how "gawk" works. You
may also use the egrep --colour option to egrep to accomplish something
similar (except of course, using egrep, not awk), e.g.:
$ echo 123 | egrep --colour '1'
123
- If a question is about California, notice that a mere
grep of
/CA/ will include:
WINNEMUCCA, NV
POCATELLO, ID
Probably NOT what you wanted. The shell script "showmatch" maybe help
you debug in this regard.
- Use "gawk" to integrate the patterns and actions we
described
above.
Hints
Take it slowly and complete each portion of the homework
before
working on the next as each step depends upon the correct solution to
the last. If you think you have the right answer for a step but the
test is still failing, mail the list. If you are having any questions
at all, mail the list.
Deliverables
Carefully follow the steps below.
- Create a local lab2 directory and copy the files within the LAB2
directory (http://www.classes.cs.uchicago.edu/archive/2013/winter/51081-1/labs/LAB2)
- You should create two files in this directory:
- ex.awk : The
awk script file.
- ex.rain : The command line argument and the output of your awk script ex.awk
- When you are finished with your directory you will create a
compressed archive file using tar
(this utility stores your directory as a single file, then compresses
its size.)
tar -czvf username.lab2.tgz username.lab2
- You will email your file to our grader as an attachment. She will send an
acknowledgement that your assignment has been
received.
Maria Power