Lecture 3: fun with scripting/intro to cgi
------------------------------------------

What is a scripting language? Why are these things important?

When we say "scripting language," we're usually referring to some sort of
programming language that you don't compile into machine code once and run
many times, like C, Pascal, C++, or Java. Instead, an interpreter reads
your program every time it runs, parsing all of the program text. You may
think this is inefficient - and in the old days when computers were really
kind of wimpy, it was. But because they are interpreted and somewhat easier
to debug, have rich builtin function sets, and you don't have to compile
anything before you run them (they go straight from the text editor to
execution), scripting languages tend to save real time - programmer time.

And time is money. If you write, say, a Perl script that you only need to
run once, it doesn't matter if the script you wrote in ten minutes takes an
hour to process its data if you're only going to run the program once. The
same task might have taken two days in C, C++, or Java.

Perl, Python, PHP, awk, and Tcl are all scripting languages.

Scripting language ability is probably the most important skill to have in
web programming. You need to be able to come up with text data quickly and
efficiently, to manipulate and massage files, sort them, throw data in and
around databases - yeah, it's mostly about throwing text around.

Even if your web server doesn't even execute your scripts, you'll almost
certainly use them for arranging things behind the scenes.

That's why I'm going to first teach these languages out of the context of a
web server.

So let's get started:
---------------------
By far and away, Unix has the biggest set of scripting tools available. All
of the most popular languages were developed on Unix, and all of them are
free.

Now, some of you have seen this next part before.

A Unix script's first line contains the full path of the scripting language
interpreter, like this:

#!/usr/bin/perl

The first two letters are always #!, which tell the operating system that
the program is a script (and not some piece of machine code). After seeing
this, the operating system runs whatever program follows the #! - and as an
extra bonus, it stuffs the text of the script (#!/blah/blah and all) into
that program's input.

Here's an example, using the Bourne shell (shells also can be used as
scripting languages; in fact, Bourne shell is quite powerful):

#!/bin/sh
echo hi there

Let's say that we called this script foo. If I just made a new file foo
like this [show this on screen] we see that it doesn't work: it says

foo not found

That's because my shell can't find foo anywhere in my PATH [echo $PATH].
For most of you, this wouldn't be a problem, because there's a . (meaning
the current directory) at the end of your PATH, indicating that your shell
should also look inside your current directory for programs. This is kind of
a bad idea - who knows what kind of bad stuff could be in the current
directory?

Instead, I ask for anything in the current directory explicitly by putting a
./ in front of it, like this:

./foo

and my shell promptly reports

./foo: Permission denied

That's because the script doesn't have execute permission. If you look at
any system program, you'll see that it looks like this:

$ ls -l /bin/ls
-rwxr-xr-x   1 root     root        29404 Jul  3  1998 /bin/ls

but foo looks like this:

$ ls -l foo
-rw-r--r--   1 bri      student        12 Feb 22 11:20 foo

You see how foo is missing the so-called execute bits (the "x"s on the
left)? Well, we can change that with this:

$ chmod ugo+x foo

Now foo has those bits, and if I run ./foo I get what we'd expect.

When writing a script, it would be easiest to do

$ touch foo
$ chmod ugo+x foo

before starting to write the thing. But you'll forget. When this happens,
save the file and exit your text editor, then run the chmod command, and
then go about editing it again.

Okay, now that we all know how to write scripts, let's learn Perl real
quick. One of my students last year said that I should start with Python.
Hahaha.

Perl:
-----
Perl stands for Practical Extraction and Report Language. It also stands
for Pathologically Eclectic Rubbish Lister.

It is the Swiss Army Chainsaw of programming languages. It's flexible, with
extension functions to do almost anything imaginable. The chainsaw part
comes with its remarkable power and speed - this due to a number of
factors, not the least of which is that Larry Wall (the guy who wrote it)
really knows his stuff.

Perl may be fast and powerful, but isn't always the easiest thing in
the world to read. It depends on the programmer, because there are about a
million ways to do any given task in Perl. Which is the best? "The one that
gets the job done before your boss fires you."

Perl became fairly popular in the Unix community about ten years ago.
Systems administrators were generally the first to pick it up for a number
of tasks, then they showed it to their users, who started to do their own
work with it because they noticed that it was generally a lot faster to get
things done in it. When the web boom hit, those same administrators were
the ones doing all the web site administration, and of course, they took
Perl with them.

The Programming Perl book (O'Reilley) is the de facto standard book on
Perl. Mine here is the smaller first edition - the new one is blue but
still has a camel on it.

However, once you start to get the hang of it, your most important Perl
documentation source comes in form of the online manual pages. If you do a
"man perl" you'll see that the documentation is split up into a number of
subsections, for example, you can use "man perlvar" to get the predefined
variable names and meanings.

Trust me on this one: learning to search the manual pages can be one of the
best things you ever do. Use "/blah" to search for the string "blah" and
use "n" to repear the search.

Where to begin:

Let's start simple with something that just prints a message to the screen.

#!/usr/bin/perl
print "hi there\n";

Okay, that looks pretty easy. Variables start with $, so let's modify that
program a little:

#!/usr/bin/perl
$stuff = "hi there\n";
print $stuff;

You can mix variables in with strings:

#!/usr/bin/perl
$stuff = "hi there\n";
print "I'd just like to say: $stuff";

Actually, there are a lot of variable types. We've just seen a scalar
variable. There is a list variable- the names of these start with @. But
you access individual list elements with $ and [ ]

#!/usr/bin/perl
@stuff = ("hi", "there", "sonny", "boy");
print "I'd just like to say: ";
print $stuff[1];
print "\n";

Flow control:

Perl has while, do/while, and for, just like most other declarative
languages. For example, 

#!/usr/bin/perl
@stuff = ("hi", "there", "sonny", "boy");
$i = 0;
while  ($i <= $#stuff) {
    print "word $i: " . stuff[$i] . "\n";
    $i++;
}

(Notice that we've introducted two new languages features: string
concatentation with the "." operand, and $#stuff, which returns the
last index of the list @stuff. Note also that this is *not* the length of
the list, but rather, one less than the length of the list.)

Perl has a foreach loop, which iterates over each element in a list:

#!/usr/bin/perl
@stuff = ("hi", "there", "sonny", "boy");
foreach $word (@stuff) {
    print "some word: $word\n";
}

if/else work as you'd expect, however you must be careful with strings --
the string equality is "eq", not "=="; inequality is "ne", not !=. Here's a
little example:

#!/usr/bin/perl
@stuff = ("hi", "there", "sonny", "boy");
foreach $word (@stuff) {
    if ($word eq "sonny") {
	print "I don't like the way this is going.\n";
    }
    print "some word: $word\n";
}

For multiple if/else chains, there's an elsif:

#!/usr/bin/perl
@stuff = ("hi", "there", "sonny", "boy");
foreach $word (@stuff) {
    if ($word eq "sonny") {
	print "I don't like the way this is going.\n";
    } elsif ($word eq "boy") {
	print "ugh..\n";
    }
    print "some word: $word\n";
}

Remember how we wanted to see a menu form with about a hundred items in it
in the last class? Alex kept telling me to do a Perl script to do it but I
was feeling lazy. Well here, we can create one.

#!/usr/bin/perl
print "<html><head></head>\n<body>This is annoying:\n";
print "<form><select size=1>\n";
$i = 0;
while ($i < 500) {
    print "<option>boring item $i</option>\n";
    $i++;
}
print "</select></form></body></html>\n";

Comments start with #.

# hi, I'm a comment

Okay. We've seen strings, numbers, lists, looping, and all that. That all
looks pretty simple.. Now we know Perl, and we're ready to take on anybody,
right?

Har, har.

Time to talk about I/O:
-----------------------
We've seen output so far. Most of that you'll be working with on Perl has
to do with input, which you'll get from almost anything - files, pipes,
databases, you name it. And this also happens to be where things get..
kind of ugly.

Let's start with files, since they're the most confusing.

To open a file in Perl for reading, use "open" to get a file descriptor, and
to close it, use "close":

#!/usr/bin/perl
open(FILEHANDLE, "/usr/dict/words");

close(FILEHANDLE);

Of course, that doesn't actually read anything, so let's pull out the first
three words from the file with the <FILEHANDLE> operator, which just reads
a line of text from FILEHANDLE:

#!/usr/bin/perl
open(FILEHANDLE, "/usr/dict/words");

$first = <FILEHANDLE>;
$second = <FILEHANDLE>;
$third = <FILEHANDLE>;

close(FILEHANDLE);

print "1. $first\n2. $second\n3. $third\n";

Whoops. Why do we get extra lines in the output? It's because Perl doesn't
discard the trailing newline at the end of a line when it reads it in. We
can use the chop operator for that.

#!/usr/bin/perl
open(FILEHANDLE, "/usr/dict/words");
$first = <FILEHANDLE>;		chop $first;
$second = <FILEHANDLE>;		chop $second;
$third = <FILEHANDLE>;		chop $third;
close(FILEHANDLE);
print "1. $first\n2. $second\n3. $third\n";

Cool. That works. But isn't it a drag to read stuff like this? Yes. Perl
programs that read a whole file in line-by-line tend to look like this,
with a while loop:

#!/usr/bin/perl
open(FILEHANDLE, "/usr/dict/words");
while (<FILEHANDLE>) {
    chop;
    if ($_ eq "huh") {
	print "hey, check it out - \"huh\" is in /usr/dict/words.\n";
    }
}
close(FILEHANDLE);

Whoa, WHOA, you're saying -- what's all this new stuff? What's with the $_
thing? How come you didn't need to give an argument to that chop function
this time? How does this loop know how to end?

The answer is that there are a lot of implicit things in a while loop like
this. First, putting <FILEHANDLE> in the while test makes Perl read a line
from the file handle until there's nothing left to read in the file. When
there's nothing left to read, that <FILEHANDLE> evaluates to zero, and the
loop ends.

Second, when you specify "<FILEHANDLE>" out on its own like that, it places
the result of the read into the variable $_ (yes, dollar underscore). It's
just like saying "$_ = <FILEHANDLE>;" without all of the typing.

Third, chop without any arguments operates on that funny $_ variable.
"chop" is just like saying "chop $_;"

Every now and then, you'll need to use $_ explicitly, as we have done
above. (Of course, we didn't even need to do it there.. but I don't want to
throw too much out at once.)

There are about a billion special variables like $_. Look at the perlvar
manual page for the goodies.

What you just saw may look like the ultimate victory of laziness in
programming. We'll, you ain't seen nothin' yet..

You don't even need to use the open function, or even give explicit
filehandles. You can just use <> to refer to either the standard input, or
some argument that you put on the command line.

#!/usr/bin/perl
while (<>) {
    chop;
    if ($_ eq "huh") {
	print "hey, check it out - \"huh\" is in $ARGV.\n";
    }
}

($ARGV is another one of those special variables, telling you which file
you're reading from -- it's "-" if the standard input.) So you could just
run commands like

$ ./foo /usr/dict/words
$ ./foo < /usr/dict/words
$ head -200 /usr/dict/words | ./foo

Okay, uh, great. There are two more topics that I want to go over..

Pattern Matching and Substitution:
----------------------------------
I mentioned that Perl was pretty good at sifting through strings. You're
going to want to know how to match strings with more than just the "eq"
operator, and you'll also want to know how to search and replace within
strings.

You match stuff with things called "regular expressions." These things
actually have a deep scientific meaning going to finite-state automata.
That doesn't mean that they're not useful. There's a lot to regular
expressions - read the "perlre" manual page for the details on what Perl's
look like if you need to - or buy the book.

Let's start with the "m" function. "m" stands for match. From our previous
example, let's say we wanted to pick out all of the words in a file that end
in "y". We know that in a regular expression "$", so we want to match something
containing y$. This is how it looks like:

#!/usr/bin/perl
while (<>) {
    chop;
    if ($_ =~ m/y$/) {
	print;
	print "\n";
    }
}

Look at the ($_ =~ m/y$/) -- this expression evaluates to 1 if $_ actually
matches y$.

Notice also that we don't use any arguments to the first print -- yep, that
defaults to $_ as well. But that's not all. We can actually shorten the
match expression. You might suspect by now that "m" works on $_ by default,
so we don't even need the "$_ =~ " part of the expression. That's true. But
in fact, we don't even need the "m"!

#!/usr/bin/perl
while (<>) {
    chop;
    if (/y$/) {
	print;
	print "\n";
    }
}

Now, say we not only wanted to print out everything that ends in y, but
we also want to change the y at the end of the word to "ie" because we're
feeling all cutsie. We use the "s" operator.

#!/usr/bin/perl
while (<>) {
    chop;
    if (/y$/) {
	s/y$/ie/;	# this is the same as $_ =~ s/y$/ie/;
	print;
	print "\n";
    }
}

Yes, "s" operates on $_ by default.

Okay, let's take a look at another operator, tr, which translates
characters. Say, for some reason or other, we want to take the above
program and convert the vowels to uppercase, and make everything else
lowercase. Here's how we'd do it:

#!/usr/bin/perl
while (<>) {
    chop;
    if (/y$/) {
	$_ = "\L$_";		# converts $_ to lower case
	s/y$/ie/;		# this is the same as $_ =~ s/y$/ie/;
	tr/aeiou/AEIOU/;	# same as $_ =~ tr/aeiou/AEIOU/;
	print;
	print "\n";
    }
}

Notice how I changed the $_ to all lower case - by putting a \L before it
in quotes. This is a crazy-looking operator, and there are more like it,
check in the "perlop" manual page.

Associative Arrays:
-------------------
We're almost at the end of our wild and crazy ride. We have to talk about
associative arrays now. This is one of Perl's most powerful data types.

You recall what a list looked like, that you specified it with @stuff, and
used names like $stuff[2] to access each individual element of the list by
its index -- remember that the index of an list element is always a number.
(Lists are also called arrays in Perl.)

Associative arrays are like lists, except that you don't index them by
numbers. You can index them with anything you like -- but you usually do it
by string. You refer to the whole array with %, and use $ and {} for
individual access. For example, if you wanted to make an associative array
of people indexed by their last name, it'd look like this:

#!/usr/bin/perl
%person = (
  "Ward", "Brian",
  "Merck", "Derek",
  "Raman", "Lakshmi",
  "Cousteau", "Jacques",
);
print $person{"Cousteau"} . "\n";

The things you index the array by (in this case, the last names), are
called keys. You can get a list of all the keys in an associative array
with the keys function, and it's handy to use that foreach loop we talked
about before in combination. So this prints out everyone in the array:

#!/usr/bin/perl
%person = (
  "Ward", "Brian",
  "Merck", "Derek",
  "Raman", "Lakshmi",
  "Cousteau", "Jacques",
);

foreach $last_name (keys(%person)) {
    print "$last_name, $person{$last_name}\n";
}

The keys function doesn't sort its output. You can use the sort function to
do that:

#!/usr/bin/perl
%person = (
  "Ward", "Brian",
  "Merck", "Derek",
  "Raman", "Lakshmi",
  "Cousteau", "Jacques",
);

foreach $last_name (sort(keys(%person))) {
    print "$last_name, $person{$last_name}\n";
}

Notice how the parentheses are stacking up? Use the % key in vi to match a
paren.

Okay then. What can you actually do with this stuff? I understand that the
first assignment in another class here was to pick through a million-word
document and count up how many times each word appears.

Well that's easy, given what we know already (except that I haven't told
you about the split function).

#!/usr/bin/perl
while (<>) {		# read from stdin or wherever
    chop;		# get rid of newline
    s/[,\.\?\-\"\']//g;	# get rid of some punctuation
    $_ = "\L$_";	# convert to lowercase
    @words = split(/\s+/);
    foreach $w (@words) {
	$count{$w}++;
    }
}
foreach $word (sort(keys(%count))) {
    print "$word: $count{$word}\n";
}

This doesn't sort by the number.. but we could just as easily do it by using
the parameterized sort - a crazy-looking and inconsistent feature.

#!/usr/bin/perl

sub byvalue { $count{$b} <=> $count{$a}; }

while (<>) {		# read from stdin or wherever
    chop;		# get rid of newline
    s/[,\.\?\-\"\']//g;	# get rid of some punctuation
    $_ = "\L$_";	# convert to lowercase
    @words = split(/\s+/);
    foreach $w (@words) {
	$count{$w}++;
    }
}
foreach $word (sort byvalue (keys(%count))) {
    print "$word: $count{$word}\n";
}

Hmm. Hey, this thing seems to catch blank strings too. Well, we can fix
tihs with a dirty trick: put a "$w && " in front of "$count{$w}++;" so
that "$count{$w}++;" only gets run when something $w contains something.

I wrote this particular introduction with the web in mind.


On to CGI.


CGI Scripts
-----------

CGI stands for Common Gateway Interface. It's a standard by which web
servers call server-side programs. This was more or less the first kind of
dynamic content generation.

These days, CGI is mostly a dead technology because it's kind of klunky
and inefficient. Server modules killed it. So why am I bothering to teach
it, and moreover, why am I making you do it?

Because it's a great way to learn about how to put things together, and
remember: that's a big part of what this course is about.

When you ask for something on a server that happens to be a CGI program:

1. The server starts a new process.
2. The server sets up some environment variables in that process to pass
   to the CGI.
3. The server starts up the CGI program in that process.
4. The CGI program looks at its input and generates some output - usually
   HTML, which a browser can read.
5. The server spits that output back at the client (web browser).

It's not so bad, really. And since you'll run your CGI programs on your own
web server, you'll have complete access to your logs, which will help you
immeasurably.

So, to add CGI support to your web server:

1. Find the section in your httpd.conf file that looks like
     <Directory /these/are/my/html/files>
   and add "ExecCGI" to the Options.
2. Find the line that says 
     AddHandler cgi-script .cgi
   and uncomment it.
3. Restart your web server.
4. The web server will now run any  executable file that ends with .cgi in
   /these/are/my/html/files as a CGI program.

We're now ready to write our first really stupid CGI program (let's call it
dumb.cgi):

#!/usr/bin/perl
print "Content-type: text/html\r\n\r\n";
print "<html><head></head><body><h1>This is kind of stupid</h1></body></html>\r\n"; 

Let's say we screwed something up.

#!/usr/bin/perl

print hi I'm a syntax error.

As you can see, you get a really lovely error message in your browser that
tells you absolutely nothing about what went wrong. To find out, look in
your logs/error_log file. It'll look something like this.

syntax error at /home/www/docs/dumb.cgi line 4, at EOF
Execution of /home/www/docs/dumb.cgi aborted due to compilation errors.
[Sun Apr  8 23:39:59 2001] [error] [client 10.1.2.1] Premature end of script headers: /home/www/docs/dumb.cgi

Okay, great. So now, how do we get form data to a CGI script?

It's pretty easy, as it turns out.

Let's use a simple radiobutton form.

<html><head><title>Simple Form</title></head>
<H1>Pick something.</H1>
<form action=pick.cgi>
<input type=radio name=movie value=dumb> Dumb Guy Action Flick<br>
<input type=radio name=movie value=chick> Chick Flick<br>
<input type=radio name=movie value=tunes> Music Flick<br>
<input type=radio name=movie value=art> Art Flick<br>
<input type=radio name=movie value=PRON> Porn Flick<br>
<input type=submit>
</form>

Okay, now we have to write our pick.cgi program. Let's start with something
that just shows us what we threw in as input. The CGI interface tells us
that the environment variable QUERY_STRING holds the parameters. We can get
at that with the special associative array %ENV in Perl:

#!/usr/bin/perl
print "Content-type: text/html\r\n\r\n";
print "<html><head></head><body>\r\n";
print "<h1>Query string: " . $ENV{"QUERY_STRING"} . "</h1>\r\n";
print "</body></html>\r\n";

Well, this is nice, but wouldn't it be better if we could somehow dig out
the value of "movie". To make a long story short, this is really easy if
you use the Perl CGI module:

#!/usr/bin/perl
use CGI;

$movie = CGI::param('movie');
print <<"DONE";
Content-type: text/html

<html><head></head><body>
<h1>I'll try to recommend something</h1>
DONE

if ($movie eq "dumb") {
    print "Wouldn't you rather drink?<p>";
} elsif ($movie eq "chick") {
    print "Wouldn't you rather poke your eyes out?<p>";
} elsif ($movie eq "tunes") {
    print "Spinal Tap!<p>";
} elsif ($movie eq "art") {
    print "Fear and Loathing in Las Vegas<p>";
} elsif ($movie eq "PRON") {
    print "Make it yourself.<p>";
} else {
    print "I really don't know what you're talking about.<p>";
}

print "</body></html>\r\n";

Well that's about all you need to know for the assignment.