Lecture 3: fun with scripting/intro to cgi ------------------------------------------ What is a scripting language? Why are these things important? When we say "scripting language," we're usually referring to some sort of programming language that you don't compile into machine code once and run many times, like C, Pascal, C++, or Java. Instead, an interpreter reads your program every time it runs, parsing all of the program text. You may think this is inefficient - and in the old days when computers were really kind of wimpy, it was. But because they are interpreted and somewhat easier to debug, have rich builtin function sets, and you don't have to compile anything before you run them (they go straight from the text editor to execution), scripting languages tend to save real time - programmer time. And time is money. If you write, say, a Perl script that you only need to run once, it doesn't matter if the script you wrote in ten minutes takes an hour to process its data if you're only going to run the program once. The same task might have taken two days in C, C++, or Java. Perl, Python, PHP, awk, and Tcl are all scripting languages. Scripting language ability is probably the most important skill to have in web programming. You need to be able to come up with text data quickly and efficiently, to manipulate and massage files, sort them, throw data in and around databases - yeah, it's mostly about throwing text around. Even if your web server doesn't even execute your scripts, you'll almost certainly use them for arranging things behind the scenes. That's why I'm going to first teach these languages out of the context of a web server. So let's get started: --------------------- By far and away, Unix has the biggest set of scripting tools available. All of the most popular languages were developed on Unix, and all of them are free. Now, some of you have seen this next part before. A Unix script's first line contains the full path of the scripting language interpreter, like this: #!/usr/bin/perl The first two letters are always #!, which tell the operating system that the program is a script (and not some piece of machine code). After seeing this, the operating system runs whatever program follows the #! - and as an extra bonus, it stuffs the text of the script (#!/blah/blah and all) into that program's input. Here's an example, using the Bourne shell (shells also can be used as scripting languages; in fact, Bourne shell is quite powerful): #!/bin/sh echo hi there Let's say that we called this script foo. If I just made a new file foo like this [show this on screen] we see that it doesn't work: it says foo not found That's because my shell can't find foo anywhere in my PATH [echo $PATH]. For most of you, this wouldn't be a problem, because there's a . (meaning the current directory) at the end of your PATH, indicating that your shell should also look inside your current directory for programs. This is kind of a bad idea - who knows what kind of bad stuff could be in the current directory? Instead, I ask for anything in the current directory explicitly by putting a ./ in front of it, like this: ./foo and my shell promptly reports ./foo: Permission denied That's because the script doesn't have execute permission. If you look at any system program, you'll see that it looks like this: $ ls -l /bin/ls -rwxr-xr-x 1 root root 29404 Jul 3 1998 /bin/ls but foo looks like this: $ ls -l foo -rw-r--r-- 1 bri student 12 Feb 22 11:20 foo You see how foo is missing the so-called execute bits (the "x"s on the left)? Well, we can change that with this: $ chmod ugo+x foo Now foo has those bits, and if I run ./foo I get what we'd expect. When writing a script, it would be easiest to do $ touch foo $ chmod ugo+x foo before starting to write the thing. But you'll forget. When this happens, save the file and exit your text editor, then run the chmod command, and then go about editing it again. Okay, now that we all know how to write scripts, let's learn Perl real quick. One of my students last year said that I should start with Python. Hahaha. Perl: ----- Perl stands for Practical Extraction and Report Language. It also stands for Pathologically Eclectic Rubbish Lister. It is the Swiss Army Chainsaw of programming languages. It's flexible, with extension functions to do almost anything imaginable. The chainsaw part comes with its remarkable power and speed - this due to a number of factors, not the least of which is that Larry Wall (the guy who wrote it) really knows his stuff. Perl may be fast and powerful, but isn't always the easiest thing in the world to read. It depends on the programmer, because there are about a million ways to do any given task in Perl. Which is the best? "The one that gets the job done before your boss fires you." Perl became fairly popular in the Unix community about ten years ago. Systems administrators were generally the first to pick it up for a number of tasks, then they showed it to their users, who started to do their own work with it because they noticed that it was generally a lot faster to get things done in it. When the web boom hit, those same administrators were the ones doing all the web site administration, and of course, they took Perl with them. The Programming Perl book (O'Reilley) is the de facto standard book on Perl. Mine here is the smaller first edition - the new one is blue but still has a camel on it. However, once you start to get the hang of it, your most important Perl documentation source comes in form of the online manual pages. If you do a "man perl" you'll see that the documentation is split up into a number of subsections, for example, you can use "man perlvar" to get the predefined variable names and meanings. Trust me on this one: learning to search the manual pages can be one of the best things you ever do. Use "/blah" to search for the string "blah" and use "n" to repear the search. Where to begin: Let's start simple with something that just prints a message to the screen. #!/usr/bin/perl print "hi there\n"; Okay, that looks pretty easy. Variables start with $, so let's modify that program a little: #!/usr/bin/perl $stuff = "hi there\n"; print $stuff; You can mix variables in with strings: #!/usr/bin/perl $stuff = "hi there\n"; print "I'd just like to say: $stuff"; Actually, there are a lot of variable types. We've just seen a scalar variable. There is a list variable- the names of these start with @. But you access individual list elements with $ and [ ] #!/usr/bin/perl @stuff = ("hi", "there", "sonny", "boy"); print "I'd just like to say: "; print $stuff[1]; print "\n"; Flow control: Perl has while, do/while, and for, just like most other declarative languages. For example, #!/usr/bin/perl @stuff = ("hi", "there", "sonny", "boy"); $i = 0; while ($i <= $#stuff) { print "word $i: " . stuff[$i] . "\n"; $i++; } (Notice that we've introducted two new languages features: string concatentation with the "." operand, and $#stuff, which returns the last index of the list @stuff. Note also that this is *not* the length of the list, but rather, one less than the length of the list.) Perl has a foreach loop, which iterates over each element in a list: #!/usr/bin/perl @stuff = ("hi", "there", "sonny", "boy"); foreach $word (@stuff) { print "some word: $word\n"; } if/else work as you'd expect, however you must be careful with strings -- the string equality is "eq", not "=="; inequality is "ne", not !=. Here's a little example: #!/usr/bin/perl @stuff = ("hi", "there", "sonny", "boy"); foreach $word (@stuff) { if ($word eq "sonny") { print "I don't like the way this is going.\n"; } print "some word: $word\n"; } For multiple if/else chains, there's an elsif: #!/usr/bin/perl @stuff = ("hi", "there", "sonny", "boy"); foreach $word (@stuff) { if ($word eq "sonny") { print "I don't like the way this is going.\n"; } elsif ($word eq "boy") { print "ugh..\n"; } print "some word: $word\n"; } Remember how we wanted to see a menu form with about a hundred items in it in the last class? Alex kept telling me to do a Perl script to do it but I was feeling lazy. Well here, we can create one. #!/usr/bin/perl print "\nThis is annoying:\n"; print "
\n"; Comments start with #. # hi, I'm a comment Okay. We've seen strings, numbers, lists, looping, and all that. That all looks pretty simple.. Now we know Perl, and we're ready to take on anybody, right? Har, har. Time to talk about I/O: ----------------------- We've seen output so far. Most of that you'll be working with on Perl has to do with input, which you'll get from almost anything - files, pipes, databases, you name it. And this also happens to be where things get.. kind of ugly. Let's start with files, since they're the most confusing. To open a file in Perl for reading, use "open" to get a file descriptor, and to close it, use "close": #!/usr/bin/perl open(FILEHANDLE, "/usr/dict/words"); close(FILEHANDLE); Of course, that doesn't actually read anything, so let's pull out the first three words from the file with the operator, which just reads a line of text from FILEHANDLE: #!/usr/bin/perl open(FILEHANDLE, "/usr/dict/words"); $first = ; $second = ; $third = ; close(FILEHANDLE); print "1. $first\n2. $second\n3. $third\n"; Whoops. Why do we get extra lines in the output? It's because Perl doesn't discard the trailing newline at the end of a line when it reads it in. We can use the chop operator for that. #!/usr/bin/perl open(FILEHANDLE, "/usr/dict/words"); $first = ; chop $first; $second = ; chop $second; $third = ; chop $third; close(FILEHANDLE); print "1. $first\n2. $second\n3. $third\n"; Cool. That works. But isn't it a drag to read stuff like this? Yes. Perl programs that read a whole file in line-by-line tend to look like this, with a while loop: #!/usr/bin/perl open(FILEHANDLE, "/usr/dict/words"); while () { chop; if ($_ eq "huh") { print "hey, check it out - \"huh\" is in /usr/dict/words.\n"; } } close(FILEHANDLE); Whoa, WHOA, you're saying -- what's all this new stuff? What's with the $_ thing? How come you didn't need to give an argument to that chop function this time? How does this loop know how to end? The answer is that there are a lot of implicit things in a while loop like this. First, putting in the while test makes Perl read a line from the file handle until there's nothing left to read in the file. When there's nothing left to read, that evaluates to zero, and the loop ends. Second, when you specify "" out on its own like that, it places the result of the read into the variable $_ (yes, dollar underscore). It's just like saying "$_ = ;" without all of the typing. Third, chop without any arguments operates on that funny $_ variable. "chop" is just like saying "chop $_;" Every now and then, you'll need to use $_ explicitly, as we have done above. (Of course, we didn't even need to do it there.. but I don't want to throw too much out at once.) There are about a billion special variables like $_. Look at the perlvar manual page for the goodies. What you just saw may look like the ultimate victory of laziness in programming. We'll, you ain't seen nothin' yet.. You don't even need to use the open function, or even give explicit filehandles. You can just use <> to refer to either the standard input, or some argument that you put on the command line. #!/usr/bin/perl while (<>) { chop; if ($_ eq "huh") { print "hey, check it out - \"huh\" is in $ARGV.\n"; } } ($ARGV is another one of those special variables, telling you which file you're reading from -- it's "-" if the standard input.) So you could just run commands like $ ./foo /usr/dict/words $ ./foo < /usr/dict/words $ head -200 /usr/dict/words | ./foo Okay, uh, great. There are two more topics that I want to go over.. Pattern Matching and Substitution: ---------------------------------- I mentioned that Perl was pretty good at sifting through strings. You're going to want to know how to match strings with more than just the "eq" operator, and you'll also want to know how to search and replace within strings. You match stuff with things called "regular expressions." These things actually have a deep scientific meaning going to finite-state automata. That doesn't mean that they're not useful. There's a lot to regular expressions - read the "perlre" manual page for the details on what Perl's look like if you need to - or buy the book. Let's start with the "m" function. "m" stands for match. From our previous example, let's say we wanted to pick out all of the words in a file that end in "y". We know that in a regular expression "$", so we want to match something containing y$. This is how it looks like: #!/usr/bin/perl while (<>) { chop; if ($_ =~ m/y$/) { print; print "\n"; } } Look at the ($_ =~ m/y$/) -- this expression evaluates to 1 if $_ actually matches y$. Notice also that we don't use any arguments to the first print -- yep, that defaults to $_ as well. But that's not all. We can actually shorten the match expression. You might suspect by now that "m" works on $_ by default, so we don't even need the "$_ =~ " part of the expression. That's true. But in fact, we don't even need the "m"! #!/usr/bin/perl while (<>) { chop; if (/y$/) { print; print "\n"; } } Now, say we not only wanted to print out everything that ends in y, but we also want to change the y at the end of the word to "ie" because we're feeling all cutsie. We use the "s" operator. #!/usr/bin/perl while (<>) { chop; if (/y$/) { s/y$/ie/; # this is the same as $_ =~ s/y$/ie/; print; print "\n"; } } Yes, "s" operates on $_ by default. Okay, let's take a look at another operator, tr, which translates characters. Say, for some reason or other, we want to take the above program and convert the vowels to uppercase, and make everything else lowercase. Here's how we'd do it: #!/usr/bin/perl while (<>) { chop; if (/y$/) { $_ = "\L$_"; # converts $_ to lower case s/y$/ie/; # this is the same as $_ =~ s/y$/ie/; tr/aeiou/AEIOU/; # same as $_ =~ tr/aeiou/AEIOU/; print; print "\n"; } } Notice how I changed the $_ to all lower case - by putting a \L before it in quotes. This is a crazy-looking operator, and there are more like it, check in the "perlop" manual page. Associative Arrays: ------------------- We're almost at the end of our wild and crazy ride. We have to talk about associative arrays now. This is one of Perl's most powerful data types. You recall what a list looked like, that you specified it with @stuff, and used names like $stuff[2] to access each individual element of the list by its index -- remember that the index of an list element is always a number. (Lists are also called arrays in Perl.) Associative arrays are like lists, except that you don't index them by numbers. You can index them with anything you like -- but you usually do it by string. You refer to the whole array with %, and use $ and {} for individual access. For example, if you wanted to make an associative array of people indexed by their last name, it'd look like this: #!/usr/bin/perl %person = ( "Ward", "Brian", "Merck", "Derek", "Raman", "Lakshmi", "Cousteau", "Jacques", ); print $person{"Cousteau"} . "\n"; The things you index the array by (in this case, the last names), are called keys. You can get a list of all the keys in an associative array with the keys function, and it's handy to use that foreach loop we talked about before in combination. So this prints out everyone in the array: #!/usr/bin/perl %person = ( "Ward", "Brian", "Merck", "Derek", "Raman", "Lakshmi", "Cousteau", "Jacques", ); foreach $last_name (keys(%person)) { print "$last_name, $person{$last_name}\n"; } The keys function doesn't sort its output. You can use the sort function to do that: #!/usr/bin/perl %person = ( "Ward", "Brian", "Merck", "Derek", "Raman", "Lakshmi", "Cousteau", "Jacques", ); foreach $last_name (sort(keys(%person))) { print "$last_name, $person{$last_name}\n"; } Notice how the parentheses are stacking up? Use the % key in vi to match a paren. Okay then. What can you actually do with this stuff? I understand that the first assignment in another class here was to pick through a million-word document and count up how many times each word appears. Well that's easy, given what we know already (except that I haven't told you about the split function). #!/usr/bin/perl while (<>) { # read from stdin or wherever chop; # get rid of newline s/[,\.\?\-\"\']//g; # get rid of some punctuation $_ = "\L$_"; # convert to lowercase @words = split(/\s+/); foreach $w (@words) { $count{$w}++; } } foreach $word (sort(keys(%count))) { print "$word: $count{$word}\n"; } This doesn't sort by the number.. but we could just as easily do it by using the parameterized sort - a crazy-looking and inconsistent feature. #!/usr/bin/perl sub byvalue { $count{$b} <=> $count{$a}; } while (<>) { # read from stdin or wherever chop; # get rid of newline s/[,\.\?\-\"\']//g; # get rid of some punctuation $_ = "\L$_"; # convert to lowercase @words = split(/\s+/); foreach $w (@words) { $count{$w}++; } } foreach $word (sort byvalue (keys(%count))) { print "$word: $count{$word}\n"; } Hmm. Hey, this thing seems to catch blank strings too. Well, we can fix tihs with a dirty trick: put a "$w && " in front of "$count{$w}++;" so that "$count{$w}++;" only gets run when something $w contains something. I wrote this particular introduction with the web in mind. On to CGI. CGI Scripts ----------- CGI stands for Common Gateway Interface. It's a standard by which web servers call server-side programs. This was more or less the first kind of dynamic content generation. These days, CGI is mostly a dead technology because it's kind of klunky and inefficient. Server modules killed it. So why am I bothering to teach it, and moreover, why am I making you do it? Because it's a great way to learn about how to put things together, and remember: that's a big part of what this course is about. When you ask for something on a server that happens to be a CGI program: 1. The server starts a new process. 2. The server sets up some environment variables in that process to pass to the CGI. 3. The server starts up the CGI program in that process. 4. The CGI program looks at its input and generates some output - usually HTML, which a browser can read. 5. The server spits that output back at the client (web browser). It's not so bad, really. And since you'll run your CGI programs on your own web server, you'll have complete access to your logs, which will help you immeasurably. So, to add CGI support to your web server: 1. Find the section in your httpd.conf file that looks like and add "ExecCGI" to the Options. 2. Find the line that says AddHandler cgi-script .cgi and uncomment it. 3. Restart your web server. 4. The web server will now run any executable file that ends with .cgi in /these/are/my/html/files as a CGI program. We're now ready to write our first really stupid CGI program (let's call it dumb.cgi): #!/usr/bin/perl print "Content-type: text/html\r\n\r\n"; print "

This is kind of stupid

\r\n"; Let's say we screwed something up. #!/usr/bin/perl print hi I'm a syntax error. As you can see, you get a really lovely error message in your browser that tells you absolutely nothing about what went wrong. To find out, look in your logs/error_log file. It'll look something like this. syntax error at /home/www/docs/dumb.cgi line 4, at EOF Execution of /home/www/docs/dumb.cgi aborted due to compilation errors. [Sun Apr 8 23:39:59 2001] [error] [client 10.1.2.1] Premature end of script headers: /home/www/docs/dumb.cgi Okay, great. So now, how do we get form data to a CGI script? It's pretty easy, as it turns out. Let's use a simple radiobutton form. Simple Form

Pick something.

Dumb Guy Action Flick
Chick Flick
Music Flick
Art Flick
Porn Flick
Okay, now we have to write our pick.cgi program. Let's start with something that just shows us what we threw in as input. The CGI interface tells us that the environment variable QUERY_STRING holds the parameters. We can get at that with the special associative array %ENV in Perl: #!/usr/bin/perl print "Content-type: text/html\r\n\r\n"; print "\r\n"; print "

Query string: " . $ENV{"QUERY_STRING"} . "

\r\n"; print "\r\n"; Well, this is nice, but wouldn't it be better if we could somehow dig out the value of "movie". To make a long story short, this is really easy if you use the Perl CGI module: #!/usr/bin/perl use CGI; $movie = CGI::param('movie'); print <<"DONE"; Content-type: text/html

I'll try to recommend something

DONE if ($movie eq "dumb") { print "Wouldn't you rather drink?

"; } elsif ($movie eq "chick") { print "Wouldn't you rather poke your eyes out?

"; } elsif ($movie eq "tunes") { print "Spinal Tap!

"; } elsif ($movie eq "art") { print "Fear and Loathing in Las Vegas

"; } elsif ($movie eq "PRON") { print "Make it yourself.

"; } else { print "I really don't know what you're talking about.

"; } print "\r\n"; Well that's about all you need to know for the assignment.