Web Programming Lecture 1 Notes:

Administrivia:
--------------

blah blah..

What the course is about:
-------------------------

Fundamentals of web servers, web server development.

What are you concerned about when you visit a web site?
 - Ease-of-use and Readability
 - Security
 - Cost (human + physical resources)
 - Privacy (up to how evil you want to be)
 - Performance
 - Reliability
 - Content (up to how incompetent you want to be)

Well, how do we get there?

A web server is like a jigsaw puzzle in one way: it's got a bunch of pieces
that you need to fit together. However, it's unlike any puzzle you've ever
had to work on before; you need to file some pieces at the edges to make
them fit, chop up other pieces, plus you'll end up making some of them
yourself.

If this is starting to sound like a hack, that's good, because that's all
the web is. For some reason, history has shown computer scientists that
it's usually better to do things the wrong way.

History of the web:
-------------------

Does anyone remember the internet before the web? Well, I do:

Well, we had a lot of different kinds of communications protocols, like:
 - DNS (name service)
 - SMTP (mail)
 - NNTP (Usenet/news)
 - telnet (remote access)
 - ftp (file transfer)

A significant thing to notice about all of these is that not only are
they stupid and insecure, but they aren't the easiest things to use. In
particular, the telnet and ftp protocols need to die a quick and horrible
death.

We put software (and sometimes) documents up on our ftp servers, leaving
people to figure out what the server names were - well, sometimes we told
our pals. A tool called archie made looking up ftp servers easier later on.
Sometimes we would post our server addresses on Usenet, along with our
semi-informational rants.

This was all fine and good, except that not a lot of people used the net at
that time, mainly because it generally kind of sucked. That wasn't just
because we were using stupid and insecure protocols, but also because it was
kind of hard to use. People could handle email. Using rn was daunting.

There were all sorts of proposed improvements, like Gopher and WAIS (both
of which no one really ever used). Then, in 1989, Tim Berners-Lee at CERN
decided that he wanted to sort of do his own thing for high-energy
physics-based document management.

The idea was the following: The other protocols would remain in place, but
there'd only be one program to access them. And this program would need
some sort of glue to bind all of the information together - it required a
hypertext language so that'd you'd be able to "link" to the actual
information.

They showed off their first web browsers in late 1990, and released it
early the next year. It sort of dawdled for two years. I used it during
this time (using a browser called tkWWW) to find manual pages for all sorts
of different operating systems. For some reason, I thought this was neat.

Then things started to happen in 1993. A student working at NCSA (yes, down
in Urbana) started fiddling around and wrote a browser called Mosaic. The
first version was on Unix's X Window System. It was immediately popular
for one reason: it had pictures. When they gave demos of it, the funding
people liked it (because of the pictures), and NCSA decided to port it to
the mac and windows (remember, it was 3.1 at that time).

The release of all three was in late 1993, and The New York Times responded
with a front-page article describing "the internet's first killer app."
Well, that was that. I remember that day; my boss came down and said, "um,
it looks like we actually have to do those web pages now."

The story from there goes like this:

 - Netscape Communications Corporation formed. Their web browser, Netscape,
   became popular immediately because it gratuitously added all sorts of
   extensions to make pages even prettier. Netscape's intentions were all
   about commercialization: they wanted encrypted transmission so that you
   could send credit card numbers.

 - Linux, which had been around for two years now, managed to get their
   networking code working to a point where it could be used as a web
   server. FreeBSD also got big. It was a lot cheaper to run a web server on
   a crummy PC than on an expensive Sun.

 - Microsoft released Windows 95- the look of which they completely ripped
   off from NeXT. Of course, it still wasn't any good for web servers, but
   it was less pathetic than Windows 3.1 in terms of its TCP/IP performance,
   and it came with PPP support, so having Netscape as a client on that
   really bolstered internet use. By this time, Microsoft had noticed that
   the internet had really passed them by and started work in earnest on
   their own web browser, Internet Explorer.

 - After having a couple versions of their Secure Socket Layer (SSL) hit
   and sunk, Netscape introduced one that was a bit more secure after
   getting some scientists to work on the problem for them.
   Commercialization really took root. e-commerce got hot.

 - The air got let out of the dot-com boom and the economy. A bunch of
   dot-coms folded. It's hard to notice these kind of things, though: why
   were there so many "web designers" out there making pages that all
   looked the same, anyway?

 - Even though the tools for creating it have changed, "dynamic content"
   remains popular. That's what we're going to be talking about in this
   course.

HTTP:
-----

When you make a connection to a web server with your browser, you're
connecting to a port using a particularly silly protocol called HTTP
(Hypertext Transfer Protocol). You can't just yack at a machine and expect
it to know what you're talking about.

HTTP goes over the TCP/IP protocol. A TCP service sits on a particular port
on a server - and a port is just a number (there's usually a name
associated with it in /etc/services on a Unix machine). You use ports to
differentiate between services. 25 is SMTP (for email transport), 22 is
ssh, 23 is telnet, 79 is the finger service, and so on. HTTP usually sits at
port 80, but you can put it anywhere you like (and you will in this course).

You can connect to a service using the telnet program:
 <example with finger, say, on gargoyle>

HTTP expects a request. here's a simple example in HTTP version 1.1:

  telnet www 80

  GET / HTTP/1.1
  <press enter again>

As you can see, you get a whole bunch of stuff back, starting with some
headers:

HTTP/1.1 200 OK
Date: Mon, 26 Mar 2001 00:48:11 GMT
Server: Apache/1.3.12 (Unix) PHP/4.0.4
Last-Modified: Wed, 27 Sep 2000 17:42:36 GMT
ETag: "6b7eb-5b9-39d2318c"
Accept-Ranges: bytes
Content-Length: 1465
Connection: close
Content-Type: text/html
X-Pad: avoid browser bug

After that, an actual document follows. Note the Content-Type: part of the
header; it tells the browser about the document's format.

The original HTTP version, 1.0, had some severe performance issues. To
speed up loading of a bunch of images, Netscape decided that it would open
up a bunch of connections and pull a bunch of stuff off a server
simultaneously. This hogged resources on the server side, but it also
did badly on the client side; if you were on a slow modem connection and
opened a bunch of connections at once, they'd get in the way of each other,
making the actual page download even slower. In addition, after each
document transfer, the server would disconnect and your browser would need
to go through the pain of reconnecting. That was a serious drag if you
wanted to have lots of pictures on a web page.

Not only that, people noticed that you'd need a different IP address for
each web server name (increasingly called distinct "sites"), even though one
web daemon on a real computer is capable of serving several sites at once.
People started doing stupid things like sticking five ethernet interfaces
in a single machine, running different web server processes off each
interface. (I'd just like to note again that this is dumb.)

To get around these problems, HTTP 1.1 came about. Everyone uses it. It not
only supports "virtual hosts" but persistent connections. Here's an
example:

 telnet www 80
 GET / HTTP/1.1
 Host: www.cs.uchicago.edu
 <enter again>

Notice that the connection hasn't closed right away (it will in a while).

You get this stuff (explain it):

HTTP/1.1 200 OK
Date: Mon, 26 Mar 2001 01:15:42 GMT
Server: Apache/1.3.12 (Unix) PHP/4.0.4
X-Powered-By: PHP/4.0.4
Transfer-Encoding: chunked
Content-Type: text/html

URLs:
-----

URL stands for Uniform Resource Locator. It's that thing in the bar at the
top of your browser that usually starts with "http://". The thing after the
// is the server name. You can specify a different port (say, 3124) here
with

 http://www.example.com:3124/

Without the port number, the browser looks at port 80. Since you won't be
running your servers on port 80, you're going to need to remember this.

Web servers:
------------

A web server is a daemon that sits around listening to a TCP port. When it
gets an HTTP request, it parses that request and gives a response (which
we've already outlined already). Of course, it needs to figure out where to
find whatever the request asked for. In the old days, this was usually just
a file; the server would find the file, and after the HTTP response header,
just spit the file at the connection.

These days, the server may have to do more. A lot of content isn't in a
static file; it's a file that the server has to look at, figure out if
there's a program inside, and run that program, sending the program's
output back out the port and eventually to whatever made the request.

The most popular free server is called Apache (at http://www.apache.org/).

HTML:
-----

HTML is a bunch of tags. This part of the lecture will be all slides that I
ripped off from my advisor.