Cryptography, Security, and Stuff
---------------------------------

So this lecture is gonna be about security-related concerns in the context
of web servers. Specifically, I'm gonna talk about two big things:

1. Cryptography: More or less, the basis of trying to keep your secrets to
   yourself. We'll mostly talk about some basic techniques of how to send
   stuff around on a network without anyone being able to pry.

2. General security concerns: The ideas of how to keep your machine and
   server safe so that no one can get into it.

I used "Applied Cryptography" by Bruce Schneier to prepare part of this
lecture and these notes. Good book.

So getting to cryptography, it seems that ever since people learned to
write, they've been interested in keeping other people from looking at
their stuff. For a long time, this has been the domain of governments,
protecting all sorts of stuff. But as information (and a lot of it, at
that) becomes universally available over a common network, it might be time
to rethink this. The book Applied Cryptography puts it this way in the
preface:

"Do average people really need this kind of security? Yes. They may be
planning a political campaign, discussing taxes, or having an illicit
affair."

Yes, this seems to be the way that everyone who's interested in
cryptography thinks.

Older techniques usually involved substitution and transposition. A cipher
is an algorithm that you pump a message through to make it unreadable to
the casual observer. The Ceasar Cipher is one of the oldest types of
substitution ciphers. You just shift each letter to the right three
characters:

Thisisboring
Wklvlverulqj

Of course, this provides about as much security as a screen door on a
submarine. It hides no patterns. Other substitution ciphers are a little
harder for the average human to break, like a polyalphabetic substitution
cipher, which uses a different simple substitution cipher (like Caeser or
whatever) for each character position. The sequence of chiphers repeats
every N moves, and so that N is called the period of the cipher.

For some inexplicable reason, some software companies think that
polyalphabetic substitution ciphers are reasonable secure, and sell them
to unsuspecting consumers. Of course, you can break those codes pretty
quickly, especially with the help of a computer..

Transposition ciphers switch the order of letters around. In the simple
columnar transposition cipher, you arrange the letters in a matrix:

  thisisa
  verybor
  ingmess
  assgeas
  youcans
  ee

Then you read the letters off vertically:

tviayehensoeirgsusymgcibeeasoarsss

These things aren't particularly hard to break. Sometimes they're combined
them with substitution.

So far we've covered stuff used up to World War I. Then people got the idea
that machines might help them. Rotor machines were the most common. The
idea here was that you had a bunch of rotor wheels, and on each wheel there
was a simple substitution cipher. The rotors would be wired in series, and
the output of one would be the input to the next.

Normally this would just be a plain old substitution cipher with no extra
frills, but the rotor machines also did one extra thing: after each
substitution, the rotors would shift positions, so the next time, the
substition was different.

The German Enigma was such a rotor machine. It used three rotors and you
could choose the three out of five. There was a little transposition going
on, and each rotor operated on each letter twice. This was a nasty little
machine, but through espionage and a lot of hard work by cryptographers,
the Polish and British were able to break it. Frustrating matters was that
the Germans changed their machine during the war. There was one period
later in the war where the British lost the code for two months. During
those two months, the British sunk virtually no German U-Boats. When the
British broke the new code, they resumed their previous schedule of sinking
dozens per month. One wonders if the Germans had figured out that the
British were reading their messages at this point - it would seem kind of
obvious, but the war was at a somewhat late stage at this point and the
Germans were already getting desperate. Since most of the relevant people
are dead now, it's unlikely that we'll ever know.

With all this talk about breaking these techniques, one might wonder if
there is actually anything that is in fact unbreakable. There is. It's
called a one-time pad.

The idea is that you have a pad of paper, and each sheet has one random
letter or number on it. You take your message, and for each letter in your
message, you add the next one on your pad. Then you destroy the pad.

Your recipient has the same pad. They use the same letters to decrypt the
message. And then they destroy their pad. If properly carried out, there is
no way you can go from the encrypted message to the decoded message with no
further information. But it's difficult:

- The numbers must be truly random, not pseudo-random like a computer would
  generate. This is hard, to say the least. Pseudo-random numbers have a
  period, kind of like the period of a cipher.
- Only the sender and recipient must have access to the pad.
- Sender and recipient must be on the same pad pages or all bets are off.
- Message size and pad size are the same.

We do know that some Soviet agents used one-time pads, and that they're
rumored to be used elsewhere. But aside from going back in time and making
a copy of the pads, there's no way to figure out what the actual message is
with only the encrypted message.

So, okay, what is DES, you might ask? And why are some people so suspicious
of the NSA?

DES (Data Encryption Standard) was the first public encryption standard for
computers. Back in the 1970s, NBS/NIST (National Institute of Standards and
Technology) decided that it was time that businesses and individuals needed
to have a common encryption standard, and that it ought to be pretty
secure. They adopted a proposal from IBM, who decided that they would give
away this standard.

In addition, there were a whole bunch of conditions:

- Algorithm had to be completely public. Anyone should be able to write it.
- Worked with keys; the keys should provide security.
- Had to be flexible, multi-purpose.
- Cheap.
- Someone had to validate that it was indeed secure.
- Exportable.

Well, NBS (NIST) asked the NSA (National Security Agency) for help on that
"validation" part for the specification. So IBM shipped the thing off to NSA
and waited.

The algorithm had these so-called "S-Boxes" (Substitution boxes, something
along the same lines as a rotor but much more complex) that had a fixed
numbers. But when NSA shipped it back to IBM, they had changed the S-Boxes,
and declared that the algorithm was pretty secure. What happened?

People became very suspicious of the NSA. They wondered if they hadn't
built in some sort of trap door with those S-Boxes. IBM checked out the new
S-Boxes and they passed all of IBM's tests. But NBS/NIST certified it--NSA
basically has the world's best cryptographers--and for years people have
been working on ways to break DES. Brute-force attacks have become really
the only effective way to do it with the 56-bit keys that the NSA certified.
But in 1990, Biham and Shamir figured out a new way to attack DES and other
ciphers, a technique called differential cryptanalysis.

Without going into details, this worked great in theory, but for some
reason, DES was extremely resistant to it. How was this possible? This
technique was remarkably effective at cracking other older algortihms. The
reason was that the NSA had already figured out differential cryptanalysis
maybe 20 years before Biham and Shamir did. Remember, from above - the
NSA has the world's best cryptographers. Therefore, the NSA modified the
original S-Boxes to make them more resistant to differential cryptanalysis.

One might wonder if the NSA knows of more powerful techniques of attacking
DES. Well, they won't tell us--just like they won't tell about all of their
homegrown ciphers. It's probably reasonable to say that they didn't know of
anything particularly useful back in 1974 when the original design was
made, because they did certify based on the knowledge of differential
cryptanalysis. But it's been a long time since then.

The NSA does not normally talk about algorithms. They are in the business
of secrets. So why did they certify DES? The most plausible story is that
the NSA didn't know that NBS/NIST intended to make the algorithm public.

There has not been a repeat performance by the NSA. This kind of leaves us
in a lurch, because we don't know if the NSA has ways to attack the other
public algorithms out there like RSA Public-Key Encryption and RC4.

Public-Key Encryption
---------------------
Diffie and Hellman proposed public-key encryption, a new method
altogether, in 1976. NSA says that they thought of it in 1966 but we
don't know if they're telling the truth. The idea is that you have a
public key and a private key. You keep your private key to yourself and
publish your public key.

If someone wants to send you a message, they encrypt with your public key,
and you can read it with your private key. But given the encrypted message
and the public key, you can't get to the decoded message.

Public-key methods are really a lot slower than other techniques, and as a
result, they aren't used for big transmissions of messages. Instead,
they're used to encrypt the keys of other, faster encryption techniques,
like RC4. In fact, this is what you'll find on the web.

In RSA (Rivest, Shamir, Adleman) public-key encryption, the public key is a
combination of the product n of two really big prime numbers p and q, and a
number e which is relatively prime to (p - 1)*(q - 1). The private key
involves (p - 1)*(q - 1) and e. You DO NOT let p and q known. In fact, you
can throw them away once you have the public and private keys. But since
the private key involved knowing what the two primes are, if you can factor
n, you have the private key. So that's why people are so interested in
factoring really big numbers that are the products of big primes. Hard
problem, as it turns out.

Digital Signatures and Certificates
-----------------------------------
There's a neat thing you can do with RSA and some other kinds of public-key
methods. You can actually encrypt with the private key instead of the
public key, and decrypt with the public key.

This would be good for verifying the authenticity of some remote
person--assuming that you really know that their public key is theirs and
not some imposters. This is (more or less) how certificates work on the
web:

1. You send your public key to a certificate authority (CA).
2. They use their private key to sign it (using MD4 hashing, or whatever),
   making it into a certificate.
3. You send your certificate out when someone makes a connection.
4. That someone uses the CA's public key (in their own certificate) to
   verify that certificate really does belong to your site.

If you click on that little lock in Netscape, you get all sorts of
information. You can look at the CAs. Notice that a certificate from a CA
has an expiry time.

These certificates are hardcoded into Netscape. It doesn't need to go out
on the network to find them (and even if it did, how could it trust the
network?). But therefore, after some time, like 2010 for this one for
example, you'll all need to go out and get new browsers with new CAs in them
or Netscape will really give you some nasty messages.

Because they're hardcoded into Netscape, this causes some interesting
problems. First, no one else can be a CA other than the ones currently in
most web browsers. Because they were the first on the scene, this
essentially means this one company, Verisign, is by far and away the
biggest CA. This is more or less unfair to any upstart competition--oh
well!

Also, what if someone steals your private key, or what if a site with a
certificate goes rogue? Well, if someone steals your private key, that
means that they'll be able to read the messages bound for that particular
server _only_ -- you can't move the certificate from machine to machine
because the certificate has the server name in it. So if someone steals
your private key, then you can't do anything about anything sent in the
past that they may have snooped, but if you know about it, you can stop
using your certificate and get a new one.

If a site goes "rogue" and really starts doing illegal things with encrypted
data, then the unsuspecting user only has two defenses. The first is with
the CA. Part of the CA's job is to do a background check on the site,
including finding out who is responsible. If they can provide the cops with
this information, then that's good (and also, it'd be really hard to
put up a fixed hostname as specified in a certificate and not be able to
trace where it comes from). But also, certificates don't last very long;
after a year, the certificate expires.

Now, of course, we'd be REALLY screwed if someone stole a CA's private key
that they use to make the certificates.

Netscape SSL in a few short words, then:
----------------------------------------
The CA system is part of SSL, the system that we have Netscape
Communications Corporation to thank for. SSL is the Secure Socket Layer,
and is supposed to work like normal sockets, except encrypted.

When you first initiate an SSL connection, you first verify the certificate
of the remote site. The remote site verifies you in a similar manner. And
then through a lot of bickering and hoohah, you agree on a session ID and
agree on some keys (in a thoroughly complicated manner which I won't
describe) and a cipher to use for that session (like RC4, 3DES, or
whatever), and then you start talking with that cipher.

When you reconnect to the server, you can use the same session ID and then
both client and server can resume talking with the cipher in the state
where it once was without all of the key exchange and stuff.


How to set up a server:
Look at http://www.modssl.org/example/. The apache and modssl source are in
my home directory (/home/bri); extract them into /tmp/you where you is your
login name.

$ ./configure --prefix=$HOME/cs552/apache-modssl --with-apache=../apache_1.3.19 --with-ssl=/opt/openssl/openssl-0.9.6

Then wait. Then do as the instructions tell you, with this one extra step.

$ cd ../apache_1.3.19
$ SSL_BASE=/opt/openssl/openssl-0.9.6 ./configure --prefix=$HOME/cs552/apache-modssl --enable-module=so --enable-module=ssl
$ make
$ make certificate
 (fill out the stuff, make SURE that you get the server name right.)
$ make install

Note that this syntax works for bash, and probably not tcsh. If your shell
is tcsh run bash before doing it. (tcsh sucks.)

then you can start it with $HOME/cs552/apache-modssl/bin/apachectl startssl
or something.

But first edit the httpd.conf configuration file. Do everything that you did
for the first homework. Change the 8443 to a port other than the one you use
for your regular server. Eliminate all lines with references to port 8080. 

Then you can access your secure server at

    https://mymachine.cs.uchicago.edu:myport/

The "https" is essential.