Cryptography, Security, and Stuff --------------------------------- So this lecture is gonna be about security-related concerns in the context of web servers. Specifically, I'm gonna talk about two big things: 1. Cryptography: More or less, the basis of trying to keep your secrets to yourself. We'll mostly talk about some basic techniques of how to send stuff around on a network without anyone being able to pry. 2. General security concerns: The ideas of how to keep your machine and server safe so that no one can get into it. I used "Applied Cryptography" by Bruce Schneier to prepare part of this lecture and these notes. Good book. So getting to cryptography, it seems that ever since people learned to write, they've been interested in keeping other people from looking at their stuff. For a long time, this has been the domain of governments, protecting all sorts of stuff. But as information (and a lot of it, at that) becomes universally available over a common network, it might be time to rethink this. The book Applied Cryptography puts it this way in the preface: "Do average people really need this kind of security? Yes. They may be planning a political campaign, discussing taxes, or having an illicit affair." Yes, this seems to be the way that everyone who's interested in cryptography thinks. Older techniques usually involved substitution and transposition. A cipher is an algorithm that you pump a message through to make it unreadable to the casual observer. The Ceasar Cipher is one of the oldest types of substitution ciphers. You just shift each letter to the right three characters: Thisisboring Wklvlverulqj Of course, this provides about as much security as a screen door on a submarine. It hides no patterns. Other substitution ciphers are a little harder for the average human to break, like a polyalphabetic substitution cipher, which uses a different simple substitution cipher (like Caeser or whatever) for each character position. The sequence of chiphers repeats every N moves, and so that N is called the period of the cipher. For some inexplicable reason, some software companies think that polyalphabetic substitution ciphers are reasonable secure, and sell them to unsuspecting consumers. Of course, you can break those codes pretty quickly, especially with the help of a computer.. Transposition ciphers switch the order of letters around. In the simple columnar transposition cipher, you arrange the letters in a matrix: thisisa verybor ingmess assgeas youcans ee Then you read the letters off vertically: tviayehensoeirgsusymgcibeeasoarsss These things aren't particularly hard to break. Sometimes they're combined them with substitution. So far we've covered stuff used up to World War I. Then people got the idea that machines might help them. Rotor machines were the most common. The idea here was that you had a bunch of rotor wheels, and on each wheel there was a simple substitution cipher. The rotors would be wired in series, and the output of one would be the input to the next. Normally this would just be a plain old substitution cipher with no extra frills, but the rotor machines also did one extra thing: after each substitution, the rotors would shift positions, so the next time, the substition was different. The German Enigma was such a rotor machine. It used three rotors and you could choose the three out of five. There was a little transposition going on, and each rotor operated on each letter twice. This was a nasty little machine, but through espionage and a lot of hard work by cryptographers, the Polish and British were able to break it. Frustrating matters was that the Germans changed their machine during the war. There was one period later in the war where the British lost the code for two months. During those two months, the British sunk virtually no German U-Boats. When the British broke the new code, they resumed their previous schedule of sinking dozens per month. One wonders if the Germans had figured out that the British were reading their messages at this point - it would seem kind of obvious, but the war was at a somewhat late stage at this point and the Germans were already getting desperate. Since most of the relevant people are dead now, it's unlikely that we'll ever know. With all this talk about breaking these techniques, one might wonder if there is actually anything that is in fact unbreakable. There is. It's called a one-time pad. The idea is that you have a pad of paper, and each sheet has one random letter or number on it. You take your message, and for each letter in your message, you add the next one on your pad. Then you destroy the pad. Your recipient has the same pad. They use the same letters to decrypt the message. And then they destroy their pad. If properly carried out, there is no way you can go from the encrypted message to the decoded message with no further information. But it's difficult: - The numbers must be truly random, not pseudo-random like a computer would generate. This is hard, to say the least. Pseudo-random numbers have a period, kind of like the period of a cipher. - Only the sender and recipient must have access to the pad. - Sender and recipient must be on the same pad pages or all bets are off. - Message size and pad size are the same. We do know that some Soviet agents used one-time pads, and that they're rumored to be used elsewhere. But aside from going back in time and making a copy of the pads, there's no way to figure out what the actual message is with only the encrypted message. So, okay, what is DES, you might ask? And why are some people so suspicious of the NSA? DES (Data Encryption Standard) was the first public encryption standard for computers. Back in the 1970s, NBS/NIST (National Institute of Standards and Technology) decided that it was time that businesses and individuals needed to have a common encryption standard, and that it ought to be pretty secure. They adopted a proposal from IBM, who decided that they would give away this standard. In addition, there were a whole bunch of conditions: - Algorithm had to be completely public. Anyone should be able to write it. - Worked with keys; the keys should provide security. - Had to be flexible, multi-purpose. - Cheap. - Someone had to validate that it was indeed secure. - Exportable. Well, NBS (NIST) asked the NSA (National Security Agency) for help on that "validation" part for the specification. So IBM shipped the thing off to NSA and waited. The algorithm had these so-called "S-Boxes" (Substitution boxes, something along the same lines as a rotor but much more complex) that had a fixed numbers. But when NSA shipped it back to IBM, they had changed the S-Boxes, and declared that the algorithm was pretty secure. What happened? People became very suspicious of the NSA. They wondered if they hadn't built in some sort of trap door with those S-Boxes. IBM checked out the new S-Boxes and they passed all of IBM's tests. But NBS/NIST certified it--NSA basically has the world's best cryptographers--and for years people have been working on ways to break DES. Brute-force attacks have become really the only effective way to do it with the 56-bit keys that the NSA certified. But in 1990, Biham and Shamir figured out a new way to attack DES and other ciphers, a technique called differential cryptanalysis. Without going into details, this worked great in theory, but for some reason, DES was extremely resistant to it. How was this possible? This technique was remarkably effective at cracking other older algortihms. The reason was that the NSA had already figured out differential cryptanalysis maybe 20 years before Biham and Shamir did. Remember, from above - the NSA has the world's best cryptographers. Therefore, the NSA modified the original S-Boxes to make them more resistant to differential cryptanalysis. One might wonder if the NSA knows of more powerful techniques of attacking DES. Well, they won't tell us--just like they won't tell about all of their homegrown ciphers. It's probably reasonable to say that they didn't know of anything particularly useful back in 1974 when the original design was made, because they did certify based on the knowledge of differential cryptanalysis. But it's been a long time since then. The NSA does not normally talk about algorithms. They are in the business of secrets. So why did they certify DES? The most plausible story is that the NSA didn't know that NBS/NIST intended to make the algorithm public. There has not been a repeat performance by the NSA. This kind of leaves us in a lurch, because we don't know if the NSA has ways to attack the other public algorithms out there like RSA Public-Key Encryption and RC4. Public-Key Encryption --------------------- Diffie and Hellman proposed public-key encryption, a new method altogether, in 1976. NSA says that they thought of it in 1966 but we don't know if they're telling the truth. The idea is that you have a public key and a private key. You keep your private key to yourself and publish your public key. If someone wants to send you a message, they encrypt with your public key, and you can read it with your private key. But given the encrypted message and the public key, you can't get to the decoded message. Public-key methods are really a lot slower than other techniques, and as a result, they aren't used for big transmissions of messages. Instead, they're used to encrypt the keys of other, faster encryption techniques, like RC4. In fact, this is what you'll find on the web. In RSA (Rivest, Shamir, Adleman) public-key encryption, the public key is a combination of the product n of two really big prime numbers p and q, and a number e which is relatively prime to (p - 1)*(q - 1). The private key involves (p - 1)*(q - 1) and e. You DO NOT let p and q known. In fact, you can throw them away once you have the public and private keys. But since the private key involved knowing what the two primes are, if you can factor n, you have the private key. So that's why people are so interested in factoring really big numbers that are the products of big primes. Hard problem, as it turns out. Digital Signatures and Certificates ----------------------------------- There's a neat thing you can do with RSA and some other kinds of public-key methods. You can actually encrypt with the private key instead of the public key, and decrypt with the public key. This would be good for verifying the authenticity of some remote person--assuming that you really know that their public key is theirs and not some imposters. This is (more or less) how certificates work on the web: 1. You send your public key to a certificate authority (CA). 2. They use their private key to sign it (using MD4 hashing, or whatever), making it into a certificate. 3. You send your certificate out when someone makes a connection. 4. That someone uses the CA's public key (in their own certificate) to verify that certificate really does belong to your site. If you click on that little lock in Netscape, you get all sorts of information. You can look at the CAs. Notice that a certificate from a CA has an expiry time. These certificates are hardcoded into Netscape. It doesn't need to go out on the network to find them (and even if it did, how could it trust the network?). But therefore, after some time, like 2010 for this one for example, you'll all need to go out and get new browsers with new CAs in them or Netscape will really give you some nasty messages. Because they're hardcoded into Netscape, this causes some interesting problems. First, no one else can be a CA other than the ones currently in most web browsers. Because they were the first on the scene, this essentially means this one company, Verisign, is by far and away the biggest CA. This is more or less unfair to any upstart competition--oh well! Also, what if someone steals your private key, or what if a site with a certificate goes rogue? Well, if someone steals your private key, that means that they'll be able to read the messages bound for that particular server _only_ -- you can't move the certificate from machine to machine because the certificate has the server name in it. So if someone steals your private key, then you can't do anything about anything sent in the past that they may have snooped, but if you know about it, you can stop using your certificate and get a new one. If a site goes "rogue" and really starts doing illegal things with encrypted data, then the unsuspecting user only has two defenses. The first is with the CA. Part of the CA's job is to do a background check on the site, including finding out who is responsible. If they can provide the cops with this information, then that's good (and also, it'd be really hard to put up a fixed hostname as specified in a certificate and not be able to trace where it comes from). But also, certificates don't last very long; after a year, the certificate expires. Now, of course, we'd be REALLY screwed if someone stole a CA's private key that they use to make the certificates. Netscape SSL in a few short words, then: ---------------------------------------- The CA system is part of SSL, the system that we have Netscape Communications Corporation to thank for. SSL is the Secure Socket Layer, and is supposed to work like normal sockets, except encrypted. When you first initiate an SSL connection, you first verify the certificate of the remote site. The remote site verifies you in a similar manner. And then through a lot of bickering and hoohah, you agree on a session ID and agree on some keys (in a thoroughly complicated manner which I won't describe) and a cipher to use for that session (like RC4, 3DES, or whatever), and then you start talking with that cipher. When you reconnect to the server, you can use the same session ID and then both client and server can resume talking with the cipher in the state where it once was without all of the key exchange and stuff. How to set up a server: Look at http://www.modssl.org/example/. The apache and modssl source are in my home directory (/home/bri); extract them into /tmp/you where you is your login name. $ ./configure --prefix=$HOME/cs552/apache-modssl --with-apache=../apache_1.3.19 --with-ssl=/opt/openssl/openssl-0.9.6 Then wait. Then do as the instructions tell you, with this one extra step. $ cd ../apache_1.3.19 $ SSL_BASE=/opt/openssl/openssl-0.9.6 ./configure --prefix=$HOME/cs552/apache-modssl --enable-module=so --enable-module=ssl $ make $ make certificate (fill out the stuff, make SURE that you get the server name right.) $ make install Note that this syntax works for bash, and probably not tcsh. If your shell is tcsh run bash before doing it. (tcsh sucks.) then you can start it with $HOME/cs552/apache-modssl/bin/apachectl startssl or something. But first edit the httpd.conf configuration file. Do everything that you did for the first homework. Change the 8443 to a port other than the one you use for your regular server. Eliminate all lines with references to port 8080. Then you can access your secure server at https://mymachine.cs.uchicago.edu:myport/ The "https" is essential.