MPCS_51221-Lab

Lab 2 Details for MPCS 56600

Each lab will consist of a small problem and details of how to proceed. Each lab is intended to give every student hands-on experience with the core concepts and technologies covered during the course. A student may concentrate, as a team member, on one technology over another for the final project, but labs are designed to give each and every student exposure to all the technologies that come into play. You need to submit labs to the TAs for grading--see submission instructions below. Generally, unless otherwise specified, you will have one week to complete each assigned lab.

See the syllabus for information on grading. Turning in lab assignments on time is required, without exception, and all late deliveries will be penalized, regardless of cause. Submit your assignments to the subversion repository according to the directions on the syllabus page.

You may write these solutions in any programming language of your choice. Our suggestion is now is not the time to learn a new programming language along with the concepts themselves. So our suggestion is to use whatever programming language you know best.

Lab 2 Due: 5:00 pm, Thursday, July 5, 2018

Problem 1: Brute Force Cryptanalysis:

BACKGROUND:

Like all programming problems, learning a new technology is not an exercise in reading but rather an exercise in thinking and typing. This lab is designed to give you more hands-on experience in some fundamental skills in cryptography. You will generally find the References section below helpful in addition to the required and recommended reading.

One of the best ways to go about understand how ciphertext is created is by deciphering it. This lab is designed to give you experience in doing precisely that.

ADVICE: READ through the entire lab before beginning work...

WHAT YOU NEED TO DO:

STEP 1:

A brute force approach to deciphering a rotation cipher is straightforward. You simply run through the set of rotation factors [0-26], and print out each result. Then, look at the 26 different results, and find the one that looks like English. Let's encrypt the string "Now is the winter of our discontent made glorious summer". Let's use a rotation of 8. Our encrypted text is thus:

vwe qa bpm eqvbmz wn wcz lqakwvbmvb uilm otwzqwca acuumz

Using a brute force method outlined in class, we would roll through the various possibilities, incrementing the rotation factor from 01 to 26, and derive these results:

01: WXF RB CQN FRWCNA XO XDA MRBLXWCNWC VJMN PUXARXDB BDVVNA
02: XYG SC DRO GSXDOB YP YEB NSCMYXDOXD WKNO QVYBSYEC CEWWOB
03: YZH TD ESP HTYEPC ZQ ZFC OTDNZYEPYE XLOP RWZCTZFD DFXXPC
04: ZAI UE FTQ IUZFQD AR AGD PUEOAZFQZF YMPQ SXADUAGE EGYYQD
05: ABJ VF GUR JVAGRE BS BHE QVFPBAGRAG ZNQR TYBEVBHF FHZZRE
06: BCK WG HVS KWBHSF CT CIF RWGQCBHSBH AORS UZCFWCIG GIAASF
07: CDL XH IWT LXCITG DU DJG SXHRDCITCI BPST VADGXDJH HJBBTG
08: DEM YI JXU MYDJUH EV EKH TYISEDJUDJ CQTU WBEHYEKI IKCCUH
09: EFN ZJ KYV NZEKVI FW FLI UZJTFEKVEK DRUV XCFIZFLJ JLDDVI
10: FGO AK LZW OAFLWJ GX GMJ VAKUGFLWFL ESVW YDGJAGMK KMEEWJ
11: GHP BL MAX PBGMXK HY HNK WBLVHGMXGM FTWX ZEHKBHNL LNFFXK
12: HIQ CM NBY QCHNYL IZ IOL XCMWIHNYHN GUXY AFILCIOM MOGGYL
13: IJR DN OCZ RDIOZM JA JPM YDNXJIOZIO HVYZ BGJMDJPN NPHHZM
14: JKS EO PDA SEJPAN KB KQN ZEOYKJPAJP IWZA CHKNEKQO OQIIAN
15: KLT FP QEB TFKQBO LC LRO AFPZLKQBKQ JXAB DILOFLRP PRJJBO
16: LMU GQ RFC UGLRCP MD MSP BGQAMLRCLR KYBC EJMPGMSQ QSKKCP
17: MNV HR SGD VHMSDQ NE NTQ CHRBNMSDMS LZCD FKNQHNTR RTLLDQ
18: NOW IS THE WINTER OF OUR DISCONTENT MADE GLORIOUS SUMMER
19: OPX JT UIF XJOUFS PG PVS EJTDPOUFOU NBEF HMPSJPVT TVNNFS
20: PQY KU VJG YKPVGT QH QWT FKUEQPVGPV OCFG INQTKQWU UWOOGT
21: QRZ LV WKH ZLQWHU RI RXU GLVFRQWHQW PDGH JORULRXV VXPPHU
22: RSA MW XLI AMRXIV SJ SYV HMWGSRXIRX QEHI KPSVMSYW WYQQIV
23: STB NX YMJ BNSYJW TK TZW INXHTSYJSY RFIJ LQTWNTZX XZRRJW
24: TUC OY ZNK COTZKX UL UAX JOYIUTZKTZ SGJK MRUXOUAY YASSKX
25: UVD PZ AOL DPUALY VM VBY KPZJVUALUA THKL NSVYPVBZ ZBTTLY

If we scan through the output of gobbledygook, notice line number 18. That looks like English! And very fine English, as it's Shakespeare's English. That's basically it as far as a brute force method is concerned on a rotation cipher.

Your first assignment:

We just got in the following cipher text which we believe was encrypted using a Caesarean rotation cipher:

maxzxxlxmatmetbwmaxzhewxgxzzlunmgxoxkvtvdexw

1. Code a program in your programming language of choice (Java/haskell/C#/python/Go/ruby, etc.) and decipher the encrypted text above using a brute force method.
2. One of the 26 lines of output should look like English (without spaces...)
3. At the end of your program, print out the correct rotation factor (you can hard-code this after discovering the correct rotation factor)
4. At the end of the program, print out the plaintext (you can hard-code this after discovering the correct rotation factor)
5. And at the end of the program, print out who said this (you can hard-code this after discovering the correct rotation factor)

Submit your source code for your solution as described below under "Submitting."

Problem 2: Statistical Cryptanalysis:

BACKGROUND:

WHAT YOU NEED TO DO:

STEP 1:

We have received a text document (here) that contains encrypted data we believe were created with a rotation cipher. You will use a letter frequency analysis approach to decrypt this text. But first, keep reading...

You may use the letter frequency tables below (these are drawn from this wikipedia article on this subject):

                                                                                                         Relative frequencies of letters in text                          Relative frequencies ordered by frequency

According to Robert Lewand (Cryptological Mathematics, The Mathematical Association of America, 2000), arranged from most to least common in appearance, the English letters are as follows:

    etaoinshrdlcumwfgypbvkjxqz

Meaning that the English letter 'e' is most common, and the English letter 'z' is least common across multiple inputs.

Using the common letter frequencies above, and ignoring all numbers, spaces and punctuation and other non-alphabetic characters, your job will be to decrypt a message to plain text using statistical letter frequency analysis. Luckily, our ciphertext is long enough to provide sufficient statistical significance required for this type of frequency analysis.

There are a couple of approaches to this problem. One is to create two lists, one, a list of the letter frequencies in our actual encrypted ciphertext data that we are analyzing (which you would need to produce from the encrypted data), let's call this list L_c, and a second list of the standard letter frequencies above (etaoinshrdlcumwfgypbvkjxqz), let's call this L_s, and create a new string that replaces the most common letter in the encrypted data (x) with the corresponding (x) standard order letter, such that:   L_c[x] = L_s[x]_. So for example, assume that our plain text is this (it's not, but assume):

Meet in lobby at ten: "meetinlobbyatten" (we will ignore spaces and punctuation for the time being).

And let's assume an array of 26 lowercase letters [a-z], and let's further assume that this text has been enciphered with a ROT₉ encryption, where each character in the ciphertext is rotated 9 characters in the plaintext modulus 26, yielding the following ciphertext (ignoring case):

    meetinlobbyatten => dvvkzecfssprkkve

Since we know the rotation, we know that the first letter 'd' (ASCII 100d) in the ciphertext is really the letter 'm' (ASCII 100 + 9 = 109 = 'm'), as we see in the first letter of the plaintext. The next letter in the ciphertext (again, ignoring spaces and punctuation) is 'v' (ASCII 118d), which corresponds to the letter 'e' in the plaintext (ASCII (101-118) mod26 [we are assuming an array of 26 lowercase letters a-z]).

If we roll through the ciphertext creating a frequency distribution of just the the letters in the ciphertext, we come up with the following order:

{'v': 3, 'k': 3, 'e': 2, 's': 2, 'd': 1, 'z': 1, 'c': 1, 'f': 1, 'p': 1, 'r': 1}

There is a nice pdf that discusses this type of letter frequency analysis (but on text more complicated than ours here, which is not statistically significant to provide fodder for our statistical analysis algorithm) located here. The author, Steven Gordon, has even written a shell script that will perform multiple types of encryption and decryption, and it is located here.

Download it, save it to a file called "crypt", and make it executable, and play around with it, such as:

1. Perform a brute force attack on our sample ciphertext (notice line 17 of the brute force method):

            for i in `seq 0 25`; do echo -n "$i "; crypto caesar dec dvvkzecfssprkkve $i ; done ;

2. Or, performing the decryption knowing the offset value:

            crypto caesar dec dvvkzecfssprkkve 17

Play around with the tool and read the pdf mentioned above. And then, move on to:

A second, and preferable way to approach to this type of problem is perform a statistical analysis such as the following:

1. Produce the frequencies of the letters in the encrypted text (using again the same simple ciphertext of "dvvkzecfssprkkve", ranked from most frequent to least frequent, along with their counts:

    {'v': 3, 'k': 3, 'e': 2, 's': 2, 'd': 1, 'z': 1, 'c': 1, 'f': 1, 'p': 1, 'r': 1}

2. Find the distance (shift) from the most common letter in the ciphertext (in the case of a tie, just choose the first) to the most common letter in the English Frequency Table, which is as we know, the letter 'e'. In this case, the shift from 'v' to 'e' mod 26 is 9 (note 17 + 9 = 26). This gives us a ROT₉(x) for this ciphertext. (See the table below for the full shift values ROT₉).

3. Create two lists (or combine/zip two lists, a 2-dimensional array, or a map, your call...), one with the normal English alphabet of lowercase letters a-z as in the first row below, let's call that List₁. Next, create a ROT₉ list and call this List₂, as in the second row below. The "match point" in List₂ is at the shift distance 9 (again, the shift from 'v' to 'e' mod 26 is 9), specifically, letter 'v' in List₁ is mapped to the letter 'e' in List₂, and the rest of List₂ is just wrapped around to complete: viz. 'w' is mapped to 'f', 'x' is mapped to 'g', 'y' is mapped to 'h', 'z' is mapped to 'i', and we wrap around to map 'a' to 'j', etc.

Thus we will have two lists that we wish to combine or map:

4. Then simply roll through the ciphertext character by character, and replace each character encountered in the ciphertext with it's corresponding English letter indexed by the position of the ciphertext letter from the plaintext alphabet in List₁.

So let's work through the first four substitutions in the ciphertext. Our simple ciphertext again is "dvvkzecfssprkkve", and the first four letters in the ciphertext are "dvvk". Here's how we would do this:

Using the table above, we look up 'd' in List₁, and find it's index, which is 3. We would then replace the text at ciphertext[0] with the letter at List₂[3], which is 'm'. That's the first letter of our plaintext. Next, look up 'v' in List₁, and find it's index, which is 21. We would then replace the text at ciphertext[1] with the letter at List₂[21], which is 'e'. That's the second letter of our plaintext. We'd do the same thing with the second 'v' in our first for ciphertext letters, replacing the character at ciphertext[2] with the letter at List₂[21], which is 'e'. That's our third letter. Finally, for our four letters, we'd look up 'k' in List₁, and find it's index, which is 10. We would then replace the text at ciphertext[3] with the letter at List₂[10], which is 't'. That's our fourth letter. Our decrypted string so far is "meet". That's pretty good. We'd continue on doing the same thing until we have gone through the entire ciphertext, and our result would be:

"meetinlobbyatten"

Here is a picture of the ROT₉(x) table for our example (P=Plaintext, C=Ciphertext):

By doing this form of statistical substitution, we leave intact any punctuation in the ciphertext, which will carry over into the decrypted plaintext just fine.

Ok, now that we know what we're doing, we got an intercept in this morning, and below is the following ciphertext, and we believe it was enciphered using a caesarean cipher:

"r glivcp gvvi kf gvvi mvijzfe fw vcvtkifezt trjy nflcu rccfn feczev grpdvekj kf sv jvek uzivtkcp wifd fev grikp kf refkyvi nzkyflk xfzex kyiflxy r wzeretzrc zejkzklkzfe. uzxzkrc jzxerklivj gifmzuv grik fw kyv jfclkzfe, slk kyv drze svevwzkj riv cfjk zw r kiljkvu kyziu grikp zj jkzcc ivhlzivu kf givmvek uflscv-jgveuzex. nv gifgfjv r jfclkzfe kf kyv uflscv-jgveuzex gifscvd ljzex r gvvi-kf-gvvi evknfib. kyv evknfib kzdvjkrdgj kirejrtkzfej sp yrjyzex kyvd zekf re fexfzex tyrze fw yrjy-srjvu giffw-fw-nfib, wfidzex r ivtfiu kyrk treefk sv tyrexvu nzkyflk ivufzex kyv giffw-fw-nfib. kyv cfexvjk tyrze efk fecp jvimvj rj giffw fw kyv jvhlvetv fw vmvekj nzkevjjvu, slk giffw kyrk zk trdv wifd kyv crixvjk gffc fw tgl gfnvi. rj cfex rj r drafizkp fw tgl gfnvi zj tfekifccvu sp efuvj kyrk riv efk tffgvirkzex kf rkkrtb kyv evknfib, kyvp'cc xvevirkv kyv cfexvjk tyrze reu flkgrtv rkkrtbvij. kyv evknfib zkjvcw ivhlzivj dzezdrc jkiltkliv. dvjjrxvj riv sifrutrjk fe r svjk vwwfik srjzj, reu efuvj tre cvrmv reu ivafze kyv evknfib rk nzcc, rttvgkzex kyv cfexvjk giffw-fw-nfib tyrze rj giffw fw nyrk yrggvevu nyzcv kyvp nviv xfev."

Using the statistical approach outlined above:

1. Code a program in your programming language of choice (Java/haskell/C#/python/Go/ruby, etc.) and decipher the encrypted text above using the statistical method discussed in this section, and have your program print out the plaintext.
2. Print out who said this and in what work (you can hard-code this after deciphering the plaintext)
3. Print out the rotation factor (you can hard-code this after discovering the rotation factor)
4. Submit your source code for your solution per the instructions below.

Problem 3: Hash Functions:

BACKGROUND:

WHAT YOU NEED TO DO:

STEP 1:

Identify the SHA-256 hash functions in your language's libraries. You can find examples of SHA-256 functions in multiple languages in my pub directory located here: ~mark/pub/56600/src.lecture/lecture.2/sha256/examples/*.

You are to take the following two sets of data:

{
     0000000036dc2ce23cdd934eff4bae120155de8b8712de8489c8870b06e334ff,
     c0ba5ba1c5ba02fb02c4ea93ec16cae7ea1994b5b3f9ef0e293b841081eb3003,
     the current calculated timestamp in seconds in Unix Epoch format (i.e., numeric)
     1.00000,
     5251252
}

{
     00000000d84724559f1691d916b2ed63a44884d495d155197647ce7667116b16,
     69a14e6b050d10d6621faee3dac6682809feb0ffa76320b33c5c09f1059f06c7,
     the current calculated timestamp in seconds in Unix Epoch format (i.e., numeric)
     1.00000,
     124867778
}

You are to produce two separate bytestreams of these two sets of data. Then you are to write a program in your programming language of choice (Java/haskell/C#/python/Go/ruby, etc.) that will serialize the data into a bytestream for each set of data (using whatever capabilities your language has for doing so), and do the following:

1. Double SHA-256 hash the first set of bytestream info (will look something like sha256(sha256(set1_bytestream))) and store the resulting digest in a variable called sha_left. Print out sha_left.
2. Double SHA-256 the second set of bytestream info (will look something like sha256(sha256(set2_bytestream))) and store the resulting digest in a variable called sha_right. Print out sha_right.
3. Single SHA-256 hash your two hashed results, sha_left and sha_right, which will look something like sha256(sha_left + sha_right). [Note you are concatenating two 32-byte hex strings of numbers here, not mathematically adding them.]. Store the resulting digest in a variable called root. Print out root. [Note that when testing, your output will change between runs because the timestamp will likely be different...]

Save your program, test it, and submit it as per the instructions below.

References:

ASCII Table: https://www.asciitable.com
Letter Frequency Help: https://sandilands.info/sgordon/classical-ciphers-frequency-analysis-examples
Steven Gordon's pdf : https://sandilands.info/sgordon/doc/classical-ciphers-frequency-analysis-examples

Submitting:

Use the folder named "lab2" in your Subversion repository. See the syllabus for more info about submission using Subversion. Upload your Lab 2 code and any supporting materials to the repo. Please include a README text file to explain what parts of the work you are submitting, where the different parts of the work are located and please provide a little info on how to compile and run your code.