Lab 2
Details for MPCS 56600
Each
lab will consist of a small problem and details of
how to proceed. Each lab is intended to give every student hands-on
experience with the core concepts and technologies covered during the course.
A student may concentrate, as a team member, on one technology over
another for the final project, but labs are designed to give each and
every student exposure to all the technologies that come into
play. You need to submit labs to the TAs for
grading--see submission instructions below.
Generally, unless otherwise specified, you will have one week to
complete each assigned lab.
See
the syllabus for information on grading. Turning in lab
assignments on time is required, without exception, and all late deliveries will be penalized,
regardless of cause.
Submit your
assignments to the subversion repository according to the
directions on the syllabus page.
You
may write these solutions in any programming language of your
choice. Our suggestion is now is not the time to learn a new
programming language along with the concepts themselves. So our
suggestion is to use whatever programming language you know best.
Lab 2
Due: 5:00 pm, Thursday, July 5, 2018
Problem
1: Brute Force Cryptanalysis:
BACKGROUND:
Like
all
programming problems, learning a new technology is not an exercise in
reading but rather an exercise in thinking and typing. This lab is designed
to
give you more hands-on experience in some fundamental skills in cryptography. You
will generally find the References section below helpful in addition to
the required and recommended reading.
One of the best ways to go about understand how ciphertext is created is
by deciphering it. This lab is designed to give you experience in
doing precisely that.
ADVICE: READ through the entire lab before beginning work...
WHAT YOU NEED TO DO:
STEP 1:
A brute force approach to deciphering a rotation cipher
is straightforward. You simply run through the set of rotation
factors [0-26], and print out each result. Then, look at the 26
different results, and find the one that looks like English. Let's
encrypt the string "Now is the winter of our discontent made glorious
summer". Let's use a rotation of 8. Our encrypted text is thus:
vwe qa bpm eqvbmz wn wcz lqakwvbmvb uilm otwzqwca acuumz
Using a brute force method outlined in class, we would roll through the
various possibilities, incrementing the rotation factor from 01 to 26, and derive these results:
01: WXF RB CQN FRWCNA XO XDA MRBLXWCNWC VJMN PUXARXDB BDVVNA
02: XYG SC DRO GSXDOB YP YEB NSCMYXDOXD WKNO QVYBSYEC CEWWOB
03: YZH TD ESP HTYEPC ZQ ZFC OTDNZYEPYE XLOP RWZCTZFD DFXXPC
04: ZAI UE FTQ IUZFQD AR AGD PUEOAZFQZF YMPQ SXADUAGE EGYYQD
05: ABJ VF GUR JVAGRE BS BHE QVFPBAGRAG ZNQR TYBEVBHF FHZZRE
06: BCK WG HVS KWBHSF CT CIF RWGQCBHSBH AORS UZCFWCIG GIAASF
07: CDL XH IWT LXCITG DU DJG SXHRDCITCI BPST VADGXDJH HJBBTG
08: DEM YI JXU MYDJUH EV EKH TYISEDJUDJ CQTU WBEHYEKI IKCCUH
09: EFN ZJ KYV NZEKVI FW FLI UZJTFEKVEK DRUV XCFIZFLJ JLDDVI
10: FGO AK LZW OAFLWJ GX GMJ VAKUGFLWFL ESVW YDGJAGMK KMEEWJ
11: GHP BL MAX PBGMXK HY HNK WBLVHGMXGM FTWX ZEHKBHNL LNFFXK
12: HIQ CM NBY QCHNYL IZ IOL XCMWIHNYHN GUXY AFILCIOM MOGGYL
13: IJR DN OCZ RDIOZM JA JPM YDNXJIOZIO HVYZ BGJMDJPN NPHHZM
14: JKS EO PDA SEJPAN KB KQN ZEOYKJPAJP IWZA CHKNEKQO OQIIAN
15: KLT FP QEB TFKQBO LC LRO AFPZLKQBKQ JXAB DILOFLRP PRJJBO
16: LMU GQ RFC UGLRCP MD MSP BGQAMLRCLR KYBC EJMPGMSQ QSKKCP
17: MNV HR SGD VHMSDQ NE NTQ CHRBNMSDMS LZCD FKNQHNTR RTLLDQ
18: NOW IS THE WINTER OF OUR DISCONTENT MADE GLORIOUS SUMMER
19: OPX JT UIF XJOUFS PG PVS EJTDPOUFOU NBEF HMPSJPVT TVNNFS
20: PQY KU VJG YKPVGT QH QWT FKUEQPVGPV OCFG INQTKQWU UWOOGT
21: QRZ LV WKH ZLQWHU RI RXU GLVFRQWHQW PDGH JORULRXV VXPPHU
22: RSA MW XLI AMRXIV SJ SYV HMWGSRXIRX QEHI KPSVMSYW WYQQIV
23: STB NX YMJ BNSYJW TK TZW INXHTSYJSY RFIJ LQTWNTZX XZRRJW
24: TUC OY ZNK COTZKX UL UAX JOYIUTZKTZ SGJK MRUXOUAY YASSKX
25: UVD PZ AOL DPUALY VM VBY KPZJVUALUA THKL NSVYPVBZ ZBTTLY
If we scan through the output of gobbledygook, notice line number 18.
That looks like English! And very fine English, as it's
Shakespeare's English. That's basically it as far as a brute force
method is concerned on a rotation cipher.
Your first assignment:
We just got in the following cipher text which we believe was encrypted using a Caesarean rotation cipher:
maxzxxlxmatmetbwmaxzhewxgxzzlunmgxoxkvtvdexw
1. Code a program in your programming language of choice
(Java/haskell/C#/python/Go/ruby, etc.) and decipher the encrypted text above using a brute force method.
2. One of the 26 lines of output should look like English (without spaces...)
3. At the end of your program, print out the correct rotation factor (you can hard-code this after discovering the correct rotation factor)
4. At the end of the
program, print out the plaintext (you can hard-code this after
discovering the correct rotation factor)
5. And at the end of the program, print out who said this (you can hard-code this after discovering the correct rotation factor)
Submit your source code for your solution as described below under "Submitting."
Problem 2: Statistical Cryptanalysis:
BACKGROUND:
Like
all
programming problems, learning a new technology is not an exercise in
reading but rather an exercise in thinking and typing. This lab is designed
to
give you more hands-on experience in some fundamental skills in cryptography. You
will generally find the References section below helpful in addition to
the required and recommended reading.
WHAT YOU NEED TO DO:
STEP 1:
We have received a text document (here)
that contains encrypted data we believe were created with a rotation
cipher. You will use a letter frequency analysis approach to decrypt this
text. But first, keep reading...
You may use the letter frequency tables below (these are
drawn from this wikipedia article on this subject):
Relative frequencies of letters in text Relative frequencies ordered by frequency
According to Robert Lewand (Cryptological Mathematics, The
Mathematical Association of America, 2000), arranged from most to least
common in appearance, the English letters are as follows:
etaoinshrdlcumwfgypbvkjxqz
Meaning that the English letter 'e' is most common, and the English letter 'z' is least common across multiple inputs.
Using the common letter frequencies above, and ignoring all numbers,
spaces and punctuation and other non-alphabetic characters, your job will be to decrypt a
message to plain text using statistical letter frequency analysis. Luckily,
our ciphertext is long enough to provide sufficient statistical significance required for this type of frequency analysis.
There are a couple of approaches to this problem. One is to create
two lists, one, a list of the letter frequencies in our actual
encrypted ciphertext data that we are analyzing (which you would need to produce
from the encrypted data), let's call this list Lc, and a second list of the standard letter frequencies above (etaoinshrdlcumwfgypbvkjxqz), let's call this Ls, and create a new string that replaces the most common letter in the encrypted data (x) with the corresponding (x) standard order letter, such that: Lc[x] = Ls[x]. So for example, assume that our plain text is this (it's not, but assume):
Meet in lobby at ten: "meetinlobbyatten" (we will ignore spaces and punctuation for the time being).
And let's assume an array of 26
lowercase letters [a-z], and let's further assume that this text has been enciphered with a ROT9
encryption, where each character in the ciphertext is rotated 9
characters in the plaintext modulus 26, yielding the following
ciphertext (ignoring case):
meetinlobbyatten =>
dvvkzecfssprkkve
Since we know the rotation, we know that the first letter 'd' (ASCII
100d) in the ciphertext is really the letter 'm' (ASCII 100 + 9 = 109 =
'm'), as we see in the first letter of the plaintext. The next
letter in the ciphertext (again, ignoring spaces and punctuation) is 'v'
(ASCII 118d), which corresponds to the letter 'e' in the plaintext
(ASCII (101-118) mod26 [we are assuming an array of 26
lowercase letters a-z]).
If we roll through the ciphertext creating a frequency distribution of
just the the letters in the ciphertext, we come up with the following
order:
{'v': 3, 'k': 3, 'e': 2, 's': 2, 'd': 1, 'z': 1, 'c': 1, 'f': 1, 'p': 1, 'r': 1}
There
is a nice pdf that discusses this type of letter frequency analysis
(but on text more complicated than ours here, which is not statistically
significant to provide fodder for our statistical analysis algorithm)
located here.
The author, Steven Gordon, has even written a shell script that will
perform multiple types of encryption and decryption, and it is located
here.
Download it, save it to a file called "crypt", and make it
executable, and play around with it, such as:
1. Perform a brute force attack on our sample ciphertext (notice line 17 of the brute force method):
for i in `seq 0 25`; do echo -n "$i "; crypto caesar dec
dvvkzecfssprkkve $i ; done ;
2. Or, performing the decryption knowing the offset value:
crypto caesar dec dvvkzecfssprkkve 17
Play around with the tool and read the pdf mentioned above. And then, move on to:
A second, and preferable way to approach to this type of problem is perform a statistical analysis such as the following:
1. Produce the frequencies of the letters in the encrypted text (using again the same simple ciphertext of "dvvkzecfssprkkve", ranked from most frequent to least frequent, along with their counts:
{'v': 3, 'k': 3, 'e': 2, 's': 2, 'd': 1, 'z': 1, 'c': 1, 'f': 1, 'p': 1, 'r': 1}
2. Find the distance (shift) from the most common letter in the
ciphertext (in the case of a tie, just choose the first) to the most
common letter in the English Frequency Table, which is as we know, the
letter 'e'. In this case, the shift from 'v' to 'e' mod 26 is
9 (note 17 + 9 = 26). This gives us a ROT9(x) for this ciphertext. (See the table below for the full shift values ROT9).
3. Create two lists (or combine/zip two lists, a 2-dimensional
array, or a map, your call...), one with the normal English alphabet of
lowercase letters a-z as in the first row below, let's call that List1. Next, create a ROT9 list and call this List2, as in the second row below. The "match point" in List2 is at the shift distance 9 (again, the shift from 'v' to 'e' mod 26 is
9), specifically, letter 'v' in List1 is mapped to the letter 'e' in List2, and the rest of List2 is just wrapped around to complete: viz. 'w' is mapped to 'f', 'x' is mapped to 'g', 'y' is mapped to 'h', 'z' is mapped to 'i', and we wrap around to map 'a' to 'j', etc.
Thus we will have two lists that we wish to combine or map:
4. Then simply roll through the ciphertext character by character,
and replace each character encountered in the ciphertext with it's
corresponding English letter indexed by the position of the ciphertext
letter from the plaintext alphabet in List1.
So let's work through the first four substitutions in the
ciphertext. Our simple ciphertext again is "dvvkzecfssprkkve", and the first
four letters in the ciphertext are "dvvk". Here's how we would do
this:
Using the table above, we look up 'd' in List1, and find it's index, which is 3. We would then replace the text at ciphertext[0] with the letter at List2[3], which is 'm'. That's the first letter of our plaintext. Next, look up 'v' in List1, and find it's index, which is 21. We would then replace the text at ciphertext[1] with the letter at List2[21],
which is 'e'. That's the second letter of our plaintext.
We'd do the same thing with the second 'v' in our first for ciphertext
letters, replacing the character at ciphertext[2] with the letter at List2[21], which is 'e'. That's our third letter. Finally, for our four letters, we'd look up 'k' in List1, and find it's index, which is 10. We would then replace the text at ciphertext[3] with the letter at List2[10], which is 't'. That's our fourth letter.
Our decrypted string so far is "meet". That's pretty good.
We'd continue on doing the same thing until we have gone through the
entire ciphertext, and our result would be:
"meetinlobbyatten"
Here is a picture of the ROT9(x) table for our example (P=Plaintext, C=Ciphertext):
By doing this form of statistical substitution, we leave intact any punctuation in the ciphertext, which will carry over into the decrypted plaintext just fine.
Ok, now that we know what we're doing, we got an intercept in this morning, and below is the following ciphertext, and we believe it was enciphered using a caesarean cipher:
"r
glivcp gvvi kf gvvi mvijzfe fw vcvtkifezt trjy nflcu rccfn feczev
grpdvekj kf sv jvek uzivtkcp wifd fev grikp kf refkyvi nzkyflk xfzex
kyiflxy r wzeretzrc zejkzklkzfe. uzxzkrc jzxerklivj gifmzuv grik fw kyv
jfclkzfe, slk kyv drze svevwzkj riv cfjk zw r kiljkvu kyziu grikp zj
jkzcc ivhlzivu kf givmvek uflscv-jgveuzex. nv gifgfjv r jfclkzfe kf kyv
uflscv-jgveuzex gifscvd ljzex r gvvi-kf-gvvi evknfib. kyv evknfib
kzdvjkrdgj kirejrtkzfej sp yrjyzex kyvd zekf re fexfzex tyrze fw
yrjy-srjvu giffw-fw-nfib, wfidzex r ivtfiu kyrk treefk sv tyrexvu
nzkyflk ivufzex kyv giffw-fw-nfib. kyv cfexvjk tyrze efk fecp jvimvj rj
giffw fw kyv jvhlvetv fw vmvekj nzkevjjvu, slk giffw kyrk zk trdv wifd
kyv crixvjk gffc fw tgl gfnvi. rj cfex rj r drafizkp fw tgl gfnvi zj
tfekifccvu sp efuvj kyrk riv efk tffgvirkzex kf rkkrtb kyv evknfib,
kyvp'cc xvevirkv kyv cfexvjk tyrze reu flkgrtv rkkrtbvij. kyv evknfib
zkjvcw ivhlzivj dzezdrc jkiltkliv. dvjjrxvj riv sifrutrjk fe r svjk
vwwfik srjzj, reu efuvj tre cvrmv reu ivafze kyv evknfib rk nzcc,
rttvgkzex kyv cfexvjk giffw-fw-nfib tyrze rj giffw fw nyrk yrggvevu
nyzcv kyvp nviv xfev."
Using the statistical approach outlined above:
1. Code a program in your programming language of choice
(Java/haskell/C#/python/Go/ruby, etc.) and decipher the encrypted text
above using the statistical method discussed in this section, and have
your program print out the plaintext.
2. Print out who said this and in what work (you can hard-code this after deciphering the plaintext)
3. Print out the rotation factor (you can hard-code this after discovering the rotation factor)
4. Submit your source code for your solution per the instructions below.
Problem 3: Hash Functions:
BACKGROUND:
Like
all
programming problems, learning a new technology is not an exercise in
reading but rather an exercise in thinking and typing. This lab is designed
to
give you more hands-on experience in some fundamental skills in cryptography. You
will generally find the References section below helpful in addition to
the required and recommended reading.
WHAT YOU NEED TO DO:
STEP 1:
Identify the
SHA-256 hash functions in your language's libraries. You can find
examples of SHA-256 functions in multiple languages in my pub directory located here:
~mark/pub/56600/src.lecture/lecture.2/sha256/examples/*.
You are to take the following two sets of data:
{
0000000036dc2ce23cdd934eff4bae120155de8b8712de8489c8870b06e334ff,
c0ba5ba1c5ba02fb02c4ea93ec16cae7ea1994b5b3f9ef0e293b841081eb3003,
the current calculated timestamp in seconds in Unix Epoch format (i.e., numeric)
1.00000,
5251252
}
{
00000000d84724559f1691d916b2ed63a44884d495d155197647ce7667116b16,
69a14e6b050d10d6621faee3dac6682809feb0ffa76320b33c5c09f1059f06c7,
the current calculated timestamp in seconds in Unix Epoch format (i.e., numeric)
1.00000,
124867778
}
You are to produce two separate bytestreams of these two sets of data. Then you are to write a program in your programming language of choice (Java/haskell/C#/python/Go/ruby, etc.) that
will serialize the data into a bytestream for each set of data (using
whatever capabilities your language has for doing so), and do
the following:
1.
Double SHA-256 hash the first set of bytestream info (will look something
like sha256(sha256(set1_bytestream))) and store the resulting digest in a variable
called sha_left. Print out sha_left.
2. Double SHA-256 the second set of bytestream info (will look
something like sha256(sha256(set2_bytestream))) and store the resulting digest in a
variable called sha_right. Print out sha_right.
3. Single SHA-256 hash your two hashed results, sha_left and sha_right, which will look
something like sha256(sha_left + sha_right). [Note you are concatenating
two 32-byte hex strings of numbers here, not mathematically adding
them.]. Store the resulting digest in a variable called
root. Print out root. [Note that when testing, your output
will change between runs because the timestamp will likely be
different...]
Save your program, test it, and submit it as per the instructions below.
References:
ASCII Table: https://www.asciitable.com
Letter Frequency Help: https://sandilands.info/sgordon/classical-ciphers-frequency-analysis-examples
Steven Gordon's pdf : https://sandilands.info/sgordon/doc/classical-ciphers-frequency-analysis-examples
Submitting:
Use the folder named "lab2" in your Subversion repository. See the syllabus for more info about submission using Subversion.
Upload your Lab 2 code and any supporting materials to the repo.
Please include a README text file to explain what parts of the work you are submitting, where the different parts of the work are located and
please provide a little info on how to compile and run your code.