Homework 0: Shift Cipher
Due Tuesday, June 20, 2023 at 11:59pm
In this short assignment, you will implement a Shift Cipher and Letter Frequency Analysis. The objective of this homework is threefold: first, to get your hands dirty with C programming, second, to troubleshoot your coding environment before we embark on larger projects, and third, to test our submission process. This assignment is a lot shorter than it seems.
- Read the entire assignment first before you start
- Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
- Do not hesitate to seek help if you are stuck
Synopsis
Shift Cipher: A shift cipher, also known as Caesar cipher, is a simple substitution cipher where each letter in the plaintext is shifted a certain number of places down or up the alphabet. We have provided you a program that encrypts a message as a part of the starter code. In the first part of the assignment, you will write a C program that decrypts a ciphertext given a shift amount.
Letter Frequency Analysis: The reason why we can't use shift cipher to encrypt, say, your bank information is that it is easily breakable. Letter frequency analysis is one such method to break shift cipher. Each language has a distinct characteristic based on the frequency of letters. In English, for example, 'E' is the most common letter. Since shift cipher preserves the distribution of letters, the most common letter in the encrypted text is likely to be 'E' in the original text; we can then extrapolate the shift amount from this. In this part, you will write a C program that calculates and displays the frequency of each letter in a given text file.
Written: You are asked to do some experiments and answer some simple questions.
Learning Objectives:
- Write, compile, and test your first C program
- Learn the structures of
if
statements,while
loops, and a function - Learn basic C I/O
- Get comfortable with terminal
- Familiarize yourself with the course infrastructure.
Getting started
Since this is your first assignment in C, we have provided a fairly extensive guide for getting started on this assignment. In the subsequent assignments, this section will be much more succinct.
Getting the starter code
In the repository that you created for this class (if you have not, see this page), run the following commands to add the starter code repository to your upstream:
Then, pull from upstream:
If you rungit status
now, you may see a warning message upstream is gone
.
You can ignore that warning.
You can upload the starter to your own repository:
Pay attention to any error messages that you might encounter and please ask for help if you run into any problems.
Your local repository and remote repository on Github should now contain
README.md
and hw0
directory.
What have we provided to you?
We have provided a large amount of starter code to get your started; note, again, that assignments in the future will not have such treatment. Please read all provided code carefully and make sure you understand what they do before you start coding.
WRITTEN.txt
: this is where you will write all your written response.encrypt.py
: a reference implementation of the encryption program in Pythondecrypt.py
: a reference implementation of the decryption program in Pythonfreq.py
: a reference implementation of the frequency analysis in Pythonencrypt.c
: a finished implementation of the encryption program in Ctests
: a directory that contains unit tests for yourdecrypt
andfreq
programsbig-tests
: a directory that contains large text documents for overall testingsecrets
: a directory that contains some secret messages for you to decrypt
Running the provided code
The provided programs are all complete and ready to run. At this point, you
should be very familiar with how to run a Python program, but to reiterate:
when you are inside hw0
directory, typing python3 encrypt.py
will launch a
Python interpretor that executes the program encrypt.py
. The program will not
produce any prompt, wait for the user to type in the terminal, and encrypt the
letters to standard out until there is no more input.
For example, when I type hello
<enter key>
bye
<enter key>
<Ctrl-D>
,
it produces the following transcript in the terminal.
<Ctrl-D>
<Ctrl-D>
is an instruction that tells the terminal that we finish typing
or that we have reached the "end of file" of the standard input.
Note that Mac users should also use <Ctrl>
instead of the usual <Cmd>
key.
Without typing <Ctrl-D>
, a program that expects to read until the end of
the file will always wait because there could be more input coming.
To run the C program, we need to compile it first. Running the following command
will produce an executable called encrypt
. You can check that encrypt
is
indeed in your directory by running ls
.
Then, the following command runs ("executes") the executable.
To be consistent in testing, one may wish to use the content of a file as input and save the output to a file instead of just displaying on the terminal. We use redirection to achieve this.
% python3 encrypt.py < tests/all-letters.txt
iwt fjxrz qgdlc udm yjbeh dktg iwt apon sdv.
IWT FJXRZ QGDLC UDM YJBEH DKTG IWT APON SDV.
The symbol <
redirects the content of the file tests/all-letters.txt
to the
standard input; in other words, it opens the file and pretends as if you typed
the content of the file into the terminal character by character.
Similarly, the symbol >
does the other direction. It saves what would be
displayed to the terminal to a file that you specify.
test.txt
Naturally, you can use both <
and >
at the same time, which uses the content
of a file as the input and saves the content of the output to a file.
Producing reference output
Your first task in this assignment then is to save the reference encryption
and frequency output of all tests in the tests
directory, using .enc
and
.freq
extension respectively. You need to produce the following files
containing the encryption and the frequency output of the corresponding tests.
tests/all-letters.enc
tests/all-letters.freq
tests/all-non-letter.enc
tests/all-non-letter.freq
tests/empty.enc
tests/empty.freq
tests/hello.enc
tests/hello.freq
big-tests/pride-and-prejudice.enc
big-tests/pride-and-prejudice.freq
big-tests/the-great-gatsby.enc
big-tests/the-great-gatsby.freq
Program Specification
Read the specification carefully. In this course, the programs' output is
well-specified, and we will compare your output against the reference
solution's character by character. Your solution can fail a test just because
it outputs an extra space at the end, for example. In addition to looking
for differences visually, use diff
to compare your output against the
reference.
Your second task in this assignment is to write two short programs in this
assignment, decrypt.c
and freq.c
as specified below.
The default shift amount, the KEY
, in this assignment is 15 unless otherwise
specified.
Part 1: Decryption
In a file named decrypt.c
in the hw0
directory, write a C program that reads
all texts from the standard input, shifts the characters down as follows, and
outputs the encrypted text to standard output.
For each character read from the standard input, your program should output one of the following
- if the character is a lowercase letter, shifts the letter towards 'a' by a
predetermined amount (
KEY
), wrapping to the back of the alphabet if shifting past 'a'; - if the character is an UPPERCASE letter, shifts the letter towards 'A' by a
predetermined amount (
KEY
), wrapping to the back of the alphabet if shifting past 'A'; - otherwise, output the character as is.
A lowercase letter is one of the following:1
An UPPERCASE letter is one of the following:
Your program should terminate with exit code 0 after reading and outputting all characters.
Your program should compile without warnings or errors by the following command:
What do all these flags mean?
-std=c11
specifies the C standard we are using; in this course, we use C11.-Wall
turns on "all" warnings-Wextra
turns on more warnings-pedantic
turns on even more warnings-o decrypt
tells the compiler to generate an executable nameddecrypt
Part 2: Letter Frequency Analysis
In a file named freq.c
in the hw0
directory, write a C program that reads
all characters from the standard input and prints the frequency of each letter
occurring in the read text.
A letter is either a lowercase letter or an UPPERCASE letter as defined above; a letter occurs in the text if either its lower case or its UPPER case occurs. The frequency of a letter in percentage is 100 times the quotient of the number of its occurrences divided by the number of letters.
The output written to the standard output must follow the following format:
- For each letter, in alphabetical order, print a line consisting of:
- the UPPER case of the letter
- a single additional space
- a right-justified six-digit number giving the frequency of that letter in percentage, rounded to nearest hundredth. The six-digit number consists of three digits before the decimal point, the decimal point, and two digits after the decimal point. Trailing zeros are included but leading zeros are not except for numbers less than 1.
A sample output is as follows:
A 7.67
B 1.58
C 2.27
D 3.93
E 11.97
F 2.13
G 1.80
H 6.32
I 6.67
J 0.12
K 0.93
L 4.47
M 2.91
N 6.44
O 8.22
P 1.52
Q 0.10
R 6.23
S 6.59
T 8.75
U 3.39
V 1.00
W 2.37
X 0.13
Y 2.46
Z 0.04
Hint
Do not do the last part yourself. printf
has built-in format for that,
look it up.
Your program should terminate with exit code 0 after printing the frequencies.
Your program should compile without warnings or errors by the following command:
Testing
Given the reference output produced in Producing reference output,
you can these files to compare against the output generated by your programs
using diff
For example, to test the C implementation of encrypt
against the reference
implementation in Python, one can run the following:
$ clang -std=c11 -Wall -Wextra -pedantic -o encrypt encrypt.c
$ ./encrypt < tests/all-letters.txt > test-output.enc
$ diff tests/all-letters.enc test-output.enc
$ rm test-output.enc # this step is optional but keep the directory tidy
diff
produces no output, that means there is no difference between your
program's output and the reference output, which means your program is correct.
Repeat this step for all tests until you are confident.
Written
The last task in this assignment is to complete some tasks and answer some
questions as instructed in WRITTEN.txt
.
Submission checklist
Everything below is inside hw0
directory of your coursework repository.
decrypt.c
contains your decryption program, and it compiles with the specified flags without warnings or errorsfreq.c
contains your decryption program, and it compiles with the specified flags without warnings or errorsKEY
is set to 15 for both implementations ofencrypt
anddecrypt
WRITTEN.md
is finished- all files specified in Producing reference output are present
- all changes are committed and pushed to your github repository
Submit your program to Gradescope by selecting your coursework directory and the correct branch.
Grading
Percentage | |
---|---|
Correctness | 70% |
Style | 20% |
Written | 10% |
Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.
-
Sorry è, é, ê, ë and many other letters with diacritics and non-Latin letters. ↩