Skip to content

Homework 0: Shift Cipher

Due Tuesday, June 20, 2023 at 11:59pm

In this short assignment, you will implement a Shift Cipher and Letter Frequency Analysis. The objective of this homework is threefold: first, to get your hands dirty with C programming, second, to troubleshoot your coding environment before we embark on larger projects, and third, to test our submission process. This assignment is a lot shorter than it seems.

  • Read the entire assignment first before you start
  • Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
  • Do not hesitate to seek help if you are stuck

Synopsis

Shift Cipher: A shift cipher, also known as Caesar cipher, is a simple substitution cipher where each letter in the plaintext is shifted a certain number of places down or up the alphabet. We have provided you a program that encrypts a message as a part of the starter code. In the first part of the assignment, you will write a C program that decrypts a ciphertext given a shift amount.

Letter Frequency Analysis: The reason why we can't use shift cipher to encrypt, say, your bank information is that it is easily breakable. Letter frequency analysis is one such method to break shift cipher. Each language has a distinct characteristic based on the frequency of letters. In English, for example, 'E' is the most common letter. Since shift cipher preserves the distribution of letters, the most common letter in the encrypted text is likely to be 'E' in the original text; we can then extrapolate the shift amount from this. In this part, you will write a C program that calculates and displays the frequency of each letter in a given text file.

Written: You are asked to do some experiments and answer some simple questions.

Learning Objectives:

  • Write, compile, and test your first C program
  • Learn the structures of if statements, while loops, and a function
  • Learn basic C I/O
  • Get comfortable with terminal
  • Familiarize yourself with the course infrastructure.

Getting started

Since this is your first assignment in C, we have provided a fairly extensive guide for getting started on this assignment. In the subsequent assignments, this section will be much more succinct.

Getting the starter code

In the repository that you created for this class (if you have not, see this page), run the following commands to add the starter code repository to your upstream:

git remote add upstream git@github.com:cmsc143-smr-2023/cs143-coursework-starter.git

Then, pull from upstream:

git pull upstream main
If you run git status now, you may see a warning message upstream is gone. You can ignore that warning.

You can upload the starter to your own repository:

git push

Pay attention to any error messages that you might encounter and please ask for help if you run into any problems.

Your local repository and remote repository on Github should now contain README.md and hw0 directory.

What have we provided to you?

We have provided a large amount of starter code to get your started; note, again, that assignments in the future will not have such treatment. Please read all provided code carefully and make sure you understand what they do before you start coding.

  • WRITTEN.txt: this is where you will write all your written response.
  • encrypt.py: a reference implementation of the encryption program in Python
  • decrypt.py: a reference implementation of the decryption program in Python
  • freq.py: a reference implementation of the frequency analysis in Python
  • encrypt.c: a finished implementation of the encryption program in C
  • tests: a directory that contains unit tests for your decrypt and freq programs
  • big-tests: a directory that contains large text documents for overall testing
  • secrets: a directory that contains some secret messages for you to decrypt

Running the provided code

The provided programs are all complete and ready to run. At this point, you should be very familiar with how to run a Python program, but to reiterate: when you are inside hw0 directory, typing python3 encrypt.py will launch a Python interpretor that executes the program encrypt.py. The program will not produce any prompt, wait for the user to type in the terminal, and encrypt the letters to standard out until there is no more input.

For example, when I type hello <enter key> bye <enter key> <Ctrl-D>, it produces the following transcript in the terminal.

$ python3 encrypt.py
hello
wtaad
bye
qnt

<Ctrl-D>

<Ctrl-D> is an instruction that tells the terminal that we finish typing or that we have reached the "end of file" of the standard input. Note that Mac users should also use <Ctrl> instead of the usual <Cmd> key. Without typing <Ctrl-D>, a program that expects to read until the end of the file will always wait because there could be more input coming.

To run the C program, we need to compile it first. Running the following command will produce an executable called encrypt. You can check that encrypt is indeed in your directory by running ls.

$ clang -o encrypt encrypt.c

Then, the following command runs ("executes") the executable.

$ ./encrypt
hello
wtaad
bye
qnt

To be consistent in testing, one may wish to use the content of a file as input and save the output to a file instead of just displaying on the terminal. We use redirection to achieve this.

% python3 encrypt.py < tests/all-letters.txt
iwt fjxrz qgdlc udm yjbeh dktg iwt apon sdv.
IWT FJXRZ QGDLC UDM YJBEH DKTG IWT APON SDV.

The symbol < redirects the content of the file tests/all-letters.txt to the standard input; in other words, it opens the file and pretends as if you typed the content of the file into the terminal character by character.

Similarly, the symbol > does the other direction. It saves what would be displayed to the terminal to a file that you specify.

$ ./encrypt > test.txt
hello
bye
Doing this will save the following content to the file test.txt
wtaad
qnt
Naturally, you can use both < and > at the same time, which uses the content of a file as the input and saves the content of the output to a file.

Producing reference output

Your first task in this assignment then is to save the reference encryption and frequency output of all tests in the tests directory, using .enc and .freq extension respectively. You need to produce the following files containing the encryption and the frequency output of the corresponding tests.

tests/all-letters.enc
tests/all-letters.freq
tests/all-non-letter.enc
tests/all-non-letter.freq
tests/empty.enc
tests/empty.freq
tests/hello.enc
tests/hello.freq
big-tests/pride-and-prejudice.enc
big-tests/pride-and-prejudice.freq
big-tests/the-great-gatsby.enc
big-tests/the-great-gatsby.freq

Program Specification

Read the specification carefully. In this course, the programs' output is well-specified, and we will compare your output against the reference solution's character by character. Your solution can fail a test just because it outputs an extra space at the end, for example. In addition to looking for differences visually, use diff to compare your output against the reference.

Your second task in this assignment is to write two short programs in this assignment, decrypt.c and freq.c as specified below.

The default shift amount, the KEY, in this assignment is 15 unless otherwise specified.

Part 1: Decryption

In a file named decrypt.c in the hw0 directory, write a C program that reads all texts from the standard input, shifts the characters down as follows, and outputs the encrypted text to standard output.

For each character read from the standard input, your program should output one of the following

  1. if the character is a lowercase letter, shifts the letter towards 'a' by a predetermined amount (KEY), wrapping to the back of the alphabet if shifting past 'a';
  2. if the character is an UPPERCASE letter, shifts the letter towards 'A' by a predetermined amount (KEY), wrapping to the back of the alphabet if shifting past 'A';
  3. otherwise, output the character as is.

A lowercase letter is one of the following:1

abcdefghijklmnopqrstuvwxyz

An UPPERCASE letter is one of the following:

ABCDEFGHIJKLMNOPQRSTUVWXYZ

Your program should terminate with exit code 0 after reading and outputting all characters.

Your program should compile without warnings or errors by the following command:

clang -std=c11 -Wall -Wextra -pedantic -o decrypt decrypt.c

What do all these flags mean?

  • -std=c11 specifies the C standard we are using; in this course, we use C11.
  • -Wall turns on "all" warnings
  • -Wextra turns on more warnings
  • -pedantic turns on even more warnings
  • -o decrypt tells the compiler to generate an executable named decrypt

Part 2: Letter Frequency Analysis

In a file named freq.c in the hw0 directory, write a C program that reads all characters from the standard input and prints the frequency of each letter occurring in the read text.

A letter is either a lowercase letter or an UPPERCASE letter as defined above; a letter occurs in the text if either its lower case or its UPPER case occurs. The frequency of a letter in percentage is 100 times the quotient of the number of its occurrences divided by the number of letters.

The output written to the standard output must follow the following format:

  • For each letter, in alphabetical order, print a line consisting of:
    • the UPPER case of the letter
    • a single additional space
    • a right-justified six-digit number giving the frequency of that letter in percentage, rounded to nearest hundredth. The six-digit number consists of three digits before the decimal point, the decimal point, and two digits after the decimal point. Trailing zeros are included but leading zeros are not except for numbers less than 1.

A sample output is as follows:

A   7.67
B   1.58
C   2.27
D   3.93
E  11.97
F   2.13
G   1.80
H   6.32
I   6.67
J   0.12
K   0.93
L   4.47
M   2.91
N   6.44
O   8.22
P   1.52
Q   0.10
R   6.23
S   6.59
T   8.75
U   3.39
V   1.00
W   2.37
X   0.13
Y   2.46
Z   0.04

Hint

Do not do the last part yourself. printf has built-in format for that, look it up.

Your program should terminate with exit code 0 after printing the frequencies.

Your program should compile without warnings or errors by the following command:

clang -std=c11 -Wall -Wextra -pedantic -o freq freq.c

Testing

Given the reference output produced in Producing reference output, you can these files to compare against the output generated by your programs using diff

For example, to test the C implementation of encrypt against the reference implementation in Python, one can run the following:

$ clang -std=c11 -Wall -Wextra -pedantic -o encrypt encrypt.c
$ ./encrypt < tests/all-letters.txt > test-output.enc
$ diff tests/all-letters.enc test-output.enc
$ rm test-output.enc # this step is optional but keep the directory tidy
If diff produces no output, that means there is no difference between your program's output and the reference output, which means your program is correct. Repeat this step for all tests until you are confident.

Written

The last task in this assignment is to complete some tasks and answer some questions as instructed in WRITTEN.txt.

Submission checklist

Everything below is inside hw0 directory of your coursework repository.

  • decrypt.c contains your decryption program, and it compiles with the specified flags without warnings or errors
  • freq.c contains your decryption program, and it compiles with the specified flags without warnings or errors
  • KEY is set to 15 for both implementations of encrypt and decrypt
  • WRITTEN.md is finished
  • all files specified in Producing reference output are present
  • all changes are committed and pushed to your github repository

Submit your program to Gradescope by selecting your coursework directory and the correct branch.

Grading

Percentage
Correctness 70%
Style 20%
Written 10%

Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.


  1. Sorry è, é, ê, ë and many other letters with diacritics and non-Latin letters.