Homework 1: Strings and Substrings
Due Monday, June 26, 2023 at 11:59pm
In this assignment, you will write a total of 5 functions in C. Although it doesn't seem like much, some of these functions require quite a bit of thinking and understanding.
- Read the entire assignment first before you start
- Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
- Do not hesitate to seek help if you are stuck
Synopsis
readline.h
: contains one function readline
that reads a single line from
a file. A file is separated into lines by the newline character ('\n'
); this
function reads the first line of arbitrary length into memory.
substring.h
: contains four functions related to substrings. A substring is
the characters between two pointers within a string. The functions that operate
on substrings include 1) printing a substring, 2) calculating the length of a
substring, 3) splitting a string into substrings by separators, and 4) joining
substrings into a string by separators.
Written: You will answer some simple questions at the end.
Learning Objectives:
- Allocating, accessing, and manipulating C strings of arbitrary length
- Get comfortable around pointers
- Practice heap allocation and manual memory management
Getting started
We will keep using the coursework
repository, same repository where you wrote
hw0
. First of all, you should run git status
to see if you have any
uncommitted changes; if so, commit
and push
them before proceeding.
Run git pull upstream main
. Doing this almost certainly triggers an automerge
by git
. When vim
is launched that shows the merge commit message, press
<esc>
:
x
<enter>
to save and quit the editor (look at the bottom-left
corner of your screen if you don't know where you're typing).
Run ls
to confirm that you have received a hw1
directory.
Pay attention to any error messages that you might encounter and please ask for help if you run into any problems.
What have we provided to you?
Complete:
Makefile
: a file that defines how to build your C program. See below for more detail.WRITTEN.txt
: this is where you will write the written part of the homework.cat.c
: a finished C program that concat
enates all files specified in the command line, similar to the terminal programcat
.cat.c
uses your implementation ofreadline
, and you can use this program to test yourreadline
implementation.diffw.c
: a finished C program that compares two files while ignoring the white spaces, similar todiff -w
. This file usesreadline
andsplit
function from your implementation ofsubstring.h
. You can use this program to test yoursplit
function.readline-test.c
: a finished unit test forreadline.h
using Criterion. See below for more detail.readline.h
: a header file that declares thereadline
function's signaturesubstring-test.c
: a finished unit test forreadline.h
using Criterion. See below for more detail.substring.h
: a header file that declares all functions in the substring module.
Incomplete:
substring.c
: this is where you will write your implementation of the substring module.readline.c
: this is where you will write your implementation ofreadline
tests/
: this directory contains some simple tests. You need to create your own tests.
You should read through cat.c
and diffw.c
and understand fully what they do
before writing your implementation.
Makefile
A Makefile is a powerful tool used primarily for managing the build process of
software projects. It works with the make
utility, which automatically builds
executable programs and libraries from source code by reading files called
Makefiles.
Makefile can be very complicated and it has a lot of potentials. For this assignment, you only need to know the following three commands:
make all
, or simplymake
, buildscat
anddiffw
executables.make test
buildsreadline-test
andsubstring-test
, linking the Criterion testing framework.make clean
cleans up all files produced bymake all
andmake test
.
As with hw0
, all your C code needs to be built with -Wall -Wextra -pedantic
-std=c11
. This is also what is specified in the Makefile. In addition,
-Werror
is added to turn warnings into errors.
You should run make all
and make test
now to see if you are able to create
all the executables without errors although running these executables will
almost certainly not work.
Running unit tests
In this course, we use the Criterion
framework to create unit tests for modules. In *-test.c
, you will see
definitions like
This specifies a test case. The body itself is any arbitrary C code that
assert
s some property about the module being tested via cr_assert
and
cr_expect
.
Running the produced executable without arguments will simply run all the test cases. However, we recommend you always run the tests with the following arguments for clearer results.
-j1
tells it that we want to run one test at a time. Normally, Criterion tries to run multiple tests concurrently.-f
tells it that we want to stop as soon as we fail a test instead of running the rest of the tests. This gives you an opportunity to figure out what went wrong before proceeding.
Specification
readline
A file is separated into lines by the newline character '\n'
. A line contains
at least one character. New lines begin at the start of the file and after each
newline character; the newline character is included in the line that it
terminates.
readline
reads the first unread line of the input stream from file
such that
repeated invocations of readline
eventually consumes the entire content of
file
.
The characters comprising the line are made into a C string (i.e. a
NUL-terminated array). The C string is allocated on the heap and the pointer to
the data is assigned to *line_p
. It is the caller's responsibility to free
the allocated data. The length of the string, which is the number of characters
in the array except the terminating '\0'
but including '\n'
if any, is
returned to the caller.
When readline
is called with no more data to read, it sets *line_p
to NULL
and returns 0.
When readline
is called with NULL file
or NULL line_p
, it may abort.
readline
must not leak memory.
You will implement this function in readline.c
. You may use any function from
the standard library except getline
.
substring
A substring is a structure consisting of a start
pointer, pointing to the
first character of the substring, and a end
pointer, pointing to the first
byte past the last character of the substring.
In this module, start
and end
are presumed to be valid pointers.
substring_print
prints all characters of the substring to file
. This
function may abort if file
is NULL
.
substring_length
returns the length of the substring in the number of
characters.
split
separates str
into an array of substrings by separators sep
.
The order of characters in sep
has no meanings — a character is a
separator if it appears in the string sep
.
The separators are not included in the substrings.
For example, splitting "/abc def/ghi"
based on separators "/ "
(or " /"
)
yields three substrings "abc"
, "def"
, and "ghi"
.
The array of substrings are allocator on the heap, whose pointer is assign to
*subs_p
. It is the caller's responsibility to free the array. The length of
the array is returned.
split
may abort if any of the following is true:
str
isNULL
sep
isNULL
orsep
is an empty stringsubs_p
isNULL
.
Given an array of substrings subs
of length length
, and a separator string
sep
, join
returns a heap-allocated string that has all substrings separated
by sep
. The caller of this function is responsible for freeing the
returned string.
join
may abort if subs
is NULL
or sep
is NULL
, but must handle the
case where sep
is an empty string.
Crashing vs Aborting
Crashing: An unexpected and abrupt termination of your C program due to errors like segmentation faults or division by zero. It's an indication of a bug and should be avoided at all costs, as it may result in data loss or system instability.
Aborting: A deliberate stoppage of your program when an unrecoverable error
or violation of assumptions is detected. It is preferable to crashing, as it's a
controlled exit and allows for post-mortem analysis. This is usually done by
using the assert(condition)
function in <assert.h>
— assert
will abort
the program if the condition
is not met.
In this course, unless otherwise specified, your program must not crash, and crashing in testing will incur a higher penalty than aborting or producing wrong results.
Testing
You should run the unit tests frequently throughout implementation. See Running unit tests for the commands to build and run.
Once your implementations pass all unit tests, you should build cat
and
diffw
for integration tests and for memory leaks.
The test/
directory includes three simple text files for some rudimentary
testing for cat
.
To effectively test diffw
, you need to create pairs of files to diff
between.
You need to add and submit three (3) more test cases for either cat
or
diffw
.
Make sure to check for memory leaks and errors by running
orYou should see valgrind
reports:
==XXXXXX== All heap blocks were freed -- no leaks are possible
==XXXXXX==
==XXXXXX== For lists of detected and suppressed errors, rerun with: -s
==XXXXXX== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
You should not modify cat.c
or diffw.c
if you see memory errors or leaks.
The two files are written according to the specification above. If you believe
that there is an error, please let me know as soon as possible!
Written
You need to answer some questions in WRITTEN.txt
.
Submission checklist
Everything below is inside hw1
directory of your coursework repository.
readline.c
contains your implementation of thereadline
function.substring.c
contains your implementation of thesubstring
module, which has four functions.make all
andmake test
produce no errors and successfully buildcat
,diffw
,substring-test
, andreadline-test
.WRITTEN.md
is finished.- Three test cases are added to
tests/
directory. - all changes are committed and pushed to your github repository
Submit your program to Gradescope by selecting your coursework directory and the correct branch.
Grading
Percentage | |
---|---|
Correctness | 70% |
Style | 20% |
Written | 10% |
Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.