Skip to content

Homework 1: Strings and Substrings

Due Monday, June 26, 2023 at 11:59pm

In this assignment, you will write a total of 5 functions in C. Although it doesn't seem like much, some of these functions require quite a bit of thinking and understanding.

  • Read the entire assignment first before you start
  • Start early and do not do all of the assignment in one sitting; coding is fun but fighting for hours with broken code is not
  • Do not hesitate to seek help if you are stuck

Synopsis

readline.h: contains one function readline that reads a single line from a file. A file is separated into lines by the newline character ('\n'); this function reads the first line of arbitrary length into memory.

substring.h: contains four functions related to substrings. A substring is the characters between two pointers within a string. The functions that operate on substrings include 1) printing a substring, 2) calculating the length of a substring, 3) splitting a string into substrings by separators, and 4) joining substrings into a string by separators.

Written: You will answer some simple questions at the end.

Learning Objectives:

  • Allocating, accessing, and manipulating C strings of arbitrary length
  • Get comfortable around pointers
  • Practice heap allocation and manual memory management

Getting started

We will keep using the coursework repository, same repository where you wrote hw0. First of all, you should run git status to see if you have any uncommitted changes; if so, commit and push them before proceeding.

Run git pull upstream main. Doing this almost certainly triggers an automerge by git. When vim is launched that shows the merge commit message, press <esc> : x <enter> to save and quit the editor (look at the bottom-left corner of your screen if you don't know where you're typing).

Run ls to confirm that you have received a hw1 directory.

Pay attention to any error messages that you might encounter and please ask for help if you run into any problems.

What have we provided to you?

Complete:

  • Makefile: a file that defines how to build your C program. See below for more detail.
  • WRITTEN.txt: this is where you will write the written part of the homework.
  • cat.c: a finished C program that concatenates all files specified in the command line, similar to the terminal program cat. cat.c uses your implementation of readline, and you can use this program to test your readline implementation.
  • diffw.c: a finished C program that compares two files while ignoring the white spaces, similar to diff -w. This file uses readline and split function from your implementation of substring.h. You can use this program to test your split function.
  • readline-test.c: a finished unit test for readline.h using Criterion. See below for more detail.
  • readline.h: a header file that declares the readline function's signature
  • substring-test.c: a finished unit test for readline.h using Criterion. See below for more detail.
  • substring.h: a header file that declares all functions in the substring module.

Incomplete:

  • substring.c: this is where you will write your implementation of the substring module.
  • readline.c: this is where you will write your implementation of readline
  • tests/: this directory contains some simple tests. You need to create your own tests.

You should read through cat.c and diffw.c and understand fully what they do before writing your implementation.

Makefile

A Makefile is a powerful tool used primarily for managing the build process of software projects. It works with the make utility, which automatically builds executable programs and libraries from source code by reading files called Makefiles.

Makefile can be very complicated and it has a lot of potentials. For this assignment, you only need to know the following three commands:

  • make all, or simply make, builds cat and diffw executables.
  • make test builds readline-test and substring-test, linking the Criterion testing framework.
  • make clean cleans up all files produced by make all and make test.

As with hw0, all your C code needs to be built with -Wall -Wextra -pedantic -std=c11. This is also what is specified in the Makefile. In addition, -Werror is added to turn warnings into errors.

You should run make all and make test now to see if you are able to create all the executables without errors although running these executables will almost certainly not work.

Running unit tests

In this course, we use the Criterion framework to create unit tests for modules. In *-test.c, you will see definitions like

Test(test_suite_name, test_name)
{
      ...body...
}

This specifies a test case. The body itself is any arbitrary C code that asserts some property about the module being tested via cr_assert and cr_expect.

Running the produced executable without arguments will simply run all the test cases. However, we recommend you always run the tests with the following arguments for clearer results.

./readline-test -j1 -f
  • -j1 tells it that we want to run one test at a time. Normally, Criterion tries to run multiple tests concurrently.
  • -f tells it that we want to stop as soon as we fail a test instead of running the rest of the tests. This gives you an opportunity to figure out what went wrong before proceeding.

Specification

readline

int readline(FILE *file, char **line_p);

A file is separated into lines by the newline character '\n'. A line contains at least one character. New lines begin at the start of the file and after each newline character; the newline character is included in the line that it terminates.

readline reads the first unread line of the input stream from file such that repeated invocations of readline eventually consumes the entire content of file.

The characters comprising the line are made into a C string (i.e. a NUL-terminated array). The C string is allocated on the heap and the pointer to the data is assigned to *line_p. It is the caller's responsibility to free the allocated data. The length of the string, which is the number of characters in the array except the terminating '\0' but including '\n' if any, is returned to the caller.

When readline is called with no more data to read, it sets *line_p to NULL and returns 0.

When readline is called with NULL file or NULL line_p, it may abort. readline must not leak memory.

You will implement this function in readline.c. You may use any function from the standard library except getline.

substring

A substring is a structure consisting of a start pointer, pointing to the first character of the substring, and a end pointer, pointing to the first byte past the last character of the substring. In this module, start and end are presumed to be valid pointers.

void substring_print(const struct substring sub, FILE *file);
substring_print prints all characters of the substring to file. This function may abort if file is NULL.

int substring_length(const struct substring sub);
substring_length returns the length of the substring in the number of characters.

int split(char *str, const char *sep, struct substring **subs_p);
split separates str into an array of substrings by separators sep. The order of characters in sep has no meanings — a character is a separator if it appears in the string sep. The separators are not included in the substrings.

For example, splitting "/abc def/ghi" based on separators "/ " (or " /") yields three substrings "abc", "def", and "ghi".

The array of substrings are allocator on the heap, whose pointer is assign to *subs_p. It is the caller's responsibility to free the array. The length of the array is returned.

split may abort if any of the following is true:

  • str is NULL
  • sep is NULL or sep is an empty string
  • subs_p is NULL.
char *join(struct substring *subs, int length, const char *sep);

Given an array of substrings subs of length length, and a separator string sep, join returns a heap-allocated string that has all substrings separated by sep. The caller of this function is responsible for freeing the returned string.

join may abort if subs is NULL or sep is NULL, but must handle the case where sep is an empty string.

Crashing vs Aborting

Crashing: An unexpected and abrupt termination of your C program due to errors like segmentation faults or division by zero. It's an indication of a bug and should be avoided at all costs, as it may result in data loss or system instability.

Aborting: A deliberate stoppage of your program when an unrecoverable error or violation of assumptions is detected. It is preferable to crashing, as it's a controlled exit and allows for post-mortem analysis. This is usually done by using the assert(condition) function in <assert.h>assert will abort the program if the condition is not met.

In this course, unless otherwise specified, your program must not crash, and crashing in testing will incur a higher penalty than aborting or producing wrong results.

Testing

You should run the unit tests frequently throughout implementation. See Running unit tests for the commands to build and run.

Once your implementations pass all unit tests, you should build cat and diffw for integration tests and for memory leaks. The test/ directory includes three simple text files for some rudimentary testing for cat.

To effectively test diffw, you need to create pairs of files to diff between.

You need to add and submit three (3) more test cases for either cat or diffw.

Make sure to check for memory leaks and errors by running

valgrind --leak-check=full ./cat file.txt
or
valgrind --leak-check=full ./diffw file1.txt file2.txt

You should see valgrind reports:

==XXXXXX== All heap blocks were freed -- no leaks are possible
==XXXXXX==
==XXXXXX== For lists of detected and suppressed errors, rerun with: -s
==XXXXXX== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

You should not modify cat.c or diffw.c if you see memory errors or leaks. The two files are written according to the specification above. If you believe that there is an error, please let me know as soon as possible!

Written

You need to answer some questions in WRITTEN.txt.

Submission checklist

Everything below is inside hw1 directory of your coursework repository.

  • readline.c contains your implementation of the readline function.
  • substring.c contains your implementation of the substring module, which has four functions.
  • make all and make test produce no errors and successfully build cat, diffw, substring-test, and readline-test.
  • WRITTEN.md is finished.
  • Three test cases are added to tests/ directory.
  • all changes are committed and pushed to your github repository

Submit your program to Gradescope by selecting your coursework directory and the correct branch.

Grading

Percentage
Correctness 70%
Style 20%
Written 10%

Warning: If your program cannot be compiled using the commands above without error or warning, you will receive 0 points in correctness since there is no executables for us to run.