Lab #4: Strings and Structures¶

Due Date Wednesday, April 20th at 9pm.

The purpose of this lab is to give you practice working with:

strings,
structures and pointers to structures, and
arrays of pointers to structures.

You’ll also get practice with reusing functions.

You must complete this lab on a CS Linux machine. You can use the machines in CSIL or can use VSCode + ssh on your own machine to get to one of the Linux servers.

Getting Started¶

You will be using the same repository for all the labs in this course. To pick up the files you need for this lab (aka, the distribution), you should navigate to your cmsc15200/labs-GITHUB_USERID directory (where GITHUB_USERID) is replaced with your GitHUb account name and then run:

$ git pull upstream main

(We will use $ to indicate the Linux command-line prompt. It is not part of the command that you will run.)

If this command succeeded, you will name have a subdirectory named lab4. You will do all of your work for this lab in this directory. See the file README.md for a description of the files in the distribution.

Compiling and testing your code¶

As in earlier labs, we have provided a Makefile that has targets for compiling your code with tests that you write, for compiling your code with our automated tests, and for cleaning up generated files. To use this file to compile your code, run one of the following commands from the Linux command-line:

$ make student_test_lab4

$ make test_lab4

$ make

The first command will compile your code in lab4.c along with any tests you have added to student_test_lab4.c and will, if possible, generate an executable (student_test_lab4). To run the resulting executable, you can run:

$ ./student_test_lab4

We have provided some simple tests as examples in student_test_lab4.c. You should add tests of your own as your work on your code.

The second command will compile your code with our automated tests and will, if possible, generate an executable (test_lab4). To run all of the tests using the resulting executable, run this command from the command line:

$ ./test_lab4 -f --verbose

The -f flag tells the test code to stop after the first failure. The --verbose flag generates information which specific tests passed and failed.

The tests are broken into suites. To run a specific suite or test, you can add a filter. For example, to run all of the tests in count_occurrences suite, you would run:

$ ./test_lab4 -f --verbose --filter="count_occurrences/*"

To run a specific test, replace the * with the name of the test. For example, to run the first test in the count_occurrences suite, you would run:

$ ./test_lab4 -f --verbose --filter="count_occurrences/test0"

Deliverables¶

You are required to reuse functions written for earlier tasks in later tasks where appropriate.

Task 1¶

Complete the function match_at_index, which takes a test string, a target string, and an index and returns true if the target is a substring of the test string starting at the specified index. Here are a couple of sample calls:

match_at_index("abcdef", "cde", 2);  // yields true
match_at_index("abcdef", "cde", 3);  // yields false

In the first example, the string "cde" is a substring of "abcdef" that starts at index two. The second example yields false. Even though "cde" is a substring of the test string, it is not a substring of the test string starting at index three.

Restrictions: You are not allowed to use any of the functions in the C string library (string.h), including strlen for this task. You will not get full credit, if your implementation does not meet this restriction.

Task 2¶

Complete the function count_occurrences, which takes a test string and a target string and counts the number of times that the test string occurs in the target string. Here is a sample call to this function:

count_occurrences("xabcyabczabcabcw", "abc");  // yields 4

This call yields four, because the substring "abc" occurs four times in the test string ("xabcyabczabcabcw").

Occurrences are allow to overlap. So, this call:

count_occurrences("aaaaaabaaa", "aaa");   // yields 5

yields four: the occurrence starting at index 0, the occurrence starting at index 1, etc.

Restrictions: You are allowed to use strlen for this task, but not any of the other functions in the C string library (string.h) for this task. You will not get full credit, if your implementation does not meet this restriction.

Task 3¶

Complete the function count_non_overlapping, which takes a test string and a target string and counts the number of non-overlapping occurrences of the target string in the test string. Here is a sample call to this function:

count_non_overlapping("xabcyabczabcabcw", "abc");  // yields 4

This call yields four, because the substring "abc" occurs four times in the test string ("xabcyabczabcabcw") and none of these occurrences overlap.

This call, on the other hand:

count_non_overlapping("aaaaaabaaa", "aaa");   // yields 3

yields three: the occurrence of "aaa" starting at index 0, the occurrence starting at index 3, and the occurrence starting at index 7.

Task 4¶

This task and the next use the following type definition for representing pairs of integers as structures:

typedef struct pair {
    int v0;
    int v1;
} pair_t;

We will use the natural comparison order on pairs, that is, given two pairs (v0, v1) and (w0, w1), we will say that

the pair (v0, v1) is less than the pair (w0, w1) if v0 < w0 or v0 == w0 and v1 < w1
the pair (v0, v1) equals the pair (w0, w1) if v0 == w0 and v1 == w1, and
the pair (v0, v1) is greater than the pair (w0, w1) if v0 > w0 or v0 == w0 and v1 > w1

Complete the function compare_pairs, which takes two pointers to pairs and returns:

a value less than zero if the first pointer refers to a pair that is less than the pair referenced by the second pointer,
zero if the first pointer refers to a pair that is equal to the pair referenced by the second pointer, and
a value greater than zero if the first pointer refers to a pair that is greater than the pair referenced by the second pointer.

Here is a sample use of this function:

pair_t p0 = {0, 10};    // create a pair struct for the pair (0, 10)
pair_t p1 = {0, -10};   // create a pair struct for the pair (0, -10)

// pass pointers to pair structures to compare_pairs
int actual = compare_pairs(&p0, &p1);    // yields a value greater than 0

The result is a value that is greater than 0, because the pair (0, 10) is greater than the pair (0, -10).

Task 5¶

We will say that two pointers to structures match, if the structures they refer to are equal as described above.

Complete the function find_match, which takes a list of pointers to pairs and a pointer to a pair (the target) and returns the index of the first element in the array that matches the target. The function should return -1 if no match is found.

Here is a sample use of this function:

// Simple way to create an array of pointers to pairs
// for test cases.
pair_t p0 = {0, 0};
pair_t p1 = {1, 1};
pair_t p2 = {2, 2};
pair_t p3 = {0, 0};
pair_t *pairs[] = {&p0, &p1, &p2, &p3};

int actual = find_match(pairs, 4, &p3);  //yields 0

The result of this call is 0: the pair referenced in element one ((0, 0)) has the same value as pair referenced by the target ((0, 0)).

Notice that we are comparing the values of the pairs that referenced by the pointers rather than comparing the pointers themselves.

Some useful stuff¶

Most labs in this course will contains some material that is intended to help you become more fluent in Linux. Today you’ll have a chance to work with some useful utilities: cut, grep, etc.

There are no deliverables for this section. That does not mean you should skip it!

In this section, you will use redirection, piping, and some new commands to work with a data file from the command line. Take your time working through this exercise and remember that you can find information about a command (what it does, how to use it, etc.) using the man pages:

$ man <command>

First, download this data file using this command from within your lab4 directory:

wget http://classes.cs.uchicago.edu/archive/2022/spring/15200-1/cta.csv.zip

Unzip it (with the unzip command or otherwise) and make sure it exists under the name cta.csv.

A few words about the data file. The data source is the Chicago Data Portal (https://data.cityofchicago.org/). It contains daily ridership numbers per L station in the city’s L system. The “daytype” is either W for weekday, A for Saturday, and U for Sunday/holiday.

The data file is large. Try cat cta.csv and you’ll see the console swamped with text. The command head will print the first 10 lines of a file to the terminal. Try head cta.csv at the prompt. Try man head next. You will see you can specify the number of lines to display using the option -n. To see the first five lines of the file, try head -n 5 cta.csv. Then try head -n 15 cta.csv. Is the command happy with head -n5 or head -n15?

The dual of head is tail, which shows the last 10 lines of file. Read about tail with man tail and you will discover a similar option -n. Try tail -n 5 and tail -n 15 on the data file (as well as tail -n5 and tail -n15 with the data file).

Read about the wc instruction via man wc. Try wc on the CTA datafile and read the results.

Let’s try to discover particular information within the file. For starters, let’s see a list of the unique stations in the data. The following piped command will generate distinct station names, in alphabetical order:

cut -d, -f2 cta.csv | sort | uniq

Use man to read about cut, sort, and uniq. What happens if you leave out the sort in the pipes above? What if you exchange the order of sort and uniq?

There is a small problem with the list of stations generated by the command above; you will notice it includes stationname as a station. One way to address this problem is to edit the data file itself so as to remove the offending line. But why bother to edit the file when another pipe will do? We can strip out stationname with grep -v as follows:

cut -d, -f2 cta.csv | sort | uniq | grep -v stationname

Read about the purpose of this flag with man grep.

Having taken the trouble to collate the list of unique stations in the data file, we might as well save the results in case we want them some other time. One way to do this would be to use the mouse to copy and paste the terminal text and save a text file. Or, better yet: redirection! Try this:

cut -d, -f2 cta.csv | sort | uniq | grep -v stationname > stations.txt

Check your results with cat stations.txt.

Now, let’s assume you’re interested in the extreme ridership days – the highest and the lowest – in the year 2010 for the Western-Forest Park station. The command grep "Western-Forest Park" cta.csv will pull all the station data, but won’t restrict it to 2010. For that, we can pipe the data through another grep, as follows:

grep "Western-Forest Park" cta.csv | grep 2010

Look closely. Did the command yield the right results? If not, modify the command accordingly.

Once you have the 2010 data for Western-Forest Park, discover the lowest daily ridership as follows. Pipe the 2010 data to a use of cut that pulls out the ridership column; sort the data in numeric order (not alphabetic) by calling sort with the appropriate option; pipe the result to head -n1. Then, find the highest ridership similarly with tail.

Grading¶

For this assignment, the weights will be:

Completeness: 80%
Code Quality: 20%

The completeness score will be determined by the automated tests. The code quality score will largely be determined by whether your implementation follows the restrictions described for Tasks 1 through 3 and reuses your functions effectively.

Cleaning up¶

Before your submit your work, we strongly encourage you to get your directory into a clean state. Run make clean to remove any executables that are laying around and then run git status . to make sure you did not forget to add/commit any of the required files.

Also, remove directives, such as:

// YOUR CODE HERE

// Replace 0.0 with an appropriate return value

from your code. Make sure to recompile and rerun the tests after you clean up. It is easy to introduce a syntax error at this stage.

Submission¶

Before you upload your submission, make sure you have added the required information to the header comment in lab4.c:

your name,
the names of anyone beyond the course staff you discussed the assignment with, and
links to any resources that you used other than course materials and the course textbook.

Please remove the explanation for what is required. Note: write None in the relevant field, to indicate that you did not use any sources and/or consult anyone beyond the course staff.

To submit your work, you will need to add, commit, and push lab4.c to GitHub and then upload it to Gradescope under the lab4 assignment. Make sure to choose the right assignment on Gradescope!

You are welcome to upload your code to Gradescope multiple times before the deadline, but it is wildly inefficient to use Gradescope as a compute server. You should test your code on the CS Linux machines and only upload it when you think you are finished or when you are getting close to the deadline and want to do a “safety” submission.

You are responsible for making sure that the code you upload compiles and runs. You will get zero credit for code that does not compile.

Finally, a reminder that we will not accept late work.