Lab #4: Strings and Structures¶
Due Date Wednesday, April 20th at 9pm.
The purpose of this lab is to give you practice working with:
strings,
structures and pointers to structures, and
arrays of pointers to structures.
You’ll also get practice with reusing functions.
You must complete this lab on a CS Linux machine. You can use the machines in CSIL or can use VSCode + ssh on your own machine to get to one of the Linux servers.
Getting Started¶
You will be using the same repository for all the labs in this course.
To pick up the files you need for this lab (aka, the distribution),
you should navigate to your cmsc15200/labs-GITHUB_USERID
directory
(where GITHUB_USERID
) is replaced with your GitHUb account name
and then run:
$ git pull upstream main
(We will use $
to indicate the Linux command-line prompt. It is
not part of the command that you will run.)
If this command succeeded, you will name have a subdirectory named
lab4
. You will do all of your work for this lab in this
directory. See the file README.md
for a description of the files
in the distribution.
Compiling and testing your code¶
As in earlier labs, we have provided a Makefile
that has targets for
compiling your code with tests that you write, for compiling your code
with our automated tests, and for cleaning up generated files. To use
this file to compile your code, run one of the following commands
from the Linux command-line:
$ make student_test_lab4
$ make test_lab4
$ make
The first command will compile your code in lab4.c
along with any
tests you have added to student_test_lab4.c
and will, if possible,
generate an executable (student_test_lab4
). To run the resulting
executable, you can run:
$ ./student_test_lab4
We have provided some simple tests as examples in
student_test_lab4.c
. You should add tests of your own as your
work on your code.
The second command will compile your code with our automated tests and
will, if possible, generate an executable (test_lab4
). To run all
of the tests using the resulting executable, run this command from the
command line:
$ ./test_lab4 -f --verbose
The -f
flag tells the test code to stop after the first failure.
The --verbose
flag generates information which specific tests
passed and failed.
The tests are broken into suites. To run a specific suite or test,
you can add a filter. For example, to run all of the tests in count_occurrences
suite, you would run:
$ ./test_lab4 -f --verbose --filter="count_occurrences/*"
To run a specific test, replace the *
with the name of the test.
For example, to run the first test in the count_occurrences
suite, you would run:
$ ./test_lab4 -f --verbose --filter="count_occurrences/test0"
Deliverables¶
You are required to reuse functions written for earlier tasks in later tasks where appropriate.
Task 1¶
Complete the function match_at_index
, which takes a test string, a
target string, and an index and returns true if the target is a
substring of the test string starting at the specified index. Here
are a couple of sample calls:
match_at_index("abcdef", "cde", 2); // yields true
match_at_index("abcdef", "cde", 3); // yields false
In the first example, the string "cde"
is a substring of
"abcdef"
that starts at index two. The second example yields
false. Even though "cde"
is a substring of the test string, it is
not a substring of the test string starting at index three.
Restrictions: You are not allowed to use any of the functions in the C string library (string.h
), including strlen
for this task. You will not get full credit, if your implementation does not meet this restriction.
Task 2¶
Complete the function count_occurrences
, which takes a test string
and a target string and counts the number of times that the test
string occurs in the target string. Here is a sample call to this
function:
count_occurrences("xabcyabczabcabcw", "abc"); // yields 4
This call yields four, because the substring "abc"
occurs four
times in the test string ("xabcyabczabcabcw"
).
Occurrences are allow to overlap. So, this call:
count_occurrences("aaaaaabaaa", "aaa"); // yields 5
yields four: the occurrence starting at index 0, the occurrence starting at index 1, etc.
Restrictions: You are allowed to use strlen
for this task, but not any of the other functions in the C string library (string.h
) for this task. You will not get full credit, if your implementation does not meet this restriction.
Task 3¶
Complete the function count_non_overlapping
, which takes a test
string and a target string and counts the number of non-overlapping
occurrences of the target string in the test string. Here is a sample
call to this function:
count_non_overlapping("xabcyabczabcabcw", "abc"); // yields 4
This call yields four, because the substring "abc"
occurs four
times in the test string ("xabcyabczabcabcw"
) and none of these
occurrences overlap.
This call, on the other hand:
count_non_overlapping("aaaaaabaaa", "aaa"); // yields 3
yields three: the occurrence of "aaa"
starting at index 0, the
occurrence starting at index 3, and the occurrence starting at
index 7.
Restrictions: You are allowed to use strlen
for this task, but not any of the other functions in the C string library (string.h
) for this task. You will not get full credit, if your implementation does not meet this restriction.
Task 4¶
This task and the next use the following type definition for representing pairs of integers as structures:
typedef struct pair {
int v0;
int v1;
} pair_t;
We will use the natural comparison order on pairs, that is, given two
pairs (v0, v1)
and (w0, w1)
, we will say that
the pair
(v0, v1)
is less than the pair(w0, w1)
ifv0 < w0
orv0 == w0
andv1 < w1
the pair
(v0, v1)
equals the pair(w0, w1)
ifv0 == w0
andv1 == w1
, andthe pair
(v0, v1)
is greater than the pair(w0, w1)
ifv0 > w0
orv0 == w0
andv1 > w1
Complete the function compare_pairs
, which takes two pointers to
pairs and returns:
a value less than zero if the first pointer refers to a pair that is less than the pair referenced by the second pointer,
zero if the first pointer refers to a pair that is equal to the pair referenced by the second pointer, and
a value greater than zero if the first pointer refers to a pair that is greater than the pair referenced by the second pointer.
Here is a sample use of this function:
pair_t p0 = {0, 10}; // create a pair struct for the pair (0, 10)
pair_t p1 = {0, -10}; // create a pair struct for the pair (0, -10)
// pass pointers to pair structures to compare_pairs
int actual = compare_pairs(&p0, &p1); // yields a value greater than 0
The result is a value that is greater than 0, because the pair (0, 10)
is
greater than the pair (0, -10)
.
Task 5¶
We will say that two pointers to structures match, if the structures they refer to are equal as described above.
Complete the function find_match
, which takes a list of pointers
to pairs and a pointer to a pair (the target) and returns the index of
the first element in the array that matches the target. The function
should return -1
if no match is found.
Here is a sample use of this function:
// Simple way to create an array of pointers to pairs
// for test cases.
pair_t p0 = {0, 0};
pair_t p1 = {1, 1};
pair_t p2 = {2, 2};
pair_t p3 = {0, 0};
pair_t *pairs[] = {&p0, &p1, &p2, &p3};
int actual = find_match(pairs, 4, &p3); //yields 0
The result of this call is 0
: the pair referenced in element one
((0, 0)
) has the same value as pair referenced by the target
((0, 0)
).
Notice that we are comparing the values of the pairs that referenced by the pointers rather than comparing the pointers themselves.
Some useful stuff¶
Most labs in this course will contains some material that is intended
to help you become more fluent in Linux. Today you’ll have a chance
to work with some useful utilities: cut
, grep
, etc.
There are no deliverables for this section. That does not mean you should skip it!
In this section, you will use redirection, piping, and some new commands to work with a data file from the command line. Take your time working through this exercise and remember that you can find information about a command (what it does, how to use it, etc.) using the man pages:
$ man <command>
First, download this data file using this command from within your lab4
directory:
wget http://classes.cs.uchicago.edu/archive/2022/spring/15200-1/cta.csv.zip
Unzip it (with the unzip command or otherwise) and make sure it exists
under the name cta.csv
.
A few words about the data file. The data source is the Chicago Data Portal (https://data.cityofchicago.org/). It contains daily ridership numbers per L station in the city’s L system. The “daytype” is either W for weekday, A for Saturday, and U for Sunday/holiday.
The data file is large. Try cat cta.csv
and you’ll see the console swamped with text. The command head
will print the first 10 lines of a file to the terminal. Try head cta.csv
at the prompt. Try man head
next. You will see you can specify the number of lines to display using the option -n
. To see the first five lines of the file, try head -n 5 cta.csv
. Then try head -n 15 cta.csv
. Is the command happy with head -n5
or head -n15
?
The dual of head
is tail
, which shows the last 10 lines of
file. Read about tail
with man tail
and you will discover a
similar option -n
. Try tail -n 5
and tail -n 15
on the
data file (as well as tail -n5
and tail -n15
with the data
file).
Read about the wc
instruction via man wc
. Try wc
on the CTA datafile and read the results.
Let’s try to discover particular information within the file. For starters, let’s see a list of the unique stations in the data. The following piped command will generate distinct station names, in alphabetical order:
cut -d, -f2 cta.csv | sort | uniq
Use man
to read about cut
, sort
, and uniq
. What
happens if you leave out the sort
in the pipes above? What if you
exchange the order of sort
and uniq
?
There is a small problem with the list of stations generated by the command above; you will notice it includes stationname as a station. One way to address this problem is to edit the data file itself so as to remove the offending line. But why bother to edit the file when another pipe will do? We can strip out stationname with grep -v as follows:
cut -d, -f2 cta.csv | sort | uniq | grep -v stationname
Read about the purpose of this flag with man grep
.
Having taken the trouble to collate the list of unique stations in the data file, we might as well save the results in case we want them some other time. One way to do this would be to use the mouse to copy and paste the terminal text and save a text file. Or, better yet: redirection! Try this:
cut -d, -f2 cta.csv | sort | uniq | grep -v stationname > stations.txt
Check your results with cat stations.txt
.
Now, let’s assume you’re interested in the extreme ridership days – the highest and the lowest – in the year 2010 for the Western-Forest Park station. The command grep "Western-Forest Park" cta.csv
will pull all the station data, but won’t restrict it to 2010. For that, we can pipe the data through another grep, as follows:
grep "Western-Forest Park" cta.csv | grep 2010
Look closely. Did the command yield the right results? If not, modify the command accordingly.
Once you have the 2010 data for Western-Forest Park, discover the lowest daily ridership as follows. Pipe the 2010 data to a use of cut that pulls out the ridership column; sort the data in numeric order (not alphabetic) by calling sort
with the appropriate option; pipe the result to head -n1
. Then, find the highest ridership similarly with tail
.
Grading¶
For this assignment, the weights will be:
Completeness: 80%
Code Quality: 20%
The completeness score will be determined by the automated tests. The code quality score will largely be determined by whether your implementation follows the restrictions described for Tasks 1 through 3 and reuses your functions effectively.
Cleaning up¶
Before your submit your work, we strongly encourage you to get your
directory into a clean state. Run make clean
to remove any
executables that are laying around and then run git status .
to
make sure you did not forget to add/commit any of the required files.
Also, remove directives, such as:
// YOUR CODE HERE
// Replace 0.0 with an appropriate return value
from your code. Make sure to recompile and rerun the tests after you clean up. It is easy to introduce a syntax error at this stage.
Submission¶
Before you upload your submission, make sure you have added the
required information to the header comment in lab4.c
:
your name,
the names of anyone beyond the course staff you discussed the assignment with, and
links to any resources that you used other than course materials and the course textbook.
Please remove the explanation for what is required. Note: write
None
in the relevant field, to indicate that you did not use any
sources and/or consult anyone beyond the course staff.
To submit your work, you will need to add, commit, and push lab4.c
to GitHub and then upload it to Gradescope under the lab4
assignment. Make sure to choose the right assignment on Gradescope!
You are welcome to upload your code to Gradescope multiple times before the deadline, but it is wildly inefficient to use Gradescope as a compute server. You should test your code on the CS Linux machines and only upload it when you think you are finished or when you are getting close to the deadline and want to do a “safety” submission.
You are responsible for making sure that the code you upload compiles and runs. You will get zero credit for code that does not compile.
Finally, a reminder that we will not accept late work.