152: Week 9

Goals for this homework

Continue practicing implementation with structs
Better understand hash tables and how the components work together

Set Up

Create a hw9 directory for this assignment.

We have provided some starting files. Like other warmups, and unlike your project, the design is well specified so that you can just fill in the implementation.

As with any container class, there are two very separate jobs for you to do - get the structure you're storing in the hash table ready by providing the set of functions it needs and writing the hash table code.

Here are some files to start you out.

word.h - this is almost identical to the word.h that you had last time. The only difference is a function called get_key that takes in a wordcount struct (actually a void * that we know is a wordcount struct) and returns an integer. Below, we will describe how to calculate the integer. We do not provide word.c - start with your word.c from last week and add in the new function.
hash.h - This declares a set of functions that will be needed by the hash table in order to store and manage the strings it is holding. We will be making a hash table that uses linear probing to handle collisions.
hash.c - The implementation of the hash functions. We have provided the implementation of print_hash_table_entries so that everyone's output is consistent.
main.c - the main function that reads input from the user, creates the hash table, and tests the hash table.

You will also need the following files (you might as well touch and add them to your respository now):

Makefile - makes executable named hash_table

Problem: hash table

Your exercise today is to complete this hash table implementation by providing definitions for the functions specified in the header files. The function generate_hash and get_key merits some additional explanation, which appears below in this document. The rest of the functions are specified in the header files in some detail; please review those files carefully. For our exercise, we assume that we are storing (key,value) pairs such as ("uchicago", "60637"). The key being stored is an unsigned long int, but the value is a void pointer that can be used to store any type of value.

When displaying the hash table (print_hash_table_entries), you must follow our format: the bucket indexes must appear as numbers at the left, enclosed in square brackets, one per line, with the contents of each bucket following it. If the hash-table is chained, each Key,Value pair is shown in parenthesis with a "->" symbol. The key is what get_key returns, and the value is what print_word returns.

You need two functions from word.c - a compare function and a get_key function. You will use compare_words that you already wrote in a previous assignment. Then you will need to write get_key as specified below.

You need to write a function (get_key) to get from the wordcount structs stored in the hash table to a key. In this case, the key is not guaranteed to be unique because the keyspace is larger than the 2^64. However, our compare operation in wordcount does more than compare keys - it compares words - so we're okay.

We will encode the word the same way the number was encoded for the print_word exercise 2 in HW 2. Follow this recipe for computing an unsigned long int key for a string:

1. Initialize an unsigned long int result to 0 (unsigned long res = 0;).
2. Loop over the characters in the string. For every character, set the result to be 27 times itself plus the ASCII numeric value of the character.

For example, the hash code for the word FORK, given that the ASCII values of the characters in FORK are 70, 79, 82 and 75, is as follows:


res = 0
res = 27*0+6     = 6
res = 27*6+15    = 177
res = 27*177+18  = 4797
res = 27*4797+11 = 129530
(As you can see, the numbers get big.)

Now that your wordcount struct has produced an integer key, the hash table needs to take that key and map it to a location within the hash table. You need to convert this long integer into an integer index into the hash table array (generate_hash).

We will go with the simplest of approaches here. To identify the index of FORK in a hash table of size, say, 33, you compute the remainder of the key when divided by 33. In this case 129530%33 is 5, so FORK would be inserted into and/or located in the bucket with index 5 (the sixth bucket in the table).

Once you have implemented the hash function above, you will need to implement methods to handle collisions in your hash table. You are only required to impelment linear probing.

Linear Probing: If a hash table entry is already occupied, search through the hash table entries in linear fashion until you find an empty slot and place the entry there. The logic for insertion and search will have to be implemented in the insert_linear() and search_linear() functions respectively.

The insert functions return an integer value, denoting success or failure of the insert (1 on success, 0 on failure), while the search functions return the item that matches the key. An insert only failes when the table is full - you are not required to expand the size of the table dynamically.

Putting it all together

We want to be able test the correctness and evaluate the relative merits/demerits of each collision handling mechanism. For this, we should read key-value pairs from a file and insert/search them in sequence. The input files should have exactly one word per line, without duplicates. We have provided the functions insert_from_file() and search_from_file() to perform insert and search operations respectively. The function do_all_tests() in main.c can be used to test all three hash table implementations on the same file and print out the average operations required for each hash table type.

Submit

Don't forget to add and commit all of your files!

CS152 S17 Warmup 9

Goals for this homework

Set Up

Problem: hash table

Putting it all together

Submit

Useful Links

Resources