Due Thursday, June 2nd, 11:59pm

Goals for this homework

  • Continue practicing implementation with structs
  • Better understand hash tables and how the components work together

You may work in your duet partnership on this assignment if you wish.

Set Up

Create a hw9 directory for this assignment.

We have provided some starting files. Like other warmups, and unlike your project, the design is well specified so that you can just fill in the implementation.


Here are some files to start you out.
  • htbl_string.h - This declares a set of three functions that will be needed by the hash table in order to store and manage the strings it is holding. We are making a generic hash table, so we need to make functions that it can hold in function pointers.
  • htbl_string.c - The implementation of the string functions. I have provided the implementation of string_print so that everyone's output is consistent.
  • htbl_list.h - This defines the functions necessary for linked list collision in the hash table. This is very similar to your linked list code. In fact, I would suggest having your linked list code handy as you implement it. Think about which parts you need and which you don't. The reason you can't use it directly is that it stores the key for efficiency. It only calls the string_compare function if the keys don't match.
  • htbl.h - this defines the hash table functions.
  • htbl.c - this contains skeleton functions for the hash table functions.
  • test_htbl.c - the main function that reads input from the user, creates the hash table, and tests the hash table.
You will also need the following files (you might as well touch and add them to your respository now):
  • htbl_list.c - you may either start from a blank file or start from your llist.c file.
  • Makefile - makes executable named hash_table

Problem: hash table

Your exercise today is to complete this hash table implementation by providing definitions for the functions specified in the header files. The function good_hash merits some additional explanation, which appears below in this document. The rest of the functions are specified in the header files in some detail; please review those files carefully. When displaying the hash table (htbl_print), you must follow our format: the bucket indexes must appear as numbers at the left, one per line, with the contents of each bucket following it. When we collect your completed work, we should be able to use it as follows:
$ make htbl ... $ ./htbl < ./cnets2016 ... $ ./htbl -s 99 < ./cnets2016 ... $ ./htbl -b -s 99 < ./cnets2016 ...
Note that the code to enable the -s option is part of htbl-main.c. It uses a library function getopt, which is the standard command-line option utility for C programs. The application also supports a command-line option b (for "bad"); run with -b to use a bad hash function (instead of the good one whose details follow). Please note that sometimes the hash function works differently on different machines (i.e., computes a different hash code per string, given what unsigned long int means on that system), leading to different configurations of the hash table on identical input. Therefore, if your output differs from another's, it doesn't necessarily mean anything is wrong. You are welcome to share the output of this lab on piazza, since sharing the output alone is not giving anything away. If you do share, make sure you specify the platform (i.e. the operating system and version and CPU) on which you compiled and ran the code.

You need to write two functions to get from the strings stored in the hash table to the index in which you should store the item. First, you need to calculate a key from the string.

Follow this recipe for computing an unsigned long int key for a string:
1. Initialize an unsigned long int result to 17 (unsigned long res = 17;).
2. Loop over the characters in the string. For every character, set the result to be 37 times itself plus the ASCII numeric value of the character.

(This hashing algorithm recipe is from Effective Java by Joshua Bloch.)
For example, the hash code for the word FORK, given that the ASCII values of the characters in FORK are 70, 79, 82 and 75, is as follows:

res = 17 res = 37*17+70 = 699 res = 37*699+79 = 25942 res = 37*25942+82 = 959936 res = 37*959936+75 = 35517707 (As you can see, the numbers get big.)

This key gets stored with the item so that you don't need to go through this again for the same string. Then, you need a hash function that obtains the index from the key.

To identify the index of FORK in a hash table of size, say, 33, you compute the remainder of the key when divided by 33. In this case 35517707%33 is 5, so FORK would be inserted into and/or located in the bucket with index 5 (the sixth bucket in the table).


You are implementing linked list collision resolution. The linked list implementation belongs in htbl_list.*. I encourage you to copy your own linked list code liberally. Just note that the nodes have not only a void pointer to data and a next point, but also the key. Both store the key and make use of the stored key to increase the efficiency of your searches.

Submit

Don't forget to add and commit all of your files!