Dictionaries

Introduction

The objective of this lab is to give you practice using dictionaries, a useful data type built into Python. A dictionary (dict for short) is a generalization of arrays/lists that associates keys with values. In computer science, this data type is also referred to as an associative array or a map.

You will also learn simple statements to read a file in Python. We will cover file I/O extensively in future lectures.

By the end of the lab, you should be able to:

  • Perform basic operations on dictionaries
  • Apply dictionaries in several usage scenarios
  • Read a file using Python

Getting started

Once you have collected the lab materials, navigate to the lab4 directory and fire up ipython.

Dictionaries

As a recap, when using arrays/lists we give an index and receive a value, like this:

names = ['Alice', 'Bob', 'Charlie']
print(names[1])
Bob

We might think about this list in the following way, i.e., a list associates an index with a value:

0: 'Alice'
1: 'Bob'
2: 'Charlie'

A dict generalizes what we can use as an index (left of the :). Dictionaries can use many different types of values as an index–not just integers. Here is an example that associates items in a grocery store with their prices:

'Apple':   .89
'Banana':  .39
'Bread':  2.50

Each product 'Apple', 'Banana', 'Bread' is associated with a value: the price. 'Apple' works like an index in an array. When using dictionaries we refer to the index as the key. We can specify this dict in Python with the following syntax:

prices = {'Apple': .89, 'Banana': .39, 'Bread': 2.50}

which adheres to the following pattern:

new_dict = {key1: value1,  key2: value2, ... }

Note the use of curly braces and colons.

We get the value associated with a specific key with standard indexing syntax

print(prices['Banana'])
0.39

As we’ve described, dictionaries can use many different types for keys and any type for values. For this dict, we have:

Give -- string
Get  -- float

Question: What if we try to print the price of an item that is not in the dict? Try it yourself. If you are unsure, it is always good practice to test whether a key exists in a dict before accessing its value, like this:

if 'Chocolate' in prices:
    print(prices['Chocolate'])
else:
    print("No Chocolate!")

We add new key/value pairs to the dict with standard indexing syntax.

prices['Chocolate'] = 1.50

prices now contains:

'Apple':         .89
'Banana':        .39
'Bread':        2.50
'Chocolate':    1.50

Looping over Dictionaries

print(list(prices.keys()))
['Chocolate', 'Bread', 'Apple', 'Banana']
print(list(prices.values()))
[1.5, 2.5, 0.89, 0.39]
print(list(prices.items()))
[('Chocolate', 1.5), ('Bread', 2.5), ('Apple', 0.89), ('Banana', 0.39)]

We often want to perform an action on every element in a list. How do we do this task with dictionaries? The three methods .keys(), .values(), and .items() return “list-like structures” (a dictionary view object, which we will cover in future lectures in more details) that contain the keys, values, and key/value tuples respectively. In the example above, we cast them to lists explicitly.

We can also loop over these lists as we have looped over lists in the past. The following loop prints out a nicely formatted string displaying the cost for each product

for product in prices.keys():
    price = '$' + str(prices[product])
    print(product + ' costs ' + price)

Chocolate costs $1.5
Bread costs $2.5
Apple costs $0.89
Banana costs $0.39

Looping over the keys of a dict is so common, that we can omit .keys() as in:

for product in prices:
    ...

Reading Files

We often store data in files. How do we open a file and read its contents into a Python variable? The following command opens a file for reading.

f = open(filename, 'r')

f is an object of type file and filename is a string, such as 'my_file.txt'. We can read this file using the readlines() method, which returns a list of strings and each string is one line of the file.

lines = f.readlines()
print(lines)

lines is a list of strings. We can now use for loops and string methods to process this data. Once we are done we need to close the file as follows:

f.close()

Exercises

The following exercises are designed to introduce some of the ways in which dictionaries can be used. You will work with data contained in files.

Grocery Store

In this exercise we will write code to handle purchases in a grocery store. You will create a dict that encodes the products that the store sells and their corresponding prices. Assume that each time a customer buys a set of items, the cash register generates a new file. This file describes the type and quantity of each item (e.g. two apples and one loaf of bread). We will read this file, compute the total cost of the purchase ($4.38) and write the cost out to a second file. We presume that the second file will then be sent to the cashier. We will do this task in stages.

Do the work described in a new file called grocery.py.

Loading in the prices

In the example discussed above, we associated a price with each item in a grocery store.

'Apple':       .89
'Banana':      .39
'Bread':      2.50
'Chocolate':  1.50

The file prices.txt contains a more thorough list. Perform the following steps to create a dict out of this file.

  1. Look at this file in a text editor and make sure you understand the format in which the data are stored.

  2. Open this file for reading in Python.

  3. Read the lines into a list of strings. Print them out to see what they look like.

  4. Create an empty dict named prices (syntax for a new dict is dict_name = {}).

    We need to convert this list of lines into a dict mapping the first element of each line to the float of the second element of each line. For example the first line:

    'Apple .89\n'
    

    should cause us to add the following key/value pair to our dict. Note that .89 is now a float and that Python will print a leading zero on floats less than one.

    prices['Apple'] = .89
    
  5. Figure out how to do this transformation for a single line using s.split() and float(s), where s is a string. Try calling these functions in your IPython session on various strings to understand their behavior. Your code should look something like this.

    line = lines[0]
    # your code here. Make product and cost variables.
    # They should be like 'Apple' and .89 respectively
    prices[product] = cost
    
  6. Wrap this code with a for loop so that we do this subtask for each line in the file.

  7. Verify that you have a complete price list by printing out the cost of an orange.

  8. Close the file once you’re done.

Put this code in a function read_into_dict(filename), which takes the name of the file to be read and returns the prices dict that we have just constructed.

Reading in a single receipt

The file receipt.txt contains the items and quantities from a single sale. This file was produced by the cash register.

  1. Read this file into a dict as you did for prices. Note that you should still be able to use read_into_dict for this task, since prices.txt and receipt.txt use the same format.
  2. We’ll do this a few times (there are a few receipts).

We want to compute the cost of each sale. Write a function calc_cost that takes in two dictionaries:

  1. prices - a dict mapping each product to the single unit price
  2. receipt - a dict mapping each product in a sale to the quantity

and returns a float with the total sale cost. Write this function and test it on the three receipts in your lab4 directory.

Printing total cost

Having processed a receipt, and calculated the total cost, your program should print the total cost. For example:

receipt3.txt

    Chocolate 4
    Coffee 1

Output on screen as a result of a print statement:

    10.0

Write a function process_receipt that takes in a prices dict and the name of a receipt file and does the following:

  1. Computes the total price of the sale (use your previous work).
  2. Print the price to examine the value

Counting

In the example above we used dictionaries to represent unchanging variables. The price and quantities always stayed the same. Sometimes, we also need to change values in a dict after they are initially defined. In the following example we want to count the number of times each letter grade was given for a quiz.

Here are the results of the quiz:

grades = ['B', 'C', 'A', 'A', 'D', 'B', 'B', 'A', 'C']

And the associated counts:

counts = {'A': 3, 'B': 3, 'C': 2, 'D': 1}

Because the grades list is so short we can make the counts variable by hand. For a class with more students, this task would be tedious. Your first goal for this task is to write a function that takes in a list like grades and returns a dict like counts.

Do your work in a file called grades.py. First, write function called count_grades that takes a list of grades as an argument and returns a dict of grade counts.

This task is the classic histogram problem. The problem can be solved with the following steps:

  1. Create an empty counts dict (the syntax is just dict_name = {}).
  2. For each item in grades do one of the following:
    1. If that grade is not in the counts dict then add that grade to the dict with a value of 1.
    2. If that grade is already in counts then add 1 to that grade’s value.

You may check if a key is in a dict using the key in dict syntax. For example:

'Apples' in {'Apples': 1, 'Bananas': 4}
True
'Chocolate' in {'Apples': 1, 'Bananas': 4}
False

Next write a function process_grades that takes the name of a file, reads in this file, and converts the information it contains into a list like the grades variable above. Use count_grades to convert this list into a dict that counts the number of times each grade occurs in the file. Finally, process_grades should print out a nicely formatted text that follows the format below (your numbers, however, will be different). Use the file grades.txt, which contains letter grades from a recent quiz, to test your functions.

A -- 3
B -- 3
C -- 2
D -- 1

Note that the grades do not have to appear in alphabetical order.

Other Data Types in Dictionaries

You know very well by now that dictionaries keep key/value pairs, but the value does not have to be simple primitives such as an integer, a float, or a string. It can be more complex data types such as a tuple, a list, or another dict. Similarly, we can use various data types for the key as well.

Your task here is to write a function list_grades that:

  1. Reads the file grades_names.txt, which contains a list of student names together with their grades.
  2. Generates a dict such that keys are letter grades and values are lists of student names associated with each grade. As in the Grocery Store exercise, you will need to parse each line in the file. You will also need to test whether a grade already exists in the dict. If the grade does not exist, add it as a new key with its value being a single-element list. Otherwise, simply append the name of the student to the corresponding list. Note that, if the same name appears multiple times for a given letter grade, your should append the name multiple times.
  3. Print out a nicely formated dict like below (again, your exact output will be different).
A -- ['Alice', 'Adam', 'Alex']
B -- ['Bob', 'Beth', 'Bill']
C -- ['Charlie', 'Cathy']
D -- ['David']

Hint: you should be able to follow the structure of grades.py and reuse a lot of your code.

A useful library function

Code like:

if key not in d:
    d[key] = 0
d[key] = d[key] + 1

occurs frequently. Python dictionaries have a built-in method, called get, that simplifies this type of code. The expression d.get(key, 0) returns the value of the expression d[key], if the key occurs in the dict and zero, if it does not. The above example can be rewritten using get as follows:

d[key] = d.get(key, 0) + 1

In general, code of the form:

if key not in d:
   d[key] = some_default_value
d[key] = some_transformation(d[key])

can be rewritten as:

d[key] = some_transformation(d.get(key, some_default_value))

When finished

When finished with the lab please check in your work (assuming you are inside the lab directory):

git add grocery.py
git add grades.py
git commit -m "Finished with lab4"
git push

No, we’re not grading this, we just want to look for common errors.