Dictionaries¶

Introduction¶

The objective of this lab is to give you practice using dictionaries, a useful data type built into Python. A dictionary (dict for short) is a generalization of arrays/lists that associates keys with values. In computer science, this data type is also referred to as an associative array or a map.

By the end of the lab, you should be able to:

Perform basic operations on dictionaries
Apply dictionaries in several usage scenarios

Getting started¶

Open up a terminal and navigate (cd) to your cs121-aut-16-username directory, where username is your CNetID. Run git pull upstream master to collect the lab materials and git pull to sync with your personal repository.

Once you have collected the lab materials, navigate to the lab4 directory and fire up ipython3.

You will do your work in this lab in the file named lab4/cfpb.py.

Data¶

We will be using data from the Consumer Financial Protection Bureau’s Consumer Complaint Database. Each complaint has information such as:

the company,

the date the complaint was received,

a unique ID,

the issue,

the product,

the consumer’s complaint narrative,

the company’s public response,

the consumer’s home state,

the consumer’s zipcode,

We have included code in the file cfpb.py that defines a variable cfpb_16 that contains a 1000 complaints received in 2016.

Dictionaries as a simple data representation¶

Dictionaries provide a mechanism for mapping keys (often, but not always, strings) to values. They are often used to represent multi-part data, like the CFPB complaint data discussed above.

We could store this information in a list

complaint_as_list =
    ['Wells Fargo & Company',
     '07/29/2013',
     '468882',
     'Managing the loan or lease',
     'Consumer Loan',
     '',
     'Closed with explanation',
     'VA',
     '24540',
      ...]

But then we would have to keep track of the fact that the name of the company is at index 0, (complaint_as_list[0]), the date received is at index 1 (complaint_as_list[1]), etc. Dictionaries allow us to use more meaningful values to access the different parts of a complaint. In particular, we can use strings as keys. Here, for example, is the same complaint represented using a dictionary:

complaint_as_dict =
   {'Company': 'Wells Fargo & Company',
    'Company public response': '',
    'Company response to consumer': 'Closed with explanation',
    'Complaint ID': '468882',
    'Consumer complaint narrative': '',
    'Consumer consent provided?': 'N/A',
    'Consumer disputed?': 'No',
    'Date received': '07/29/2013',
    'Date sent to company': '07/30/2013',
    'Issue': 'Managing the loan or lease',
    'Product': 'Consumer Loan',
    'State': 'VA',
    'Sub-issue': '',
    'Sub-product': 'Vehicle loan',
    'Submitted via': 'Phone',
    'Tags': '',
    'Timely response?': 'Yes',
    'Wells Fargo & Company': 2,
    'ZIP code': '24540'}

Given such a dictionary, we can extract the name of the company using the string "Company" as the index or key: complaint_as_dict["Company"]. We can extract the home state of complainant using the expression complaint_as_dict["State"].

Notice that while the types of the keys are all the same (strings), the types of values associated with the keys are different. This arrangement is not required but is very common.

We will start with a few tasks that take a list of complaints, that use the representation as shown above, and compute a simple value.

Task 1

Write a function:

def count_complaints(complaints, company_name):

that takes a list of complaint dictionaries and the name of a company as a string and returns the number of complaints received for that company.

Task 2

Write a function:

def find_companies(complaints):

that takes a list of complaints and returns a list (or set) of the companies that received at least one complaint.

Note

Python has a built in set data structure that will be useful for this task. The expression set() creates an empty set. The add method can be used to add an element to the set. For example:

s = set()
s.add("a")
s.add("b")
s.add("a")
s.add("c")
print(s)

yields the set:

{'a', 'b', 'c'}

You can also pass a list to the set constructor set(["a", "b", "a", "c"]) and it will construct a set from the elements of the list. Note that sets do not preserve order, so {'a', 'b', 'c'} and {'b', 'c', 'a'} are both possible results from evaluating set(["a", "b", "a", "c"]).

Counting with dictionaries¶

The complaint representation we discussed in the last section is static in the sense that the contents of a complaint dictionary do not change. Dictionaries are also used in more dynamic ways. For example, let’s say we wanted to compute the number of complaints received per company.

Our goal is to compute a dictionary that maps a company name to the number of complaints received about that company. The dictionary would include an entry for every company that received at least one complaint.

We could start this task by using the result of Task 2 to initialize a dictionary that maps each company name to zero.

by_company = {}
for company in find_companies(complaints):
    by_company[company] = 0

And then loop over the complaints, extracting the company from the complaint, and updating the associated count appropriately.

for complaint in complaints:
    c = complaint["Company"]
    by_company[c] = by_company[c] + 1

This approach requires two passes over the data: one to identify the companies and one to compute the counts. It would be better to do the computation in one pass over the data.

We can use the in operator, which allows us to check whether a given key has a value associated with it in a dictionary, and initialize the value associated with the key, if necessary. Given this operation, we can re-write our counting code as follows:

by_company = {}

for complaint in complaints:
    c = complaint["Company"]
    if c in by_company:
        by_company[c] = by_company[c] + 1
    else:
        by_company[c] = 1

We could simplify this code a bit using not, as in:

by_company = {}

for complaint in complaints:
    c = complaint["Company"]
    if c not in by_company:
        by_company[c] = 0
    by_company[c] = by_company[c] + 1

Finally, the get method for dictionaries allows us to specify a value to use as a default if a key does not appear in a dictionary:

by_company = {}

for complaint in complaints:
    c = complaint["Company"]
    by_company[c] = by_company.get(c, 0) + 1

Task 3

Write a function:

def count_by_state(complaints):

that takes a list of complaint dictionaries and returns a dictionary that maps a state to the number of complaints reported from that state.

Nested Dictionaries¶

Dictionaries can be nested, that is, the value associated with a key can itself be a dictionary. For example, we might have a dictionary that maps each company name to another dictionary that maps a state to the number of complaints about that company in that state. Here’s an abridged version of this dictionary computed using the 2016 complaint data:

by_company_by_state =
    {'Mid-American Financial Group, Inc': {'OH': 1},
     'Fortren Funding LLC': {'NJ': 2},
     'Tucker Financing Inc.': {'FL': 1},
     'Network Funding, L.P.': {'DC': 1},
     'Absolute Mortgage Company Inc.': {'NJ': 1, 'WA': 1},
     'Payday America Inc.': {'IL': 1, 'MN': 2}
    }

The expression by_company_by_state['Absolute Mortgage Company Inc.']['NJ'] would yield 1, the number of complaints made from New Jersey about this company.

Task 4

Write a function:

def count_by_company_by_state(complaints):

that takes a list of complaints and computes the by_company_by_state dictionary described above.

Dictionaries can also map keys to lists or even to lists of dictionaries.

Task 5

Write a function:

def complaints_by_company(complaints):

that takes a list of complaint dictionaries and returns a dictionary that maps the name of a company to a list of the complaint dictionaries that concern that company.

When Finished¶

When finished with the lab please check in your work (assuming you are inside the lab directory):

git add cfpb.py
git commit -m "Finished with lab4"
git push

No, we’re not grading this, we just want to look for common errors.