Course Search Engine: Backend¶
Due: Saturday, February 12th at 4:30pm CST
You may work alone or in a pair for this assignment. If you plan to work in a pair, please read the getting started instructions before you start your work!
In this assignment, you will be building the backend for a course search tool. You combine information gathered from the University’s course catalog and the time schedules web page to respond to searches such as: “Find all courses that have Plato in their description, meet on Tuesdays between 10:30am and 3pm, and are within a 10 minute walk of Ryerson.”
Getting started¶
Using the invitation URL provided on Ed Discussion, create a repository for your PA #3 files.
If you are going to work individually, follow the invitation URL provided on Ed Discussion and, when prompted for a team name, simply enter your CNetID.
If you are going to work in a pair, the process involves a couple of extra steps. The process is described in our Coursework Basics page. Please head over to that page, and return here once you’ve created your team repository.
Next, you will need to initialize your repository. If you are working in a pair only one of you should complete these steps. If you repeat these steps more than once, you may break your repository.
First, create a TEAM
variable in the terminal that contains either your CNetID (if you’re working individually) or your team name (if you’re working in a pair):
with your GitHub username):
TEAM=replace_me_with_your_cnetid_or_team_name
(remember you can double-check whether the variable is properly set by running echo $TEAM
)
Finally, run these commands (if you don’t recall what some of these commands do, they are explained at the end of the Git Tutorial)::
cd ~/capp30122
git clone git@github.com:uchicago-CAPP30122-win-2022/pa3-$TEAM.git
cd pa3-$TEAM
You will find the files you need for the programming assignment directly in the root of your repository.
Before describing the specifics of your task, we will briefly explain how to work with URLs and grab pages from the web.
Please note that this means you will not be able to start working on the assignment until you request and setup your pair repository.
Once you follow these instructions, your repository will contain a
directory named pa3
. That directory will contain:
README.txt:
description of the contents of the directory.ui:
a sub-directory containing the Django user interface code. You will modify a file namedcourses.py
in this directory. Do not modify any of the other files.
Data¶
For this assignment, you will use a sqlite3
database, named
ui/course_information.sqlite3
, that we constructed using the index
gathered by our catalog scraper from PA #2 and data scraped from the
timeschedules web page.
You can use the sqlite3
command-line interface to explore the
database and run sample queries. Recall that the command .tables
will list the names of the tables stored in the database, the
.schema
command will list the schema (definition) for particular
table, and the .quit
command will exit sqlite3
. For example,
you can run the following from within your pa3/ui
directory:
$ sqlite3 course_information.sqlite3
SQLite version 3.19.3 2017-06-27 16:48:08
Enter ".help" for usage hints.
sqlite> .tables
catalog_index gps sections
courses meeting_patterns
sqlite> .schema courses
CREATE TABLE courses
(
course_id integer, -- unique ID assigned to each course
dept varchar(4), -- 4 character department code
course_num varchar(5), -- 5 digit course number, represented as a 5 character string
title varchar(100) -- course title, represented as a string with up to 100 characters
);
sqlite> .quit
Note that the string sqlite>
is the sqlite command-line prompt.
Warm up exercise #1¶
Before you start writing code for this assignment, we strongly
recommend getting comfortable with querying the database
using the sqlite3
command-line interface.
Here are some queries that it would be helpful to understand:
Find the titles of all courses with department code “CMSC” in the course table.
Find the department names, course numbers, and section numbers for courses being offered on MWF at 10:30am (represented as 1030)
Find the department names and course numbers for courses being offered in Ryerson on MWF between 10:30am and 3pm (represented as 1500).
Find the department names, course numbers, and course titles for courses being offered on MWF at 9:30am (represented as 930) that have the words “programming” and “abstraction” in their title/course description.
Place these results inside warmup1.sql
. These will not be
graded. However, they provide evidence that you attempted to complete
the warmup exercises.
Using Python and sqlite3¶
The library sqlite3
allows a Python program to connect to
a sqlite3 database and make queries. You will use the following
functions from this library:
sqlite3.connect(db_filename):
takes the name of a database as a
string and opens or creates a sqlite3 database of that name and
returns a connection object.
conn.cursor():
returns a cursor object that can be used to execute
queries on the connected database (conn
) and get information about
the schema for a query result. conn
is a connection object.
c.execute(q, args):
takes q
, a string containing a SQL query
in which parameters are specified with question marks (?), and a list
of values for the parameters (one per ? in the query string) and
returns an object that can be used to access the resulting table.
c
is a cursor object.
r.fetchall():
returns a list of lists with one entry per tuple in
the result. r
is the result of a call to execute
.
conn.close():
closes the connection to the database. conn
is a
connection object.
Warm up Exercise #2¶
Write parameterized versions of the queries from warm up exercise #1
from within ipython3
using the sqlite3
library.
Place these results inside warmup2.py
. These will not be
graded. However, they provide evidence that you attempted to complete
the warmup exercises.
Django Web Interface¶
We have provided a Django web interface for your search tool. To run it, you’ll need to install the appropriate version of Django using the Linux commands:
$ cd; pip install -Iv django==2.0.2 --user
To start the interface go to the directory pa3/ui
in your
repository and run the command:
$ python3 manage.py runserver
It will be helpful to run this command in its own terminal window.
Once the interface is started, you can use it by pointing a browser on
the machine your are using (your VM or a machine in CSIL) to
http://127.0.0.1:8000/
.
When the user hits the submit button, the interface will call the
function find_courses
in the file pa3/ui/courses.py
with a
dictionary containing values corresponding to the completed fields.
Here is a list of the possible keys for the dictionary and the types of their associated values:
dept
department code (string)
day
meeting pattern (list of strings)
time_start
earliest starting time (integer in range 0 to 2359)
time_end
latest ending time (integer in range 0 to 2359)
building_code
building code (string)
walking_time
maximum walking time from specified building (integer). You will always get building and walking_time together.
enrollment
lower and upper bound (inclusive) on the size of the class (pair of integers)
terms
catalog search terms (list of strings)
A few notes:
If the user’s input includes a value for
day
, a class will match it if it meets on any one of the specified meeting patterns. For example, the input{"day":["MWF", "TR"]}
matches classes that meet on a MWF schedule or a TR schedule. The SQL operatorIN
will be useful for this field.time_start
/time_end
should be considered as inclusive bounds. Furthermore, you should not assume that you receive both time fields. For example, if the user interface passes along a dictionary with a field fortime_start
but does not include a field fortime_end
, your program should take the specified starting time as a lower bound and return all courses that meet at that time or after it.The bounds specified in the
enrollment
pair should be considered inclusive. The SQL operatorBEWTEEN x AND y
will be useful for this field.A specific course matches a string of terms only if all of the terms occur in the catalog index with the course_id for that course. For example, the terms [“economics”, “sciences”, “mathematics”] match “CMSC 12200” (course id: 611), because the tuples
(611,"sciences")
,(611, "economics")
, AND(611, mathematics)
occur in thecatalog_index
table.
If you check the “show args_to_ui” box, the interface will show the dictionary that will be passed to find_courses
.
Your task¶
Your task is to implement the function find_courses(args_to_ui),
which takes a dictionary as described above and performs a search to
find courses/sections that match the user’s input.
The result of this search process will be a relation. Your function must return a tuple with the attribute (column) names for the relation and a list of the tuples in the relation. For example, if the input from the user contains:
{
"terms": ["science", "mathematics", "economics"],
"day": ["MWF"],
"time_start": 900,
"time_end": 1500
}
the output of the function would be:
(['dept', 'course_num', 'title', 'section_num', 'day', 'time_start', 'time_end', 'enrollment'],
[('CMSC', '12200', 'Computer Science with Applications II.', '01', 'MWF', 930, 1020, 76),
('CMSC', '12200', 'Computer Science with Applications II.', '02', 'MWF', 1330, 1420, 64)])
When the interface first starts up, it will call find_courses
with
the empty dictionary. If the input dictionary is empty, find_courses
should
return ([], [])
.
We have provided you with a function to ease the task of retrieving
the column names for the table generated by your query. After you have
run your query, pass your cursor object into the function
get_header(cursor)
to retrieve the column names for your generated
table (in list format).
Querying the Database
Your function will need to gather information from
course_information.sqlite3
to process the user’s input. There are
many possible combinations of inputs and you should not hard-code them
all into your implementation. Instead, your code should construct a
query from the parameters specified by the user.
For example, given the dictionary:
{
"time_start": 1030,
"time_end": 1500,
"day": ["MWF"]
}
Your implementation should construct a query similar to:
SELECT courses.dept,
courses.course_num,
courses.title,
sections.section_num,
meeting_patterns.day,
meeting_patterns.time_start,
meeting_patterns.time_end,
sections.enrollment
FROM courses JOIN sections JOIN meeting_patterns
ON courses.course_id = sections.course_id AND
sections.meeting_pattern_id = meeting_patterns.meeting_pattern_id
WHERE meeting_patterns.time_start >= ? AND
meeting_patterns.time_end <= ? AND
meeting_patterns.day in (?)
along with an argument tuple containing (1030, 1500, 'MWF')
.
(Note: the formatting of the query is merely for clarity.)
The join method for strings will be very useful for generating a query string. For example:
l = ["a", "b", "c"]
", ".join(l) #yields the string "a, b, c"
What Attributes Should Be Included in the Output?
The attributes included in the output will depend on the parameters specified by the user. The table below shows the attributes (columns) that should be included for each possible type of input (rows). The output should include the union of the attributes for each of the specified inputs.
dept |
course_num |
title |
section_num |
day |
time_start |
time_end |
enrollment |
building_code |
walking_time |
|
terms |
X |
X |
X |
|||||||
dept |
X |
X |
X |
|||||||
day |
X |
X |
X |
X |
X |
X |
X |
X |
||
enrollment |
X |
X |
X |
X |
X |
X |
X |
X |
||
time_start |
X |
X |
X |
X |
X |
X |
X |
X |
||
time_end |
X |
X |
X |
X |
X |
X |
X |
X |
||
building_code & walking_time |
X |
X |
X |
X |
X |
X |
X |
X |
X |
X |
The attributes should be ordered as shown in the table above.
For example, examples given the dictionary {"dept":"URDU"}
as
input, your find_courses
function should return:
(['dept', 'course_num', 'title'],
[('URDU', '10300', 'First-Year Urdu III.'),
('URDU', '20100', 'Second-Year Urdu I.'),
('URDU', '10200', 'First-Year Urdu II.'),
('URDU', '20200', 'Second-Year Urdu II.'),
('URDU', '10100', 'First-Year Urdu I.'),
('URDU', '20300', 'Second-Year Urdu III.')])
Given the dictionary:
{"terms": ["science", "mathematics", "economics"],
"day": ["MWF"],
"time_start": 900,
"time_end": 1500}
as input, your code should return:
(['dept', 'course_num', 'title', 'section_num', 'day', 'time_start', 'time_end', 'enrollment'],
[('CMSC', '12200', 'Computer Science with Applications II.', '01', 'MWF', 930, 1020, 76),
('CMSC', '12200', 'Computer Science with Applications II.', '02', 'MWF', 1330, 1420, 64)])
Notice that the result of the second example has many more attributes than the first. Also, notice that they are in the specified order.
Distance
We have provided a function:
compute_time_between(lon0, lat0, lon1, lat1)
that computes an approximation of the time it takes to walk between two locations, specified with longitudes and latitudes.
You can arrange to have sqlite3
call a Python function while it is
processing a query. The function must be added to the connection
using the create_function
method. This method takes three
arguments: the name you will use for the function within a query,
the number of arguments taken by the function, and the function
itself. For example, here is a statement that will add the function
compute_time_between
to the database:
conn.create_function("time_between", 4, compute_time_between)
and here is a use of that function in a query that computes the time it takes to walk between all pairs of buildings:
SELECT a.building_code,
b.building_code,
time_between(a.lon, a.lat, b.lon, b.lat) AS walking_time
FROM gps AS a JOIN gps AS b
Your implementation of find_courses
should use this mechanism to
compute the walking time between buildings when needed. We suggest
that you get the other input values working before working on queries
that involve building_code
and walking_time
.
Debugging¶
We recommend debugging your code using IPython and hand-constructed input dictionaries before you try using it with the Django interface.
We have provided the usual pytest
code in
pa3/ui/test_courses.py
. The failure output for each test includes
the dictionary passed to find_courses
along with information about
the failure.
We will only provide help with debugging your code, if there is clear evidence in your repository that you made a good faith effort to do the warm up exercises.
Grading¶
Programming assignments will be graded according to a general rubric. Specifically, we will assign points for completeness, correctness, design, and style. (For more details on the categories, see our PA Rubric page.)
The exact weights for each category will vary from one assignment to another. For this assignment, the weights will be:
Completeness: 70%
Correctness: 10%
Design: 10%
Style: 10%
Cleaning up¶
Before you submit your final solution, you should, remove
any
print
statements that you added for debugging purposes andall in-line comments of the form: “YOUR CODE HERE” and “REPLACE …”
Also, check your code against the style guide. Did you use good variable names? Do you have any lines that are too long, etc.
Make sure you have included header comments, that is, the triple-quote strings that describe the purpose, inputs, and return values of each function, for every function you have written.
As you clean up, you should periodically save your file and run your code through the tests to make sure that you have not broken it in the process.
Submission¶
You must submit your work through Gradescope (linked from our Canvas site). In the “Programming Assignment #3” assignment, simply upload file courses.py
(do not upload any other file!). Please note:
You are allowed to make as many submissions as you want before the deadline.
For students working in a pair, one student should upload the pair’s solution and use GradeScope’s mechanism for adding group members to add the second person in the pair.
Please make sure you have read and understood our Late Submission Policy
Your completeness score is determined solely based on the automated tests, but we may adjust your score if you attempt to pass tests by rote (e.g., by writing code that hard-codes the expected output for each possible test input).
Gradescope will report the test score it obtains when running your code. If there is a discrepancy between the score you get when running our grader script, and the score reported by Gradescope, please let us know so we can take a look at it.
Acknowledgments: Trevor Coyle recommended the topic of this assignment and helped with its design. Gustav Larsson wrote the Django interface and Kartik Singhal ported it to Django 2.