Course Search Engine: backend¶
Due: Friday, Feb 12th at 4pm
You may work alone or in a pair for this assignment.
In this assignment, you will be building the backend for a course search tool. You combine information gathered from the University’s course catalog and the time schedules web page to respond to searches like: “Find all courses that have Plato in their description, meet on Tuesdays between 10:30am and 3pm, and are within a 10 minute walk of Ryerson.”
Getting started¶
Follow these start-up instructions if you plan to work alone.
Follow these start-up instructions if you are going to work with your PA #2 partner.
Follow these start-up instructions if you are going to work in a new pair.
Once you follow these instructions, your repository will contain a
directory named pa3
. That directory will contain:
get-db.sh:
a script to pick-up the relational database that you will use for this assignment, andui:
a directory containing the Django user interface code. You will modify a file namedcourses.py
in this directory. Do not modify any of the other files.
Data¶
For this assignment, you will use a sqlite3
database we
constructed using the index gathered by our catalog scraper from PA #2
and data scraped from the timeschedules web page in a past quarter. Run the following command:
$ ./get-db.sh
using the Linux command line from within your pa3
directory to
pick up a database named course-info.db
and store it in the ui
sub-directory. Please do not add the database to your repository.
You can use the sqlite3
command-line interface to explore the
database and run sample queries. Recall that the command .tables
will list the names of the tables stored in the database, the
.schema
command will list the schema for a particular table, and the
.quit
command will exit sqlite3
. For example:
$ sqlite3 ui/course-info.db
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> .tables
catalog_index gps sections
courses meeting_patterns
sqlite> .schema courses
CREATE TABLE courses
(
course_id integer, -- unique ID assigned to each course
dept varchar(4), -- 4 character department code
course_num varchar(5), -- 5 digit course number, represented as a 5 character string
title varchar(100) -- course title, represented as a string with up to 100 characters
);
sqlite> .quit
Note that the string sqlite>
is the sqlite command-line prompt.
Warm up exercise #1¶
Before you start writing code for this assignment, we strongly
recommend getting comfortable with querying the database
using the sqlite3
command-line interface.
Here are some queries that it would be helpful to understand:
Find the titles of all courses with department code “CMSC” in the course table.
Find the department names, course numbers, and section numbers for courses being offered on MWF at 10:30am (represented as 1030)
Find the department names and course numbers for courses being offered in Ryerson on MWF between 10:30am and 3pm (represented as 1500).
Find the department names, course numbers, and course titles for courses being offered on MWF at 9:30am (represented as 930) that have the words “programming” and “abstraction” in their title/course description.
Using Python and sqlite3¶
The library sqlite3
allows a Python program to connect to
a sqlite3 database and make queries. You will use the following
functions from this library:
sqlite3.connect(db_filename):
takes the name of a database as a
string and opens or creates a sqlite3 database of that name and
returns a connection object.
db.cursor():
takes a database connection (db
) and returns a
cursor object that can be used to execute queries and get information about the
schema for a query result.
c.execute(s, args):
takes s
, a string containing a SQL query
in which parameters are specified with question marks (?), and a tuple
of values for the parameters (one per ? in the query string) and
returns an object that can be used to access the resulting table.
c
is a cursor object.
r.fetchall():
returns a list of lists with one entry per tuple in
the result. r
is the result of a call to execute
.
db.close():
closes the connection to the database. db
is a
connection object.
Warm up Exercise #2¶
Write parameterized versions of the queries from warm up exercise #1
from within ipython3
using the sqlite3
library. If you are unable to import this library, then go to the Linux command line and:
$ pip3 install --user --upgrade sqlite3-api
Django Web Interface¶
We have provided a Django web interface for your search tool. Django is a Python library that allows you to run a small web server on a computer, create web forms, have the inputs from those forms reported to a piece of Python code, and have the results of a Python computation based on those inputs be displayed on a result web page.
For this programming assignment, you will write a Python function that receives input from Django (a set of criteria for a course search), queries the database to find matching courses, and returns information about matching courses in a format suitable for display by Django. To actually be able to run Django and interact with your site in a web browser, you will need to use vDesk.
If you are unable to use vDesk, you can complete the entire assignment without ever actually using Django. Because the code is modularized, you can create sample inputs to the function you are writing in the same format as Django would have provided, run them through your function using ipython3
, and see the results. In fact, even for those who will be using Django, the use of ipython3
to get code working initially before moving on to using Django is still highly recommended. So, to be clear, although you may find it rewarding to see your code working as part of a web site, you are not at a disadvantage if you are unable to use vDesk reliably.
The next part of this section discusses how to set up and use Django for those who are able to use it and interested in doing so. It should be performed while logged in to vDesk. If you do not intend to use Django, skip down to the paragraph that begins “When the user hits the submit button….”
To install Django, run the command:
$ pip3 install --user --upgrade django
Then, go to the directory pa3/ui
in your
repository and run the command:
$ python3 manage.py migrate
You need only perform this step a single time.
Each time you want to start the interface, go to the same directory and:
$ python3 manage.py go
It will be helpful to run this command in its own terminal window.
Once the interface is started, you can use it by pointing the browser within vDesk (i.e. the installation of Firefox that can be launched using the icon near the top left of the vDesk window, not the real web browser on your real computer) to the address shown in the terminal window. Tip: clicking on this address will launch Firefox and bring up the correct URL automatically; you may need to hold down a modifier key, such as the Control key, or click more than once, depending on settings. Warning: the address will be different each time you start Django.
If, when you run the above command, you receive an error that The port is already in use
, simply run it again.
When the user hits the submit button, the interface will call the
function find_courses
in the file ui/courses.py
with a
dictionary containing values corresponding to the completed fields. When the form is originally loaded, the function will be called with a blank dictionary, even though no form has yet been submitted.
Here is a list of the possible keys for the dictionary and the types of their associated values:
dept
department code (string, such as
"CMSC"
)day
meeting pattern (list of strings, such as
"MWF"
or"TR"
)time_start
earliest starting time (integer in range 0 to 2359)
time_end
latest ending time (integer in range 0 to 2359)
building
building code (string, such as
"RY"
or"JCL"
)walking_time
maximum walking time in minutes from specified building (integer). You will always get building and walking_time together.
enroll_lower
lower limit for enrollment (integer)
enroll_upper
upper limit for enrollment (integer)
terms
catalog search terms (string of words (aka terms) separated by white space)
A few notes:
If the user’s input includes a value for
day
, a class should match it if it meets on any one of the specified meeting patterns. For example, the input{"day":["MWF", "TR"]}
should match classes that meet on a MWF schedule OR a TR schedule.time_start
/time_end
andenroll_upper
/enroll_lower
should be considered as inclusive bounds. Furthermore, you should not assume that you receive both fields for time or both fields for enroll. For example, if the user interface passes along a dictionary with a field fortime_start
but does not include a field fortime_end
, your program should take the specified starting time as a lower bound and return all courses that meet on or after that time.A specific course matches a string of terms only if all of the terms occur in the catalog index with the course_id for that course. For example, the terms “economics sciences mathematics” match “CMSC 12200” (course id: 611), because the tuples
(611,"sciences")
,(611, "economics")
, AND(611, mathematics)
occur in thecatalog_index
table.
If you check the “show args_to_ui” box, the interface will show the dictionary that will be passed to find_courses
.
Your task¶
Your task is to implement the function find_courses(args_to_ui),
which takes a dictionary as described above and performs a search to
find courses/sections that match the user’s input.
The result of this search process will be a relation (table). Your function must return a tuple with the attribute (column) names for the relation and a list of the tuples (rows) in the relation. For example, if the input from the user contains:
{
"terms": "science mathematics economics",
"day": ["MWF"],
"time_start": 900,
"time_end": 1500
}
the output of the function would be something like:
(["dept", "course_num", "section_num", "day", "time_start", "time_end", "title"],
[("CMSC", "12200", "01", "MWF", 930, 1020, "Computer Science with Applications-2"),
("CMSC", "12200", "02", "MWF", 1330, 1420, "Computer Science with Applications-2")])
If the result of a query is empty (no courses match the criteria),
then find_courses should return ([], [])
.
We have provided you with a function to ease the task of retrieving
the column names for the table generated by your query. After you have
run your query, pass your cursor object into the function
get_header(cursor)
to retrieve the column names for your generated
table (in list format).
Querying the Database
Your function will need to gather information from course-info.db
to process the user’s input. There are many possible combinations
of inputs and you should not hard-code them all into your
implementation. (By “hard-coding,” we specifically mean generating the SQL queries for all possible combinations of inputs up-front and hard-coding these strings into your program.) Instead, your code should construct a query from the
parameters specified by the user, by combining different building blocks in response to the specific request from the user. (Note that there will be some degree of hard-coding for these building blocks.)
For example, given the dictionary:
{
"time_start": 1030,
"time_end": 1500,
"day": ["MWF"]
}
Your implementation should construct a query similar to:
SELECT courses.dept,
courses.course_num,
sections.section_num,
meeting_patterns.day,
meeting_patterns.time_start,
meeting_patterns.time_end
FROM courses JOIN sections JOIN meeting_patterns
ON courses.course_id = sections.course_id AND
sections.meeting_pattern_id = meeting_patterns.meeting_pattern_id
WHERE meeting_patterns.day = ? AND
meeting_patterns.time_start >= ? AND
meeting_patterns.time_end <= ?
along with an argument array containing ["MWF", 1030, 1500]
. (Note: the formatting of the query is merely for clarity.)
The join method for strings will be very useful for generating a query string. For example:
l = ["a", "b", "c"]
", ".join(l) yields the string "a, b, c"
What Attributes Should Be Included in the Output?
The attributes included in the output will depend on the parameters specified by the user. The table below shows the attributes (columns) that should be included for each possible type of input (rows). The output should include the union of the attributes for each of the specified inputs.
dept |
course_num |
section_num |
day |
time_start |
time_end |
building |
walking_time |
enrollment |
title |
|
terms |
X |
X |
X |
|||||||
dept |
X |
X |
X |
|||||||
day |
X |
X |
X |
X |
X |
X |
||||
time_start |
X |
X |
X |
X |
X |
X |
||||
time_end |
X |
X |
X |
X |
X |
X |
||||
walking_time |
X |
X |
X |
X |
X |
X |
X |
X |
||
building |
X |
X |
X |
X |
X |
X |
X |
X |
||
enroll_lower |
X |
X |
X |
X |
X |
X |
X |
|||
enroll_upper |
X |
X |
X |
X |
X |
X |
X |
The attributes should be ordered as follows : dept, course_num, section_num, day, time_start, time_end, building, walking_time, enrollment, and title.
For the example input dictionary shown above, the output should include the department code, course number, section number, day, time_start, and time_end in that order.
Distance
We have provided a function:
compute_time_between(lon0, lat0, lon1, lat1)
that computes an approximation of the time it takes to walk between two locations, specified with longitudes and latitudes.
It is possible to have sqlite3
call a Python function while it is
processing a query. The function must be added to the connection
using the create_function
method. This method takes three
arguments: the name you will use for the function within a query, the
number of arguments taken by the function, and the function itself.
For example, here is a statement that will add the function
compute_time_between
to the database:
db.create_function("time_between", 4, compute_time_between)
and here is a use of that function in a query that computes the time it takes to walk between all pairs of buildings:
SELECT a.building_code, b.building_code, time_between(a.lon, a.lat, b.lon, b.lat) AS walking_time
FROM gps AS a JOIN gps AS b
WHERE a.building_code < b.building_code
Your implementation of find_courses
should use this mechanism to
handle user inputs that contain building
and walking_time
. We
suggest that you get the rest of the possible inputs working before you
handle this one.
Debugging¶
We recommend debugging your code using IPython and hand-constructed
input dictionaries before you try using it with the Django
interface. (In fact, if you are unable to access vDesk reliably, you need not necessarily test your code beyond the use of ipython3
at all) We have included a few sample inputs at the bottom of the
file to help you get started with debugging.
We will only provide help with debugging your code if there is clear evidence in your repository that you made a good faith effort to do the warm up exercises.
Submission¶
Follow these submission instructions if you are working alone.
Follow these submission instructions if you are working in a pair.
Acknowledgments: Anne Rogers wrote this assignment. Trevor Coyle recommended the topic of this assignment and helped with its design. Gustav Larsson wrote the Django interface and Kartik Singhal ported it to Django 2.