============================= Course Search Engine: backend ============================= **Due: Tuesday, Feb 2nd at 5pm** You may work alone or in a pair for this assignment. If you work in a pair, we recommend working with the same partner as PA #2. In this assignment, you will be building the backend for a course search tool. You combine information gathered from the University's course catalog and the time schedules web page to respond to searches like: "Find all courses that have Plato in their description, meet on Tuesdays between 10:30am and 3pm, and are within a 10 minute walk of Ryerson." Getting started --------------- Follow `these start-up instructions `_ if you plan to work alone. Follow `these start-up instructions `_ if you are going to work with your PA #2 partner. Follow `these start-up instructions `_ if you are going to work in a new pair. Once you follow these instructions, your repository will contain a directory named ``pa3``. That directory will contain: - ``get-db.sh:`` a script to pick-up the relational database that you will use for this assignment, and - ``ui:`` a directory containing the Django user interface code. You will modify a file named ``courses.py`` in this directory. Do not modify any of the other files. Data ---- For this assignment, you will use a ``sqlite3`` database we constructed using the index gathered by our catalog scraper from PA #2 and data scraped from the timeschedules webpage. Run the following command:: ./get-db.sh using the Linux command line from within your ``pa3`` directory to pick up a database named ``course-info.db`` and store it in the ``ui`` sub-directory. Please do not add the database to your repository. You can use the ``sqlite3`` command-line interface to explore the database and run sample queries. Recall that the command ``.tables`` will list the names of the tables stored in the database, the ``.schema`` command will list the schema for particular table, and the ``.quit`` command will exit ``sqlite3``. For example:: $ sqlite3 ui/course-info.db SQLite version 3.8.5 2014-08-15 22:37:57 Enter ".help" for usage hints. sqlite> .tables catalog_index gps sections courses meeting_patterns sqlite> .schema courses CREATE TABLE courses ( course_id integer, -- unique ID assigned to each course dept varchar(4), -- 4 character department code course_num varchar(5), -- 5 digit course number, represented as a 5 character string title varchar(100) -- course title, represented as a string with up to 100 characters ); sqlite> .quit Note that ``$`` represents the Linux command-line prompt. Warm up exercise #1 ------------------- Before you start writing code for this assignment, we strongly recommend getting comfortable with querying the database using the ``sqlite3`` command-line interface. Here are some queries that it would be helpful to understand: - Find the titles of all courses with department code "CMSC" in the course table. - Find the department names, course numbers, and sections numbers for courses being offered on MWF at 10:30am (represented as 1030) - Find the department names and course numbers for courses being offered in Ryerson on MWF between 10:30am and 3pm (represented as 1500). - Find the department names, course numbers, and course titles for courses being offered on MWF at 9:30am (represented as 930) that have the words "programming" and "abstraction" in their title/course description. Using Python and sqlite3 ------------------------ The library ``sqlite3`` allows a Python program to connect to a sqlite3 database and make queries. You will use the following functions from this library: ``sqlite3.connect(db_filename):`` takes the name of a database as a string and opens or creates a sqlite3 database of that name and returns a connection object. ``db.cursor(connection):`` takes a database connection (``db``) and returns a cursor object that can be used to execute queries and get information about the schema for a query result. ``c.execute(s, args):`` takes ``s``, a string containing a SQL query in which parameters are specified with question marks (?), and a list of values for the parameters (one per ? in the query string) and returns an object that can be used to access the resulting table. ``c`` is a cursor object. ``r.fetchall():`` returns a list of lists with one entry per tuple in the result. ``r`` is the result of a call to ``execute``. ``db.close():`` closes the connection to the database. ``db`` is a connection object. Warmup Exercise #2 ------------------ Write parameterized versions of the queries from warmup exercise #1 from within ``ipython3`` using the ``sqlite3`` library. Django Web Interface -------------------- We have provided a Django web interface for your search tool. Django is installed on the machines in CSIL, but not on your VM. To install Django on your VM, run the command:: sudo pip3 install django To start the interface go to the directory ``pa3/ui`` in your repository and run the command:: python3 manage.py runserver It will be helpful to run this command in its own terminal window. Once the interface is started, you can use it by pointing a browser to ``http://127.0.0.1:8000/``. When the user hits the submit button, the interface will call the function ``find_courses`` in the file ``ui/courses.py`` with a dictionary containing values corresponding to the completed fields. Here is a list of the possible keys for the dictionary and the types of their associated values: :``dept``: department code (string) :``day``: meeting pattern (list of strings) :``time_start``: earliest starting time (integer in range 0 to 2359) :``time_end``: latest ending time (integer in range 0 to 2359) :``building``: building code (string) :``walking_time``: maximum walking time from specified building (integer). You will always get building and walking_time together. :``enroll_lower``: lower limit for enrollment (integer) :``enroll_upper``: upper limit for enrollment (integer) :``terms``: catalog search terms (string of words (aka terms) separated by white space) A few notes: - If the user's input includes a value for ``day``, a class will match it if it meets on any one of the specified meeting patterns. For example, the input ``{"day":["MWF", "TR"]}`` matches classes that meet on a MWF schedule OR a TR schedule. - ``time_start`` / ``time_end`` and ``enroll_upper`` /``enroll_lower`` should be considered as *inclusive* bounds. Furthermore, you should not assume that you receive both fields for time or both fields for enroll. For example, if the user interface passes along a dictionary with a field for ``time_start`` but does not include a field for ``time_end``, your program should take the specified starting time as a lower bound and return all courses that meet on or after that time. - A specific course matches a string of terms only if `all` of the terms occur in the catalog index with the course_id for that course. For example, the terms "economics sciences mathematics" match "CMSC 12200" (course id: 611), because the tuples ``(611,"sciences")``, ``(611, "economics")``, AND ``(611, mathematics)`` occur in the ``catalog_index`` table. If you check the "show args_to_ui" box, the interface will show the dictionary that will be passed to ``find_courses``. Your task --------- Your task is to implement the function ``find_courses(args_to_ui),`` which takes a dictionary as described above and performs a search to find courses/sections that match the user's input. The result of this search process will be a relation. Your function must return a tuple with the attribute (column) names for the relation and a list of the tuples in the relation. For example, if the input from the user contains:: { "terms": "science mathematics economics", "day": ["MWF"], "time_start": 900, "time_end": 1500 } the output of the function would be something like:: ([dept, course_num, section_num, day, time_start, time_end, title], [[CMSC, 12200, 01, MWF, 930, 1020, Computer Science with Applications-2], [CMSC, 12200, 02, MWF, 1330, 1420, Computer Science with Applications-2]]) If the result of a query is empty (no courses match the criteria), then find_courses should return ``([], [])``. We have provided you with a function to ease the task of retrieving the column names for the table generated by your query. After you have run your query, pass your cursor object into the function ``get_header(cursor)`` to retrieve the column names for your generated table (in list format). **Querying the Database** Your function will need to gather information from ``course-info.db`` to process the user's input. There are many possible combinations of inputs and you should not hard-code them all into your implementation. Instead, your code should construct a query from the parameters specified by the user. For example, given the dictionary:: { "time_start": 1030, "time_end": 1500, "day": ["MWF"] } Your implementation should construct a query similar to: .. code:: sql SELECT courses.dept, courses.course_num, sections.section, meeting_patterns.day, meeting_patterns.time_start, meeting_patterns.time_end FROM courses JOIN sections JOIN meeting_patterns ON courses.course_id = sections.course_id AND sections.meeting_patterns_id = meeting_patterns.meeting_patterns_id WHERE meeting_patterns.day = ? AND meeting_patterns.time_start >= ? AND meeting_patterns.time_end <= ? along with an argument array containing ``["MWF", 1030, 1500]``. (Note: the formatting of the query is merely for clarity.) The join method for strings will be very useful for generating a query string. For example:: l = ["a", "b", "c"] ", ".join(l) yields the string "a, b, c" **What Attributes Should Be Included in the Output?** The attributes included in the output will depend on the parameters specified by the user. The table below shows the attributes (columns) that should be included for each possible type of input (rows). The output should include the union of the attributes for each of the specified inputs. +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | | dept | course_num | section_num | day | time_start | time_end | building | walking_time | enrollment | title | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | terms | X | X | | | | | | | | X | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | dept | X | X | | | | | | | | X | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | day | X | X | X | X | X | X | | | | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | time_start | X | X | X | X | X | X | | | | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | time_end | X | X | X | X | X | X | | | | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | walking_time | X | X | X | X | X | X | X | X | | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | building | X | X | X | X | X | X | X | X | | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | enroll_lower | X | X | X | X | X | X | | | X | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ | enroll_upper | X | X | X | X | X | X | | | X | | +--------------+------+------------+-------------+-----+------------+----------+----------+--------------+------------+-------+ The attributes should be ordered as follows : dept, course_num, section_num, day, time_start, time_end, building, walking_time, enrollment, and title. For the example input dictionary shown above, the output should include the department code, course number, section number, day, time_start, and time_end in that order. **Distance** We have provided a function:: compute_time_between(lon0, lat0, lon1, lat1) that computes an approximation of the time it takes to walk between two locations, specified with longitudes and latitudes. It is possible to have ``sqlite3`` call a Python function while it is processing a query. The function must be added to the connection using the ``create_function`` method. This method takes three arguments: the name you will use for the function within a query, the number of arguments taken by the function, and the function itself. For example, here is a statement that will add the function ``compute_time_between`` to the database:: db.create_function("time_between", 4, compute_time_between) and here is a use of that function in a query that computes the time it takes to walk between all pairs of buildings: .. code:: sql SELECT a.building, b.building, time_between(a.lon, a.lat, b.lon, b.lat) AS walking_time FROM gps AS a JOIN gps AS b WHERE a.building < b.building Your implementation of ``find_courses`` should use this mechanism to handle user inputs that contain ``building`` and ``walking_time``. We suggest that you get the rest of the possible inputs working before you handle this one. Debugging --------- We recommend debugging your code using IPython and hand-constructed input dictionaries *before* you try using it with the Django interface. We have included a few sample inputs at the bottom of the file to help you get started with debugging. We will only provide help with debugging your code, if there is clear evidence in your repository that you made a good faith effort to do the warmup exercises. Submission ---------- Follow these `submission instructions `_ if you are working alone. Follow these `submission instructions `_ if you are working in a pair. *Acknowledgment: Trevor Coyle recommended the topic of this assignment and helped with its design. Gustav Larsson wrote the Django interface.*