Project

The final project for CAPP 30122 is to build software that achieves a clearly stated goal that is of genuine interest to you and your partners. Final projects must be done in teams. A team must have a minimum of 3 and a maximum of 4 people. Teams can beformed with students from any section. For instance, 3 students of the same section can form a group, or 3 students, each one from a different section, can also form a group.

Projects must have a clearly-defined goal and use at least two sources of data. At least one source of data must be acquired using web-scraping or through the use of an API.

What you do with the data you collect is up to you. Possible options include, but are not limited to, generating visualizations, using the data as a basis for a simulation or a prediction algorithm, using the data as the basis for a small database and a simple text-based front end for querying it.

We encourage teams to seek out and exploit relevant open source libraries. You are welcome to use code you find on the web, but please keep in mind that all such code must be clearly labeled with the source. Also, all projects must include some original code. (That is, you cannot just glue together pre-existing code.) See project requirements at the end of this page.

Each team must create a group repository using chisubmit. We are using chisubmit/gitlab to make it easier to handle the repositories. Unlike regular assignments, you are welcome to post your projects on Github when you are finished.

Register project teams (Week #4): Each team must complete this form to register by no later than 6pm on Feb 1st. The form requires your repository name and so, you must create it before you complete the form.

This form requires you to provide basic information about your team along with a description of the goals of your project, etc.

Project Checkin (Week #6): Please come back here in week 6. We will provide a link to provide additional information about your project progress. We will also require additonal information about your data sources and team member responsibilities.

Final Project Poster Session (Monday, March 16th, 7-9pm in JCL 390): We will be holding a poster session on Monday, March 16th from 7-9pm. Each team will be required to put together a poster for the session. Posters should include a description of the project’s goal, the results obtained, and a brief description of the software, including a pointer to any algorithms or tools that your team found to be particularly useful. If your software is interactive, you are encouraged to set up a demo at the poster session.

Completed Software (March 16th at 5pm): Each submission should include a README file that contains a list of required libraries (with version numbers) and description of how to run the software. We must be able to run your software on a VM and we must be able to understand the structure of your code without undue effort.

All students in the class must participate in the poster session.

Project Requirements: As a recap of the information provided above and with a few additional requirements, the project must include at least the following:

  1. At least two sources of data. One source of data must be acquired using web-scraping or through the use of an API. Other sources of data can be acquired as the group sees fit.
  2. A data analysis component must be part of the project. For example, as stated above, using the data as a basis for a simulation or prediction algorithm. Data visualization with interpretation of the results is also acceptable.
  3. Specific components built by each team member. Every team member is responsible for implementing a component of the project’s software. We expect projects to include roughly 400-600 lines of code. This range should be interpreted as a guideline, rather than as a firm requirement. A project that uses a complex algorithm might have fewer lines of code. A project that mainly integrates data from many different sources or software components might have more.
  4. A visual or textual output. What the program generates (i.e., outputs) is up to you. For example, generating visualizations, a JSON string with the results of performing a query on the database. You can check with the instructors if you are unsure about this requirement.