CMSC 12300 and CAPP 30123: Computer Science with Applications III

The University of Chicago, Spring 2016

Syllabus

This syllabus, last updated March 31, shows my plans for the course. I have posted it on the web to facilitate updates over the course of the quarter. I reserve the right to make changes; for instance, I may rearrange or change lecture topics in response to student interests, or vary the number of assignments based on pacing.

Course Staff

Instructor: Matthew Wachs
Email: mwachs
Office: RY 175-A
Office Hours: Friday, 3-5pm. (Please don't hesitate to email if you'd like to meet at another time.)

TA: Nick Seltzer
Email: nseltzer
Office: RY 176
Office Hours: TBA.

Course Components

The course consists of:

Lectures
Labs: give you the opportunity to practice with real Big Data environments and get immediate feedback and assistance from course staff (not graded)
Programming assignments: give you more in-depth practice on selected material than would be possible in the labs; approximately five programming assignments are planned, each contributing 10% towards your final grade
Project: an extended, open-ended team project, similar to the project last quarter; you will propose, check in with me, and give a final presentation alongside submitting your code. The theme of the project is answering hypotheses on large data sets. Projects will count for the balance of your final grade

Topics

This course is about Big Data: the challenges of working with it, and the solutions that have ben developed to successfully overcome them. Topics include:

Algorithms:
- considerations and changes needed when moving from smaller data sets to large ones
- analysis of computational time and memory requirements and how they scale with data size
- methods and conceptual frameworks for dividing up the work of an algorithm into separate tasks that can be run in parallel on multiple computing resources
C: an expansion of your programming skills to include, in your repertoire, arguably the most-widely used language in the world, a lingua franca for computer scientists, a low-level language that offers higher performance than interpreted languages such as Python
Big Data and cloud computing environments and programming paradigms:
- Amazon Web Services
- MapReduce
- Hadoop
- Multi-process and multi-threaded programming
- Concurrency and synchronization primitives (mutexes, condition variables)
- MPI

Tentative Schedule

Week1
Lecture 1 3/28: Introduction, challenges of scale
Lecture 2 3/30: Amazon Web Services, space and time analysis I
Lecture 3 4/1: Space and time analysis II, algorithms I
Week 2
Lecture 4 4/4: Algorithms II
Lecture 5 4/6: Algorithms III
Lab 4/7: AWS
Lecture 6 4/8: MapReduce I
Week 3
Lecture 7 4/11: MapReduce II
Lecture 8 4/13: MapReduce III
Lab 4/14: Algorithms
Lecture 9 4/15: MapReduce IV
Week 4
Lecture 10 4/18: C I; project proposals due
Lecture 11 4/20: C II
Lab 4/21: mrjob / S3
Lecture 12 4/22: C III
4/22 PA1 (MapReduce) due
Week 5
Lecture 13 4/25: C IV
Lecture 14 4/27: C V
Lab 4/28: C
Lecture 15 4/29: C & concurrency & parallelism I
Week 6
Lecture 16 5/2: Concurrency & parallelism II
Lecture 17 5/4: Concurrency & parallelism III
Lab 5/5: Concurrency
Lecture 18 5/6: Concurrency & parallelism IV
5/6 PA2 (C) due
Remainder of the schedule will be developed as we get closer to the end of the quarter

Academic Honesty

The University's rules on academic honesty apply equally to this course as they did in the prior courses in the sequence and will be rigorously and rigidly enforced. If you have any doubts, questions, or concerns, please ask, particularly in advance.

Textbook

This course does not have a textbook. However, this book is highly relevant to the course and is available online at no cost; you may find it of value.