CMSC 12300 and CAPP 30123: Computer Science with Applications III
The University of Chicago, Spring 2016
Syllabus
This syllabus, last updated March 31, shows my plans for the course. I have
posted it on the web to facilitate updates over the course of the quarter.
I reserve the right to make changes; for instance, I may rearrange or change
lecture topics in response to student interests, or vary the number of
assignments based on pacing.
Course Staff
Instructor: Matthew Wachs
Email: mwachs
Office: RY 175-A
Office Hours: Friday, 3-5pm. (Please don't hesitate to email if you'd like to meet at another time.)
TA: Nick Seltzer
Email: nseltzer
Office: RY 176
Office Hours: TBA.
Course Components
The course consists of:
- Lectures
- Labs: give you the opportunity to practice with real Big Data environments and get immediate feedback and assistance from course staff (not graded)
- Programming assignments: give you more in-depth practice on selected material than would be possible in the labs; approximately five programming assignments are planned, each contributing 10% towards your final grade
- Project: an extended, open-ended team project, similar to the project last quarter; you will propose, check in with me, and give a final presentation alongside submitting your code. The theme of the project is answering hypotheses on large data sets. Projects will count for the balance of your final grade
Topics
This course is about Big Data: the challenges of working with it, and the solutions that have ben developed to successfully overcome them. Topics include:
- Algorithms:
- considerations and changes needed when moving from smaller data sets to large ones
- analysis of computational time and memory requirements and how they scale with data size
- methods and conceptual frameworks for dividing up the work of an algorithm into separate tasks that can be run in parallel on multiple computing resources
- C: an expansion of your programming skills to include, in your repertoire,
arguably the most-widely used language in the world, a lingua franca for
computer scientists, a low-level language that offers higher performance than
interpreted languages such as Python
- Big Data and cloud computing environments and programming paradigms:
- Amazon Web Services
- MapReduce
- Hadoop
- Multi-process and multi-threaded programming
- Concurrency and synchronization primitives (mutexes, condition variables)
- MPI
Tentative Schedule
- Week1
- Lecture 1 3/28: Introduction, challenges of scale
- Lecture 2 3/30: Amazon Web Services, space and time analysis I
- Lecture 3 4/1: Space and time analysis II, algorithms I
- Week 2
- Lecture 4 4/4: Algorithms II
- Lecture 5 4/6: Algorithms III
- Lab 4/7: AWS
- Lecture 6 4/8: MapReduce I
- Week 3
- Lecture 7 4/11: MapReduce II
- Lecture 8 4/13: MapReduce III
- Lab 4/14: Algorithms
- Lecture 9 4/15: MapReduce IV
- Week 4
- Lecture 10 4/18: C I; project proposals due
- Lecture 11 4/20: C II
- Lab 4/21: mrjob / S3
- Lecture 12 4/22: C III
- 4/22 PA1 (MapReduce) due
- Week 5
- Lecture 13 4/25: C IV
- Lecture 14 4/27: C V
- Lab 4/28: C
- Lecture 15 4/29: C & concurrency & parallelism I
- Week 6
- Lecture 16 5/2: Concurrency & parallelism II
- Lecture 17 5/4: Concurrency & parallelism III
- Lab 5/5: Concurrency
- Lecture 18 5/6: Concurrency & parallelism IV
- 5/6 PA2 (C) due
- Remainder of the schedule will be developed as we get closer to the end of the quarter
Academic Honesty
The University's rules on academic honesty apply equally to this course as they did in the prior courses in the sequence and will be rigorously and rigidly enforced. If you have any doubts, questions, or concerns, please ask, particularly in advance.
Textbook
This course does not have a textbook. However, this book is highly relevant to the course and is available online at no cost; you may find it of value.