CMSC 12300 Computer Science with Applications-3
Spring 2013

Lecturer: Borja Sotomayor
E-mail: borja AT cs DOT uchicago DOT edu
Office: Ryerson 151
Office hours: By appointment.

TA: Gustav Larsson
E-mail: larsson AT uchicago DOT edu
Office: Ryerson 177
Office hours: TBD

Lectures: TuTh 4:00-5:20 in Ryerson 276

Quick links

Course Description

This course is the third in a three-quarter sequence that teaches computational thinking and skills to students in the sciences, mathematics, economics, etc. The course revolves around core ideas behind the management and computation of large volumes of data ("Big Data"). Topics include (1) Statistical methods for large data analysis, (2) Parallelism and concurrency, including models of parallelism and synchronization primitives, and (3) Distributed computing, including distributed architectures and the algorithms and techniques that enable these architectures to be fault-tolerant, reliable, and scalable.

Students will continue to use R, and will also learn C++ and distributed computing tools and platforms, including Amazon AWS and Hadoop. This course includes a project where students will have to formulate hypotheses about a large dataset, develop statistical models to test those hypothesis, implement a prototype that performs an initial exploration of the data, and a final system to process the entire dataset.

CMSC 12200, or instructor's consent, is a prerequisite for taking this course.

Books

This course has no required textbooks.