CMSC 23310/33310 Advanced Distributed Systems
Spring 2012

Lecturer: Borja Sotomayor
E-mail: borja AT cs DOT uchicago DOT edu
Office: Searle 209-A
Office hours: Open door policy (see Course Syllabus)

Discussion session: Tuesdays 10:30-11:50 in Cobb 304
Lecture: Thursdays 10:30-11:50 in Ryerson 276

Quick links

Course Description

In recent years, large distributed systems have taken a prominent role not just in scientific inquiry, but also in our daily lives. When we perform a search on Google, stream content from Netflix, place an order on Amazon, or catch up on the latest comings-and-goings on Facebook, our seemingly minute requests are processed by complex systems that sometimes include hundreds of thousands of computers, connected by both local and wide area networks.

Recent papers in the field of Distributed Systems have described several solutions (such as MapReduce, BigTable, Dynamo, Cassandra, etc.) for managing large-scale data and computation. However, building and using these systems poses a number of more fundamental challenges: How do we keep the system operating correctly even when individual machines fail? How do we ensure that all the machines have a consistent view of the system's state? (and how do we ensure this in the presence of failures?) How can we determine the order of events in a system where we can't assume a single global clock?

Many of these fundamental problems were identified and solved over the course of several decades, starting in the 1970's. To better appreciate the challenges of recent developments in the field of Distributed Systems, this course will guide students through seminal work in Distributed Systems from the 70's, 80's, and 90's, leading up to a discussion of recent work in the field.

Course Organization

This course is divided into two components:

The final grade will be divided as follows: 20% response papers, 20% final paper, 20% participation in discussions, 40% project (the grade for the project is divided into several milestones; see the syllabus). There will be no midterms or final exam.

Students interested in taking this course should take into account that CMSC 23300 (Networks and Distributed Systems) is a prerequisite for this course. Students can petition to have this requirement waived, as long as they have taken at least one other 200-level CS systems course.

Papers

This is the tentative list of papers we will be covering in this course. We will be discussing some of these in detail, while others will be assigned as optional reading. See the Course Syllabus for the week-by-week reading schedule..

Getting Started

Fault Tolerance

Distributed Consensus

Distributed Time

Other Topics and Surveys

Recent Work