CMSC 23310/33310 Advanced Distributed Systems
Spring 2014

Lecturer: Borja Sotomayor
E-mail: borja AT cs DOT uchicago DOT edu
Office: Ryerson 151
Office hours: By appointment

Time and Location: Mondays and Wednesdays 1:30-2:50 in Cobb 301

Quick links

Course Description

In recent years, large distributed systems have taken a prominent role not just in scientific inquiry, but also in our daily lives. When we perform a search on Google, stream content from Netflix, place an order on Amazon, or catch up on the latest comings-and-goings on Facebook, our seemingly minute requests are processed by complex systems that sometimes include hundreds of thousands of computers, connected by both local and wide area networks.

Recent papers in the field of Distributed Systems have described several solutions (such as MapReduce, BigTable, Dynamo, Cassandra, etc.) for managing large-scale data and computation. However, building and using these systems poses a number of more fundamental challenges: How do we keep the system operating correctly even when individual machines fail? How do we ensure that all the machines have a consistent view of the system's state? (and how do we ensure this in the presence of failures?) How can we determine the order of events in a system where we can't assume a single global clock?

Many of these fundamental problems were identified and solved over the course of several decades, starting in the 1970's. To better appreciate the challenges of recent developments in the field of Distributed Systems, this course will guide students through seminal work in Distributed Systems from the 70's, 80's, and 90's, leading up to a discussion of recent work in the field.

Course Organization

This course is divided into two components:

Reading and Discussion of Primary Sources: We will cover seminal work in the field, as well as recent work, by going directly to the original papers (listed below). Several papers will be assigned each week, to be discussed on both Monday and Wednesday.
Homeworks: Two short homework assignments will be given in the first half of the quarter.
Project and Paper: In the second half of the quarter, students will have to implement a distributed system drawing upon the seminal work covered in the first half of the quarter. Based on their projects, students will have to write a final paper evaluating the features and performance of their project.

The final grade will be divided as follows: 20% homeworks (each weighed equally), 30% participation in discussions, 20% project, 30% final paper. There will be no midterms or final exam.

A B+ or higher in CMSC 23300 (Networks and Distributed Systems) is a prerequisite for this course. Students can petition to have this requirement waived, as long as they have taken at least one other 200-level CS systems course.

Course Schedule

Week 1

No papers to read this week

Week 2 - Distributed Time

Required reading for Monday, April 7

Leslie Lamport. Time, Clocks, and the Ordering of Events in a Distributed System. Commun.ACM, 21(7):558–565, July 1978

Required reading for Wednesday, April 9

C. J. Fidge.Timestamps in Message-Passing Systems that Preserve the Partial Ordering. Proceedings of the 11th Australian Computer Science Conference, 10(1):5666, 1988
Friedemann Mattern. Virtual Time and Global States of Distributed Systems. In Parallel and Distributed Algorithms, pages 215–226. North-Holland, 1989

Week 3 - Distributed Consensus I

Required reading for Monday, April 14

Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine Generals Problem. ACM Trans. Program. Lang. Syst., 4(3):382–401, July 1982

Required reading for Wednesday, April 16

Butler W. Lampson and Howard E. Sturgis. Crash Recovery in a Distributed Data Storage System, 1979
D. Skeen and M. Stonebraker. A Formal Model of Crash Recovery in a Distributed System. IEEE Trans. Softw. Eng., 9(3):219–228, May 1983

Week 4 - Limits of Distributed Systems

Required reading for Monday, April 21

Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. Impossibility of distributed consensus with one faulty process. J. ACM, 32(2):374–382, April 1985
Danny Dolev, Cynthia Dwork, and Larry Stockmeyer. On the Minimal Synchronism Needed for Distributed Consensus. J. ACM, 34(1):77–97, January 1987

Required reading for Wednesday, April 23

Seth Gilbert and Nancy Lynch. Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services. SIGACT News, 33(2):51–59, June 2002

Week 5 - Paxos

Required reading for Monday, April 28 and Wednesday April 30

Leslie Lamport. The Part-Time Parliament. ACM Trans. Comput. Syst., 16(2):133–169, May 1998
Leslie Lamport. Paxos Made Simple. ACM SIGACT News, 32(4):18–25, December 2001

Week 6 - Distributed Consensus II

Required reading for Monday, May 5

Mike Burrows. The Chubby Lock Service for Loosely-Coupled Distributed Systems. In Proceedings of the 7th symposium on Operating systems design and implementation, OSDI ’06, pages 335–350, Berkeley, CA, USA, 2006. USENIX Association
Tushar D. Chandra, Robert Griesemer, and Joshua Redstone. Paxos Made Live: An Engineering Perspective. In Proceedings of the twenty-sixth annual ACM symposium on Principles of distributed computing, PODC ’07, pages 398–407, New York, NY, USA, 2007. ACM

Required reading for Monday, May 7

Diego Ongaro and John Ousterhout. In search of an understandable consensus algorithm, 2014

Week 7 - Distributed Hash Tables

Required reading for Monday, May 12

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. SIGCOMM Comput. Commun. Rev., 31(4):149–160, August 2001
Antony I. T. Rowstron and Peter Druschel. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed Systems Platforms Heidelberg, Middleware ’01, pages 329–350, London, UK, UK, 2001. Springer-Verlag

Required reading for Monday, May 14

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon’s Highly Available Key-Value Store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles, SOSP ’07, pages 205–220, New York, NY, USA, 2007. ACM

Week 8 - Distributed Data

Required reading for Monday, May 19

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung. The Google File System. SIGOPS Oper. Syst. Rev., 37(5):29–43, October 2003
Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM, 51(1):107–113, January 2008

Required reading for Monday, May 21

James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google’s globally-distributed database. In Proceedings of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI’12, pages 251–264, Berkeley, CA, USA, 2012. USENIX Association

Week 9 - Distributed Currency

NOTE: No class on Monday, May 26 (Memorial Day)

Required reading for Monday, May 28

Satoshi Nakamoto. Bitcoin: A peer-to-peer electronic cash system, 2008

Week 10 - Review

Required reading for Monday, June 2

Edsger W. Dijkstra. Self-stabilizing systems in spite of distributed control. Commun. ACM, 17(11):643–644, November 1974
Leslie Lamport. Solved Problems, Unsolved Problems and Non-problems in Concurrency. SIGOPS Oper. Syst. Rev., 19(4):34–44, October 1985

NOTE: No required reading for Wednesday, June 4

CMSC 23310/33310 Advanced Distributed Systems
Spring 2014

Quick links

Course Description

Course Organization

Course Schedule

Week 1

Week 2 - Distributed Time

Required reading for Monday, April 7

Required reading for Wednesday, April 9

Suggested reading

Week 3 - Distributed Consensus I

Required reading for Monday, April 14

Required reading for Wednesday, April 16

Suggested reading

Week 4 - Limits of Distributed Systems

Required reading for Monday, April 21

Required reading for Wednesday, April 23

Suggested reading

Week 5 - Paxos

Required reading for Monday, April 28 and Wednesday April 30

Week 6 - Distributed Consensus II

Required reading for Monday, May 5

Required reading for Monday, May 7

Week 7 - Distributed Hash Tables

Required reading for Monday, May 12

Required reading for Monday, May 14

Week 8 - Distributed Data

Required reading for Monday, May 19

Required reading for Monday, May 21

Suggested Reading

Week 9 - Distributed Currency

Required reading for Monday, May 28

Week 10 - Review

Required reading for Monday, June 2

CMSC 23310/33310 Advanced Distributed SystemsSpring 2014

Quick links

Course Description

Course Organization

Course Schedule

Week 1

Week 2 - Distributed Time

Required reading for Monday, April 7

Required reading for Wednesday, April 9

Suggested reading

Week 3 - Distributed Consensus I

Required reading for Monday, April 14

Required reading for Wednesday, April 16

Suggested reading

Week 4 - Limits of Distributed Systems

Required reading for Monday, April 21

Required reading for Wednesday, April 23

Suggested reading

Week 5 - Paxos

Required reading for Monday, April 28 and Wednesday April 30

Week 6 - Distributed Consensus II

Required reading for Monday, May 5

Required reading for Monday, May 7

Week 7 - Distributed Hash Tables

Required reading for Monday, May 12

Required reading for Monday, May 14

Week 8 - Distributed Data

Required reading for Monday, May 19

Required reading for Monday, May 21

Suggested Reading

Week 9 - Distributed Currency

Required reading for Monday, May 28

Week 10 - Review

Required reading for Monday, June 2

CMSC 23310/33310 Advanced Distributed Systems
Spring 2014