| 1 |
Tue 03/24 |
Introduction & Motivation Slides & Discussion |
Suggested Reading: DDIA Ch. 1 DS4 Ch. 1 Optional Readings: Dean & Barroso — The Tail at Scale |
HW1 Out |
|
Thu 03/27 |
Processes, Threads, and RPC Slides & Discussion |
Required Reading: DS4 3.1, DDIA Ch. 4 Graduate Reading: Waldo et al. — A Note on Distributed Computing Optional Readings: DS4 Ch. 3.2-3.6 Birrell & Nelson — Implementing Remote Procedure Calls |
|
| 2 |
Tue 03/31 |
Logical Time and Coordination Slides & Discussion |
Required Reading: DDIA Ch. 8 §"Unreliable Clocks" DS4 Ch. 5.2 Graduate Reading: Lamport — Time, Clocks, and the Ordering of Events Optional Readings: DDIA Ch. 9 §"Ordering Guarantees" Mattern — Virtual Time and Global States |
|
| 2 |
Thu 04/02 |
Failures and Fault Models Slides & Discussion |
Required Reading: DDIA Ch. 8 Graduate Reading: Chandra & Toueg — Unreliable Failure Detectors Optional Readings: Fischer, Lynch, Paterson — FLP Impossibility Lamport et al. - The Byzantine Generals Problem Hayashibara et al. - The ϕ Accrual Failure Detector |
|
| 2 |
Fri 04/03 |
|
|
HW1 Due |
| 3 |
Tue 04/07 |
Replication Slides & Discussion |
Required Reading: DDIA Ch. 5 Graduate Reading: van Renesse & Schneider — Chain Replication Optional Readings: Oki & Liskov — Viewstamped Replication Terrace & Freedman — Object Storage on CRAQ |
|
| 3 |
Thu 04/09 |
Partitioning Slides & Discussion |
Required Reading: DDIA Ch. 6 Graduate Reading: DeCandia et al. — Dynamo: Amazon's Highly Available Key-Value Store Optional Readings: Chang et al. — Bigtable |
HW2 Out |
| 4 |
Tue 04/14 |
Consistency Models Slides & Discussion |
Required Reading: DDIA Ch. 9 p321–352 Graduate Reading: Herlihy & Wing — Linearizability (§1-3 Only, Proofs Optional) Optional Readings: Gilbert & Lynch — Brewer's Conjecture and the CAP Theorem Terry et al. — Session Guarantees for Weakly Consistent Replicated Data Vogels - Eventually Consistent |
|
| 4 |
Thu 04/16 |
Consensus I (Paxos) Slides & Discussion |
Required Reading: DS4 §8.2.4 Graduate Reading: Lamport — Paxos Made Simple Optional Readings: Lamport — The Part-Time Parliament Chandra et al. — Paxos Made Live |
|
| 4 |
Sun 04/19 |
|
|
HW2 Due |
| 5 |
Tue 04/21 |
Consensus II (Raft) Slides & Discussion |
Required Reading: DDIA Ch. 9 §"Fault-Tolerant Consensus" (p364-369) Graduate Reading: Ongaro & Ousterhout — In Search of an Understandable Consensus Algorithm (Raft) Optional Readings: Howard et al. — Flexible Paxos: Quorum Intersection Revisited |
|
| 5 |
Thu 04/23 |
Distributed Transactions Slides & Discussion |
Required Reading: DDIA Ch. 7 §"The Slippery Concept of a Transaction" (p221-228) DDIA Ch. 9 §"Distributed Transactions and Consensus (upto "Fault Tolerant Consensus") (p352-360) Graduate Reading: Gray & Lamport — Consensus on Transaction Commit (§1-5, proofs optional) Optional Readings: Helland — Life Beyond Distributed Transactions: An Apostate's Opinion |
HW3 Out |
| 6 |
Tue 04/28 |
Distributed File Systems |
Required Reading: DDIA Ch. 10 §"MapReduce and Distributed Filesystems" Graduate Reading: Ghemawat et al. — The Google File System Optional Readings: Shvachko et al. — The Hadoop Distributed File System Weil et al. — Ceph: A Scalable, High-Performance Distributed File System |
|
| 6 |
Thu 04/30 |
Coordination Services |
Required Reading: DS4 §5.3.6 DDIA Ch. 6 §"Request Routing" Graduate Reading: Hunt et al. — ZooKeeper: Wait-free Coordination for Internet-scale Systems Optional Readings: Burrows — The Chubby Lock Service |
|
| 7 |
Tue 05/05 |
Global Distributed Databases |
Required Reading: DDIA Ch 7. §"Snapshot Isolation and Repeatable Read" (p237-239) Ch. 8 §"Synchronized Clocks for Global Snapshots (p294)" Graduate Reading: Corbett et al. — Spanner: Google's Globally Distributed Database Optional Readings: Bacon et al. — Spanner: Becoming a SQL System Kulkarni et al. — Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases |
|
| 7 |
Thu 05/07 |
Cluster Management and Orchestration |
Required Reading DS4 §3.2.2-3.2.3 Graduate Reading: Verma et al. — Large-scale Cluster Management at Google with Borg Optional Readings: Burns et al. — Borg, Omega, and Kubernetes Hindman et al. — Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center |
|
| 7 |
Fri 05/08 |
|
|
HW3 Due |
| 8 |
Tue 05/12 |
Distributed Computation Frameworks |
Required Reading DDIA Ch.10 Graduate Reading: Dean & Ghemawat — MapReduce: Simplified Data Processing on Large Clusters Optional Readings: Zaharia et al. — Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing Abadi et al. — TensorFlow: A System for Large-Scale Machine Learning |
HW 4 Out |
| 8 |
Thu 05/14 |
Stream Processing |
Required Reading: DDIA Ch. 11 Graduate Reading: Chandy & Lamport — Distributed Snapshots: Determining Global States of Distributed Systems Optional Readings: Akidau et al. — The Dataflow Model Kreps et al. — Kafka: A Distributed Messaging System for Log Processing Carbone et al. — Apache Flink: Stream and Batch Processing in a Single Engine |
|
| 9 |
Tue 05/19 |
Distributed AI Infrastructure |
Required and Graduate Reading: Kwon et al. — Efficient Memory Management for Large Language Model Serving with PagedAttention Optional Readings: Dean et al. — Large Scale Distributed Deep Networks Shoeybi et al. — Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism |
|
| 9 |
Thu 05/21 |
Wrap-up & Review |
|
|
| 10 |
Fri 05/22 |
|
|
HW4 Due |
| Finals Week |
Fri 05/29 |
Final Exam |
|
|