Reading Group

Welcome to the DistSys Reading Group! Every week we present and discuss one distributed systems paper. We try to focus on relatively new papers, although we occasionally break this rule for some important older publications. The main objective of this group is to share knowledge through the discussion. Our participants come from academia and industry and often carry a unique perspective and expertise on the subject matter.


We start each meeting with a short presentation of the paper by one of the group members. We record the presentation and later upload it to YouTube for the general audience. After the presentation, we move into a group discussion of the paper. This part is not on the record to make sure we can speak freely about the topic and the paper. However, I write a moderated discussion summary for each meeting and post it here. All the summaries are available via the “Summary” link next to the paper title. For the archive of the summaries, navigate to the “Past Meetings” section below.

Meeting Info

Current Schedule (Papers ##51-60)

  1. Distributed Snapshots: Determining Global States of a Distributed System – April 7th – :classical_building: Classical/Foundation Paper
  2. Facebook’s Tectonic Filesystem: Efficiency from Exascale – April 14th
  3. New Directions in Cloud Programming – April 21st
  4. Paxos vs Raft: Have we reached consensus on distributed consensus? – April 28th
  5. Protocol-Aware Recovery for Consensus-Based Storage – May 5th
  6. chainifyDB: How to get rid of your Blockchain and use your DBMS instead – May 12th
  7. XFT: practical fault tolerance beyond crashes – May 19
  8. Cerebro: A Layered Data Platform for Scalable Deep Learning – May 26th
  9. Multitenancy for Fast and Programmable Networks in the Cloud – June 2nd
  10. Exploiting Symbolic Execution to Accelerate Deterministic Databases – June 9th

Past Meetings

Special Sessions

  1. Building Distributed Systems With StaterightMarch 30th @ 1pm EST – Jon Nadal. [YouTube]

Recent Reading Group Meetings

Reading Group. Protean: VM Allocation Service at Scale

The last paper in our reading group was “Protean: VM Allocation Service at Scale.” This paper from Microsoft is full of technical insights into how they operate their datacenters/regions at scale. In particular, the paper discusses one of the fundamental components of any cloud provider — the VM service. The system, called Protean, is an […]

Reading Group. Sundial: Fault-tolerant Clock Synchronization for Datacenters

In our 48th reading group meeting, we talked about time synchronization in distributed systems. More specifically, we discussed the poor state of time sync, the reasons for it, and most importantly, the solutions, as outline in the “Sundial: Fault-tolerant Clock Synchronization for Datacenters” OSDI’20 paper. We had a comprehensive presentation by Murat Demirbas. Murat’s talk […]

Reading Group Special Session: Building Distributed Systems With Stateright

Stateright is a software framework for analyzing and systematically verifying distributed systems. Its name refers to its goal of verifying that a system’s collective state always satisfies a correctness specification, such as “operations invoked against the system are always linearizable.” Cloud service providers like AWS and Azure leverage verification software such as the TLA+ model […]

Reading Group. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads

On Wednesday we were discussing scheduling in large distributed ML/AI systems. Our main paper was the “Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads.” one from OSDI’20. However, it was a bit outside of our group’s comfort zone (outside of my comfort zone for sure). Luckily we had an extensive presentation with a complete background […]

Reading Group. FlightTracker: Consistency across Read-Optimized Online Stores at Facebook

Last DistSys Reading Group we have discussed “FlightTracker: Consistency across Read-Optimized Online Stores at Facebook.” This paper is about consistency in Facebook’s TAO caching stack. TAO is a large social graph storage system composed of many caches, indexes, and persistent storage backends. The sheer size of Facebook and TAO makes it difficult to enforce meaningful […]

Reading Group Paper List. Papers ##51-60.

With just four more papers to go in the DistSys Reading Group’s current batch, it is time to get the next set going. This round, we will have 10 papers that should last till the end of the spring semester. Our last batch was all about OSDI’20 papers, and this time around we will mix […]

Reading Group. Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories

Hard to imagine, but the reading group just completed the 45th session. We discussed “Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories,” again from OSDI’20. Pegasus is one of these systems that are very obvious in the hindsight. However, this “obviousness” is deceptive — Dan Ports, one of the authors behind the […]

Reading Group. Performance-Optimal Read-Only Transactions

Last meeting we looked at “Performance-Optimal Read-Only Transactions” from OSDI’20. This paper covers important topics of transactional reads in database/data-management systems. In particular, the paper discusses “one-shot” read-only transactions that complete in 1 network round-trip-time (RTT) without blocking and bloated and expensive messages. If this sounds too good to be true, it is. Before presenting […]