Reading Group

Every week we present and discuss one relatively recent distributed systems paper. The goal of the reading group is to discuss the papers and share the knowledge. Our participants come from academia and industry and often carry a unique perspective and knowledge on the subject matter. The reading group enables sharing this knowledge and expertise.


We start each meeting with a short presentation of the paper by one of the participants. We record the presentation and later upload it to YouTube. After the presentation, we move into a group discussion of the paper. This part is not on the record to make sure we can speak freely about the paper. However, I try to write a discussion summary for each meeting and post it here.

Meeting Info

Current Paper Schedule

  1. Fault-tolerant and transactional stateful serverless workflows – December 23rd [YouTube]
  2. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning – January 6th [Summary] [YouTube]
  3. hXDP: Efficient Software Packet Processing on FPGA NICs – January 13th – [Summary] [YouTube]
  4. Virtual Consensus in Delos – January 20th [Summary] [YouTube]
  5. A large scale analysis of hundreds of in-memory cache clusters at Twitter – January 27th [Summary] [YouTube]
  6. Cobra: Making Transactional Key-Value Stores Verifiably Serializable – February 3rd [Summary][YouTube]
  7. Microsecond Consensus for Microsecond Applications  – February 10th [Summary][YouTube]
  8. Performance-Optimal Read-Only Transactions – February 17th [Summary][YouTube]
  9. Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories – February 24th [Summary][YouTube]
  10. FlightTracker: Consistency across Read-Optimized Online Stores at Facebook – March 3rd [Summary][YouTube]
  11. Heterogeneity-Aware Cluster Scheduling Policies for Deep Learning Workloads – March 10th
  12. Sundial: Fault-tolerant Clock Synchronization for Datacenters – March 17th
  13. Protean: VM Allocation Service at Scale – March 24th
  14. Aragog: Scalable Runtime Verification of Shardable Networked Systems – March 31st

Future Papers (##51-60)

  1. Distributed Snapshots: Determining Global States of a Distributed System – April 7th – :classical_building: Classical/Foundation Paper
  2. Facebook’s Tectonic Filesystem: Efficiency from Exascale – April 14th
  3. New Directions in Cloud Programming – April 21st
  4. Paxos vs Raft: Have we reached consensus on distributed consensus? – April 28th
  5. Protocol-Aware Recovery for Consensus-Based Storage – May 5th
  6. chainifyDB: How to get rid of your Blockchain and use your DBMS instead – May 12th
  7. XFT: practical fault tolerance beyond crashes – May 19
  8. Cerebro: A Layered Data Platform for Scalable Deep Learning – May 26th
  9. Multitenancy for Fast and Programmable Networks in the Cloud – June 2nd
  10. Exploiting Symbolic Execution to Accelerate Deterministic Databases – June 9th

Recent Reading Group Meetings

Reading Group Paper List. Papers ##51-60.

With just four more papers to go in the DistSys Reading Group’s current batch, it is time to get the next set going. This round, we will have 10 papers that should last till the end of the spring semester. Our last batch was all about OSDI’20 papers, and this time around we will mix […]

Reading Group. Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories

Hard to imagine, but the reading group just completed the 45th session. We discussed “Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories,” again from OSDI’20. Pegasus is one of these systems that are very obvious in the hindsight. However, this “obviousness” is deceptive — Dan Ports, one of the authors behind the […]

Reading Group. Performance-Optimal Read-Only Transactions

Last meeting we looked at “Performance-Optimal Read-Only Transactions” from OSDI’20. This paper covers important topics of transactional reads in database/data-management systems. In particular, the paper discusses “one-shot” read-only transactions that complete in 1 network round-trip-time (RTT) without blocking and bloated and expensive messages. If this sounds too good to be true, it is. Before presenting […]

Reading Group. Cobra: Making Transactional Key-Value Stores Verifiably Serializable.

This Wednesday, we were talking about serializability checking of production databases. In particular, we looked at the recent OSDI’20 paper: “Cobra: Making Transactional Key-Value Stores Verifiably Serializable.” The paper explores the problem of verifying serializability in a black-box production system from a client point of view. This makes sense as serializability is an operational, client-observable […]

Reading Group. Virtual Consensus in Delos

We are continuing through the OSDI 2020 paper list in our reading group. This time we have discussed “Virtual Consensus in Delos,” a consensus paper (Delos is yet another greek island to continue the consensus naming tradition). Delos relies on the log abstraction to keep track of all commands/operations and their order. Traditionally, some consensus […]

Reading Group. Toward a Generic Fault Tolerance Technique for Partial Network Partitioning

Short Summary We have resumed the distributed systems reading group after a short holiday break. Yesterday we discussed the “Toward a Generic Fault Tolerance Technique for Partial Network Partitioning” paper from OSDI 2020. The paper studies a particular type of network partitioning – partial network partitioning. Normally, we expect that every node can reach every […]

Reading Group. Near-Optimal Latency Versus Cost Tradeoffs in Geo-Distributed Storage

Short Summary Yesterday we discussed Pando, a geo-replication system achieving near-optimal latency-cost tradeoff in storage systems. Pando uses large Flexible Paxos deployments and erasure coding to do its magic. Pando relies on having many storage sites to locate sites closer to users. It then uses Flexible Paxos to optimize read and write quorums to have […]