Reading Group

Welcome to the DistSys Reading Group! Every week we present and discuss one distributed systems paper. We try to focus on relatively new papers, although we occasionally break this rule for some important older publications. The main objective of this group is to share knowledge through discussion. Our participants come from academia and industry and often carry a unique perspective and expertise on the subject matter.

Format

We start each meeting with a short presentation of the paper by one of the group members. We record the presentation and later upload it to YouTube for the general audience. After the presentation, we move into a group discussion of the paper. This part is not on the record to make sure we can speak freely about the topic and the paper. However, I write a moderated discussion summary for each meeting and post it here. All the summaries are available via the “Summary” link next to the paper title. To see the archive of past meetings, scroll down to the “Past Meetings” section below.

Meeting Info

Current Schedule (Papers ##131-140)

Below is a list of papers for the fall term of the distributed systems reading group.

  • Transactions Make Debugging Easy [CIDR’23]
    • Authors: Qian Li, Peter Kraft, Michael Cafarella, Çağatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, and Matei Zaharia
    • What: Everything is a database transaction, including debugging
    • When: April 12th
  • Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems [FAST’23]
    • Authors: Ruiming Lu, Erci Xu, Yiming Zhang, Xiamen University; Fengyi Zhu, Zhaosheng Zhu, Mengtian Wang, Zongpeng Zhu, Guangtao Xue, Jiwu Shu, Minglu Li, Jiesheng Wu
    • What: Detecting fail-slow failures (i.e., machine/node slowdowns) in storage systems
    • When: April 19th
  • SelfTune: Tuning Cluster Managers [NSDI’23]
    • Authors: Ajaykrishna Karthikeyan, Nagarajan Natarajan, Gagan Somashekar, Lei Zhao, Ranjita Bhagwan, Rodrigo Fonseca, Tatiana Racheva, Yogesh Bansal
    • What: Automatically tuning cluster manager parameters/setting for ever-changing cluster state.
    • When: May 3rd

Past Meetings

Past Special Sessions

  1. Building Distributed Systems With StaterightMarch 30th @ 1pm EST – Jon Nadal.
  2. Distributed Transactions in YugabyteDBMay 11th @12pm EST – Karthik Ranganathan.
  3. Fast General Purpose Transactions in Apache Cassandra – February 9thth @ 2 pm EST – Benedict Elliott Smith
  4. Scalability and Fault Tolerance in YDBAugust 10th @ 2pm EST – Andrey Fomichev

Recent Reading Group Meetings

Reading Group Paper. Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate

In this week’s reading group, we discussed the “Take Out the TraChe: Maximizing (Tra)nsactional Ca(che) Hit Rate” OSDI’23 paper by Audrey Cheng, David Chu, Terrance Li, Jason Chan, Natacha Crooks, Joseph M. Hellerstein, Ion Stoica, Xiangyao Yu. This paper argues against optimizing for object hit rate in caches for transactional databases. The main logic behind […]

Reading Group Paper. Aggregate VM: Why Reduce or Evict VM’s Resources When You Can Borrow Them From Other Nodes?

In our recent reading group meeting, we discussed “Aggregate VM: Why Reduce or Evict VM’s Resources When You Can Borrow Them From Other Nodes?” by Ho-Ren Chuang, Karim Manaouil, Tong Xing, Antonio Barbalace, Pierre Olivier, Balvansh Heerekar, Binoy Ravindran. This EuroSys’23 paper introduces the concept of Aggregate VM to allow the pooling of small unused […]

Reading Group Paper: Hyrax: Fail-in-Place Server Operation in Cloud Platforms

In the 142nd reading group meeting, we discussed “Hyrax: Fail-in-Place Server Operation in Cloud Platforms” OSDI’23 paper. Hyrax allows servers with certain types of hardware failures to return to service after some software-only automated mitigation steps. Traditionally, when a server malfunctions, the VMs are migrated off of it, then the server gets shut down and […]

Fall 2023 Reading Group Papers (##141-150)

The schedule is also in our Google Calendar. Omni-Paxos: Breaking the Barriers of Partial Connectivity (EuroSys’23) Authors: Harald Ng, Seif Haridi, Paris Carbone What: Multi-Paxos flavor with better tolerance for partial network partitions When: August 30th Hyrax: Fail-in-Place Server Operation in Cloud Platforms (OSDI’23) Authors: Jialun Lyu, Marisa You, Celine Irvene, Mark Jung, Tyler Narmore, […]

Spring Term Reading Group Papers: ##131-140

A new set of papers for spring and early summer! Transactions Make Debugging Easy [CIDR’23] Authors: Qian Li, Peter Kraft, Michael Cafarella, Çağatay Demiralp, Goetz Graefe, Christos Kozyrakis, Michael Stonebraker, Lalith Suresh, and Matei Zaharia What: Everything is a database transaction, including debugging When: April 12th Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems […]

Reading Group. DeepScaling: microservices autoscaling for stable CPU utilization in large scale cloud systems

In the 127th meeting, we discussed the “DeepScaling: microservices autoscaling for stable CPU utilization in large scale cloud systems” SoCC’22 paper by Ziliang Wang, Shiyi Zhu, Jianguo Li, Wei Jiang, K. K. Ramakrishnan, Yangfei Zheng, Meng Yan, Xiaohong Zhang, Alex X. Liu. This paper argues that current Autoscaling solutions for Microservice applications are lacking in […]

Reading Group. How to fight production incidents?: an empirical study on a large-scale cloud service

In the 125th reading group meeting, we looked at the reliability of cloud services. In particular, we read the “How to fight production incidents?: an empirical study on a large-scale cloud service” SoCC’22 paper by Supriyo Ghosh, Manish Shetty, Chetan Bansal, and Suman Nath. This paper looks at 152 severe production incidents in the Microsoft […]

Reading Group. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

In the 120th DistSys meeting, we talked about “Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service” ATC’22 paper by Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, Akshat Vig. […]

Reading Group. The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

In the 122nd reading group meeting, we read “The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation” paper by Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, Walid G. Aref. This paper looks at the trend of resource disaggregation in the cloud and asks whether distributed shared memory databases (DSM-DBs) can benefit from […]