Reading Group Paper ##111-120

Metastable Failures in the Wild [OSDI’22]

  • Authors: Lexiang Huang, Matthew Magnusson, Abishek Bangalore Muralikrishna, Salman Estyak, Rebecca Isaacs, Abutalib Aghayev, Timothy Zhu, Aleksey Charapko
  • What: An exploration of many significant outages that illustrates a new class of “self-feeding” failures.
  • Blog: Metastable Failures in the Wild

Enabling the Next Generation of Multi-Region Applications with CockroachDB [SIGMOD’22]

  • Authors: Nathan VanBenschoten, Arul Ajmani, Marcus Gartner, Andrei Matei, Aayush Shah, Irfan Sharif, Alexander Shraer, Adam Storm, Rebecca Taft, Oliver Tan, Andy Woods, Peyton Walters
  • What: Global distribution in multi-region SQL database.

High Throughput Replication with Integrated Membership Management [ATC’22]

  • Authors: Pedro Fouto, Nuno Preguiça, João Leitão
  • What: Paxos over a chain replication

Debugging the OmniTable Way [OSDI’22]

  • Authors: Andrew Quinn, Jason Flinn, Michael Cafarella, Baris Kasikci
  • What: Debugging applications through the queriable history of application states.

Understanding and Detecting Software Upgrade Failures in Distributed Systems [SOSP’21]

  • Authors: Yongle Zhang, Junwen Yang, Zhuqi Jin, Utsav Sethi, Kirk Rodrigues, Shan Lu, Ding Yuan
  • What: Study of real-world upgrade failures and lessons learned from these failures.

Skeena: Efficient and Consistent Cross-Engine Transactions [SIGMOD’22]

  • Authors: Jianqiu Zhang, Kaisong Huang, Tianzheng Wang, King Lv
  • What: Transactions across database engines with various isolation levels.

RRC: Responsive Replicated Containers [ATC’22]

  • Authors: Diyu Zhou, Yuval Tamir
  • What: Container replication using periodic checkpoints and replay.

TAOBench: An End-to-End Benchmark for Social Network Workloads [VLDB]

  • Authors: Audrey Cheng, Xiao Shi, Aaron Kabcenell, Shilpa Lawande, Hamza Qadeer, Jason Chan, Harrison Tin, Ryan Zhao, Peter Bailis, Mahesh Balakrishnan, Nathan Bronson, Natacha Crooks, Ion Stoica
  • What: See the paper’s title!

Cancellation in Systems: An Empirical Study of Task Cancellation Patterns and Failures [OSDI’22]

  • Authors: Utsav Sethi, Haochen Pan, Shan Lu, Madanlal Musuvathi, Suman Nath
  • What: Study of task cancelation features and bugs across various applications and languages to find common issues with task cancelations.

Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service [ATC’22]

  • Authors: Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, Akshat Vig
  • What: History and overview of Amazon’s scalable cloud database.