• Reading Group. DeepScaling: microservices autoscaling for stable CPU utilization in large scale cloud systems

    ·

    Placeholder Icon

    In the 127th meeting, we discussed the “DeepScaling: microservices autoscaling for stable CPU utilization in large scale cloud systems” SoCC’22 paper by Ziliang Wang, Shiyi Zhu, Jianguo Li, Wei Jiang, K. K. Ramakrishnan, Yangfei Zheng, Meng Yan, Xiaohong Zhang, Alex X. Liu. This paper argues that current Autoscaling solutions for Microservice applications are lacking in…

    Read More

  • Reading Group. Method Overloading the Circuit

    ·

    Placeholder Icon

    In the 126th reading group meeting, we continued talking about the reliability of large distributed systems. This time, we read the “Method Overloading the Circuit” SoCC’22 paper by Christopher Meiklejohn, Lydia Stark, Cesare Celozzi, Matt Ranney, and Heather Miller. This paper does an excellent job summarizing a concept of a circuit breaker in microservice applications.…

    Read More

  • Reading Group. How to fight production incidents?: an empirical study on a large-scale cloud service

    ·

    Placeholder Icon

    In the 125th reading group meeting, we looked at the reliability of cloud services. In particular, we read the “How to fight production incidents?: an empirical study on a large-scale cloud service” SoCC’22 paper by Supriyo Ghosh, Manish Shetty, Chetan Bansal, and Suman Nath. This paper looks at 152 severe production incidents in the Microsoft…

    Read More

  • Reading Group. Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service

    ·

    Placeholder Icon

    In the 120th DistSys meeting, we talked about “Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service” ATC’22 paper by Mostafa Elhemali, Niall Gallagher, Nicholas Gordon, Joseph Idziorek, Richard Krog, Colin Lazier, Erben Mo, Akhilesh Mritunjai, Somu Perianayagam, Tim Rath, Swami Sivasubramanian, James Christopher Sorenson III, Sroaj Sosothikul, Doug Terry, Akshat Vig.…

    Read More

  • Reading Group. The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation

    ·

    Placeholder Icon

    In the 122nd reading group meeting, we read “The Case for Distributed Shared-Memory Databases with RDMA-Enabled Memory Disaggregation” paper by Ruihong Wang, Jianguo Wang, Stratos Idreos, M. Tamer Özsu, Walid G. Aref. This paper looks at the trend of resource disaggregation in the cloud and asks whether distributed shared memory databases (DSM-DBs) can benefit from…

    Read More

  • Reading Group. Not that Simple: Email Delivery in the 21st Century

    ·

    Placeholder Icon

    I haven’t been posting new reading group paper summaries lately, but I intend to fix that gap and resume writing these. Our 123rd paper was about email: “Not that Simple: Email Delivery in the 21st Century” by Florian Holzbauer, Johanna Ullrich, Martina Lindorfer, and Tobias Fiebig. This paper studies whether different emerging standards and technologies impact email delivery…

    Read More

  • Winter Term Reading Group Papers: ##121-130

    ·

    Placeholder Icon

    Our winter set of papers! The schedule is also in our Google Calendar. C5: Cloned Concurrency Control that Always Keeps Up [VLDB’23] Authors: Jeffrey Helt, Abhinav Sharma, Daniel J. Abadi, Wyatt Lloyd, Jose M. Faleiro What: Enabling a more concurrent execution of copied/replicated operations at the followers. When: December 14th The Case for Distributed Shared-Memory…

    Read More

  • Fall Term Reading Group Papers: ##111-120

    ·

    Placeholder Icon

    Below is a list of papers for the fall term of the distributed systems reading group. The list is also on the reading group’s Google Calendar. Metastable Failures in the Wild [OSDI’22] Authors: Lexiang Huang, Matthew Magnusson, Abishek Bangalore Muralikrishna, Salman Estyak, Rebecca Isaacs, Abutalib Aghayev, Timothy Zhu, Aleksey Charapko What: An exploration of many…

    Read More

  • Reading Group Special Session: Scalability and Fault Tolerance in YDB

    ·

    Placeholder Icon

    YDB is an open-source Distributed SQL Database. YDB is used as an OLTP Database for mission-critical user-facing applications. It provides strong consistency and serializable transaction isolation for the end user. One of the main characteristics of YDB is scalability to very large clusters together with multitenancy, i.e. ability to provide an isolated user environment for…

    Read More

  • Metastable Failures in the Wild

    ·

    Metastable failures in distributed systems are failures that “feed” and strengthen their own “failed” condition. The main characteristic of a metastable failure is a positive feedback loop that keeps the system in a degraded/failed state. These failures are hard to spot, as they always start with some other distraction — some trigger event that nudges…

    Read More

Aleksey CharapkoI am an assistant professor of computer science at the University of New Hampshire. My research interests lie in distributed systems, distributed consensus, fault tolerance, reliability, and scalability.
X (twitter)@AlekseyCharapko
emailaleksey.charapko@unh.edu

Search