debugging
-
Reading Group. How to fight production incidents?: an empirical study on a large-scale cloud service
In the 125th reading group meeting, we looked at the reliability of cloud services. In particular, we read the “How to fight production incidents?: an empirical study on a large-scale cloud service” SoCC’22 paper by Supriyo Ghosh, Manish Shetty, Chetan Bansal, and Suman Nath. This paper looks at 152 severe production incidents in the Microsoft…
-
Reading Group. Distributed Snapshots: Determining Global States of Distributed Systems
On Wednesday we kicked off a new set of papers in the reading group. We have started with one of the classical foundational papers in distributed systems and looked at the Chandy-Lamport token-based distributed snapshot algorithm. The basic idea here is to capture the state of distributed processes and channels by “flushing” the messages out…
Search
Recent Posts
- Fall 2024 Reading Group Papers (Papers ##181-190)
- Pile of Eternal Rejections: Revisiting Mencius SMR
- System’s Guy Teaching Game Development…
- Summer 2024 Reading Group Papers (Papers ##171-180)
- Pile of Eternal Rejections: The Cost of Garbage Collection for State Machine Replication
Categories
- One Page Summary (10)
- Other Thoughts (10)
- Paper Review and Summary (14)
- Pile of Eternal Rejections (2)
- Playing Around (14)
- Reading Group (98)
- RG Special Session (4)
- Teaching (2)