monitoring
-
One Page Summary: “milliScope: a Fine-Grained Monitoring Framework for Performance Debugging of n-Tier Web Services”
Authors of the ICDCS2017 milliScope paper attack an interesting monitoring problem for distributed systems: detecting and determining a cause of short-lived events in the system. In particular, they address the issue of identifying very short bottlenecks (VSBs) in distributed web services. VSBs manifest themselves as performance degradation of a small number of requests, however they…
-
Retroscoping Zookeeper Staleness
ZooKeeper is a popular coordination service used as part of many large scale distributed systems. ZooKeeper provides a file-system inspired abstraction to the users on top of its replicated key-value store. Like other Paxos-inspired protocols, ZooKeeper is typically deployed on at least 3 nodes, and can tolerate F node failure for a cluster of size…
-
Gorilla – Facebook’s Cache for Time Series Data
Facebook operates a huge infrastructure that needs to be constantly monitored for performance and stability. Such monitoring collects huge amounts of data that must be easily accessible to various diagnosis and anomaly detection tools in order to quickly identify and react to possible issues. Many of such parameters can be represented as real-valued time series.…
-
Pivot Tracing Part 2
After looking more at Pivot Tracing tool described in my earlier post, I asked myself about the limitations of such monitoring approach. Pivot tracing is not a universal tool, so it appears that there are few problems it does not address well enough. The basic idea of the Pivot Tracing is to collect the information…
-
Review – Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems
Debugging can be a nightmare for software engineers, it is even more so in the distributed systems that span many machines in potentially more than one datacenter. Unfortunately, many of the debugging and monitoring techniques for such large system do not differ much from the methods used to debug and monitor simple single-machine software. Logs…
Search
Recent Posts
- Fall 2024 Reading Group Papers (Papers ##181-190)
- Pile of Eternal Rejections: Revisiting Mencius SMR
- System’s Guy Teaching Game Development…
- Summer 2024 Reading Group Papers (Papers ##171-180)
- Pile of Eternal Rejections: The Cost of Garbage Collection for State Machine Replication
Categories
- One Page Summary (10)
- Other Thoughts (10)
- Paper Review and Summary (14)
- Pile of Eternal Rejections (2)
- Playing Around (14)
- Reading Group (98)
- RG Special Session (4)
- Teaching (2)