monitoring

Trace Synchronization with HLC

Playing Around

Aleksey Charapko

·

Oct 18, 2017

Event logging or tracing is one of the most common techniques for collecting data about the software execution. For simple application running on the same machine, a trace of events timestamped with the machine’s hardware clock is typically sufficient. When the system grows and becomes distributed over multiple nodes, each node is going to produce…
Read More
One Page Summary: “milliScope: a Fine-Grained Monitoring Framework for Performance Debugging of n-Tier Web Services”

One Page Summary

Aleksey Charapko

·

Jun 19, 2017

Authors of the ICDCS2017 milliScope paper attack an interesting monitoring problem for distributed systems: detecting and determining a cause of short-lived events in the system. In particular, they address the issue of identifying very short bottlenecks (VSBs) in distributed web services. VSBs manifest themselves as performance degradation of a small number of requests, however they…
Read More
Sonification of Distributed Systems with RQL

Playing Around

Aleksey Charapko

·

Jun 12, 2017

In the past, I have discussed sonification as a mean of representing monitoring data. Aside from some silly and toy examples, sonifications can be used for serious applications. In many monitoring cases, the presence of some phenomena is more important than the details about it. In such situations, simple sonification is a perfect way to…
Read More
Retroscoping Zookeeper Staleness

Other Thoughts

Aleksey Charapko

·

Apr 24, 2017

ZooKeeper is a popular coordination service used as part of many large scale distributed systems. ZooKeeper provides a file-system inspired abstraction to the users on top of its replicated key-value store. Like other Paxos-inspired protocols, ZooKeeper is typically deployed on at least 3 nodes, and can tolerate F node failure for a cluster of size…
Read More
Monitoring with Retroscope: Detecting Invariant Violations

Playing Around

Aleksey Charapko

·

Feb 24, 2017

Earlier I briefly mentioned Retroscope, our distributed snapshot library that makes taking non-blocking, unplanned consistent global distributed snapshots possible. However, these snapshots are only good if we know how to use them well. Of course the most obvious use case is just a data backup, and despite it being an important application for snapshots, I…
Read More
Globally Consistent Distributed Snapshots with Retroscope

Playing Around

Aleksey Charapko

·

Feb 8, 2017

Taking a consistent snapshot of a distributed system is no trivial task for the reasons of asynchrony between the nodes in the system. As the state of each machine changes in response to incoming external messages or internal events, each node may produce a log of such state changes. With the log abstraction, the problem…
Read More
Gorilla – Facebook’s Cache for Time Series Data

Paper Review and Summary

Aleksey Charapko

·

Jan 11, 2017

Facebook operates a huge infrastructure that needs to be constantly monitored for performance and stability. Such monitoring collects huge amounts of data that must be easily accessible to various diagnosis and anomaly detection tools in order to quickly identify and react to possible issues. Many of such parameters can be represented as real-valued time series.…
Read More
The Light of Voldemort

Playing Around

Aleksey Charapko

·

Dec 19, 2016

Few month ago I showcased how a single server of Voldemort key-value store sounds. Sonification is a valid way to monitor systems, and has been used a lot in real applications. Geiger counter would be one of the most well-known examples of a sonified application. In some cases sonification may be the preferred form of…
Read More
Pivot Tracing Part 2

Paper Review and Summary

Aleksey Charapko

·

Oct 16, 2016

After looking more at Pivot Tracing tool described in my earlier post, I asked myself about the limitations of such monitoring approach. Pivot tracing is not a universal tool, so it appears that there are few problems it does not address well enough. The basic idea of the Pivot Tracing is to collect the information…
Read More
Review – Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

Paper Review and Summary

Aleksey Charapko

·

Jul 10, 2016

Debugging can be a nightmare for software engineers, it is even more so in the distributed systems that span many machines in potentially more than one datacenter. Unfortunately, many of the debugging and monitoring techniques for such large system do not differ much from the methods used to debug and monitor simple single-machine software. Logs…
Read More

Aleksey Charapko

monitoring

Trace Synchronization with HLC

One Page Summary: “milliScope: a Fine-Grained Monitoring Framework for Performance Debugging of n-Tier Web Services”

Sonification of Distributed Systems with RQL

Retroscoping Zookeeper Staleness

Monitoring with Retroscope: Detecting Invariant Violations

Globally Consistent Distributed Snapshots with Retroscope

Gorilla – Facebook’s Cache for Time Series Data

The Light of Voldemort

Pivot Tracing Part 2

Review – Pivot Tracing: Dynamic Causal Monitoring for Distributed Systems

Search

Recent Posts

Categories