Aleksey Charapko

One Page Summary: “Musketeer: all for one, one for all in data processing systems”.

One Page Summary

Aleksey Charapko

·

Mar 12, 2017

Many distributed computation platforms and programming frameworks exist today, and new ones constantly popping out from the industry and academia. Some platforms are domain specific, such as TensorFlow for machine learning. Others, like Hadoop and Naiad are more general, and this generality allows for sophisticated and specialized programming abstractions to be built on top. So…
Read More
One Page Summary: “Slicer: Auto-Sharding for Datacenter Applications”

One Page Summary

Aleksey Charapko

·

Mar 8, 2017

One of the questions engineers of large distributed system must answer is “where to compute”. This is a big and important question, as we do not want to send a request originating in the US to some server in Australia. It simply makes no sense to incur the communication overhead if there are resources available…
Read More
Monitoring with Retroscope: Detecting Invariant Violations

Playing Around

Aleksey Charapko

·

Feb 24, 2017

Earlier I briefly mentioned Retroscope, our distributed snapshot library that makes taking non-blocking, unplanned consistent global distributed snapshots possible. However, these snapshots are only good if we know how to use them well. Of course the most obvious use case is just a data backup, and despite it being an important application for snapshots, I…
Read More
One Page Summary: Incremental, Iterative Processing with Timely Dataflow

One Page Summary

Aleksey Charapko

·

Feb 13, 2017

This paper describes Naiad distributed computation system. Naiad uses dataflow model to represent the computations, but it aims to be a general dataflow framework in contrast to other specialized approaches such as TensorFlow. Similarly to other dataflow systems, the computations are represented as graphs, where vertices represent data and operations and edges carry the data…
Read More
Is Java Fast Enough for Distributed Applications?

Paper Review and Summary

Aleksey Charapko

·

Feb 9, 2017

Lots of modern distributed systems are built with Java programming language, and consequently use Java Virtual Machine (JVM) as their execution environment. The list of such systems is rather large: Hadoop, Spark, HBase, Cassandra, Voldemort, ZooKeeper, BookKeeper, Kafka, and the list goes on and on. But is JVM fast enough for these systems? Anyone who…
Read More
Globally Consistent Distributed Snapshots with Retroscope

Playing Around

Aleksey Charapko

·

Feb 8, 2017

Taking a consistent snapshot of a distributed system is no trivial task for the reasons of asynchrony between the nodes in the system. As the state of each machine changes in response to incoming external messages or internal events, each node may produce a log of such state changes. With the log abstraction, the problem…
Read More
Gorilla – Facebook’s Cache for Time Series Data

Paper Review and Summary

Aleksey Charapko

·

Jan 11, 2017

Facebook operates a huge infrastructure that needs to be constantly monitored for performance and stability. Such monitoring collects huge amounts of data that must be easily accessible to various diagnosis and anomaly detection tools in order to quickly identify and react to possible issues. Many of such parameters can be represented as real-valued time series.…
Read More
The Light of Voldemort

Playing Around

Aleksey Charapko

·

Dec 19, 2016

Few month ago I showcased how a single server of Voldemort key-value store sounds. Sonification is a valid way to monitor systems, and has been used a lot in real applications. Geiger counter would be one of the most well-known examples of a sonified application. In some cases sonification may be the preferred form of…
Read More
Pivot Tracing Part 2

Paper Review and Summary

Aleksey Charapko

·

Oct 16, 2016

After looking more at Pivot Tracing tool described in my earlier post, I asked myself about the limitations of such monitoring approach. Pivot tracing is not a universal tool, so it appears that there are few problems it does not address well enough. The basic idea of the Pivot Tracing is to collect the information…
Read More
The Sound of Voldemort

Playing Around

Aleksey Charapko

·

Sep 4, 2016

Recently at our lab we discussed a fun little project of making distributed systems “play” music. The idea of sonifying a distributed application can be of some benefit for debugging and maintenance, since people have natural ability in recognizing patterns. Of course developer or systems administrators can analyze the logs of their systems and study the…
Read More

Aleksey Charapko

Aleksey Charapko

One Page Summary: “Musketeer: all for one, one for all in data processing systems”.

One Page Summary: “Slicer: Auto-Sharding for Datacenter Applications”

Monitoring with Retroscope: Detecting Invariant Violations

One Page Summary: Incremental, Iterative Processing with Timely Dataflow

Is Java Fast Enough for Distributed Applications?

Globally Consistent Distributed Snapshots with Retroscope

Gorilla – Facebook’s Cache for Time Series Data

The Light of Voldemort

Pivot Tracing Part 2

The Sound of Voldemort

Search

Recent Posts

Categories