• Keep The Data Where You Use It


    Placeholder Icon

    As trivial as it sounds, but keeping the data close to where it is consumed can drastically improve the performance of the large globe-spanning cloud applications, such as social networks, ecommerce and IoT. These applications rely on some database systems to make sure that all the data can be accessed quickly. The de facto method…

    Read More

  • Looking at State and Operational Consistency



    Placeholder Icon

    Recently I rediscovered the “The many faces of consistency” paper by Marcos Aguilera and Doug Terry. When I first read the paper two years ago, I largely dismissed it as trivial, and, oh boy, now I realized how wrong I was at that time.  It is easy to read for sure, and may appear as…

    Read More

  • Python, Numpy and a Programmer Error: Story of a Bizarre Bug


    Placeholder Icon

    While recently working on my performance analysis for Paxos-style protocols, I uncovered some weird quirks about python and numpy. Ultimately, the problem was with my code, however the symptoms of the issue looked extremely bizarre at first. Modeling WPaxos required doing a series of computations with numpy. In each step, I used numpy to do…

    Read More

  • One Page Summary: “PaxosStore: High-availability Storage Made Practical in WeChat”


    Placeholder Icon

    PaxosStore paper, published in VLDB 2017, describes the large scale, multi-datacenter storage system used in WeChat. As the name may suggest, it uses Paxos to provide storage consistency. The system claims to provide storage for many components of the WeChat application, with 1.5TB of traffic per day and tens of thousands of queries per second…

    Read More

  • Modeling Paxos Performance in Wide Area – Part 3


    Placeholder Icon

    Earlier I looked at modeling paxos performance in local networks, however nowadays people (companies) use paxos and its flavors in the wide area as well. Take Google Spanner and CockroachDB as an example. I was naturally curious to expand my performance model into wide area networks as well. Since our lab worked on WAN coordination…

    Read More

  • Modeling Paxos Performance – Part 2


    Placeholder Icon

    In the previous posts I started to explore node-scalability of paxos-style protocols. In this post I will look at processing overheads that I estimate with the help of a queue or a processing pipeline. I show how these overheads cap the performance and affect the latency at different cluster loads. I look at the scalability for…

    Read More

  • Paxos Performance Modeling – Part 1.5


    Placeholder Icon

    This post is a quick update/conclusion to the part 1. So, does the network variations make any impact at all? In the earlier simulation I showed some small performance degradation going from 3 to 5 nodes. The reality is that for paxos, network behavior makes very little difference on scalability, and in some cases no difference at…

    Read More

  • Do not Blame (only) Network for Your Paxos Scalability Issues. (PPM Part 1)


    Placeholder Icon

    In the past few months our lab has been doing a lot of work with different flavors of paxos consensus algorithm. Paxos and its numerous flavors are widely used in today’s cloud infrastructure. Distributed systems rely on it for many different tasks to ensure safe operation. For instance, coordination services use some consensus protocol flavor…

    Read More

  • Trace Synchronization with HLC


    Placeholder Icon

    Event logging or tracing is one of the most common techniques for collecting data about the software execution. For simple application running on the same machine, a trace of events timestamped with the machine’s hardware clock is typically sufficient. When the system grows and becomes distributed over multiple nodes, each node is going to produce…

    Read More

  • One Page Summary: Flease – Lease Coordination without a Lock Server


    Placeholder Icon

    This paper talks about a decentralized lease management solution. In the past, many lock/lease services have been centralized, placing a single authority to manage all locks in the system. Google’s Chubby, Apache ZooKeeper, etcd, and others rely on a centralized approach and backed by some flavor of a consensus algorithm for fault-tolerance. According to Flease authors,…

    Read More

Aleksey CharapkoI am an assistant professor of computer science at the University of New Hampshire. My research interests lie in distributed systems, distributed consensus, fault tolerance, reliability, and scalability.
X (twitter)@AlekseyCharapko
