Retroscope

  • Reading Group. Aragog: Scalable Runtime Verification of Shardable Networked Systems

    ·

    Placeholder Icon

    We have covered 50 papers in the reading group so far! This week we looked at the “Aragog: Scalable Runtime Verification of Shardable Networked Systems” from OSDI’20. This paper discusses the problem of verifying the network functions (NFs), such as NAT Gateways or firewalls at the runtime. The problem is quite challenging due to its…

    Read More

  • Sonification of Distributed Systems with RQL

    ·

    Placeholder Icon

    In the past, I have discussed sonification as a mean of representing monitoring data. Aside from some silly and toy examples, sonifications can be used for serious applications. In many monitoring cases, the presence of some phenomena is more important than the details about it. In such situations, simple sonification is a perfect way to…

    Read More

  • Retroscoping Zookeeper Staleness

    ·

    Placeholder Icon

    ZooKeeper is a popular coordination service used as part of many large scale distributed systems. ZooKeeper provides a file-system inspired abstraction to the users on top of its replicated key-value store. Like other Paxos-inspired protocols, ZooKeeper is typically deployed on at least 3 nodes, and can tolerate F node failure for a cluster of size…

    Read More

  • Monitoring with Retroscope: Detecting Invariant Violations

    ·

    Placeholder Icon

    Earlier I briefly mentioned Retroscope, our distributed snapshot library that makes taking non-blocking, unplanned consistent global distributed snapshots possible. However, these snapshots are only good if we know how to use them well. Of course the most obvious use case is just a data backup, and despite it being an important application for snapshots, I…

    Read More

  • Globally Consistent Distributed Snapshots with Retroscope

    ·

    Placeholder Icon

    Taking a consistent snapshot of a distributed system is no trivial task for the reasons of asynchrony between the nodes in the system. As the state of each machine changes in response to incoming external messages or internal events, each node may produce a log of such state changes. With the log abstraction, the problem…

    Read More