In the past, I have discussed sonification as a mean of representing monitoring data. Aside from some silly and toy examples, sonifications can be used for serious applications. In many monitoring cases, the presence of some phenomena is more important than the details about it. In such situations, simple sonification is a perfect way to alert users about the occurrence of such phenomena. For example, Geiger counter alerts users of the radioactive decay, without providing any further details.
Our recent work on Retroscope and RQL allows us to bring sonification to the same “phenomena-awareness” plane for distributed systems. With RQL we can search for interesting conditions happening globally in the system, and we can use sounds to alert the engineers when certain predicates occur in the global state. Of course, for such system to be useful, it needs to be real-time, as the recency of events is the most important attribute of this type of sonification. For instance, users of a Geiger counter are not interested in hearing ticks for decay events happening a minute ago, as it rather defeats the purpose of the counter. Similar requirements apply to the “phenomena-awareness” tools in the distributed systems domain, as engineers need to know what happens in the system instantaneously.
RQL can easily allow inspecting past states, however it lacks when it comes to the current state inspection. What we need is a streaming query that can continuously receive individual logs from Retroscope log-servers, and perform the search as soon as sufficient data become available to form a consistent cut. Of course, streaming queries will still be lagging behind the current time, however we can make this lag small enough to be virtually unobservable by a person.
With streaming queries we can not only receive the cuts meeting some search criteria in near real time, but we can also sonify the mere fact such cuts have been found. If we look at the previous example of ZooKeeper staleness, we can run sonified streaming queries to have the system alert us when the data staleness reaches some threshold, such as two or more versions stale. In the audio clip below, we have used MIDI to sonify the stream of staleness events occurring in a ZooKeeper cluster. Multiple queries were sonified to produce different sounds for different staleness of ZooKeeper data.
We can hear periods of silence when the staleness is below the threshold of 2 values, however we can also observe some variations in cluster performance. For instance, it is very easy to identify when the cluster was experiencing more problems than normal. It is also easy to hear it recovering from the spike in staleness soon after.
Few month ago I showcased how a single server of Voldemort key-value store sounds. Sonification is a valid way to monitor systems, and has been used a lot in real applications. Geiger counter would be one of the most well-known examples of a sonified application. In some cases sonification may be the preferred form of representing information, as other forms surprisingly may not work as well for human perception. Even visualization of information is often not as good as sonification. Take the same Geiger counter example; research has shown that visual radiation level monitors do not perform quite as well as the sound ones as people tend to be distracted from the display to perform other tasks. Even visual and audio hybrids do not alert users of high radiation levels as good as simple audio counter.
As an example of a hybrid audio-visual system for Voldemort, I have built a small “traffic-light monitor” that changes colors and beeps differently depending on what action is performed by the server. The rig is built with Arduino and simply plugs to the USB of the machine running Voldemort server. Below is a short video of how it operates:
The green light lights up when server handles a “get” operation. Yellow light is for “put” requests and red light is for “get version” commands. As can be see, writing to Voldemort requires two operations, “get version” and “put” and they happen so quickly that Arduino is barely capable to light up the LEDs.
P.S. “Traffic light monitor” is not to be taken seriously, it is rather a silly example to show that there are plenty of ways to monitor a system or represent system’s logs.
Recently at our lab we discussed a fun little project of making distributed systems “play” music. The idea of sonifying a distributed application can be of some benefit for debugging and maintenance, since people have natural ability in recognizing patterns. Of course developer or systems administrators can analyze the logs of their systems and study the patterns that way, but listing to patterns and hearing the changes in such patterns is something we can do in the background, probably without taxing our entire attention span.
So how does a distributed system sound? And what can we learn by listening to it? Here is a 4.5 minute clip of a single Voldemort server playing its song.
Each message request type coming to the server was assigned a different pitch, with the note duration roughly corresponding to the time it took to fulfill the request. Of course, the recording was slowed down compared to the original execution of the node, with the entire 4 minutes and 37 seconds of audio representing just a coupe of seconds of real-time operation.
The audio has been recorded under a static workload of read and write operations, but there are few things that we can definitely hear about Voldemort’s operation even without the workload variations. The most obvious one is that the first half a minute of the audio is mostly silence. This is something I observed from the logs earlier as well, as Voldemort takes some time to get to its paces. As the execution progresses, we can definitely hear different operations happening at somewhat constant rate. In the second half of the audio, we can hear a few “hick-ups” as well as some louder and more forceful sounds for the requests that took longer to process. This, however was normal operation of Voldemort node, and introducing some problems into the system, such as network congestion or some machine failure will most definitely impact the sounds of this node.
What about making the sound of the entire distributed system? This becomes a trickier problem, as now we need to play multiple streams for all the distributed components in our system at once. Such components can be located on different physical servers and different racks and even different datacenters. However, for us to play the “true” sound of the system while preserving the causality of events, we need to be able to precisely synchronize and align the streams from various servers, accounting for any time skew and clock imprecisions.
Additionally, with multiple servers “playing” at once it may be more difficult for people to comprehend the patterns and the changes to such patterns.