dataflow

  • Reading Group. Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo

    ·

    Placeholder Icon

    In the 68th reading group session, we discussed scheduling in dataflow-like systems with Cameo. The paper, titled “Move Fast and Meet Deadlines: Fine-grained Real-time Stream Processing with Cameo,” appeared at NSDI’21. This paper discusses some scheduling issues in data processing pipelines. When a system answers a query, it breaks the query into several steps or…

    Read More

  • One Page Summary: “Musketeer: all for one, one for all in data processing systems”.

    ·

    Placeholder Icon

    Many distributed computation platforms and programming frameworks exist today, and new ones constantly popping out from the industry and academia.  Some platforms are domain specific, such as TensorFlow for machine learning. Others, like Hadoop and Naiad are more general, and this generality allows for sophisticated and specialized programming abstractions to be built on top. So…

    Read More

  • One Page Summary: Incremental, Iterative Processing with Timely Dataflow

    ·

    Placeholder Icon

    This paper describes Naiad distributed computation system. Naiad uses dataflow model to represent the computations, but it aims to be a general dataflow framework in contrast to other specialized approaches such as TensorFlow. Similarly to other dataflow systems, the computations are represented as graphs, where vertices represent data and operations and edges carry the data…

    Read More

  • About Google’s Dataflow Model

    ·

    Placeholder Icon

    In this post I am trying to understand the Google’s Dataflow Model, a data management and manipulation framework used for dealing with unbounded and unordered datasets. A lot of the data is being constantly produced today and has no “maximum size”, in other words the amount of such data is constantly increasing, and therefore modern…

    Read More