One Page Summary. Photons: Lambdas on a diet

Recently, to prepare for a class I teach this semester, I went through the “Photons: Lambdas on a diet” SoCC’20 paper by Vojislav Dukic, Rodrigo Bruno, Ankit Singla, Gustavo Alonso. This is a very well-written paper with a ton of educational value for people like me who are only vaguely familiar with serverless space!

The authors try to solve some deficiencies in serverless runtime. See, each function invocation is isolated from all other functions by running in separate containers. Such an isolated approach has lots of benefits, ranging from security to resource isolation. But there are downsides as well — each invocation of the same function needs its own container to run, so if there are no available idle containers from previous runs, a new one needs to be created, leading to cold starts. A “cold” function takes more time to run, as it needs to be deployed in a new container and complied/loaded up to the execution environment. In the case of many popular JIT-compiled languages (i.e., Java), this also means that it initially runs from byte-code in an interpreted mode without a bunch of clever compile-time optimizations. The paper states another problem to having all functions invocations requiring their separate containers — the resource wastage. In particular, if we run many invocations of the same function concurrently, we are likely to waste memory on loading up all the dependencies/libraries separately in each container. The authors also mention that some function invocations, for instance, functions for some ML or data processing workloads, may operate from the same initial dataset. Obviously, this dataset must be loaded to each function container separately.

The paper proposes a solution, called Photons, to address the cold-start issues and resource wastage in workloads with many concurrent invocations of the same functions. The concept of a Photon describes a function executed in a container shared with other function invocations of the same type. The premise here is to forgo some isolation and allow multiple invocations of the same function to share the container. As a result, the number of cold starts can be reduced, and resources, like RAM, can be better utilized by having libraries and common data loaded only once for many concurrent function calls.

The lack of isolation between the function invocations, however, creates a few problems. The paper solves some of them, but it also passes quite a few of these problems off to the function developers. One big problem passed to the developers is ensuring that different invocations of the same function consume a predictable amount of resources. This is needed for a couple of reasons. First, there are scheduling needs, as the system needs to predict machine and container resource usage to determine which containers can be expanded to take on more function calls or which machines can handle new containers. Second, the lack of container-level isolation means that a misbehaved function instance can grab too many resources and starve other concurrent functions in the same container.

Another big problem is memory isolation and memory sharing between concurrent functions invocations. A significant part of the Photons platform deals with these problems. On memory isolation, things are easy when we only work with class variables. Each object created from the class will have a separate copy. However, some class variables may be static, and what is even worse, they can be mutable, allowing concurrent executions of the code to modify these static fields. Photons address this problem by transforming static variables into a map, where a key is a photonID (i.e., a unique invocation id of a function). Naturally, all reads and writes to a static variable change to map puts and gets. There are a bit more nuances with statics, and the paper covers them in greater detail. For sharing state, Photons runtime maintains a KV-store exposed to all Photons/function invocation in the container. One major downside of having a KV-store is managing concurrent access to it, and the systems leave it up to the developers to code coordination to this shared store. Aside from shared KV-store, functions also share the file system for temporary storage.