How to Read Computer Science (Systems) Papers using Shampoo Algorithm

I think most academics had to answer a question on how to approach papers. It is the beginning of the semester and a new academic year, and I have heard this question quite a lot in the past two weeks. Interestingly enough, I believe that almost every academic active on the Internet has written about paper reading. So one may think there should be plenty of guidance out there, all within Google’s reach. And I see students reading these tips and suggestions and still coming back for more advice.

One problem with many paper reading tips is that they are too algorithmic and do not help where people really struggle. “Read the title, then abstract, then go to the conclusion, look at the figures…” is a common advice pattern. Since when do these steps actually help understand the paper? Who can look at protocol drawings or results figures after reading an abstract and get it? No doubt, this approach works for some, but it seems to require a decent paper reading skill and huge baggage of consumed literature in related topics. Other suggestions I see online may be a lot more helpful but too come with some bias towards more experienced readers.

My approach to reading papers and advice I give my students is based on the infamous shampoo algorithm — lather, rinse and repeat. Scientific (computer science) papers are not like most other types of reading we do. Unlike most fiction, for example, scientific papers are non-linear in their comprehension. Let me explain what I mean here. When we read fictional stories, we follow a linear path through the plot. On the first pages of a detective story, we will never see who was the murderer and how the victim was killed, only to see the motive in the next chapter and a detailed description of a weapon somewhere in the middle of the book. Yet this is exactly how computer science papers unfold their story.

This non-linearity means that we cannot apply linear reading strategies to papers. We cannot just go with the text to build a complete understanding of the story. Most academic papers have circular “plot” structures with multiple rings of comprehension built around each other. Each of these “rings” acts as a foundation for what is to come next. We can think of them as the same concepts explained in increasing levels of difficulty. This rings abstraction is where the shampoo algorithm comes in — we must “rinse and repeat” until we reach a good understanding of something at one level of difficulty before moving to the next one.

Naturally, we can start with a title and an abstract. These build our initial comprehension ring of the paper, as we establish the broad topic and very high-level information about the solution and the outcome. Hopefully, things are clear at this point in the reading process, but if they are not, it may be a good idea to pick up a textbook and resolve the confusion.

The introduction usually brings us to the next comprehension ring. We learn the problem and its importance in greater detail, get motivated. The intro will likely provide some insight into the solution as well. At this time, we may also look at the conclusion to see if anything is interesting there. Some conclusions can be very good at providing a high-level overview, some are not so much. We are at a crucial point in understanding the paper now, and things tend to get way harder very quickly after this. If there is some fundamental gap in comprehension by this time, it would be a good idea to go back and reread. Maybe follow the references on aspects that are still hard to grasp — they will only get more difficult later on.

I should take a pause here for a bit. See, it is ok to not know something at each comprehension ring before moving forward. It is ok to ask questions about how things are done and seek answers in the following outer rings. Moreover, you must ask how and why questions and seek answers later. But you probably should not have fundamental questions about what the authors are doing or questions about the background discussed. I think there is a rather fine line between the questions you need to be asking going forward and the questions that require the “rinse and repeat” treatment. Seeing this line may come with a bit of experience. The good things about the shampoo algorithm is that when you are confused, you can always go back and repeat.

Now we are ready to move forward. If the paper has a background section and you had questions about background earlier, then read it. In my areas of expertise, I may often skim through the background instead of reading carefully.

We are now diving into the “meat of the paper” or “the solution” sections. This is the most challenging and time-consuming part that may actually have multiple comprehension rings packed together. It is important to never be afraid to practice the “shampoo algorithm” when things start to become gibberish. Go back even if this requires rewinding to the previous comprehension ring.

At first, we may approach the solution sections by reading the English description accompanied by any graphical explanations in the figures. At this stage, it may be a good idea to skip any algorithms and proofs. Instead, we must focus on absorbing the details of the solution and try to answer the how/why questions we had previously. The detailed English and graphical descriptions form an entire comprehension ring after the introduction/background one. Working on this third comprehension ring often requires going back and rereading paragraphs and sections many times over to build proper understanding. Now is also a good time to start looking at the evaluation and see if the results match any expectations we may have had based on our understanding so far.

Finally, we can move into the algorithms and proofs and try to work them out. Working through these things requires a good understanding of what is going on in the solution. We have worked hard to have this understanding by reading and rereading, asking and answering questions. Building the required understanding/intuition of the problem and solution is the reason for approaching these so late in the process. However, going through algorithms on paper line by line is super useful and helps transform intuition into a more codified working model. This codified model forms yet another comprehension rings. An additional ring will form after going through the proofs.

Writing as a Reading Tool

While I talked a lot about rinse and repeat, not being afraid to go back into the paper, and gradually build the comprehension rings, I did not mention one super useful tool for reading papers. This tool is writing. See, our brains are kind of too fast when we read, and this speed is not a good one. The brain races between questions, answers, remembering past information, and acquiring the new one. The “go back and reread” approach allows the brain to slow down a bit from getting carried away too far and too fast. However, the ultimate brain speed bump is writing — we type much slower and have to focus on what we write, forcing the brain to slow down even more. And this is where the magic happens. It turns out that when the brain is working slower, the thought process is also more structured. For this reason, pilots are trained to slow their brains down in critical conditions to think better. We are not in any life-or-death critical situation, so simple writing may be a good enough tool to slow your brain down, so you can have time to think and explain what you just have learned.

I often use writing as my ultimate reading tool. For example, all of my reading group summaries are a result of the “ring-3” comprehension discovery process. I do not write for all papers I read, but when I do, it significantly improves my understanding of the paper.