Last reading group meeting we have discussed “hXDP: Efficient Software Packet Processing on FPGA NICs.” This paper talks about using FPGA NICs to offload some CPU cycles doing certain routine packet processing tasks. In particular, the paper implements XDP purely in FPGA and achieves a performance similar to that of a single x86 CPU core. One of the bigger points in the paper is the need to keep a small footprint on FPGA in order to save FPGA resources for other potential uses, and the authors deliver on the front as well, using under 10% of the logic resources. The paper surely was a lot of work. It has a significant compiler component, hardware component, and comprehensive evaluation. The compiler is very important for the performance of the system, as it includes lots of optimizations, and many of these optimizations had a corresponding implementation in FPGA.
Below you can find a video of the presentation. This paper was a bit out of my comfort/expertise zone, so I may have oversimplified things at times.
Discussion
1) Benefit for distributed systems. One of the discussion points was the potential benefit to distributed applications from this. One immediate candidate we saw is the partial partitions paper we have discussed before. Nifty relies on the forwarding layer to mask such partitions, and having the packet inspection/forwarding done away from the CPU can be very useful in keeping the performance of the distributed application. To go a bit more general, may be any application that heavily relies on forwarding can take advantage of similar technology.
2) Performance evaluation. According to the paper, hXDP has similar throughput as the ~2.1 GHz x86 core. However, the CPU used in the eval was a 6.5 years old Xeon, not exactly state of the art silicon. However, it is worth noting that the development board used is also from the same time period as the CPU.
3) FPGA & CPU? Will we see FPGAs integrated into PCs? or Server CPUs? Unfortunately, non of us at the meeting had any real expertise with FPGAs, so this is a tough one. But given that CPU manufacturers have been buying FPGA manufacturers (Intel + Altera, AMD + Xilinx), there is a possibility of these two technologies coming closer, especially in servers, many of which already have FPGAs.
4) Cost. The development board is $7000, which is not cheap, but hXDP performance compares to just one core of a server-grade CPU, and paying 7k for one core seems a bit steep. But it is important to understand a few things. First, this was a development board, so it has more features and than the hardware actually used in production. Second, because this is a dev board, the economy of scale is not there. Third, hXDP used very little resources of the FPGA, so a lot of it can be used for other tasks, making even the $7k cost viable.
Our reading groups takes place over Zoom every Wednesday at 3:30pm EST. We have a slack group where we post papers, hold discussions and most importantly manage Zoom invites to the papers. Please join the slack group to get involved!