Part 1: SmartNICs Could be Managed Like Switches
For years, I’ve suggested in small meetings with enterprise switch OEMs that they should foster the development of a SmartNIC product that could run their switch OS. This product could be managed from the cloud, like any other enterprise switch in a customer's network. This would further boost the customer's reliance on their switch cloud management platform while also increasing the OEM's per install revenue as they could upsell $3K SmartNICs into every server at the end of every new 100 GbE network switch port, but more importantly, deliver value and reliance on features others aren't or weren't capably of offering. As far as I know, this concept has fallen on deaf ears, as no such product has ever entered the market. As 400 GbE into servers becomes more common and I’m now free from corporate employment, I’ll restart this rallying cry.
Some Background
Before we dive too deep into this concept, let's take a moment to frame what separates a SmartNIC from a generic Network Interface Card (NIC). The sole job of a NIC is to take signals coming into a computer in the form of voltage changes on a collection of wires and convert them into structured data in memory buffers that can then be passed on to the operating system running on the primary CPU. While this may sound simple, it isn't, as these changes occur billions of times per second on each wire, and to reduce the amount of signal lost to noise, there are various layers of framing the data and extremely clever encoding that takes place before the data is ever transmitted that then needs to be unwound before it can be placed into buffers for the OS.
Early SmartNICs broke these decoding and framing tasks down into discrete steps and created a pipeline so each transform could be applied efficiently and in the proper order. For now, let's consider the receive side of the NICs' function; once the signals were reconstituted into the final data that should be sent to the host, that buffer could then be passed to a router within the SmartNIC. The router then analyzed the source and destination addresses, ports, and protocol, which is often defined as a flow identifier (FID). If the FID matched one in the routers table, it would steer the packet to the appropriate virtualized NIC (vNIC) instance paired to this FID within the SmartNIC. Some SmartNICs, like those by Solarflare, Myricom, and Mellanox, would have 2,000 or more vNIC instances, each containing a host memory address where specific data was destined. Once passed to the appropriate vNIC, the data would be moved from the SmartNIC into the designated host memory by an application process running on the specific core expecting that data. For applications with accelerated networking, this would happen outside the host operating system, saving the application time by not having to wait for the operating system to handle the task. This concept became known as kernel bypass and it worked great when we had a pair of 10 GbE ports and two CPUs, each with sixteen cores. It would enable applications to shave a few millionths of a second off each packet's receive time, but that was a dozen years ago.
Shortly after Xilinx, now AMD, acquired Solarflare in 2019, we brought the SN1000 SmartNIC to market. This SmartNIC stood out because it featured a control plane running on an ARM core complex and a separate data plane running on soft logic within a Field-Programmable Gate Array (FPGA) on the same PCIe card. Much of this is explained in "SmartNIC Architectures: The Future is a Portable Architecture," which I published in May 2021. The enduring key concept in this article is the separation of data and control. You can think of the data plane as the pipeline through which your network data flows and the control plane as the manager who adjusts and tunes the pipeline configuration and can reprogram it on the fly.
The SN1000 has two distinctly different compute elements, much like the simple diagram above. The FPGA, through which the data plane ran (the big chip with the switch logo on it), was composed of a gargantuan array of programmable look-up tables enabling it to emulate complex logic at speeds similar to that of traditional Application Specific Integrated Circuits (ASICs). These look-up tables are programmed in a circuit design abstraction language called Register Transfer Level (RTL). Essentially each stage of the data plane pipeline is a single large circuit definition. These stages have their own associated memories and are sewn into a pipeline using RTL. The central and most sophisticated stage in the data plane pipeline is one that is actually wired up as a P4 processor that can execute P4 code passed to it from the control plane.
P4 is a domain-specific language for managing networking data. This P4 processor was created as a matching engine to examine FIDs and steer or transform packets as specified. Initially, in May 2021, when this P4 architecture implementation was released, it could handle roughly 80 million packets per second (MPPs). This particular SmartNIC had two 100 GbE ports, and by definition, a single network port at this speed could see up to 148 million packets per second. In most normal networking use cases, 80 MPPs, for both ports would be adequate. Conversely, in a worst-case packet storm, something that might happen during a Distributed Denial of Service (DDoS) attack, the SN1000 could be subjected to nearly 300 million packets per second, woefully short of the roughly 80 million it was capable of handling, so nearly three-quarters of all these DDoS packets would be back-pressured into the switch where they would eventually be dropped.
The SN1000's eight-core ARM CPU complex executed the control plane program. This program communicated with both the host, to receive new P4 pipeline instructions and pass up pipeline usage and telemetry data, and the data plane P4 matching engine, where it passed down new instructions to execute and fetch data plane usage and telemetry data. This was a host-side programmable SmartNIC because all of the control plane management was handled by code running on the server. Other SmartNICs and DPUs utilize a third network interface, often running at 1 GbE or 10 GbE speeds, to program the control plane. This will be addressed in part two of this series.
Rise of the Generic Flow Table
Today, we have dual port 100 GbE SmartNICs and two CPUs with up to 200 cores each, so more horsepower is needed to route packets at these rates to satisfy these CPUs. To do this, we need a more sophisticated way to map a FID to a memory address and perform actions on packets belonging to these flows. Actions could be as simple as:
Furthermore, we need a much larger table for this router than a few thousand FID entries and perhaps even a new approach. One way to do this is to create a huge hash table, often called a Generic Flow Table (GFT).
GFTs typically support tens of millions of FID entries, and the associated logic has a method for effectively managing hash collisions. As we move to 400 GbE, these GFTs must support up to 128 or even 256 million entries. Generally, due to their architecture, the performance of these tables begins to degrade as utilization eclipses 25%, so a GFT built with support for 128 million FIDs can efficiently manage 32 million before performance degrades. So, as you now know, the GFT represents a single but critical data plane pipeline stage. In tests recently, we demonstrated a GFT supporting 128 million FIDs at a wire-rate on 400 GbE.
More to Come...
The next few parts of this series will expand on the control plane and explain why and how an enterprise switch company could make money with its own SmartNIC operating as a switch and managed via its cloud tools.
One Final Note
At the end of March, Achronix shifted its focus and laid off some fine FPGA engineers, programmers, and managers like myself. If you are looking for people with an intimate knowledge of FPGAs, please contact me, and I'll put you in touch with some of these amazing people.
SmartNIC/IPU validation Architect @Intel/Altera
5moI have worked on similar smartnics where we could manage 4million flow entries
Director |Software architect | Network security | AWS | Embedded systems |
5moReminds me of the product we built together 😉. Serverlock!