Smart-NICs and IPUs: Engines Behind HPC and Cloud Virtualization Lift off
An Overview of hardware acceleration of Network Infrastructure to enable high performance Virtualized Cloud for Enterprise
Ajay Dubey, Engineering Director, Intel Corporation
Introduction
The rise of High-Performance Computing (HPC) in the cloud has transformed how scientific computing, AI training, and data-intensive workloads are deployed at scale. Central to this transformation is a new breed of networking hardware: SmartNICs (Smart Network Interface Cards) and IPUs (Infrastructure Processing Units). These technologies emerged during the virtualization revolution (2005–2015), addressing performance bottlenecks, reducing CPU overhead, and enabling seamless multi-tenant resource management. This article explores how SmartNICs and IPUs evolved and why they are now indispensable to cloud-based HPC.
The x86 Multicore, Compute Virtualization, and Its Impact on Networking
The feasibility and success of server virtualization were significantly boosted by the emergence of multicore x86 architectures. With multiple CPU cores per socket, servers could efficiently run multiple VMs in parallel. Many of us might remember, at in some companies, our individual desktops running multiple Operating systems using VMs managed through a hypervisor. Availability of multiple CPU cores in the same sockets enabled:
Better resource sharing and isolation across VMs
Lower overhead for context switching, and
Greater potential for horizontal scalability in data center designs
Multicore CPUs laid the computational groundwork needed to support hypervisors, virtual switching, and per-VM network policies without overwhelming the system setup. The advent of virtualized cloud computing revolutionized the landscape of data center architecture and networking infrastructure.
In the mid-2000s, the widespread adoption of hypervisors such as VMware ESXi, KVM, and Microsoft Hyper-V marked a paradigm shift—from rigid, physical servers to agile, software-defined infrastructures. This transformation made it possible to spin up, migrate, and scale virtual machines (VMs) on demand, unlocking unprecedented flexibility and operational efficiency. As compute became more dynamic, traditional networking architectures began to strain under the pressure. Challenges quickly emerged. Some of them are summarized below.
Explosion in East-West Traffic In traditional data centers, most network traffic followed a North-South pattern, flowing between external clients and internal servers. However, with the rise of virtualization, there was a dramatic shift toward East-West traffic, where virtual machines (VMs) within the same physical host or across hosts began communicating heavily. This shift resulted in a tenfold increase in intra-host traffic, placing significant strain on traditional Network Interface Cards (NICs) and switch architectures, which were not designed to handle such internal communication intensity.
Software vSwitch Bottlenecks
In early virtualization deployments, software-based virtual switches like Linux Bridge and Open vSwitch (OVS) were used to forward traffic between VMs. These switches processed every packet entirely in software, requiring multiple context switches between kernel and user space for each operation. Additionally, packet forwarding involved CPU-intensive lookups for MAC addresses and access control rules. As a result, throughput was severely constrained, and performance bottlenecks became prominent, particularly as hosts approached traffic loads of 10–20 Gbps.
CPU Overhead from Virtualization
Another critical issue was the high CPU overhead introduced by virtualization. Hypervisors could consume up to 50% of total CPU cycles just to handle network-related tasks. The constant flow of packets triggered frequent interrupts, leading to interrupt storms that further degraded performance. These conditions often caused latency spikes exceeding 100 microseconds, significantly impacting application responsiveness and workload consistency.
Networking Offloads: Unburdening the Host CPU
As the scale of virtualization increased, so did the demand on the host CPU to manage networking workloads. To counter this, NICs evolved to offload several key networking functions. These offloads include L2 switching, VLAN tagging, L3 routing, TCP/UDP checksum computation, TCP segmentation, overlay encapsulation (e.g., VXLAN, NVGRE), and flow classification. Each of these tasks, when processed in software, consumed significant CPU cycles. By moving them to the NIC hardware, systems achieved lower latencies, higher throughput, and freed up valuable CPU resources for core application logic—laying the groundwork for scalable, efficient cloud infrastructure.
Efficient IO-Virtualization: SR-IOV
The introduction of the PCI-SIG SR-IOV (Single Root I/O Virtualization) standard in 2007 marked a pivotal moment in NIC evolution. Traditional virtualized networking suffered from high latency and CPU overhead due to the hypervisor's software switch managing all VM I/O. SR-IOV redefined this paradigm by essentially expanding multiple Virtual Functions for each Physical Function. The PF still is responsible for all physical link level functions and comprehensive configuration space, but VF needed essential PCIe functions required for routing etc. such as BARs spaces. In short:
Physical Functions (PFs) provided hypervisor-level control.
Virtual Functions (VFs) enabled direct assignment of NIC hardware queues to VMs.
Typical NICs supported thousand(s) of VFs per PF, allowing efficient I/O scaling.
This architecture drastically reduced context switching, improved isolation, and unlocked near-native performance for VMs. Virtualization forced networking to evolve from software-centric to hardware-accelerated data planes. By offloading L2 switching, tunneling, and flow classification, NICs evolved into programmable network processors. Today, SmartNICs and IPUs enables:
Zero-CPU networking (for VMs, pods, and GPUs
Deterministic low latency (<5 µs)
Cloud-scale agility (instant VM migration, elastic scaling)
The future l think still lies in programmable NICs that can dynamically adapt to new protocols, custom overlays, or AI-specific traffic—without compromising performance. Of course that is a challenging and competing set of criteria.
Freeing up Processors altogether: Offloading Networking Infrastructure to the NIC
As hyperscalers expanded, so did the demands on networking hardware. Early SR-IOV systems laid the foundation, but cloud-native workloads required further innovation:
Multi-Queue Support: NICs evolved to support hundreds to over a thousand TX/RX queues.
Flow Director: On-NIC steering of packets to specific queues based on flow identifiers improved CPU cache locality.
Hardware QoS: NIC-level rate limiting and traffic shaping ensured multi-tenant fairness.
VMDq (Virtual Machine Device Queues): Allowed NICs to assign traffic to VMs more intelligently, predating SR-IOV as a precursor to queue-based I/O isolation.
These enhancements created NIC architectures optimized not just for performance but also for scalability and multi-tenancy—cornerstones of modern cloud environments. To further decouple the networking from the CPU, networking functions are offloaded to NIC hardware. Some of them are summarized in the paragraph below.
L2 Switching
Software lookups were slow; NICs use TCAMs
10× lower latency, line-rate forwarding
VLAN Tagging/Untagging (802.1Q)
Per-packet VLAN ops in software added overhead
Zero CPU cost for VLANs
L3 Routing
Software-based routing added microseconds
L3 routing tables cached to NIC hardware
TCP/UDP Checksum
CPU-bound task per packet
Frees 5–10% CPU cycles
TSO (TCP Segmentation Offload)
CPUs split large packets into MTU sizes
Cuts CPU load by 20%+
VXLAN, NVGRE Encapsulation
Overlay networks overwhelmed software vSwitch
100 Gbps tunneling with no CPU hit
Flow Classification
ACLs and QoS inefficient in software
NIC-based flow steering and load balancing
Conclusion
SmartNICs and IPUs represent the maturation of network interface technology from passive I/O endpoints to intelligent infrastructure processors. As the demands of cloud-based HPC grow—driven by AI, simulations, and real-time analytics—these engines have become foundational. Born from the needs of the virtualization revolution, SmartNICs and IPUs are now the unsung heroes enabling cloud to meet the rigorous demands of high-performance computing.
Intel has developed a comprehensive portfolio of SmartNICs and IPUs, ranging from FPGA-based programmable platforms to fixed-function ASIC solutions. Notable offerings include the Intel N6000‑PL and F2000X‑PL platforms built on Agilex FPGAs, as well as the ASIC-based Mount Evans and Mount Morgan IPUs designed for high-performance cloud-scale deployments.
I have had the opportunity lead some of Intel’s key Smart-NICs driving the POC and product development both. I hope Intel networking continue to play central role on enabling cloud networking for both the AI and regular networking workloads.