Smart-NICs and IPUs: Engines Behind HPC and Cloud Virtualization Lift off

Ajay Dubey

Director of Engineering, AI Fabric | Networking | IP | SoC | ASIC | FPGA

Published Jul 13, 2025

An Overview of hardware acceleration of Network Infrastructure to enable high performance Virtualized Cloud for Enterprise

Ajay Dubey, Engineering Director, Intel Corporation

Introduction

The rise of High-Performance Computing (HPC) in the cloud has transformed how scientific computing, AI training, and data-intensive workloads are deployed at scale. Central to this transformation is a new breed of networking hardware: SmartNICs (Smart Network Interface Cards) and IPUs (Infrastructure Processing Units). These technologies emerged during the virtualization revolution (2005–2015), addressing performance bottlenecks, reducing CPU overhead, and enabling seamless multi-tenant resource management. This article explores how SmartNICs and IPUs evolved and why they are now indispensable to cloud-based HPC.

The x86 Multicore, Compute Virtualization, and Its Impact on Networking

The feasibility and success of server virtualization were significantly boosted by the emergence of multicore x86 architectures. With multiple CPU cores per socket, servers could efficiently run multiple VMs in parallel. Many of us might remember, at in some companies, our individual desktops running multiple Operating systems using VMs managed through a hypervisor. Availability of multiple CPU cores in the same sockets enabled:

Better resource sharing and isolation across VMs
Lower overhead for context switching, and
Greater potential for horizontal scalability in data center designs

Multicore CPUs laid the computational groundwork needed to support hypervisors, virtual switching, and per-VM network policies without overwhelming the system setup. The advent of virtualized cloud computing revolutionized the landscape of data center architecture and networking infrastructure.

In the mid-2000s, the widespread adoption of hypervisors such as VMware ESXi, KVM, and Microsoft Hyper-V marked a paradigm shift—from rigid, physical servers to agile, software-defined infrastructures. This transformation made it possible to spin up, migrate, and scale virtual machines (VMs) on demand, unlocking unprecedented flexibility and operational efficiency. As compute became more dynamic, traditional networking architectures began to strain under the pressure. Challenges quickly emerged. Some of them are summarized below.

Explosion in East-West Traffic In traditional data centers, most network traffic followed a North-South pattern, flowing between external clients and internal servers. However, with the rise of virtualization, there was a dramatic shift toward East-West traffic, where virtual machines (VMs) within the same physical host or across hosts began communicating heavily. This shift resulted in a tenfold increase in intra-host traffic, placing significant strain on traditional Network Interface Cards (NICs) and switch architectures, which were not designed to handle such internal communication intensity.

Software vSwitch Bottlenecks

In early virtualization deployments, software-based virtual switches like Linux Bridge and Open vSwitch (OVS) were used to forward traffic between VMs. These switches processed every packet entirely in software, requiring multiple context switches between kernel and user space for each operation. Additionally, packet forwarding involved CPU-intensive lookups for MAC addresses and access control rules. As a result, throughput was severely constrained, and performance bottlenecks became prominent, particularly as hosts approached traffic loads of 10–20 Gbps.

CPU Overhead from Virtualization

Another critical issue was the high CPU overhead introduced by virtualization. Hypervisors could consume up to 50% of total CPU cycles just to handle network-related tasks. The constant flow of packets triggered frequent interrupts, leading to interrupt storms that further degraded performance. These conditions often caused latency spikes exceeding 100 microseconds, significantly impacting application responsiveness and workload consistency.

Networking Offloads: Unburdening the Host CPU

As the scale of virtualization increased, so did the demand on the host CPU to manage networking workloads. To counter this, NICs evolved to offload several key networking functions. These offloads include L2 switching, VLAN tagging, L3 routing, TCP/UDP checksum computation, TCP segmentation, overlay encapsulation (e.g., VXLAN, NVGRE), and flow classification. Each of these tasks, when processed in software, consumed significant CPU cycles. By moving them to the NIC hardware, systems achieved lower latencies, higher throughput, and freed up valuable CPU resources for core application logic—laying the groundwork for scalable, efficient cloud infrastructure.

Efficient IO-Virtualization: SR-IOV

The introduction of the PCI-SIG SR-IOV (Single Root I/O Virtualization) standard in 2007 marked a pivotal moment in NIC evolution. Traditional virtualized networking suffered from high latency and CPU overhead due to the hypervisor's software switch managing all VM I/O. SR-IOV redefined this paradigm by essentially expanding multiple Virtual Functions for each Physical Function. The PF still is responsible for all physical link level functions and comprehensive configuration space, but VF needed essential PCIe functions required for routing etc. such as BARs spaces. In short:

Physical Functions (PFs) provided hypervisor-level control.
Virtual Functions (VFs) enabled direct assignment of NIC hardware queues to VMs.
Typical NICs supported thousand(s) of VFs per PF, allowing efficient I/O scaling.

This architecture drastically reduced context switching, improved isolation, and unlocked near-native performance for VMs. Virtualization forced networking to evolve from software-centric to hardware-accelerated data planes. By offloading L2 switching, tunneling, and flow classification, NICs evolved into programmable network processors. Today, SmartNICs and IPUs enables:

Zero-CPU networking (for VMs, pods, and GPUs
Deterministic low latency (<5 µs)
Cloud-scale agility (instant VM migration, elastic scaling)

The future l think still lies in programmable NICs that can dynamically adapt to new protocols, custom overlays, or AI-specific traffic—without compromising performance. Of course that is a challenging and competing set of criteria.

Freeing up Processors altogether: Offloading Networking Infrastructure to the NIC

As hyperscalers expanded, so did the demands on networking hardware. Early SR-IOV systems laid the foundation, but cloud-native workloads required further innovation:

Multi-Queue Support: NICs evolved to support hundreds to over a thousand TX/RX queues.
Flow Director: On-NIC steering of packets to specific queues based on flow identifiers improved CPU cache locality.
Hardware QoS: NIC-level rate limiting and traffic shaping ensured multi-tenant fairness.
VMDq (Virtual Machine Device Queues): Allowed NICs to assign traffic to VMs more intelligently, predating SR-IOV as a precursor to queue-based I/O isolation.

These enhancements created NIC architectures optimized not just for performance but also for scalability and multi-tenancy—cornerstones of modern cloud environments. To further decouple the networking from the CPU, networking functions are offloaded to NIC hardware. Some of them are summarized in the paragraph below.

L2 Switching

Software lookups were slow; NICs use TCAMs
10× lower latency, line-rate forwarding

VLAN Tagging/Untagging (802.1Q)

Per-packet VLAN ops in software added overhead
Zero CPU cost for VLANs

L3 Routing

Software-based routing added microseconds
L3 routing tables cached to NIC hardware

TCP/UDP Checksum

CPU-bound task per packet
Frees 5–10% CPU cycles

TSO (TCP Segmentation Offload)

CPUs split large packets into MTU sizes
Cuts CPU load by 20%+

VXLAN, NVGRE Encapsulation

Overlay networks overwhelmed software vSwitch
100 Gbps tunneling with no CPU hit

Flow Classification

ACLs and QoS inefficient in software
NIC-based flow steering and load balancing

Conclusion

SmartNICs and IPUs represent the maturation of network interface technology from passive I/O endpoints to intelligent infrastructure processors. As the demands of cloud-based HPC grow—driven by AI, simulations, and real-time analytics—these engines have become foundational. Born from the needs of the virtualization revolution, SmartNICs and IPUs are now the unsung heroes enabling cloud to meet the rigorous demands of high-performance computing.

Intel has developed a comprehensive portfolio of SmartNICs and IPUs, ranging from FPGA-based programmable platforms to fixed-function ASIC solutions. Notable offerings include the Intel N6000‑PL and F2000X‑PL platforms built on Agilex FPGAs, as well as the ASIC-based Mount Evans and Mount Morgan IPUs designed for high-performance cloud-scale deployments.

I have had the opportunity lead some of Intel’s key Smart-NICs driving the POC and product development both. I hope Intel networking continue to play central role on enabling cloud networking for both the AI and regular networking workloads.

Smart-NICs and IPUs: Engines Behind HPC and Cloud Virtualization Lift off

Ajay Dubey

Director of Engineering, AI Fabric | Networking | IP | SoC | ASIC | FPGA

Introduction

The x86 Multicore, Compute Virtualization, and Its Impact on Networking

Software vSwitch Bottlenecks

CPU Overhead from Virtualization

Networking Offloads: Unburdening the Host CPU

Efficient IO-Virtualization: SR-IOV

Freeing up Processors altogether: Offloading Networking Infrastructure to the NIC

Conclusion

More articles by this author

Others also viewed

Conclusion: LLMs on BGP4 vs OSPF in Large Scale Datacenters

Broadcom Didn’t Kill Virtualization. It Gave It Room to Evolve.

AI Datacenter Switch Math

Beyond Containers: Exploring the MicroVM Revolution Part 7: 💡 The Future of Virtualization – MicroVMs, Unikernels, and Beyond

Introduction to Open-source SONiC: A Cost-Efficient and Flexible Choice for Data Center Switching

Oracle PCA X8 Obsoletes Nutanix, Dell, Cisco, HPE, NetApp HCI & CI [White Paper]

The Basics of Virtualization: A Comprehensive Guide

Gartner: VMware users face multi-hypervisor future

High-Density Servers: Unlock Hidden Rack Capacity and Cut Costs

what is virtualization and how does it work?

Explore topics

Introduction

The x86 Multicore, Compute Virtualization, and Its Impact on Networking

Software vSwitch Bottlenecks

CPU Overhead from Virtualization

Networking Offloads: Unburdening the Host CPU

Efficient IO-Virtualization: SR-IOV

Freeing up Processors altogether: Offloading Networking Infrastructure to the NIC

Conclusion

Networking (OvS) Data Path Offloads: Hardware Acceleration Architectures

Aug 11, 2025

Overview of Packet Flow through the NIC

Jul 28, 2025

The Evolution of Network Interface Cards: From ISA Adapters to AI-Optimized Interconnects

Jul 6, 2025

AI Networking - Push to Standardization

Apr 23, 2025

Others also viewed

Conclusion: LLMs on BGP4 vs OSPF in Large Scale Datacenters

Broadcom Didn’t Kill Virtualization. It Gave It Room to Evolve.

AI Datacenter Switch Math

Beyond Containers: Exploring the MicroVM Revolution Part 7: 💡 The Future of Virtualization – MicroVMs, Unikernels, and Beyond

Introduction to Open-source SONiC: A Cost-Efficient and Flexible Choice for Data Center Switching

Oracle PCA X8 Obsoletes Nutanix, Dell, Cisco, HPE, NetApp HCI & CI [White Paper]

The Basics of Virtualization: A Comprehensive Guide

Gartner: VMware users face multi-hypervisor future

High-Density Servers: Unlock Hidden Rack Capacity and Cut Costs

what is virtualization and how does it work?

Explore topics