SlideShare a Scribd company logo
Leveraging DPDK to Scale-Out Network
Functions Without Sacrificing Networking
Performance
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
August 2015
2
VNFs Will Run in Diverse Infrastructures
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
How can we unify these environments?
Hypervisor
Host
Switch
Bare Metal
NIB
Hypervisor
Host
Hypervisor
Host
Switch
Private Clouds
Switch
Hypervisor
Host
Public Clouds
NIB
3
Virtual Network Function (VNF) Considerations
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
GW
A VNF Breaks Down Into VNF Components (VNFc)
– Control Plane / Data Plane / Security as example
– VNF vendors likely to have diverse network attachment models
– Any NFV architecture needs to accommodate a variety of guest attachment options
– Each VNFC will need to scale independently and be able to run virtually anywhere
Example
VNF
Bearer
• Packet forwarding performance is
the most critical metric
• Limit complex interface bonding if
possible for flow state consistency
• Guest OS’s need access to data
plane accelerated NICs (DPDK)
• Target hosts with high bandwidth
NICs (40G/100G)
VNFc-1
Security
• Session encryption / decryption
rate is the most critical metric
• Scale-out of crypto tunnels
across security subsystem key for
deterministic scalability
• Guest OS’s need access to crypto
offload
• Target hosts with PCI accelerated
crypto processing (NPU/NSP)
VNFc-3
Control
VNFc-2
• Transactions per second is the most
critical metric
• VPN / NAT association and
management
• Many other in-line services likely to be
offered
• Responsible for coordinated scale-out
of all VNFCs
• Likely most utilized function – target
low cost hosts
4
Variety of Network Attachment Options
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
Virtio
– Para-virtualized network that’s the simplest to deploy
– OpenStack native support
– Tenant encapsulation supported by OpenStack
– Lower performance due to many context switches (host / guest /
QEMU)
– Complex networking limited to host environment
Direct Pass-Through
– Direct guest access to NIC hardware
– OpenStack does not natively support this
– Tenant encapsulation outside of OpenStack - significant work to
integrate
– Very high performance due to direct guest access to the hardware
– Complex networking left to the guest environment and underlay
SR-IOV
– High speed multi-guest NIC access
– OpenStack native support
– Tenant encapsulation outside of OpenStack - significant work to
integrate
– High performance due to direct hardware support
– Complex networking left to the guest environment and underlay
DPDK Accelerated vSwitch with Sockets / KNI
– KNI provides the ability to use guest kernel NIC interfaces
– Supported by Openstack
– Low performance due to kernel interface
– Complex networking limited to host environment
DPDK Accelerated vSwitch with Ivshmem
– Facilitates fast zero-copy data sharing among virtual machines
– Supported by Openstack
– Good performance but hugepage shared by all guests (unsecure)
– Complex networking limited to host environment
DPDK Accelerated vSwitch with Virtio
– Virtio to DPDK in QEMU
– Supported by Openstack
– Limited performance due to packet copy
– Complex networking limited to host environment
DPDK Accelerated vSwitch with vhost-user
– Facilitates fast zero-copy data sharing among virtual machines
– Supported by Openstack
– Limited performance due single queue limitation (multi-queue coming)
– Complex networking limited to host environment
5
Network Attachment Option - Details
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
Hypervisor BypassStandard vSwitch DPDK Accelerated vSwitch
Host
Host OS / kernel space
Host OS / user space
QEMU
Direct Pass-through
KVM
Guest VM
Guest OS / kernel space
Guest OS / user space
PF Driver
QEMU
KVM
Guest VM
Guest OS / kernel space
Guest OS / user space
PF Driver
pNIC
Intel VT-d
Host
Host OS / kernel space
virtIO
Host OS / user space
KVM
Guest VM
Guest OS / kernel space
Guest OS / user space
VirtIO Driver
OVS
Tap
Ethernet
QEMU
Host
Host OS / user space
Host OS / kernel space
QEMU
IVSHM
Guest VM
Guest OS / kernel space
IVSHM
Guest OS / user space
Linux Sockets
BAR2
DPDK KNI
QEMU
IVSHM
Guest VM
Guest OS / kernel space
IVSHM
Guest OS / user space
DPDK
BAR2
Guest VM
Guest OS / kernel space
Guest OS / user space
VirtIO
DPDK PMD Drivers
Ethernet
Port
Ethernet
Port
Packet Forwarding
Virtual Ethernet Port Virtual Ethernet Port Virtual Ethernet Port
OVS Daemon
DPDK Control
Interface
DPDK Accelerated OVS Forwarding Application
virtIOivshmemKNI
KNI BUF
QEMU
VirtIO
DPDK
Host
Host OS / kernel space
Host OS / user space
DPDK SR-IOV
Guest VM
Guest OS / kernel space
Guest OS / user space
QEMU
KVM
Guest VM
Guest OS / kernel space
Guest OS / user space
DPDK DPDK
VF DriverVF Driver
Guest VM
Guest OS / kernel space
Guest OS / user space
VirtIO
vhost-user
QEMU
VirtIO
DPDK
Virtual Ethernet Port
pNIC
SR-IOV
Physical Function
Virtual Ethernet Bridge and Classifier
PCIManager
Virtual Function Virtual Function
6
Other Considerations Beyond Network Attachment Options
CPU Pinning
– A process or thread affinity configured with one or multiple cores
– In a 1:1 pinning configuration between virtual CPUs and physical CPUs, some predictability is introduced into the system by preventing host and
guest schedulers from moving workloads around facilitating other efficiencies such as improved cache hit rates
Huge Pages
– Provides up to 1-GB page table entry sizes to reduce I/O translation look-aside buffer (IOTLB) misses which improves networking performance,
particularly for small packets
I/O-Aware NUMA Scheduling
– Memory allocation process that prioritizes the highest-performing memory local to a processor core
– Able to configure VMs to use CPU cores from the same processor socket and choose the optimal socket based on the locality of the relevant NIC
device that is providing the data connectivity for the VM
Cache Monitoring Technology / Cache Allocation Technology (CMT/CAT)
– CMT allows an operating system (OS) or hypervisor or virtual machine monitor (VMM) to determine the usage of cache by applications running
on the platform
– CMT allows an OS or VMM to assign an ID (RMID) for each of the applications or VMs that are scheduled to run on a core, monitor cache
occupancy on a per-RMID basis, and read last level cache occupancy for a given RMID at any time
– CAT allows an OS, hypervisor, or VMM to control allocation of a CPU’s shared last-level cache which lets the processor provide access to portions
of the cache according to the established class of service (COS)
• Configuring COS defines the amount of resources (cache space) available at the logical processor level and associates each logical processor
with an available COS
– CMT provides an application the ability to run on a logical processor that uses the desired COS
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
7
Workload Placement Implications
Hosts with
PCI Hardware
Adapters
Hosts with
40G/100G NICs
Hosts with
TXT / TPM
Hosts with
NUMA
VNF-1
VNF-2c VNF-2c
VNF-2c
VNF-3
Host 1
Virtualization Layer
Router
1
Host 2
Host 3
Router
2
Intelligent
Workload
Placement
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
8
Matching Workload Needs with Infrastructure Capabilities
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
Physical Switch (physnet2-sriov)
VLAN A (data-sriov-net-a)
VLAN B (data-sriov-net-b)
Host A
Guest A Guest B
pNIC
eth0
pNIC
eth1
Compute Attributes
vNIC
e2
vNIC
e1
Physical Switch (physnet1)
VLAN A (control-net-a)
VLAN B (control-net-b)
VDU Descriptor:
– Image: <path>
– Flavor: { vcpus: <count>, memory: <mb>, storage: <GB> }
– Guest- EPA: { mempage-size: <large|small|prefer-large>,
trusted-execution: <true|false>,
cpu-pinning-policy: <dedicated|shared|any>,
thread-pin-policy: <avoid|separate|isolated|any> ,
numa: { node-cnt: <count>,
mem-policy: <strict|preferred> ,
nodes: { id: <id>, memory: <mb>,
vcpu-list: <comma separated list> } } }
– Host-EPA: { processor: { model: <model>,
features: <64b, iommu, cat, cmt, ddio, etc>
} }
– vSwitch-EPA: { ovs-acceleration: <true|false>,
ovs-offload: <true|false> } }
– Hypervisor: { type:<kvm|xen> , version: <> }
– Interface:
• Name: <string>
• Type: direct-sr-iov | normal | direct-pci-passthrough
• NIC-EPA: { vlan-tag-stripping: <boolean>,
ovs-offload: <boolean>, vendor-id: <vendor-id>,
datapath-library: <name-of-library>,
bandwidth: <bw> }
• Network: <name-of-provider-network>
Open vSwitch
HostAttributes
VF-A VF-B
Detailed CPU and network controls
described in an open descriptor model!
9
The Need for Abstracted I/O
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
I/O Abstraction Layer
DPDK PMD Drivers
Ethernet
Port
Ethernet
Port
Packet Forwarding
Virtual Ethernet Port Virtual Ethernet Port Virtual Ethernet Port
OVS Daemon
DPDK Control Interface
DPDK Accelerated OVS Forwarding Application
Virtual Ethernet Port
DPDK DPDK DPDK DPDK
GW
Guest VM
Guest - kernel space
Guest - user space
DPDK
VF Driver
Guest VM
Guest - kernel space
Guest - user space
DPDK
VF Driver
SR-IOV
SR-IOV Driver SR-IOV Driver
Application Application
Gi-LAN
Guest VM
Guest - kernel space
Guest - user space
VirtIO Driver
Guest VM
Guest - kernel space
Guest - user space
VirtIO Driver
vhost-user
VirtIO Driver VirtIO Driver
Application Application
NAT
ivshmem
Guest VM
Guest - kernel space
IVSHM
Guest - user space
DPDK
BAR2
Ivshm Driver
Application
FW
KNI
Guest VM
Guest - kernel space
IVSHM
Guest - user space
Linux Sockets
BAR2
DPDK KNI
KNI Driver
Application
Packet I/O
Tool Kit
API
Packet I/O
Tool Kit
API
Packet I/O
Tool Kit
API
Packet I/O
Tool Kit
API
Packet I/O
Tool Kit
API
Packet I/O
Tool Kit
API
10
Packet I/O Toolkit (PIOT)
• PIOT is based on DPDK EAL (Environment Abstraction Layer)
• PIOT provides a Layer-2 Packet API, which allows applications to perform fastpath I/O through the physical and logical devices that it manages. The
following types of devices are initially supported by PIOT:
- User Mode I/O Ethernet (DPDK based)
- Raw Socket Mode (attached to a Linux kernel-managed Ethernet port)
- Ring Mode (user-space loopback device)
- PCAP-based player/recorder
- PIOT supports KNI (Kernel NIC Interface) for all of the devices list above
PIOT
DPDK DPDK DPDK DPDK
SR-IOV Driver VirtIO Driver Ivshm Driver KNI Driver
Creating a Packet I/O Toolkit Leveraging DPDK
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
Device Open - This API is used for opening a
PIOT-managed device for I/O
• Input parameters:
- Device name
- Number of device Tx queues requested
- Number of device Rx queues requested
- Device event callback (link up, link
down, etc.)
- Initial device configuration requested:
- Promiscuous mode
- Multicast
• Output:
- Handle for the opened device, with the
following information:
- Number of Tx queues allocated
- Number of Rx queues allocated
- NUMA socket affinity
- Interrupt event poll info:
- Event poll function pointer
- Event poll file descriptor (of /dev/uioN
device)
Device Close - This API is used to close the
PIOT connection for the device specified by
the input handle.
• Input parameters:
- Device handle
• Output:
- Success/failure status
Burst Receive - This API polls the specified receive
queue of the device for packets and, if they are
available, reads and returns the packets in bulk.
The caller can specify the maximum number of
packets that can be read.
• Input parameters:
- Device handle
- Receive queue ID
- Maximum number of packets to be read
• Output:
- Number of packets received
- Packets received
Burst Transmit - This API polls the specified receive
queue of the device for packets and, if they are
available, reads and returns the packets in bulk.
The caller can specify the maximum number of
packets that can be read.
• Input parameters:
- Device handle
- Transmit queue ID
- Number of packets to be transmitted
- Packets to be transmitted
• Output:
- Number of packets transmitted
Device Start - This API call is used for device-
specific start operation.
• Input parameters:
- Device handle
• Output:
- Success/failure status
Device Tx Queue Setup - This API call is used
to set up the specified transmit queue of the
device.
• Input parameters:
- Device handle:
- Queue ID
- Number of Tx descriptors
- Memory pool for Tx buffer allocation
• Output:
- Success/failure status
Device Stop - This API call is used for device-
specific start operation.
• Input parameters:
- Device handle
• Output:
- Success/failure status
Device Unpairing - This API call is used to
unpair paired devices. It is important to note
that paired devices must be unpaired before
either one can be closed.
• Input parameters:
- Device 1 handle
- Device 2 handle
• Output:
- Status
Device Pairing - This operation is applicable
only for certain types of devices. Initially, this
will be implemented only for Ring Mode
devices. Its purpose is to pair two specified
logical devices. It works by connecting the
receive of one device to the transmit of the
other device, and vice-versa, creating a loop
back between them.
• Input parameters:
- Device 1 handle
- Device 2 handle
• Output:
- Status
Device Statistics Fetch - This API call is used
to fetch input and output statistics for the
device.
• Input parameters:
- Device handle
• Output:
- Device statistics
- Number of received packets
- Number of received bytes
- Number of transmitted packets
- Number of transmitted bytes
- Number of receive errors
- Number of transmit errors
- Number of multicast packets received
Device Information Fetch - This API function
is used to fetch device status and device-
specific information.
• Input parameters:
- Device handle
• Output:
- Device information:
- Device driver name
- Max Rx queues
- Max Tx queues
- Max MAC address
- PCI ID and address
- NUMA node
Device Rx Queue Setup - This API call is used
to set up the specified receive queue of the
device.
• Input parameters:
- Device handle
- Queue ID
- Number of Rx descriptors
- Memory pool for Rx buffer allocation
• Output:
- Success/failure status
11
An Example Use Case of a DPDK Packet I/O Toolkit Performance
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
Non-DPDK Enabled Network Service (Fleet)
– Traffic generator VNF  Virtual Load Balancer VNF  Traffic Sink/Reflector VNF
– Intel Niantic NICs in Wildcat Pass servers running over Cisco Nexus 3K switches
– Virtio connection to OVS with out DPDK on all hosts
DPDK Enabled Network Service (Fleet)
– Traffic generator VNF  Virtual Load Balancer VNF  Traffic Sink/Reflector VNF
– Intel Niantic NICs in Wildcat Pass servers running over Cisco Nexus 3K switches
– DPDK Enabled OVS using virtio driver on all hosts
5x Performance
Gain
12
Summary Slide
• Virtual Network Functions (VNFs) are diverse and will have many
different network connectivity requirements
– PCI Pass-Through / SR-IOV / KNI / ivshmem / virt-io / vhost-user
• Network connectivity options, in combination with other
enhancement choices, have a dramatic effect on performance
– NUMA, CPU Pinning, Huge Pages, CAT/CMT, QAT
• Leveraging DPDK to build a network abstraction layer (Packet I/O
Toolkit) provides simplified VNF networking
– Write once, deploy anywhere
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary
The information and descriptions contained herein embody confidential and proprietary information that is the property of RIFT.io, Inc. Such information and descriptions may not be copied, reproduced, disclosed to others, published or used, in whole or in part, for any
purpose other than that for which it is being made available without the express prior written permission of RIFT.io, Inc.
Nothing contained herein shall be considered a commitment by RIFT.io to develop or deliver such functionality at any time. RIFT.io reserves the right to change, modify, or delete any item(s) at any time for any reason.
Thank You
© 2013-2015 RIFT.io, Inc. Confidential and Proprietary

More Related Content

PDF
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
PDF
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
PDF
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
PDF
Intel DPDK Step by Step instructions
PPTX
High Performance Networking Leveraging the DPDK and Growing Community
PDF
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
PDF
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
PDF
Disruptive IP Networking with Intel DPDK on Linux
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
DPDK Summit 2015 - NTT - Yoshihiro Nakajima
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
Intel DPDK Step by Step instructions
High Performance Networking Leveraging the DPDK and Growing Community
Lagopus presentation on 14th Annual ON*VECTOR International Photonics Workshop
OVS and DPDK - T.F. Herbert, K. Traynor, M. Gray
Disruptive IP Networking with Intel DPDK on Linux

What's hot (20)

PDF
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
PDF
DPDK Summit 2015 - Aspera - Charles Shiflett
PPSX
FD.io Vector Packet Processing (VPP)
PPTX
Symmetric Crypto for DPDK - Declan Doherty
PDF
DPDK Summit 2015 - HP - Al Sanders
PPTX
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
PDF
DPDK in Containers Hands-on Lab
PDF
100 M pps on PC.
PDF
Performance challenges in software networking
PDF
DPDK Summit 2015 - Intel - Keith Wiles
PDF
Accelerate Service Function Chaining Vertical Solution with DPDK
PPTX
Introduction to DPDK
ODP
Dpdk performance
PDF
DPDK In Depth
PDF
DPDK Summit 2015 - Sprint - Arun Rajagopal
PDF
DPDK & Layer 4 Packet Processing
PPTX
Dpdk applications
PPTX
Accelerating Neutron with Intel DPDK
PDF
Devconf2017 - Can VMs networking benefit from DPDK
PPTX
Netsft2017 day in_life_of_nfv
DPDK Summit - 08 Sept 2014 - 6WIND - High Perf Networking Leveraging the DPDK...
DPDK Summit 2015 - Aspera - Charles Shiflett
FD.io Vector Packet Processing (VPP)
Symmetric Crypto for DPDK - Declan Doherty
DPDK Summit 2015 - HP - Al Sanders
DPDK summit 2015: It's kind of fun to do the impossible with DPDK
DPDK in Containers Hands-on Lab
100 M pps on PC.
Performance challenges in software networking
DPDK Summit 2015 - Intel - Keith Wiles
Accelerate Service Function Chaining Vertical Solution with DPDK
Introduction to DPDK
Dpdk performance
DPDK In Depth
DPDK Summit 2015 - Sprint - Arun Rajagopal
DPDK & Layer 4 Packet Processing
Dpdk applications
Accelerating Neutron with Intel DPDK
Devconf2017 - Can VMs networking benefit from DPDK
Netsft2017 day in_life_of_nfv
Ad

Viewers also liked (12)

PDF
Kvm performance optimization for ubuntu
PDF
OpenDataPlane - Bill Fischofer
ZIP
Performance Monitoring in the Cloud - Gluecon 2011
PPTX
Userspace Linux I/O
PDF
[Viet openstack] vnpt-zabbix-openstackv2.2.5.
PPTX
Viet stack 2nd meetup - Virtualization & Nova in OpenStack
PDF
OpenStack Barcelona Summit Recap - Technical Meetup #12
PDF
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
PDF
Freezer - Vietnam OpenStack Technical Meetup #12
PPTX
Ironic - Vietnam OpenStack Technical Meetup #12
PDF
Open stack nova reverse engineer
PPTX
Understanding DPDK
Kvm performance optimization for ubuntu
OpenDataPlane - Bill Fischofer
Performance Monitoring in the Cloud - Gluecon 2011
Userspace Linux I/O
[Viet openstack] vnpt-zabbix-openstackv2.2.5.
Viet stack 2nd meetup - Virtualization & Nova in OpenStack
OpenStack Barcelona Summit Recap - Technical Meetup #12
Ceph on Intel: Intel Storage Components, Benchmarks, and Contributions
Freezer - Vietnam OpenStack Technical Meetup #12
Ironic - Vietnam OpenStack Technical Meetup #12
Open stack nova reverse engineer
Understanding DPDK
Ad

Similar to DPDK Summit 2015 - RIFT.io - Tim Mortsolf (20)

PDF
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
PDF
FreeBSD VPC Introduction
PDF
Service Assurance for Virtual Network Functions in Cloud-Native Environments
PDF
WAN - trends and use cases
PDF
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
PDF
Achieving the Ultimate Performance with KVM
PDF
Known basic of NFV Features
PDF
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
PDF
Achieving the Ultimate Performance with KVM
PDF
100Gbps OpenStack For Providing High-Performance NFV
PDF
Dev Conf 2017 - Meeting nfv networking requirements
PDF
NFV в сетях операторов связи
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
PPTX
Module 1b - Hyper-v Configuration.pptx
PPTX
VMware Networking, CISCO Nexus 1000V, and CISCO UCS VM-FEX
PDF
See what happened with real time kvm when building real time cloud pezhang@re...
PDF
High performance and flexible networking
PPTX
Microsoft Server Virtualization and Private Cloud
PDF
Virtualization overheads
PDF
Approaching hyperconvergedopenstack
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
FreeBSD VPC Introduction
Service Assurance for Virtual Network Functions in Cloud-Native Environments
WAN - trends and use cases
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Achieving the Ultimate Performance with KVM
Known basic of NFV Features
VSPERF BEnchmarking the Network Data Plane of NFV VDevices and VLinks
Achieving the Ultimate Performance with KVM
100Gbps OpenStack For Providing High-Performance NFV
Dev Conf 2017 - Meeting nfv networking requirements
NFV в сетях операторов связи
VMworld 2016: vSphere 6.x Host Resource Deep Dive
Module 1b - Hyper-v Configuration.pptx
VMware Networking, CISCO Nexus 1000V, and CISCO UCS VM-FEX
See what happened with real time kvm when building real time cloud pezhang@re...
High performance and flexible networking
Microsoft Server Virtualization and Private Cloud
Virtualization overheads
Approaching hyperconvergedopenstack

More from Jim St. Leger (8)

PDF
DPDK Summit 2015 - Intro - Tim O'Driscoll
PDF
DPDK Summit - 08 Sept 2014 - Intel - Closing Remarks
PDF
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
PDF
DPDK Summit - 08 Sept 2014 - Microsoft- PacketDirect
PDF
DPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFV
PDF
DPDK Summit - 08 Sept 2014 - Introduction - St Leger
PDF
Transforming Communications Networks
PDF
3D Printing Overview
DPDK Summit 2015 - Intro - Tim O'Driscoll
DPDK Summit - 08 Sept 2014 - Intel - Closing Remarks
DPDK Summit - 08 Sept 2014 - Intel - Networking Workloads on Intel Architecture
DPDK Summit - 08 Sept 2014 - Microsoft- PacketDirect
DPDK Summit - 08 Sept 2014 - Ericsson - A Multi-Socket Ferrari for NFV
DPDK Summit - 08 Sept 2014 - Introduction - St Leger
Transforming Communications Networks
3D Printing Overview

Recently uploaded (20)

PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
ai tools demonstartion for schools and inter college
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
top salesforce developer skills in 2025.pdf
PPTX
Transform Your Business with a Software ERP System
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
Which alternative to Crystal Reports is best for small or large businesses.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Upgrade and Innovation Strategies for SAP ERP Customers
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
Softaken Excel to vCard Converter Software.pdf
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Design an Analysis of Algorithms I-SECS-1021-03
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Internet Downloader Manager (IDM) Crack 6.42 Build 41
How to Choose the Right IT Partner for Your Business in Malaysia
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
ai tools demonstartion for schools and inter college
Odoo POS Development Services by CandidRoot Solutions
2025 Textile ERP Trends: SAP, Odoo & Oracle
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
top salesforce developer skills in 2025.pdf
Transform Your Business with a Software ERP System
VVF-Customer-Presentation2025-Ver1.9.pptx
Operating system designcfffgfgggggggvggggggggg
Navsoft: AI-Powered Business Solutions & Custom Software Development

DPDK Summit 2015 - RIFT.io - Tim Mortsolf

  • 1. Leveraging DPDK to Scale-Out Network Functions Without Sacrificing Networking Performance © 2013-2015 RIFT.io, Inc. Confidential and Proprietary August 2015
  • 2. 2 VNFs Will Run in Diverse Infrastructures © 2013-2015 RIFT.io, Inc. Confidential and Proprietary How can we unify these environments? Hypervisor Host Switch Bare Metal NIB Hypervisor Host Hypervisor Host Switch Private Clouds Switch Hypervisor Host Public Clouds NIB
  • 3. 3 Virtual Network Function (VNF) Considerations © 2013-2015 RIFT.io, Inc. Confidential and Proprietary GW A VNF Breaks Down Into VNF Components (VNFc) – Control Plane / Data Plane / Security as example – VNF vendors likely to have diverse network attachment models – Any NFV architecture needs to accommodate a variety of guest attachment options – Each VNFC will need to scale independently and be able to run virtually anywhere Example VNF Bearer • Packet forwarding performance is the most critical metric • Limit complex interface bonding if possible for flow state consistency • Guest OS’s need access to data plane accelerated NICs (DPDK) • Target hosts with high bandwidth NICs (40G/100G) VNFc-1 Security • Session encryption / decryption rate is the most critical metric • Scale-out of crypto tunnels across security subsystem key for deterministic scalability • Guest OS’s need access to crypto offload • Target hosts with PCI accelerated crypto processing (NPU/NSP) VNFc-3 Control VNFc-2 • Transactions per second is the most critical metric • VPN / NAT association and management • Many other in-line services likely to be offered • Responsible for coordinated scale-out of all VNFCs • Likely most utilized function – target low cost hosts
  • 4. 4 Variety of Network Attachment Options © 2013-2015 RIFT.io, Inc. Confidential and Proprietary Virtio – Para-virtualized network that’s the simplest to deploy – OpenStack native support – Tenant encapsulation supported by OpenStack – Lower performance due to many context switches (host / guest / QEMU) – Complex networking limited to host environment Direct Pass-Through – Direct guest access to NIC hardware – OpenStack does not natively support this – Tenant encapsulation outside of OpenStack - significant work to integrate – Very high performance due to direct guest access to the hardware – Complex networking left to the guest environment and underlay SR-IOV – High speed multi-guest NIC access – OpenStack native support – Tenant encapsulation outside of OpenStack - significant work to integrate – High performance due to direct hardware support – Complex networking left to the guest environment and underlay DPDK Accelerated vSwitch with Sockets / KNI – KNI provides the ability to use guest kernel NIC interfaces – Supported by Openstack – Low performance due to kernel interface – Complex networking limited to host environment DPDK Accelerated vSwitch with Ivshmem – Facilitates fast zero-copy data sharing among virtual machines – Supported by Openstack – Good performance but hugepage shared by all guests (unsecure) – Complex networking limited to host environment DPDK Accelerated vSwitch with Virtio – Virtio to DPDK in QEMU – Supported by Openstack – Limited performance due to packet copy – Complex networking limited to host environment DPDK Accelerated vSwitch with vhost-user – Facilitates fast zero-copy data sharing among virtual machines – Supported by Openstack – Limited performance due single queue limitation (multi-queue coming) – Complex networking limited to host environment
  • 5. 5 Network Attachment Option - Details © 2013-2015 RIFT.io, Inc. Confidential and Proprietary Hypervisor BypassStandard vSwitch DPDK Accelerated vSwitch Host Host OS / kernel space Host OS / user space QEMU Direct Pass-through KVM Guest VM Guest OS / kernel space Guest OS / user space PF Driver QEMU KVM Guest VM Guest OS / kernel space Guest OS / user space PF Driver pNIC Intel VT-d Host Host OS / kernel space virtIO Host OS / user space KVM Guest VM Guest OS / kernel space Guest OS / user space VirtIO Driver OVS Tap Ethernet QEMU Host Host OS / user space Host OS / kernel space QEMU IVSHM Guest VM Guest OS / kernel space IVSHM Guest OS / user space Linux Sockets BAR2 DPDK KNI QEMU IVSHM Guest VM Guest OS / kernel space IVSHM Guest OS / user space DPDK BAR2 Guest VM Guest OS / kernel space Guest OS / user space VirtIO DPDK PMD Drivers Ethernet Port Ethernet Port Packet Forwarding Virtual Ethernet Port Virtual Ethernet Port Virtual Ethernet Port OVS Daemon DPDK Control Interface DPDK Accelerated OVS Forwarding Application virtIOivshmemKNI KNI BUF QEMU VirtIO DPDK Host Host OS / kernel space Host OS / user space DPDK SR-IOV Guest VM Guest OS / kernel space Guest OS / user space QEMU KVM Guest VM Guest OS / kernel space Guest OS / user space DPDK DPDK VF DriverVF Driver Guest VM Guest OS / kernel space Guest OS / user space VirtIO vhost-user QEMU VirtIO DPDK Virtual Ethernet Port pNIC SR-IOV Physical Function Virtual Ethernet Bridge and Classifier PCIManager Virtual Function Virtual Function
  • 6. 6 Other Considerations Beyond Network Attachment Options CPU Pinning – A process or thread affinity configured with one or multiple cores – In a 1:1 pinning configuration between virtual CPUs and physical CPUs, some predictability is introduced into the system by preventing host and guest schedulers from moving workloads around facilitating other efficiencies such as improved cache hit rates Huge Pages – Provides up to 1-GB page table entry sizes to reduce I/O translation look-aside buffer (IOTLB) misses which improves networking performance, particularly for small packets I/O-Aware NUMA Scheduling – Memory allocation process that prioritizes the highest-performing memory local to a processor core – Able to configure VMs to use CPU cores from the same processor socket and choose the optimal socket based on the locality of the relevant NIC device that is providing the data connectivity for the VM Cache Monitoring Technology / Cache Allocation Technology (CMT/CAT) – CMT allows an operating system (OS) or hypervisor or virtual machine monitor (VMM) to determine the usage of cache by applications running on the platform – CMT allows an OS or VMM to assign an ID (RMID) for each of the applications or VMs that are scheduled to run on a core, monitor cache occupancy on a per-RMID basis, and read last level cache occupancy for a given RMID at any time – CAT allows an OS, hypervisor, or VMM to control allocation of a CPU’s shared last-level cache which lets the processor provide access to portions of the cache according to the established class of service (COS) • Configuring COS defines the amount of resources (cache space) available at the logical processor level and associates each logical processor with an available COS – CMT provides an application the ability to run on a logical processor that uses the desired COS © 2013-2015 RIFT.io, Inc. Confidential and Proprietary
  • 7. 7 Workload Placement Implications Hosts with PCI Hardware Adapters Hosts with 40G/100G NICs Hosts with TXT / TPM Hosts with NUMA VNF-1 VNF-2c VNF-2c VNF-2c VNF-3 Host 1 Virtualization Layer Router 1 Host 2 Host 3 Router 2 Intelligent Workload Placement © 2013-2015 RIFT.io, Inc. Confidential and Proprietary
  • 8. 8 Matching Workload Needs with Infrastructure Capabilities © 2013-2015 RIFT.io, Inc. Confidential and Proprietary Physical Switch (physnet2-sriov) VLAN A (data-sriov-net-a) VLAN B (data-sriov-net-b) Host A Guest A Guest B pNIC eth0 pNIC eth1 Compute Attributes vNIC e2 vNIC e1 Physical Switch (physnet1) VLAN A (control-net-a) VLAN B (control-net-b) VDU Descriptor: – Image: <path> – Flavor: { vcpus: <count>, memory: <mb>, storage: <GB> } – Guest- EPA: { mempage-size: <large|small|prefer-large>, trusted-execution: <true|false>, cpu-pinning-policy: <dedicated|shared|any>, thread-pin-policy: <avoid|separate|isolated|any> , numa: { node-cnt: <count>, mem-policy: <strict|preferred> , nodes: { id: <id>, memory: <mb>, vcpu-list: <comma separated list> } } } – Host-EPA: { processor: { model: <model>, features: <64b, iommu, cat, cmt, ddio, etc> } } – vSwitch-EPA: { ovs-acceleration: <true|false>, ovs-offload: <true|false> } } – Hypervisor: { type:<kvm|xen> , version: <> } – Interface: • Name: <string> • Type: direct-sr-iov | normal | direct-pci-passthrough • NIC-EPA: { vlan-tag-stripping: <boolean>, ovs-offload: <boolean>, vendor-id: <vendor-id>, datapath-library: <name-of-library>, bandwidth: <bw> } • Network: <name-of-provider-network> Open vSwitch HostAttributes VF-A VF-B Detailed CPU and network controls described in an open descriptor model!
  • 9. 9 The Need for Abstracted I/O © 2013-2015 RIFT.io, Inc. Confidential and Proprietary I/O Abstraction Layer DPDK PMD Drivers Ethernet Port Ethernet Port Packet Forwarding Virtual Ethernet Port Virtual Ethernet Port Virtual Ethernet Port OVS Daemon DPDK Control Interface DPDK Accelerated OVS Forwarding Application Virtual Ethernet Port DPDK DPDK DPDK DPDK GW Guest VM Guest - kernel space Guest - user space DPDK VF Driver Guest VM Guest - kernel space Guest - user space DPDK VF Driver SR-IOV SR-IOV Driver SR-IOV Driver Application Application Gi-LAN Guest VM Guest - kernel space Guest - user space VirtIO Driver Guest VM Guest - kernel space Guest - user space VirtIO Driver vhost-user VirtIO Driver VirtIO Driver Application Application NAT ivshmem Guest VM Guest - kernel space IVSHM Guest - user space DPDK BAR2 Ivshm Driver Application FW KNI Guest VM Guest - kernel space IVSHM Guest - user space Linux Sockets BAR2 DPDK KNI KNI Driver Application Packet I/O Tool Kit API Packet I/O Tool Kit API Packet I/O Tool Kit API Packet I/O Tool Kit API Packet I/O Tool Kit API Packet I/O Tool Kit API
  • 10. 10 Packet I/O Toolkit (PIOT) • PIOT is based on DPDK EAL (Environment Abstraction Layer) • PIOT provides a Layer-2 Packet API, which allows applications to perform fastpath I/O through the physical and logical devices that it manages. The following types of devices are initially supported by PIOT: - User Mode I/O Ethernet (DPDK based) - Raw Socket Mode (attached to a Linux kernel-managed Ethernet port) - Ring Mode (user-space loopback device) - PCAP-based player/recorder - PIOT supports KNI (Kernel NIC Interface) for all of the devices list above PIOT DPDK DPDK DPDK DPDK SR-IOV Driver VirtIO Driver Ivshm Driver KNI Driver Creating a Packet I/O Toolkit Leveraging DPDK © 2013-2015 RIFT.io, Inc. Confidential and Proprietary Device Open - This API is used for opening a PIOT-managed device for I/O • Input parameters: - Device name - Number of device Tx queues requested - Number of device Rx queues requested - Device event callback (link up, link down, etc.) - Initial device configuration requested: - Promiscuous mode - Multicast • Output: - Handle for the opened device, with the following information: - Number of Tx queues allocated - Number of Rx queues allocated - NUMA socket affinity - Interrupt event poll info: - Event poll function pointer - Event poll file descriptor (of /dev/uioN device) Device Close - This API is used to close the PIOT connection for the device specified by the input handle. • Input parameters: - Device handle • Output: - Success/failure status Burst Receive - This API polls the specified receive queue of the device for packets and, if they are available, reads and returns the packets in bulk. The caller can specify the maximum number of packets that can be read. • Input parameters: - Device handle - Receive queue ID - Maximum number of packets to be read • Output: - Number of packets received - Packets received Burst Transmit - This API polls the specified receive queue of the device for packets and, if they are available, reads and returns the packets in bulk. The caller can specify the maximum number of packets that can be read. • Input parameters: - Device handle - Transmit queue ID - Number of packets to be transmitted - Packets to be transmitted • Output: - Number of packets transmitted Device Start - This API call is used for device- specific start operation. • Input parameters: - Device handle • Output: - Success/failure status Device Tx Queue Setup - This API call is used to set up the specified transmit queue of the device. • Input parameters: - Device handle: - Queue ID - Number of Tx descriptors - Memory pool for Tx buffer allocation • Output: - Success/failure status Device Stop - This API call is used for device- specific start operation. • Input parameters: - Device handle • Output: - Success/failure status Device Unpairing - This API call is used to unpair paired devices. It is important to note that paired devices must be unpaired before either one can be closed. • Input parameters: - Device 1 handle - Device 2 handle • Output: - Status Device Pairing - This operation is applicable only for certain types of devices. Initially, this will be implemented only for Ring Mode devices. Its purpose is to pair two specified logical devices. It works by connecting the receive of one device to the transmit of the other device, and vice-versa, creating a loop back between them. • Input parameters: - Device 1 handle - Device 2 handle • Output: - Status Device Statistics Fetch - This API call is used to fetch input and output statistics for the device. • Input parameters: - Device handle • Output: - Device statistics - Number of received packets - Number of received bytes - Number of transmitted packets - Number of transmitted bytes - Number of receive errors - Number of transmit errors - Number of multicast packets received Device Information Fetch - This API function is used to fetch device status and device- specific information. • Input parameters: - Device handle • Output: - Device information: - Device driver name - Max Rx queues - Max Tx queues - Max MAC address - PCI ID and address - NUMA node Device Rx Queue Setup - This API call is used to set up the specified receive queue of the device. • Input parameters: - Device handle - Queue ID - Number of Rx descriptors - Memory pool for Rx buffer allocation • Output: - Success/failure status
  • 11. 11 An Example Use Case of a DPDK Packet I/O Toolkit Performance © 2013-2015 RIFT.io, Inc. Confidential and Proprietary Non-DPDK Enabled Network Service (Fleet) – Traffic generator VNF  Virtual Load Balancer VNF  Traffic Sink/Reflector VNF – Intel Niantic NICs in Wildcat Pass servers running over Cisco Nexus 3K switches – Virtio connection to OVS with out DPDK on all hosts DPDK Enabled Network Service (Fleet) – Traffic generator VNF  Virtual Load Balancer VNF  Traffic Sink/Reflector VNF – Intel Niantic NICs in Wildcat Pass servers running over Cisco Nexus 3K switches – DPDK Enabled OVS using virtio driver on all hosts 5x Performance Gain
  • 12. 12 Summary Slide • Virtual Network Functions (VNFs) are diverse and will have many different network connectivity requirements – PCI Pass-Through / SR-IOV / KNI / ivshmem / virt-io / vhost-user • Network connectivity options, in combination with other enhancement choices, have a dramatic effect on performance – NUMA, CPU Pinning, Huge Pages, CAT/CMT, QAT • Leveraging DPDK to build a network abstraction layer (Packet I/O Toolkit) provides simplified VNF networking – Write once, deploy anywhere © 2013-2015 RIFT.io, Inc. Confidential and Proprietary
  • 13. The information and descriptions contained herein embody confidential and proprietary information that is the property of RIFT.io, Inc. Such information and descriptions may not be copied, reproduced, disclosed to others, published or used, in whole or in part, for any purpose other than that for which it is being made available without the express prior written permission of RIFT.io, Inc. Nothing contained herein shall be considered a commitment by RIFT.io to develop or deliver such functionality at any time. RIFT.io reserves the right to change, modify, or delete any item(s) at any time for any reason. Thank You © 2013-2015 RIFT.io, Inc. Confidential and Proprietary