SlideShare a Scribd company logo
Copyright © NTT Communications Corporation.
Transform your business, transcend expectations with our technologically advanced solutions.
100Gbps OpenStack
For Providing High-Performance NFV
Takeaki Matsumoto
Copyright © NTT Communications Corporation.
1
Agenda
● Background
● Goal / Actions
● Kamuee (Software router)
● DPDK application on OpenStack
● Benchmark
● Conclusion
Copyright © NTT Communications Corporation.
2
Self-Introduction
Takeaki Matsumoto
takeaki.matsumoto@ntt.com
NTT Communications
Technology Development
R&D for OpenStack
Ops for Private Cloud
Copyright © NTT Communications Corporation.
3
Background
● NTT Communications
○ A Global Tier-1 ISP in 196 countries/regions
○ Over 150 datacenters in the world
● Problems
○ Costs
■ spending 1M+ USD for each core router
○ Flexibility
■ long time to add router, orchestration, rollback...
Copyright © NTT Communications Corporation.
4
Goal / Actions
● Goal
○ Cheaper and more flexible router with 100Gbps performamce
● Actions
○ Research & verify software router requirements
○ Check the OpenStack functions for NFV
○ Benchmark the software router performance with OpenStack
Copyright © NTT Communications Corporation.
5
Kamuee
● Software router with 100Gbps+ (on Baremetal)
○ Developed by NTT Communications
○ 146Gbps with 610K+ IPv4 Routes and 128Byte packets
○ Using technologies
■ DPDK
■ Poptrie
■ RCU
○ Achieving 100Gbps Performance at Core with Poptrie and Kamuee Zero
https://www.youtube.com/watch?v=OhHv3O1H8-w
Copyright © NTT Communications Corporation.
6
Requirements
● High-performance NFV requirements
○ High-bandwidth network port
○ Low-latency communication NIC-to-CPU
○ Dedicated CPU cores
○ Hugepages
○ CPU features
Copyright © NTT Communications Corporation.
7
Agenda
● Background
● Goal / Actions
● Kamuee (Software router)
● DPDK application on OpenStack
○ SR-IOV
○ NUMA
○ vCPU pinning
○ Hugepages
○ CPU feature
● Benchmark
● Conslusion
Copyright © NTT Communications Corporation.
Compute Host
NUMA
VM
8
DPDK application on OpenStack
NUMA
vCPU vCPU vCPU vCPU
VF
Memory
hugepage
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
NUMA
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
Copyright © NTT Communications Corporation.
Compute Host
NUMA
VM
9
SR-IOV
NUMA
vCPU vCPU vCPU vCPU
VF
Memory
hugepage
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
NUMA
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
Copyright © NTT Communications Corporation.
10
SR-IOV
● What is SR-IOV?
○ Hardware-level virtualization on supported NIC
○ SR-IOV device has
■ Physical Function (PF)
● Normal NIC device (1 device/physical port)
■ Virtual Funtion (VF)
● Virtual NIC device from PF
● can be created up to NIC's limit
NIC
VF VF VF
VF VF PF
Copyright © NTT Communications Corporation.
NIC
11
SR-IOV
● Why need SR-IOV?
○ vSwitch can be bottleneck on high-performance network
○ SR-IOV has no effect on Host CPU
VF
VF
VM
VF
VF
VF
PF
vNIC
Software vSwitch
VM
NIC
VF
VF
VF
PF
VF
VF
VF
PCI Passthrough
Typical Implementation SR-IOV
Copyright © NTT Communications Corporation.
12
SR-IOV
● OpenStack supports SR-IOV
○ VF can be used as Neutron port
○ Instance get VF directly with PCI-Passthrough
$ neutron port-create $net_id --name sriov_port --binding:vnic_type direct
$ openstack server create --flavor m1.large --image ubuntu_14.04 --nic
port-id=$port_id sriov-server
ubuntu@sriov-server $ lspci | grep Ethernet
00:05.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4
Virtual Function]
Copyright © NTT Communications Corporation.
Compute Host
NUMA
VM
13
NUMA
NUMA
vCPU vCPU vCPU vCPU
VF
Memory
hugepage
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
NUMA
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
Copyright © NTT Communications Corporation.
14
NUMA
● What is NUMA?
○ Non-Uniform Memory Access
○ Server usually has multi NUMA nodes on each CPU socket
○ CPU cores, Memory, PCI devices belong to its NUMA nodes
○ For low-latency, we have to think about NUMA Topology
NUMA
Socket
NIC Memory
CPU CPU CPU
CPU CPU
NUMA
Socket
NIC Memory
CPU CPU CPU
CPU CPU
Interconnect
has overhead
Copyright © NTT Communications Corporation.
15
NUMA
● OpenStack has NUMATopologyFilter
○ can schedule VM with thinking about NUMA topology
○ When using hugepages or CPU-pinning,
automatically launch on same NUMA node
○ 2 NUMA nodes also can be used
$ openstack flavor set m1.large --property hw:numa_nodes=1
$ openstack flavor set m1.large --property hw:numa_nodes=2
Copyright © NTT Communications Corporation.
Compute Host
NUMA
VM
16
vCPU pinning
NUMA
vCPU vCPU vCPU vCPU
VF
Memory
hugepage
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
NUMA
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
Copyright © NTT Communications Corporation.
● What is vCPU pinning?
○ vCPU:pCPU=1:1 dedicated allocation
■ Reduces context-switching
17
vCPU pinning
pCPU
vCPU
pCPU
vCPU
pCPU
vCPU
pCPU
vCPU
pCPU
nova-compute
Linux process
Dedicated for vCPUs
Copyright © NTT Communications Corporation.
● OpenStack flavor has extra spec "hw:cpu_policy"
○ enables vCPU pinning
18
vCPU pinning
$ openstack flavor set m1.large --property hw:cpu_policy=dedicated
$ virsh vcpupin instance-00000002
VCPU: CPU Affinity
----------------------------------
0: 1
1: 2
2: 3
3: 4
4: 5
5: 6
6: 7
7: 8
8: 9
$ virsh vcpupin instance-00000001
VCPU: CPU Affinity
----------------------------------
0: 0-31
1: 0-31
2: 0-31
3: 0-31
4: 0-31
5: 0-31
6: 0-31
7: 0-31
8: 0-31
Default allocation vCPU pinning
Copyright © NTT Communications Corporation.
Compute Host
NUMA
VM
19
Hugepages
NUMA
vCPU vCPU vCPU vCPU
VF
Memory
hugepage
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
NUMA
CPU CPU CPU CPU CPU
NICVF VF Memory
hugepage
Copyright © NTT Communications Corporation.
● What is Hugepages?
○ segmented pages in memory from 4KB to larger size
■ Reduces TLB misses
■ DPDK applications usually use Hugepages
20
Hugepages
page
page
page
page
virtual
virtual
virtual
virtual
physical
physical
physical
physical
page
page
page
page
Virtual address TLB Physical address
Page table
TLB miss
Copyright © NTT Communications Corporation.
● OpenStack flavor has extra spec "hw:mem_page_size"
○ Enables Hugepages and assign to guest
21
Hugepages
$ openstack flavor set m1.large --property hw:mem_page_size=1048576
$ cat /etc/libvirt/qemu/instance-00000002.xml | grep hugepages -1
<memoryBacking>
<hugepages>
<page size='1048576' unit='KiB' />
</hugepages>
</memoryBacking>
$ cat /proc/meminfo | grep Hugepagesize
Hugepagesize: 1048576 kB
Copyright © NTT Communications Corporation.
● Optimization feature for DPDK
○ SSSE3, SSE4,...
● "[libvirt] cpu_mode" option in nova.conf
○ By default, none is set in some distribution
○ host-model, host-passthrough, or custom is required
22
Other CPU features
$ cat /proc/cpuinfo | grep -e model name -e flags
model name : Intel Core Processor (Broadwell)
flags : fpu vme de pse tsc msr pae mce cx8 apic
sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse
sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc
rep_good nopl xtopology eagerfpu pni pclmulqdq vmx
ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt
tsc_deadline_timer aes xsave avx f16c rdrand hypervisor
lahf_lm abm 3dnowprefetch tpr_shadow vnmi flexpriority
ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2
erms invpcid rtm rdseed adx smap xsaveopt arat
$ cat /proc/cpuinfo | grep -e model name -e flags
model name : QEMU Virtual CPU version 2.0.0
flags : fpu de pse tsc msr pae mce cx8 apic sep
mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2
syscall nx lm rep_good nopl pni vmx cx16 x2apic popcnt
hypervisor lahf_lm vnmi ept
cpu_mode=none cpu_mode=host-model
Copyright © NTT Communications Corporation.
23
Agenda
● Background
● Goal / Actions
● Kamuee (Software router)
● DPDK application on OpenStack
● Benchmark
○ Environment
○ Baremetal performance
○ VM + VF performance
○ VM +PF performance
○ Baremetal (VF exists) performance
Copyright © NTT Communications Corporation.
24
Environment: Hardware
● Server
○ Dell PowerEdge R630
■ Intel® Xeon® CPU E5-2690 v4 @ 2.60GHz (14 cores) * 2
■ DDR-4 256GB (32GB * 8)
■ Ubuntu 16.04
● NIC
○ Mellanox ConnectX-4 100Gb/s Dual-Port Adapter
■ 1 PCIe Card, 100G Ports * 2
● Switch
○ Dell Networking Z9100
■ Cumulus Linux 3.2.0
■ 100Gbps Port * 32
Copyright © NTT Communications Corporation.
25
Environment: Architecture
②
③
Switch
Kamuee
pktgen
dpdk 0 VLAN100
※Each line is 100G link
ConnectX-4 100G 2 port
(using 2ports)
ConnectX-4 100G 2 port * 2
(using only each 1port)
NIC
port0
port1
pktgen
dpdk 1
NIC
port0
port1
NIC
0
port0
port1
NIC
1
port0
port1
①
②
③
①
VLAN200
nexthop 0NIC
nexthop 1
port0
port1
NIC
port0
port1
④
④
Copyright © NTT Communications Corporation.
26
Environment: pktgen-dpdk
● Open source packet generator
○ Output: about 100Mpps≒67.2Gbps/server (64Byte packet)
■ 50Mpps/port
○ dst mac
■ kamuee NIC0 port0 (port0-1 on pktgen-dpdk 0)
■ kamuee NIC1 port0 (port0-1 on pktgen-dpdk 1)
○ dst ip (range)
■ 1.0.0.1-254 (port0 on each server)
■ 1.0.4.1-254 (port1 on each server)
○ dst TCP port (range)
■ 1-254 (port0 on each server)
■ 256-510 (port1 on each server)
Copyright © NTT Communications Corporation.
27
Environment: Kamuee
● DPDK software router
● Spec configuration
○ 2 NUMA nodes
○ Using 26 cores
■ Forwarding: 12 cores/port * 2 (each NUMA)
■ Other functions: 2 cores
○ Using 16GB memory
■ 1GB Hugepages * 8 * 2 (each NUMA)
○ 2 NICs
■ only port 0 is used * 2 (each NUMA)
Copyright © NTT Communications Corporation.
28
Environment: Kamuee
● Routing configuration
○ 518K routes (like Fullroute) loaded
■ Forwading to nexthop server
● DPDK EAL options
○ ./kamuee -n 4 --socket-mem 8192,8192 -w
0000:00:05.0,txq_inline=128 -w 0000:00:06.0,txq_inline=128
kamuee-console> show ipv4 route
1.0.0.0/24 nexthop: 172.21.4.105
1.0.4.0/24 nexthop: 172.21.3.104
...
kamuee-console> show ipv4 route 172.21.4.105
172.21.4.105/32 ether: 24:8a:07:4c:2f:64 port 1
kamuee-console> show ipv4 route 172.21.3.104
172.21.3.104/32 ether: 24:8a:07:4c:2f:6c port 0
Copyright © NTT Communications Corporation.
29
Environment: nexthop
● Measuring RX packets
○ Using eth_stat.sh
■ https://community.mellanox.com/docs/DOC-2506#jive_content_id_
How_to_Measure_Ingress_Rate
■ using "rx_packets_phy" on ethtool
● hardware-level packet counter
Copyright © NTT Communications Corporation.
30
Environment: Ideal flow on each pktgen server (64Byte)
③:33.6Gbps
Switch
Kamuee
pktgen
dpdk 0 VLAN100
ConnectX-4 100G 2 port
(using 2ports)
ConnectX-4 100G 2 port * 2
(using only each 1port)
NIC
port0
port1
pktgen
dpdk 1
NIC
port0
port1
NIC
0
port0
port1
NIC
1
port0
port1
VLAN200
nexthop 0NIC
nexthop 1
port0
port1
NIC
port0
port1
①:33.6Gbps ②:67.2Gbps
③:33.6Gbps
③:33.6Gbps
③:33.6Gbps
①:33.6Gbps
Copyright © NTT Communications Corporation.
31
Environment: Ideal flow on each pktgen server (64Byte)
③:33.6Gbps
Switch
Kamuee
pktgen
dpdk 0 VLAN100
ConnectX-4 100G 2 port
(using 2ports)
ConnectX-4 100G 2 port * 2
(using only each 1port)
NIC
port0
port1
pktgen
dpdk 1
NIC
port0
port1
NIC
0
port0
port1
NIC
1
port0
port1
VLAN200
nexthop 0NIC
nexthop 1
port0
port1
NIC
port0
port1
①:33.6Gbps
②:67.2Gbps
③:33.6Gbps
③:33.6Gbps
③:33.6Gbps
①:33.6Gbps
Copyright © NTT Communications Corporation.
32
Environment: Ideal flow (64Byte)
Switch
Kamuee
pktgen
dpdk 0 VLAN100
ConnectX-4 100G 2 port
(using 2ports)
ConnectX-4 100G 2 port * 2
(using only each 1port)
NIC
port0
port1
pktgen
dpdk 1
NIC
port0
port1
NIC
0
port0
port1
NIC
1
port0
port1
VLAN200
nexthop 0NIC
nexthop 1
port0
port1
NIC
port0
port1
①:33.6Gbps ②:67.2Gbps
③:67.2Gbps
③:67.2Gbps
③:67.2Gbps
①:33.6Gbps
①:33.6Gbps
①:33.6Gbps
③:67.2Gbps
②:67.2Gbps
Copyright © NTT Communications Corporation.
33
Baremetal performance: Configuration
● BIOS
○ Hyper-Threading: OFF
● Boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable
nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup
default_hugepagesz=1G hugepagesz=1G hugepages=32
isolcpus=1-27
● Mellanox
○ CQE_COMPRESSION: AGGRESSIVE(1)
○ SRIOV_EN: False(0)
● Ports
○ 2 PFs (only port0 on each NIC)
Copyright © NTT Communications Corporation.
34
Baremetal performance: Result
Copyright © NTT Communications Corporation.
35
VM + VF performance: Host Configuration
● BIOS
○ Hyper-Threading: OFF
○ VT-d: ON
● Host boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable
nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup
default_hugepagesz=1G hugepagesz=1G hugepages=32
isolcpus=1-27 intel_iommu=on
● Mellanox
○ CQE_COMPRESSION: AGGRESSIVE(1)
○ SRIOV_EN: True(1)
○ NUM_OF_VFS: 1
Copyright © NTT Communications Corporation.
36
VM + VF performance: Guest Configuration
● Flavor
○ vCPUs: 27
○ Memory: 32GB
○ extra_specs:
■ hw:cpu_policy: dedicated
■ hw:mem_page_size: 1048576
■ hw:numa_mem.0: 16384
■ hw:numa_mem.1: 16384
■ hw:numa_cpus.0: 0-13
■ hw:numa_cpus.1: 14-26
■ hw:numa_nodes: 2
● Ports
○ 2 VFs (vf 0 on each NIC port0)
● Guest boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-26 rcu_nocbs=1-26 rcu_novb_poll audit=0
nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-26
Copyright © NTT Communications Corporation.
37
VM + VF performance: Result
Copyright © NTT Communications Corporation.
38
VM + PF performance: Host Configuration
● BIOS
○ Hyper-Threading: OFF
○ VT-d: ON
● Host boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable
nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup
default_hugepagesz=1G hugepagesz=1G hugepages=32
isolcpus=1-27 intel_iommu=on
● Mellanox
○ CQE_COMPRESSION: AGGRESSIVE(1)
○ SRIOV_EN: False(0)
Copyright © NTT Communications Corporation.
39
VM + PF performance: Guest Configuration
● Flavor
○ vCPUs: 27
○ Memory: 32GB
○ extra_specs:
■ hw:cpu_policy: dedicated
■ hw:mem_page_size: 1048576
■ hw:numa_mem.0: 16384
■ hw:numa_mem.1: 16384
■ hw:numa_cpus.0: 0-13
■ hw:numa_cpus.1: 14-26
■ hw:numa_nodes: 2
● Ports
○ 2 PFs (only port0 on each NIC with PCI-Passthrough)
● Guest boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-26 rcu_nocbs=1-26 rcu_novb_poll audit=0
nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-26
Copyright © NTT Communications Corporation.
40
VM + PF performance: Result
Copyright © NTT Communications Corporation.
41
Baremetal (VF exists) performance: Configuration
● BIOS
○ Hyper-Threading: OFF
● Boot parameters
○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-27 rcu_nocbs=1-27
rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=32
isolcpus=1-27
● Mellanox
○ CQE_COMPRESSION: AGGRESSIVE(1)
○ SRIOV_EN: True(1)
○ NUM_OF_VFS: 1
● Ports
○ 2 PFs (only port0 on each NIC)
○ 2 VFs (vf 0 on each NIC port0) exists [not used]
Copyright © NTT Communications Corporation.
42
Baremetal (VF exists) performance: Result
Copyright © NTT Communications Corporation.
43
All Results
Copyright © NTT Communications Corporation.
44
Conclusion
● OpenStack functions for NFV works fine
○ SR-IOV port assignment
○ NUMA awareness
○ vCPU pinning
○ Hugepages
○ CPU Feature
● KVM + Intel VT archive close to baremetal performance
● SR-IOV performance evaluation is required
○ SR-IOV device implementation depends on its vendor
Copyright © NTT Communications Corporation.
45
Conclusion
● Our decision
○ VM + PF is powerful option
■ SR-IOV advantange
● Multiple VF can be created
○ Router
○ Firewall
○ Load balancer
○ ...
■ 100G router consumes almost host resources
● "1 Host: 1 VM" is realistic option
○ no need so many ports
Copyright © NTT Communications Corporation.
46
Thank you!
Copyright © NTT Communications Corporation.
47
References
● SR-IOV
○ https://docs.openstack.org/ocata/networking-guide/config-sriov.html
● How to enable SR-IOV with Mellanox NIC
○ https://community.mellanox.com/docs/DOC-2386
● Hugepages
○ https://www.mirantis.com/blog/mirantis-openstack-7-0-nfvi-deployment-gui
de-huge-pages/
● isolcpu & cpupinning
○ https://docs.mirantis.com/mcp/1.0/mcp-deployment-guide/enable-numa-a
nd-cpu-pinning/enable-numa-and-cpu-pinning-procedure.html
● NUMA
○ https://docs.openstack.org/nova/pike/admin/cpu-topologies.html

More Related Content

PDF
Linux Networking Explained
PDF
Fun with Network Interfaces
PDF
DPDK & Layer 4 Packet Processing
PDF
Deploying IPv6 on OpenStack
PPTX
Introduction to DPDK
PDF
Intel dpdk Tutorial
PDF
SRv6 Network Programming: deployment use-cases
PPTX
Understanding DPDK
Linux Networking Explained
Fun with Network Interfaces
DPDK & Layer 4 Packet Processing
Deploying IPv6 on OpenStack
Introduction to DPDK
Intel dpdk Tutorial
SRv6 Network Programming: deployment use-cases
Understanding DPDK

What's hot (20)

PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
PDF
LinuxCon 2015 Linux Kernel Networking Walkthrough
PPTX
Linux Network Stack
PPTX
Tutorial: Using GoBGP as an IXP connecting router
PDF
BGP Unnumbered で遊んでみた
PDF
Deploying IPv6 in OpenStack Environments
PDF
eBPF - Rethinking the Linux Kernel
PPTX
VXLAN Practice Guide
PPTX
Vxlan deep dive session rev0.5 final
PDF
Large scale overlay networks with ovn: problems and solutions
PPTX
フロー技術によるネットワーク管理
PDF
Security Monitoring with eBPF
PPTX
Basic of IPv6
PDF
DPDK in Containers Hands-on Lab
PDF
HPCユーザが知っておきたいTCP/IPの話 ~クラスタ・グリッド環境の落とし穴~
PDF
Open vSwitch - Stateful Connection Tracking & Stateful NAT
ODP
Dpdk performance
PDF
netfilter and iptables
PDF
Introduction to IPv6
PDF
DevConf 2014 Kernel Networking Walkthrough
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
LinuxCon 2015 Linux Kernel Networking Walkthrough
Linux Network Stack
Tutorial: Using GoBGP as an IXP connecting router
BGP Unnumbered で遊んでみた
Deploying IPv6 in OpenStack Environments
eBPF - Rethinking the Linux Kernel
VXLAN Practice Guide
Vxlan deep dive session rev0.5 final
Large scale overlay networks with ovn: problems and solutions
フロー技術によるネットワーク管理
Security Monitoring with eBPF
Basic of IPv6
DPDK in Containers Hands-on Lab
HPCユーザが知っておきたいTCP/IPの話 ~クラスタ・グリッド環境の落とし穴~
Open vSwitch - Stateful Connection Tracking & Stateful NAT
Dpdk performance
netfilter and iptables
Introduction to IPv6
DevConf 2014 Kernel Networking Walkthrough
Ad

Similar to 100Gbps OpenStack For Providing High-Performance NFV (20)

PDF
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
PDF
Ensuring performance for real time packet processing in open stack white paper
PDF
Known basic of NFV Features
PDF
Measuring a 25 and 40Gb/s Data Plane
PDF
Approaching hyperconvergedopenstack
PDF
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
PDF
Unlocking the SDN and NFV Transformation
PDF
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
PDF
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
PPTX
VMworld 2016: vSphere 6.x Host Resource Deep Dive
PDF
Introduction to container networking in K8s - SDN/NFV London meetup
PDF
High performance and flexible networking
PDF
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
PDF
Service Assurance for Virtual Network Functions in Cloud-Native Environments
PPTX
Unleashing the Power of Fabric Orchestrating New Performance Features for SR-...
PDF
NFV Infrastructure Manager with High Performance Software Switch Lagopus
PDF
Libvirt/KVM Driver Update (Kilo)
PDF
Accelerate the SDN with Intel ONP
PPTX
Easing the Path to Network Transformation - Network Transformation Experience...
PPTX
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Intel's Out of the Box Network Developers Ireland Meetup on March 29 2017 - ...
Ensuring performance for real time packet processing in open stack white paper
Known basic of NFV Features
Measuring a 25 and 40Gb/s Data Plane
Approaching hyperconvergedopenstack
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
Unlocking the SDN and NFV Transformation
DPDK Summit - 08 Sept 2014 - NTT - High Performance vSwitch
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
VMworld 2016: vSphere 6.x Host Resource Deep Dive
Introduction to container networking in K8s - SDN/NFV London meetup
High performance and flexible networking
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
Service Assurance for Virtual Network Functions in Cloud-Native Environments
Unleashing the Power of Fabric Orchestrating New Performance Features for SR-...
NFV Infrastructure Manager with High Performance Software Switch Lagopus
Libvirt/KVM Driver Update (Kilo)
Accelerate the SDN with Intel ONP
Easing the Path to Network Transformation - Network Transformation Experience...
Tối ưu hiệu năng đáp ứng các yêu cầu của hệ thống 4G core
Ad

More from NTT Communications Technology Development (20)

PDF
クラウドを最大限活用するinfrastructure as codeを考えよう
PPTX
【たぶん日本初導入!】Azure Stack Hub with GPUの性能と機能紹介
PDF
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
PPTX
マルチクラウドでContinuous Deliveryを実現するSpinnakerについて
PDF
SpinnakerとKayentaで 高速・安全なデプロイ!
PDF
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
PDF
AWS re:Invent2017で見た AWSの強さとは
PDF
分散トレーシング技術について(Open tracingやjaeger)
PDF
Mexico ops meetup発表資料 20170905
PDF
NTT Tech Conference #2 - closing -
PPTX
イケてない開発チームがイケてる開発を始めようとする軌跡
PDF
GPU Container as a Service を実現するための最新OSS徹底比較
PDF
SpinnakerとOpenStackの構築
PDF
Troveコミュニティ動向
PPTX
Web rtc for iot, edge computing use cases
PDF
OpenStack Ops Mid-Cycle Meetup & Project Team Gathering出張報告
PDF
NTT Tech Conference #1 Opening Keynote
PDF
NTT Tech Conference #1 Closing Keynote
PDF
OpsからみたOpenStack Summit
クラウドを最大限活用するinfrastructure as codeを考えよう
【たぶん日本初導入!】Azure Stack Hub with GPUの性能と機能紹介
macOSの仮想化技術について ~Virtualization-rs Rust bindings for virtualization.framework ~
マルチクラウドでContinuous Deliveryを実現するSpinnakerについて
SpinnakerとKayentaで 高速・安全なデプロイ!
Can we boost more HPC performance? Integrate IBM POWER servers with GPUs to O...
AWS re:Invent2017で見た AWSの強さとは
分散トレーシング技術について(Open tracingやjaeger)
Mexico ops meetup発表資料 20170905
NTT Tech Conference #2 - closing -
イケてない開発チームがイケてる開発を始めようとする軌跡
GPU Container as a Service を実現するための最新OSS徹底比較
SpinnakerとOpenStackの構築
Troveコミュニティ動向
Web rtc for iot, edge computing use cases
OpenStack Ops Mid-Cycle Meetup & Project Team Gathering出張報告
NTT Tech Conference #1 Opening Keynote
NTT Tech Conference #1 Closing Keynote
OpsからみたOpenStack Summit

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PPTX
MYSQL Presentation for SQL database connectivity
PPT
Teaching material agriculture food technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Approach and Philosophy of On baking technology
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
The Rise and Fall of 3GPP – Time for a Sabbatical?
Mobile App Security Testing_ A Comprehensive Guide.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
MYSQL Presentation for SQL database connectivity
Teaching material agriculture food technology
Per capita expenditure prediction using model stacking based on satellite ima...
Approach and Philosophy of On baking technology
Network Security Unit 5.pdf for BCA BBA.
Agricultural_Statistics_at_a_Glance_2022_0.pdf
NewMind AI Monthly Chronicles - July 2025
20250228 LYD VKU AI Blended-Learning.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Dropbox Q2 2025 Financial Results & Investor Presentation

100Gbps OpenStack For Providing High-Performance NFV

  • 1. Copyright © NTT Communications Corporation. Transform your business, transcend expectations with our technologically advanced solutions. 100Gbps OpenStack For Providing High-Performance NFV Takeaki Matsumoto
  • 2. Copyright © NTT Communications Corporation. 1 Agenda ● Background ● Goal / Actions ● Kamuee (Software router) ● DPDK application on OpenStack ● Benchmark ● Conclusion
  • 3. Copyright © NTT Communications Corporation. 2 Self-Introduction Takeaki Matsumoto takeaki.matsumoto@ntt.com NTT Communications Technology Development R&D for OpenStack Ops for Private Cloud
  • 4. Copyright © NTT Communications Corporation. 3 Background ● NTT Communications ○ A Global Tier-1 ISP in 196 countries/regions ○ Over 150 datacenters in the world ● Problems ○ Costs ■ spending 1M+ USD for each core router ○ Flexibility ■ long time to add router, orchestration, rollback...
  • 5. Copyright © NTT Communications Corporation. 4 Goal / Actions ● Goal ○ Cheaper and more flexible router with 100Gbps performamce ● Actions ○ Research & verify software router requirements ○ Check the OpenStack functions for NFV ○ Benchmark the software router performance with OpenStack
  • 6. Copyright © NTT Communications Corporation. 5 Kamuee ● Software router with 100Gbps+ (on Baremetal) ○ Developed by NTT Communications ○ 146Gbps with 610K+ IPv4 Routes and 128Byte packets ○ Using technologies ■ DPDK ■ Poptrie ■ RCU ○ Achieving 100Gbps Performance at Core with Poptrie and Kamuee Zero https://www.youtube.com/watch?v=OhHv3O1H8-w
  • 7. Copyright © NTT Communications Corporation. 6 Requirements ● High-performance NFV requirements ○ High-bandwidth network port ○ Low-latency communication NIC-to-CPU ○ Dedicated CPU cores ○ Hugepages ○ CPU features
  • 8. Copyright © NTT Communications Corporation. 7 Agenda ● Background ● Goal / Actions ● Kamuee (Software router) ● DPDK application on OpenStack ○ SR-IOV ○ NUMA ○ vCPU pinning ○ Hugepages ○ CPU feature ● Benchmark ● Conslusion
  • 9. Copyright © NTT Communications Corporation. Compute Host NUMA VM 8 DPDK application on OpenStack NUMA vCPU vCPU vCPU vCPU VF Memory hugepage CPU CPU CPU CPU CPU NICVF VF Memory hugepage NUMA CPU CPU CPU CPU CPU NICVF VF Memory hugepage
  • 10. Copyright © NTT Communications Corporation. Compute Host NUMA VM 9 SR-IOV NUMA vCPU vCPU vCPU vCPU VF Memory hugepage CPU CPU CPU CPU CPU NICVF VF Memory hugepage NUMA CPU CPU CPU CPU CPU NICVF VF Memory hugepage
  • 11. Copyright © NTT Communications Corporation. 10 SR-IOV ● What is SR-IOV? ○ Hardware-level virtualization on supported NIC ○ SR-IOV device has ■ Physical Function (PF) ● Normal NIC device (1 device/physical port) ■ Virtual Funtion (VF) ● Virtual NIC device from PF ● can be created up to NIC's limit NIC VF VF VF VF VF PF
  • 12. Copyright © NTT Communications Corporation. NIC 11 SR-IOV ● Why need SR-IOV? ○ vSwitch can be bottleneck on high-performance network ○ SR-IOV has no effect on Host CPU VF VF VM VF VF VF PF vNIC Software vSwitch VM NIC VF VF VF PF VF VF VF PCI Passthrough Typical Implementation SR-IOV
  • 13. Copyright © NTT Communications Corporation. 12 SR-IOV ● OpenStack supports SR-IOV ○ VF can be used as Neutron port ○ Instance get VF directly with PCI-Passthrough $ neutron port-create $net_id --name sriov_port --binding:vnic_type direct $ openstack server create --flavor m1.large --image ubuntu_14.04 --nic port-id=$port_id sriov-server ubuntu@sriov-server $ lspci | grep Ethernet 00:05.0 Ethernet controller: Mellanox Technologies MT27700 Family [ConnectX-4 Virtual Function]
  • 14. Copyright © NTT Communications Corporation. Compute Host NUMA VM 13 NUMA NUMA vCPU vCPU vCPU vCPU VF Memory hugepage CPU CPU CPU CPU CPU NICVF VF Memory hugepage NUMA CPU CPU CPU CPU CPU NICVF VF Memory hugepage
  • 15. Copyright © NTT Communications Corporation. 14 NUMA ● What is NUMA? ○ Non-Uniform Memory Access ○ Server usually has multi NUMA nodes on each CPU socket ○ CPU cores, Memory, PCI devices belong to its NUMA nodes ○ For low-latency, we have to think about NUMA Topology NUMA Socket NIC Memory CPU CPU CPU CPU CPU NUMA Socket NIC Memory CPU CPU CPU CPU CPU Interconnect has overhead
  • 16. Copyright © NTT Communications Corporation. 15 NUMA ● OpenStack has NUMATopologyFilter ○ can schedule VM with thinking about NUMA topology ○ When using hugepages or CPU-pinning, automatically launch on same NUMA node ○ 2 NUMA nodes also can be used $ openstack flavor set m1.large --property hw:numa_nodes=1 $ openstack flavor set m1.large --property hw:numa_nodes=2
  • 17. Copyright © NTT Communications Corporation. Compute Host NUMA VM 16 vCPU pinning NUMA vCPU vCPU vCPU vCPU VF Memory hugepage CPU CPU CPU CPU CPU NICVF VF Memory hugepage NUMA CPU CPU CPU CPU CPU NICVF VF Memory hugepage
  • 18. Copyright © NTT Communications Corporation. ● What is vCPU pinning? ○ vCPU:pCPU=1:1 dedicated allocation ■ Reduces context-switching 17 vCPU pinning pCPU vCPU pCPU vCPU pCPU vCPU pCPU vCPU pCPU nova-compute Linux process Dedicated for vCPUs
  • 19. Copyright © NTT Communications Corporation. ● OpenStack flavor has extra spec "hw:cpu_policy" ○ enables vCPU pinning 18 vCPU pinning $ openstack flavor set m1.large --property hw:cpu_policy=dedicated $ virsh vcpupin instance-00000002 VCPU: CPU Affinity ---------------------------------- 0: 1 1: 2 2: 3 3: 4 4: 5 5: 6 6: 7 7: 8 8: 9 $ virsh vcpupin instance-00000001 VCPU: CPU Affinity ---------------------------------- 0: 0-31 1: 0-31 2: 0-31 3: 0-31 4: 0-31 5: 0-31 6: 0-31 7: 0-31 8: 0-31 Default allocation vCPU pinning
  • 20. Copyright © NTT Communications Corporation. Compute Host NUMA VM 19 Hugepages NUMA vCPU vCPU vCPU vCPU VF Memory hugepage CPU CPU CPU CPU CPU NICVF VF Memory hugepage NUMA CPU CPU CPU CPU CPU NICVF VF Memory hugepage
  • 21. Copyright © NTT Communications Corporation. ● What is Hugepages? ○ segmented pages in memory from 4KB to larger size ■ Reduces TLB misses ■ DPDK applications usually use Hugepages 20 Hugepages page page page page virtual virtual virtual virtual physical physical physical physical page page page page Virtual address TLB Physical address Page table TLB miss
  • 22. Copyright © NTT Communications Corporation. ● OpenStack flavor has extra spec "hw:mem_page_size" ○ Enables Hugepages and assign to guest 21 Hugepages $ openstack flavor set m1.large --property hw:mem_page_size=1048576 $ cat /etc/libvirt/qemu/instance-00000002.xml | grep hugepages -1 <memoryBacking> <hugepages> <page size='1048576' unit='KiB' /> </hugepages> </memoryBacking> $ cat /proc/meminfo | grep Hugepagesize Hugepagesize: 1048576 kB
  • 23. Copyright © NTT Communications Corporation. ● Optimization feature for DPDK ○ SSSE3, SSE4,... ● "[libvirt] cpu_mode" option in nova.conf ○ By default, none is set in some distribution ○ host-model, host-passthrough, or custom is required 22 Other CPU features $ cat /proc/cpuinfo | grep -e model name -e flags model name : Intel Core Processor (Broadwell) flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology eagerfpu pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm rdseed adx smap xsaveopt arat $ cat /proc/cpuinfo | grep -e model name -e flags model name : QEMU Virtual CPU version 2.0.0 flags : fpu de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pse36 clflush mmx fxsr sse sse2 syscall nx lm rep_good nopl pni vmx cx16 x2apic popcnt hypervisor lahf_lm vnmi ept cpu_mode=none cpu_mode=host-model
  • 24. Copyright © NTT Communications Corporation. 23 Agenda ● Background ● Goal / Actions ● Kamuee (Software router) ● DPDK application on OpenStack ● Benchmark ○ Environment ○ Baremetal performance ○ VM + VF performance ○ VM +PF performance ○ Baremetal (VF exists) performance
  • 25. Copyright © NTT Communications Corporation. 24 Environment: Hardware ● Server ○ Dell PowerEdge R630 ■ Intel® Xeon® CPU E5-2690 v4 @ 2.60GHz (14 cores) * 2 ■ DDR-4 256GB (32GB * 8) ■ Ubuntu 16.04 ● NIC ○ Mellanox ConnectX-4 100Gb/s Dual-Port Adapter ■ 1 PCIe Card, 100G Ports * 2 ● Switch ○ Dell Networking Z9100 ■ Cumulus Linux 3.2.0 ■ 100Gbps Port * 32
  • 26. Copyright © NTT Communications Corporation. 25 Environment: Architecture ② ③ Switch Kamuee pktgen dpdk 0 VLAN100 ※Each line is 100G link ConnectX-4 100G 2 port (using 2ports) ConnectX-4 100G 2 port * 2 (using only each 1port) NIC port0 port1 pktgen dpdk 1 NIC port0 port1 NIC 0 port0 port1 NIC 1 port0 port1 ① ② ③ ① VLAN200 nexthop 0NIC nexthop 1 port0 port1 NIC port0 port1 ④ ④
  • 27. Copyright © NTT Communications Corporation. 26 Environment: pktgen-dpdk ● Open source packet generator ○ Output: about 100Mpps≒67.2Gbps/server (64Byte packet) ■ 50Mpps/port ○ dst mac ■ kamuee NIC0 port0 (port0-1 on pktgen-dpdk 0) ■ kamuee NIC1 port0 (port0-1 on pktgen-dpdk 1) ○ dst ip (range) ■ 1.0.0.1-254 (port0 on each server) ■ 1.0.4.1-254 (port1 on each server) ○ dst TCP port (range) ■ 1-254 (port0 on each server) ■ 256-510 (port1 on each server)
  • 28. Copyright © NTT Communications Corporation. 27 Environment: Kamuee ● DPDK software router ● Spec configuration ○ 2 NUMA nodes ○ Using 26 cores ■ Forwarding: 12 cores/port * 2 (each NUMA) ■ Other functions: 2 cores ○ Using 16GB memory ■ 1GB Hugepages * 8 * 2 (each NUMA) ○ 2 NICs ■ only port 0 is used * 2 (each NUMA)
  • 29. Copyright © NTT Communications Corporation. 28 Environment: Kamuee ● Routing configuration ○ 518K routes (like Fullroute) loaded ■ Forwading to nexthop server ● DPDK EAL options ○ ./kamuee -n 4 --socket-mem 8192,8192 -w 0000:00:05.0,txq_inline=128 -w 0000:00:06.0,txq_inline=128 kamuee-console> show ipv4 route 1.0.0.0/24 nexthop: 172.21.4.105 1.0.4.0/24 nexthop: 172.21.3.104 ... kamuee-console> show ipv4 route 172.21.4.105 172.21.4.105/32 ether: 24:8a:07:4c:2f:64 port 1 kamuee-console> show ipv4 route 172.21.3.104 172.21.3.104/32 ether: 24:8a:07:4c:2f:6c port 0
  • 30. Copyright © NTT Communications Corporation. 29 Environment: nexthop ● Measuring RX packets ○ Using eth_stat.sh ■ https://community.mellanox.com/docs/DOC-2506#jive_content_id_ How_to_Measure_Ingress_Rate ■ using "rx_packets_phy" on ethtool ● hardware-level packet counter
  • 31. Copyright © NTT Communications Corporation. 30 Environment: Ideal flow on each pktgen server (64Byte) ③:33.6Gbps Switch Kamuee pktgen dpdk 0 VLAN100 ConnectX-4 100G 2 port (using 2ports) ConnectX-4 100G 2 port * 2 (using only each 1port) NIC port0 port1 pktgen dpdk 1 NIC port0 port1 NIC 0 port0 port1 NIC 1 port0 port1 VLAN200 nexthop 0NIC nexthop 1 port0 port1 NIC port0 port1 ①:33.6Gbps ②:67.2Gbps ③:33.6Gbps ③:33.6Gbps ③:33.6Gbps ①:33.6Gbps
  • 32. Copyright © NTT Communications Corporation. 31 Environment: Ideal flow on each pktgen server (64Byte) ③:33.6Gbps Switch Kamuee pktgen dpdk 0 VLAN100 ConnectX-4 100G 2 port (using 2ports) ConnectX-4 100G 2 port * 2 (using only each 1port) NIC port0 port1 pktgen dpdk 1 NIC port0 port1 NIC 0 port0 port1 NIC 1 port0 port1 VLAN200 nexthop 0NIC nexthop 1 port0 port1 NIC port0 port1 ①:33.6Gbps ②:67.2Gbps ③:33.6Gbps ③:33.6Gbps ③:33.6Gbps ①:33.6Gbps
  • 33. Copyright © NTT Communications Corporation. 32 Environment: Ideal flow (64Byte) Switch Kamuee pktgen dpdk 0 VLAN100 ConnectX-4 100G 2 port (using 2ports) ConnectX-4 100G 2 port * 2 (using only each 1port) NIC port0 port1 pktgen dpdk 1 NIC port0 port1 NIC 0 port0 port1 NIC 1 port0 port1 VLAN200 nexthop 0NIC nexthop 1 port0 port1 NIC port0 port1 ①:33.6Gbps ②:67.2Gbps ③:67.2Gbps ③:67.2Gbps ③:67.2Gbps ①:33.6Gbps ①:33.6Gbps ①:33.6Gbps ③:67.2Gbps ②:67.2Gbps
  • 34. Copyright © NTT Communications Corporation. 33 Baremetal performance: Configuration ● BIOS ○ Hyper-Threading: OFF ● Boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=32 isolcpus=1-27 ● Mellanox ○ CQE_COMPRESSION: AGGRESSIVE(1) ○ SRIOV_EN: False(0) ● Ports ○ 2 PFs (only port0 on each NIC)
  • 35. Copyright © NTT Communications Corporation. 34 Baremetal performance: Result
  • 36. Copyright © NTT Communications Corporation. 35 VM + VF performance: Host Configuration ● BIOS ○ Hyper-Threading: OFF ○ VT-d: ON ● Host boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=32 isolcpus=1-27 intel_iommu=on ● Mellanox ○ CQE_COMPRESSION: AGGRESSIVE(1) ○ SRIOV_EN: True(1) ○ NUM_OF_VFS: 1
  • 37. Copyright © NTT Communications Corporation. 36 VM + VF performance: Guest Configuration ● Flavor ○ vCPUs: 27 ○ Memory: 32GB ○ extra_specs: ■ hw:cpu_policy: dedicated ■ hw:mem_page_size: 1048576 ■ hw:numa_mem.0: 16384 ■ hw:numa_mem.1: 16384 ■ hw:numa_cpus.0: 0-13 ■ hw:numa_cpus.1: 14-26 ■ hw:numa_nodes: 2 ● Ports ○ 2 VFs (vf 0 on each NIC port0) ● Guest boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-26 rcu_nocbs=1-26 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-26
  • 38. Copyright © NTT Communications Corporation. 37 VM + VF performance: Result
  • 39. Copyright © NTT Communications Corporation. 38 VM + PF performance: Host Configuration ● BIOS ○ Hyper-Threading: OFF ○ VT-d: ON ● Host boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=32 isolcpus=1-27 intel_iommu=on ● Mellanox ○ CQE_COMPRESSION: AGGRESSIVE(1) ○ SRIOV_EN: False(0)
  • 40. Copyright © NTT Communications Corporation. 39 VM + PF performance: Guest Configuration ● Flavor ○ vCPUs: 27 ○ Memory: 32GB ○ extra_specs: ■ hw:cpu_policy: dedicated ■ hw:mem_page_size: 1048576 ■ hw:numa_mem.0: 16384 ■ hw:numa_mem.1: 16384 ■ hw:numa_cpus.0: 0-13 ■ hw:numa_cpus.1: 14-26 ■ hw:numa_nodes: 2 ● Ports ○ 2 PFs (only port0 on each NIC with PCI-Passthrough) ● Guest boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-26 rcu_nocbs=1-26 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=16 isolcpus=1-26
  • 41. Copyright © NTT Communications Corporation. 40 VM + PF performance: Result
  • 42. Copyright © NTT Communications Corporation. 41 Baremetal (VF exists) performance: Configuration ● BIOS ○ Hyper-Threading: OFF ● Boot parameters ○ intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable nohz_full=1-27 rcu_nocbs=1-27 rcu_novb_poll audit=0 nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=32 isolcpus=1-27 ● Mellanox ○ CQE_COMPRESSION: AGGRESSIVE(1) ○ SRIOV_EN: True(1) ○ NUM_OF_VFS: 1 ● Ports ○ 2 PFs (only port0 on each NIC) ○ 2 VFs (vf 0 on each NIC port0) exists [not used]
  • 43. Copyright © NTT Communications Corporation. 42 Baremetal (VF exists) performance: Result
  • 44. Copyright © NTT Communications Corporation. 43 All Results
  • 45. Copyright © NTT Communications Corporation. 44 Conclusion ● OpenStack functions for NFV works fine ○ SR-IOV port assignment ○ NUMA awareness ○ vCPU pinning ○ Hugepages ○ CPU Feature ● KVM + Intel VT archive close to baremetal performance ● SR-IOV performance evaluation is required ○ SR-IOV device implementation depends on its vendor
  • 46. Copyright © NTT Communications Corporation. 45 Conclusion ● Our decision ○ VM + PF is powerful option ■ SR-IOV advantange ● Multiple VF can be created ○ Router ○ Firewall ○ Load balancer ○ ... ■ 100G router consumes almost host resources ● "1 Host: 1 VM" is realistic option ○ no need so many ports
  • 47. Copyright © NTT Communications Corporation. 46 Thank you!
  • 48. Copyright © NTT Communications Corporation. 47 References ● SR-IOV ○ https://docs.openstack.org/ocata/networking-guide/config-sriov.html ● How to enable SR-IOV with Mellanox NIC ○ https://community.mellanox.com/docs/DOC-2386 ● Hugepages ○ https://www.mirantis.com/blog/mirantis-openstack-7-0-nfvi-deployment-gui de-huge-pages/ ● isolcpu & cpupinning ○ https://docs.mirantis.com/mcp/1.0/mcp-deployment-guide/enable-numa-a nd-cpu-pinning/enable-numa-and-cpu-pinning-procedure.html ● NUMA ○ https://docs.openstack.org/nova/pike/admin/cpu-topologies.html