Achieving the Ultimate Performance
with KVM
Venko Moyankov
DevOps.com Webinar
2020-10-06
about me
● Solutions Architect @ StorPool
● Network and System administrator
● 20+ years in telecoms building and operating
infrastructures
linkedin.com/in/venkomoyankov/
venko@storpool.com
about StorPool
● NVMe software-defined storage for VMs and containers
● Scale-out, HA, API-controlled
● Since 2011, in commercial production use since 2013
● Based in Sofia, Bulgaria
● Mostly virtual disks for KVM
● … and bare metal Linux hosts
● Also used with VMWare, Hyper-V, XenServer
● Integrations into OpenStack/Cinder, Kubernetes Persistent
Volumes, CloudStack, OpenNebula, OnApp
3
Why performance
● Better application performance -- e.g. time to load a page, time to
rebuild, time to execute specific query
● Happier customers (in cloud / multi-tenant environments)
● ROI, TCO - Lower cost per delivered resource (per VM) through
higher density
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Usual optimization goal
- lowest cost per delivered resource
- fixed performance target
- calculate all costs - power, cooling, space, server, network,
support/maintenance
Example: cost per VM with 4x dedicated 3 GHz cores and 16 GB
RAM
Unusual
- Best single-thread performance I can get at any cost
- 5 GHz cores, yummy :)
Compute node hardware
Compute node hardware
Compute node hardware
Intel
lowest cost per core:
- Xeon Gold 5220R - 24 cores @ 2.6 GHz ($244/core)
lowest cost per 3GHz+ core:
- Xeon Gold 6240R - 24 cores @ 3.2 GHz ($276/core)
- Xeon Gold 6248R - 24 cores @ 3.6 GHz ($308/core)
lowest cost per GHz:
- Xeon Gold 6230R - 26 cores @ 30 GHz ($81/GHz)
Compute node hardware
AMD
- EPYC 7702P - 64 cores @ 2.0/3.35 GHz - lowest cost per core
- EPYC 7402P - 24 cores / 1S - low density
- EPYC 7742 - 64 cores @ 2.2/3.4GHz x 2S - max density
- EPYC 7262 - 8 cores @3.4GHz - max IO/cache per core, per $
Compute node hardware
Form factor
from to
Compute node hardware
● firmware versions and BIOS settings
● Understand power management -- esp. C-states, P-states,
HWP and “bias”
○ Different on AMD EPYC: "power-deterministic",
"performance-deterministic"
● Think of rack level optimization - how do we get the lowest
total cost per delivered resource?
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Tuning KVM
RHEL7 Virtualization_Tuning_and_Optimization_Guide link
https://pve.proxmox.com/wiki/Performance_Tweaks
https://events.static.linuxfound.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf
http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf
http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu
… but don’t trust everything you read. Perform your own benchmarking!
CPU and Memory
Recent Linux kernel, KVM and QEMU
… but beware of the bleeding edge
E.g. qemu-kvm-ev from RHEV (repackaged by CentOS)
tuned-adm virtual-host
tuned-adm virtual-guest
CPU
Typical
● (heavy) oversubscription, because VMs are mostly idling
● HT
● NUMA
● route IRQs of network and storage adapters to a core on the
NUMA node they are on
Unusual
● CPU Pinning
Understanding oversubscription and congestion
Linux scheduler statistics: /proc/schedstat
(linux-stable/Documentation/scheduler/sched-stats.txt)
Next three are statistics describing scheduling latency:
7) sum of all time spent running by tasks on this processor (in ms)
8) sum of all time spent waiting to run by tasks on this processor (in ms)
9) # of tasks (not necessarily unique) given to the processor
* In nanoseconds, not ms.
20% CPU load with large wait time (bursty congestion) is possible
100% CPU load with no wait time, also possible
Measure CPU congestion!
Understanding oversubscription and congestion
Memory
Typical
● Dedicated RAM
● huge pages, THP
● NUMA
● use local-node memory if you can
Unusual
● Oversubscribed RAM
● balloon
● KSM (RAM dedup)
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Networking
Virtualized networking
● hardware emulation (rtl8139, e1000)
● paravirtualized drivers - virtio-net
regular virtio vs vhost-net vs vhost-user
Linux Bridge vs OVS in-kernel vs OVS-DPDK
Pass-through networking
SR-IOV (PCIe pass-through)
virtio-net QEMU
● Multiple context switches:
1. virtio-net driver → KVM
2. KVM → qemu/virtio-net
device
3. qemu → TAP device
4. qemu → KVM (notification)
5. KVM → virtio-net driver
(interrupt)
● Much more efficient than
emulated hardware
● shared memory with qemu
process
● qemu thread process packets
virtio vhost-net
● Two context switches
(optional):
1. virtio-net driver → KVM
2. KVM → virtio-net driver
(interrupt)
● shared memory with the host
kernel (vhost protocol)
● Allows Linux Bridge Zero
Copy
● qemu / virtio-net device is on
the control path only
● kernel thread [vhost] process
packets
virtio vhost-usr / OVS-DPDK
● No context switches
● shared memory between the
guest and the Open vSwitch
(requres huge pages)
● Zero copy
● qemu / virtio-net device is on
the control path only
● KVM not in the path
● ovs-vswitchd process
packets.
● Poll-mode-driver (PMD) takes
1 CPU core, 100%
PCI Passthrough
● No paravirtualized devices
● Direct access from the guest
kernel to the PCI device
● Host, KVM and qemu are not
on the data nor the control
path.
● NIC driver in the guest
● No virtual networking
● No live migrations
● No filtering
● No control
● Shared devices via SR-IOV
Virtual Network Performance
All measurements are between two VMs on the same host
# ping -f -c 100000 vm2
virtio-net QEMU
qemu pid
virtio vhost-net
qemu vhost thread
virtio vhost-usr / OVS-DPDK
qemu OVS
Discussion
● Deep dive into Virtio-networking and vhost-net
https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net
● Open vSwitch DPDK support
https://docs.openvswitch.org/en/latest/topics/dpdk/
Agenda
● Hardware
● Compute - CPU & Memory
● Networking
● Storage
Storage - virtualization
Virtualized
live migration
thin provisioning, snapshots, etc.
vs. Full bypass
only speed
Storage - virtualization
Virtualized
cache=none -- direct IO, bypass host buffer cache
io=native -- use Linux Native AIO, not POSIX AIO (threads)
virtio-blk vs virtio-scsi
virtio-scsi multiqueue
iothread
vs. Full bypass
SR-IOV for NVMe devices
Storage - vhost
Virtualized with qemu bypass
vhost
before:
guest kernel -> host kernel -> qemu -> host kernel -> storage system
after:
guest kernel -> storage system
storpool_server instance
1 CPU thread
2-4 GB RAM
NIC
storpool_server instance
1 CPU thread
2-4 GB RAM
storpool_server instance
1 CPU thread
2-4 GB RAM
• Highly scalable and efficient architecture
• Scales up in each storage node & out with multiple nodes
25GbE
. . .
25GbE
storpool_block instance
1 CPU thread
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
NVMe SSD
KVM Virtual Machine
KVM Virtual Machine
Storage benchmarks
Beware: lots of snake oil out there!
● performance numbers from hardware configurations totally
unlike what you’d use in production
● synthetic tests with high iodepth - 10 nodes, 10 workloads *
iodepth 256 each. (because why not)
● testing with ramdisk backend
● synthetic workloads don't approximate real world (example)
Latency
opspersecond
best service
36
Latency
opspersecond
best service
lowest cost per
delivered resource
37
Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
38
Latency
opspersecond
best service
lowest cost per
delivered resource
only pain
39
benchmarks
Benchmarks
Real load
?
?
StorPool
Storage
@storpool StorPool
Storage
StorPool
Storage
StorPool
Storage
StorPool
Storage
Follow Us Online
Q&A
Venko Moyankov
venko@storpool.com
StorPool Storage
www.storpool.com
@storpool
Thank you!
Why performance

More Related Content

PDF
Red Hat OpenShift Container Storage
PDF
MapReduce入門
PDF
DockerとDocker Hubの操作と概念
PPTX
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
PDF
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
PDF
How Netflix Tunes EC2 Instances for Performance
PDF
実践Go ツールの作成から配布まで
Red Hat OpenShift Container Storage
MapReduce入門
DockerとDocker Hubの操作と概念
kubernetes初心者がKnative Lambda Runtime触ってみた(Kubernetes Novice Tokyo #13 発表資料)
실시간 이상탐지를 위한 머신러닝 모델에 Druid _ Imply 활용하기
How Netflix Tunes EC2 Instances for Performance
実践Go ツールの作成から配布まで

What's hot (20)

PDF
Kubernetes dealing with storage and persistence
PPTX
Java でつくる 低レイテンシ実装の技巧
PPTX
Kubernetes #6 advanced scheduling
PDF
OpenStack検証環境構築・トラブルシューティング入門 - OpenStack最新情報セミナー 2014年8月
PDF
Consistent hash
PDF
「Neutronになって理解するOpenStack Network」~Neutron/Open vSwitchなどNeutronと周辺技術の解説~ - ...
PPTX
Containers and Docker
PPTX
Hadoop -NameNode HAの仕組み-
PDF
さくらのクラウドインフラの紹介
PPTX
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
PPT
Monitoring using Prometheus and Grafana
PDF
Hyper vを理解する
PDF
IT Automation with Ansible
PDF
究極のゲーム用通信プロトコルを探せ!
PPTX
MicrometerとPrometheusによる LINEファミリーアプリのモニタリング
PPTX
Hive, Presto, and Spark on TPC-DS benchmark
PDF
Kubernetes Networking - Sreenivas Makam - Google - CC18
PDF
Grafana Lokiの Docker Logging Driver入門 (Docker Meetup Tokyo #34, 2020/01/16)
Kubernetes dealing with storage and persistence
Java でつくる 低レイテンシ実装の技巧
Kubernetes #6 advanced scheduling
OpenStack検証環境構築・トラブルシューティング入門 - OpenStack最新情報セミナー 2014年8月
Consistent hash
「Neutronになって理解するOpenStack Network」~Neutron/Open vSwitchなどNeutronと周辺技術の解説~ - ...
Containers and Docker
Hadoop -NameNode HAの仕組み-
さくらのクラウドインフラの紹介
Apache Bigtopによるオープンなビッグデータ処理基盤の構築(オープンデベロッパーズカンファレンス 2021 Online 発表資料)
Monitoring using Prometheus and Grafana
Hyper vを理解する
IT Automation with Ansible
究極のゲーム用通信プロトコルを探せ!
MicrometerとPrometheusによる LINEファミリーアプリのモニタリング
Hive, Presto, and Spark on TPC-DS benchmark
Kubernetes Networking - Sreenivas Makam - Google - CC18
Grafana Lokiの Docker Logging Driver入門 (Docker Meetup Tokyo #34, 2020/01/16)
Ad

Similar to Achieving the Ultimate Performance with KVM (20)

PDF
Achieving the Ultimate Performance with KVM
PDF
Achieving the ultimate performance with KVM
PDF
Achieving the ultimate performance with KVM
PDF
Optimization_of_Virtual_Machines_for_High_Performance
PDF
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
PDF
Accelerating Virtual Machine Access with the Storage Performance Development ...
PDF
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
PDF
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
PDF
Virtualization overheads
PDF
Libvirt/KVM Driver Update (Kilo)
PPT
Redhat Virualization Technology: A Detailed Manual.
PDF
Hyper-V Best Practices & Tips and Tricks
PDF
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
PDF
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
PPTX
VDI Design Guide
PDF
Boosting I/O Performance with KVM io_uring
PDF
i-just-want-to-use-one-giant-vm.pdf
PDF
Presentation v mware performance overview
PPTX
Server Virtualization using Hyper-V
PPTX
Thu 430pm solarflare_tolley_v1[1]
Achieving the Ultimate Performance with KVM
Achieving the ultimate performance with KVM
Achieving the ultimate performance with KVM
Optimization_of_Virtual_Machines_for_High_Performance
Optimization of OpenNebula VMs for Higher Performance - Boyan Krosnov
Accelerating Virtual Machine Access with the Storage Performance Development ...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
Storage-Performance-Tuning-for-FAST-Virtual-Machines_Fam-Zheng.pdf
Virtualization overheads
Libvirt/KVM Driver Update (Kilo)
Redhat Virualization Technology: A Detailed Manual.
Hyper-V Best Practices & Tips and Tricks
DPDK Summit 2015 - RIFT.io - Tim Mortsolf
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
VDI Design Guide
Boosting I/O Performance with KVM io_uring
i-just-want-to-use-one-giant-vm.pdf
Presentation v mware performance overview
Server Virtualization using Hyper-V
Thu 430pm solarflare_tolley_v1[1]
Ad

More from DevOps.com (20)

PDF
Modernizing on IBM Z Made Easier With Open Source Software
PPTX
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
PPTX
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
PDF
Next Generation Vulnerability Assessment Using Datadog and Snyk
PPTX
Vulnerability Discovery in the Cloud
PDF
2021 Open Source Governance: Top Ten Trends and Predictions
PDF
A New Year’s Ransomware Resolution
PPTX
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
PDF
Don't Panic! Effective Incident Response
PDF
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
PDF
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
PDF
Monitoring Serverless Applications with Datadog
PDF
Deliver your App Anywhere … Publicly or Privately
PPTX
Securing medical apps in the age of covid final
PDF
How to Build a Healthy On-Call Culture
PPTX
The Evolving Role of the Developer in 2021
PDF
Service Mesh: Two Big Words But Do You Need It?
PPTX
Secure Data Sharing in OpenShift Environments
PPTX
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
PDF
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...
Modernizing on IBM Z Made Easier With Open Source Software
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Comparing Microsoft SQL Server 2019 Performance Across Various Kubernetes Pla...
Next Generation Vulnerability Assessment Using Datadog and Snyk
Vulnerability Discovery in the Cloud
2021 Open Source Governance: Top Ten Trends and Predictions
A New Year’s Ransomware Resolution
Getting Started with Runtime Security on Azure Kubernetes Service (AKS)
Don't Panic! Effective Incident Response
Creating a Culture of Chaos: Chaos Engineering Is Not Just Tools, It's Culture
Role Based Access Controls (RBAC) for SSH and Kubernetes Access with Teleport
Monitoring Serverless Applications with Datadog
Deliver your App Anywhere … Publicly or Privately
Securing medical apps in the age of covid final
How to Build a Healthy On-Call Culture
The Evolving Role of the Developer in 2021
Service Mesh: Two Big Words But Do You Need It?
Secure Data Sharing in OpenShift Environments
How to Govern Identities and Access in Cloud Infrastructure: AppsFlyer Case S...
Elevate Your Enterprise Python and R AI, ML Software Strategy with Anaconda T...

Recently uploaded (20)

PPTX
Modernising the Digital Integration Hub
PPTX
Tartificialntelligence_presentation.pptx
PPT
What is a Computer? Input Devices /output devices
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
Architecture types and enterprise applications.pdf
PDF
August Patch Tuesday
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Getting started with AI Agents and Multi-Agent Systems
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Unlock new opportunities with location data.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
Hindi spoken digit analysis for native and non-native speakers
Modernising the Digital Integration Hub
Tartificialntelligence_presentation.pptx
What is a Computer? Input Devices /output devices
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
A contest of sentiment analysis: k-nearest neighbor versus neural network
Architecture types and enterprise applications.pdf
August Patch Tuesday
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Getting started with AI Agents and Multi-Agent Systems
sustainability-14-14877-v2.pddhzftheheeeee
WOOl fibre morphology and structure.pdf for textiles
Final SEM Unit 1 for mit wpu at pune .pptx
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Unlock new opportunities with location data.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
NewMind AI Weekly Chronicles – August ’25 Week III
Developing a website for English-speaking practice to English as a foreign la...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
Hindi spoken digit analysis for native and non-native speakers

Achieving the Ultimate Performance with KVM

  • 1. Achieving the Ultimate Performance with KVM Venko Moyankov DevOps.com Webinar 2020-10-06
  • 2. about me ● Solutions Architect @ StorPool ● Network and System administrator ● 20+ years in telecoms building and operating infrastructures linkedin.com/in/venkomoyankov/ venko@storpool.com
  • 3. about StorPool ● NVMe software-defined storage for VMs and containers ● Scale-out, HA, API-controlled ● Since 2011, in commercial production use since 2013 ● Based in Sofia, Bulgaria ● Mostly virtual disks for KVM ● … and bare metal Linux hosts ● Also used with VMWare, Hyper-V, XenServer ● Integrations into OpenStack/Cinder, Kubernetes Persistent Volumes, CloudStack, OpenNebula, OnApp 3
  • 4. Why performance ● Better application performance -- e.g. time to load a page, time to rebuild, time to execute specific query ● Happier customers (in cloud / multi-tenant environments) ● ROI, TCO - Lower cost per delivered resource (per VM) through higher density
  • 5. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 6. Usual optimization goal - lowest cost per delivered resource - fixed performance target - calculate all costs - power, cooling, space, server, network, support/maintenance Example: cost per VM with 4x dedicated 3 GHz cores and 16 GB RAM Unusual - Best single-thread performance I can get at any cost - 5 GHz cores, yummy :) Compute node hardware
  • 8. Compute node hardware Intel lowest cost per core: - Xeon Gold 5220R - 24 cores @ 2.6 GHz ($244/core) lowest cost per 3GHz+ core: - Xeon Gold 6240R - 24 cores @ 3.2 GHz ($276/core) - Xeon Gold 6248R - 24 cores @ 3.6 GHz ($308/core) lowest cost per GHz: - Xeon Gold 6230R - 26 cores @ 30 GHz ($81/GHz)
  • 9. Compute node hardware AMD - EPYC 7702P - 64 cores @ 2.0/3.35 GHz - lowest cost per core - EPYC 7402P - 24 cores / 1S - low density - EPYC 7742 - 64 cores @ 2.2/3.4GHz x 2S - max density - EPYC 7262 - 8 cores @3.4GHz - max IO/cache per core, per $
  • 10. Compute node hardware Form factor from to
  • 11. Compute node hardware ● firmware versions and BIOS settings ● Understand power management -- esp. C-states, P-states, HWP and “bias” ○ Different on AMD EPYC: "power-deterministic", "performance-deterministic" ● Think of rack level optimization - how do we get the lowest total cost per delivered resource?
  • 12. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 13. Tuning KVM RHEL7 Virtualization_Tuning_and_Optimization_Guide link https://pve.proxmox.com/wiki/Performance_Tweaks https://events.static.linuxfound.org/sites/events/files/slides/CloudOpen2013_Khoa_Huynh_v3.pdf http://www.linux-kvm.org/images/f/f9/2012-forum-virtio-blk-performance-improvement.pdf http://www.slideshare.net/janghoonsim/kvm-performance-optimization-for-ubuntu … but don’t trust everything you read. Perform your own benchmarking!
  • 14. CPU and Memory Recent Linux kernel, KVM and QEMU … but beware of the bleeding edge E.g. qemu-kvm-ev from RHEV (repackaged by CentOS) tuned-adm virtual-host tuned-adm virtual-guest
  • 15. CPU Typical ● (heavy) oversubscription, because VMs are mostly idling ● HT ● NUMA ● route IRQs of network and storage adapters to a core on the NUMA node they are on Unusual ● CPU Pinning
  • 16. Understanding oversubscription and congestion Linux scheduler statistics: /proc/schedstat (linux-stable/Documentation/scheduler/sched-stats.txt) Next three are statistics describing scheduling latency: 7) sum of all time spent running by tasks on this processor (in ms) 8) sum of all time spent waiting to run by tasks on this processor (in ms) 9) # of tasks (not necessarily unique) given to the processor * In nanoseconds, not ms. 20% CPU load with large wait time (bursty congestion) is possible 100% CPU load with no wait time, also possible Measure CPU congestion!
  • 18. Memory Typical ● Dedicated RAM ● huge pages, THP ● NUMA ● use local-node memory if you can Unusual ● Oversubscribed RAM ● balloon ● KSM (RAM dedup)
  • 19. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 20. Networking Virtualized networking ● hardware emulation (rtl8139, e1000) ● paravirtualized drivers - virtio-net regular virtio vs vhost-net vs vhost-user Linux Bridge vs OVS in-kernel vs OVS-DPDK Pass-through networking SR-IOV (PCIe pass-through)
  • 21. virtio-net QEMU ● Multiple context switches: 1. virtio-net driver → KVM 2. KVM → qemu/virtio-net device 3. qemu → TAP device 4. qemu → KVM (notification) 5. KVM → virtio-net driver (interrupt) ● Much more efficient than emulated hardware ● shared memory with qemu process ● qemu thread process packets
  • 22. virtio vhost-net ● Two context switches (optional): 1. virtio-net driver → KVM 2. KVM → virtio-net driver (interrupt) ● shared memory with the host kernel (vhost protocol) ● Allows Linux Bridge Zero Copy ● qemu / virtio-net device is on the control path only ● kernel thread [vhost] process packets
  • 23. virtio vhost-usr / OVS-DPDK ● No context switches ● shared memory between the guest and the Open vSwitch (requres huge pages) ● Zero copy ● qemu / virtio-net device is on the control path only ● KVM not in the path ● ovs-vswitchd process packets. ● Poll-mode-driver (PMD) takes 1 CPU core, 100%
  • 24. PCI Passthrough ● No paravirtualized devices ● Direct access from the guest kernel to the PCI device ● Host, KVM and qemu are not on the data nor the control path. ● NIC driver in the guest ● No virtual networking ● No live migrations ● No filtering ● No control ● Shared devices via SR-IOV
  • 25. Virtual Network Performance All measurements are between two VMs on the same host # ping -f -c 100000 vm2
  • 28. virtio vhost-usr / OVS-DPDK qemu OVS
  • 29. Discussion ● Deep dive into Virtio-networking and vhost-net https://www.redhat.com/en/blog/deep-dive-virtio-networking-and-vhost-net ● Open vSwitch DPDK support https://docs.openvswitch.org/en/latest/topics/dpdk/
  • 30. Agenda ● Hardware ● Compute - CPU & Memory ● Networking ● Storage
  • 31. Storage - virtualization Virtualized live migration thin provisioning, snapshots, etc. vs. Full bypass only speed
  • 32. Storage - virtualization Virtualized cache=none -- direct IO, bypass host buffer cache io=native -- use Linux Native AIO, not POSIX AIO (threads) virtio-blk vs virtio-scsi virtio-scsi multiqueue iothread vs. Full bypass SR-IOV for NVMe devices
  • 33. Storage - vhost Virtualized with qemu bypass vhost before: guest kernel -> host kernel -> qemu -> host kernel -> storage system after: guest kernel -> storage system
  • 34. storpool_server instance 1 CPU thread 2-4 GB RAM NIC storpool_server instance 1 CPU thread 2-4 GB RAM storpool_server instance 1 CPU thread 2-4 GB RAM • Highly scalable and efficient architecture • Scales up in each storage node & out with multiple nodes 25GbE . . . 25GbE storpool_block instance 1 CPU thread NVMe SSD NVMe SSD NVMe SSD NVMe SSD NVMe SSD NVMe SSD KVM Virtual Machine KVM Virtual Machine
  • 35. Storage benchmarks Beware: lots of snake oil out there! ● performance numbers from hardware configurations totally unlike what you’d use in production ● synthetic tests with high iodepth - 10 nodes, 10 workloads * iodepth 256 each. (because why not) ● testing with ramdisk backend ● synthetic workloads don't approximate real world (example)
  • 38. Latency opspersecond best service lowest cost per delivered resource only pain 38
  • 39. Latency opspersecond best service lowest cost per delivered resource only pain 39 benchmarks
  • 42. ?
  • 43. ?
  • 45. Q&A