SlideShare a Scribd company logo
Anu H Rao
Storage Software Product line Manager
Datacenter Group, Intel® Corp
Notices & Disclaimers
Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance
varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com.
No computer system can be absolutely secure.
Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual
performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about
performance and benchmark results, visit http://www.intel.com/benchmarks .
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as
SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors
may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases,
including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks .
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use
with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable
product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect
future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction.
Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm
whether referenced data are accurate.
© 2017 Intel Corporation.
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
Latency(µs)
Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications.
0
25
50
75
100
125
150
175
200
10,000
HDD
+SAS/SATA
SSDNAND
+SAS/SATA
SSDNAND
+NVMe™
SSDoptane™
+NVMe™
kerneldriveroverhead1-8%
kerneldriverOverhead<0.01%
kerneldriveroverhead30%-50%
Drive Latency Controller Latency Driver Latency
The Challenge: Media Latency
Storage
Performance
Development
Kit
5
Scalable and Efficient Software Ingredients
• User space, lockless, polled-mode components
• Up to millions of IOPS per core
• Designed to extract maximum performance from
non-volatile media
Storage Reference Architecture
• Optimized for latest generation CPUs and SSDs
• Open source composable building blocks (BSD
licensed)
• Available via SPDK.io
• Follow @SPDKProject on twitter for latest events
and activities
Benefits of using SPDK
SPDK
more performance
from CPUs, non-
volatile media, and
networking
10X MORE IOPS/coreUp to for NVMe-oF* vs. Linux kernel
for NVMe vs. Linux kernel8X MORE IOPS/coreUp to
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using
specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to
assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance
50% BETTERUp to for RocksDB workloadTail Latency
3X BETTERUp to for Virtualized StorageIOPS/core & Latency
Up to10.8 Million IOPS with Intel® Xeon Scalable Family and 24 Intel®
Optane™ SSD DC P4800X
1.5X BETTERUp to for NVMEoF vs Kernel for Optane SSDLatency
SPDK Community
http://SPDK.IO
Real Time Chat w/
Development Community
Backlog and
Ideas for Things to Do
Main Web Presence
Email Discussions
Weekly Calls
Multiple Annual Meetups
Code Reviews & Repo
Continuous
Integration
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
Architecture
Drivers
Storage
Services
Storage
Protocols
iSCSI
Target
NVMe-oF*
Target
SCSINVMe
NVMe Devices
Blobstore
NVMe-oF*
Initiator
Intel® QuickData
Technology Driver
Block Device Abstraction (bdev)
Ceph
RBD
Linux
AIO
Logical
Volumes
3rd Party
NVMe
NVMe* PCIe
Driver
Released
New release 18.01
1H‘18
vhost-blk
Target
BlobFS
Integration
RocksDB
Ceph
Core
Application
Framework
GPT
PMDK
blk
virtio
scsi
QEMU
QoS
Linux
nbd
RDMA
virtio
blk
vhost-scsi
Target
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
Virtio
• Paravirtualized driver specification
• Common mechanisms and layouts
for device discovery, I/O queues,
etc.
• virtio device types include:
• virtio-net
• virtio-blk
• virtio-scsi
• virtio-gpu
• virtio-rng
• virtio-cryptoHypervisor (i.e. QEMU/KVM)
Guest VM
(Linux*, Windows*, FreeBSD*, etc.)
virtio front-end drivers
virtio back-end drivers
device emulation
virtqueuevirtqueuevirtqueue
11
QEMU
Kernel
Guest VM
Guest kernel
Application
virtqueue
I/O
Processing
AIO
QEMUVirtIOSCSI
12
1. Add IO to virtqueue
2. IO processed by QEMU
3. IO issued to kernel
4. Kernel pins memory
5. Device executes IO
6. Guest completion interrupt
QEMUVIRTIO
Vhost(KERNEL)
vhost target (kernel or userspace)
Vhost
• Separate process for I/O processing
• vhost protocol for communicating
guest VM parameters
• memory
• number of virtqueues
• virtqueue locations
Hypervisor (i.e. QEMU/KVM)
Guest VM
(Linux*, Windows*, FreeBSD*, etc.)
virtio front-end drivers
device emulation
virtio back-end drivers
virtqueuevirtqueuevirtqueue
vhostvhost
QEMU
Kernel
Guest VM
Guest kernel
Application
virtqueue
vhost-kernel AIO
KernelVHOST
15
1. Add IO to virtqueue
2. Write virtio doorbell
3. Wake vhost kernel
4. Kernel pins memory
5. Device executes IO
6. Guest completion interrupt
kvm
QEMUVIRTIO
Vhost(KERNEL)
vhost(USERSPACE)
17
Host Memory
QEMU
Guest VM
virtio-scsi
Shared Guest VM
Memory
SPDK vhost
vhost DPDK vhost
virtio-scsi
virtqueuevirtqueuevirtqueue
eventfd
UNIX domain
socket
SPDKVHOSTArchitecture
QEMU
Kernel
Guest VM
Guest kernel
Application
virtqueue
SPDK Vhost
vhost i/o
SPDKVHOST
18
1. Add IO to virtqueue
2. Poll virtqueue
3. Device executes IO
4. Guest completion interrupt
kvm
Architecture
Drivers
Storage
Services
Storage
Protocols
iSCSI
Target
NVMe-oF*
Target
SCSI
vhost-scsi
Target
NVMe
NVMe Devices
Blobstore
NVMe-oF*
Initiator
Intel® QuickData
Technology Driver
Block Device Abstraction (bdev)
Ceph
RBD
Linux
AIO
Logical
Volumes
3rd Party
NVMe
NVMe* PCIe
Driver
Released
New release 18.01
1H‘18
vhost-blk
Target
BlobFS
Integration
RocksDB
Ceph
Core
Application
Framework
GPT
PMDK
blk
virtio
scsi
QEMU
QoS
Linux
nbd
RDMA
virtio
blk
Sharing SSDs in userspace
Typically not 1:1 VM to local attached NVMe SSD
 otherwise just use PCI direct assignment
What about SR-IOV?
 SR-IOV SSDs not prevalent yet
 precludes features such as snapshots
What about LVM?
 LVM depends on Linux kernel block layer and storage drivers (i.e. nvme)
 SPDK wants to use userspace polled mode drivers
SPDK Blobstore and Logical Volumes!
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
SPDK vhost Performance
0
10
20
30
40
50
Linux QEMU SPDK
QD=1 Latency (in us)
System Configuration: 2S Intel® Xeon® Platinum 8180: 28C, E5-2699v3: 18C, 2.5GHz (HT off), Intel® Turbo Boost Technology enabled, 12x16GB DDR4 2133 MT/s, 1 DIMM per channel, Ubuntu* Server 16.04.2 LTS, 4.11 kernel,
23x Intel® P4800x Optane SSD – 375GB, 1 SPDK lvolstore or LVM lvgroup per SSD, SPDK commit ID c5d8b108f22ab, 46 VMs (CentOS 3.10, 1vCPU, 2GB DRAM, 100GB logical volume), vhost dedicated to 10 cores
As measured by: fio 2.10.1 – Direct=Yes, 4KB random read I/O, Ramp Time=30s, Run Time=180s, Norandommap=1, I/O Engine = libaio, Numjobs=1
Legend: Linux: Kernel vhost-scsi QEMU: virtio-blk dataplane SPDK: Userspace vhost-scsi
SPDK up to 3x better efficiency and latency
48 VMs: vhost-scsi performance (SPDK vs. Kernel)
Intel Xeon Platinum 8180 Processor, 24x Intel P4800x 375GB
2 partitions per VM, 10 vhost I/O processing cores
1
11
2.86 2.77
3.4
9.23 8.98
9.49
0
1
2
3
4
5
6
7
8
9
10
4K 100% Read 4K 100% Write 4K 70%Read30%Write
IOPSinMillions
vhost-kernel vhost-spdk
3.2x 2.7x3.2x
• Aggregate IOPS across all 48x VMs
reported. All VMs on separate cores
than vhost-scsi cores.
• 10 vhost-scsi cores for I/O
processing
• SPDK vhost-scsi up to 3.2x better
with 4K 100% Random read I/Os
• Used cgroups to restrict kernel
vhost-scsi processes to 10 cores
System Configuration:Intel Xeon Platinum 8180 @ 2.5GHz. 56 physical cores 6x 16GB, 2667 DDR4, 6 memory Channels, SSD: Intel P4800x 375GB x24 drives, Bios: HT disabled, p-states enabled, turbo enabled, Ubuntu 16.04.1
LTS, 4.11.0 x86_64 kernel, 48 VMs, number of partition: 2, VM config : 1core 1GB memory, VM OS: fedora 25, blk-mq enabled, Software packages: Qemu-2.9, libvirt-3.0.0, spdk (3bfecec994), IO distribution: 10 vhost-cores for SPDK /
Kernel. Rest 46 cores for QEMU using cgroups, FIO-2.1.10 with SPDK plugin, io depth=1, 8, 32 numjobs=1, direct=1, block size 4k
VM Density: Rate Limiting 20K IOPS per VM
Intel Xeon Platinum 8180 Processor, 24x Intel P4800x 375GB
10 vhost-scsi cores
1
11
0
10
20
30
40
50
60
70
80
90
100
0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
24 48 96
%CPUUtilization
(lowerisbetter)
IOPS
(higherisbetter)
No. of VMs
Kernel IOPS SPDK IOPS Kernel CPU Util. SPDK CPU Util.
• % CPU utilized shown from
VM side
• Each VM was running queue
depth=1, 4KB random read
workload
• Hyper threading enabled to
allow 112 cores.
• Each VM rate limited to 20K
IOPS using cgroups
• SPDK able to scale to 96 VMs,
supporting 20K per VM.
Kernel scale till 48 VMs.
Beyond 48 VMs, 10 vhost-
cores seem bottleneck
System Configuration:Intel Xeon Platinum 8180 @ 2.5GHz. 56 physical cores 6x 16GB, 2667 DDR4, 6 memory Channels, SSD: Intel P4800x 375GB x24 drives, Bios: HT disabled, p-states enabled, turbo enabled,
Ubuntu 16.04.1 LTS, 4.11.0 x86_64 kernel, 48 VMs, number of partition: 2, VM config : 1core 1GB memory, VM OS: fedora 25, blk-mq enabled, Software packages: Qemu-2.9, libvirt-3.0.0, spdk (3bfecec994), IO
distribution: 10 vhost-cores for SPDK / Kernel. Rest 46 cores for QEMU using cgroups, FIO-2.1.10 with SPDK plugin, io depth=1, 8, 32 numjobs=1, direct=1, block size 4k
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
SPDK Vhost
BDAL
Logical
Vol
NVMe Driver
BDAL
NVMe
Bdev
VM
Intel® SSD for
Datacenter
VMEPHEMERALSTORAGE
• Increased efficiency yields
greater VM density
26
BDAL
Logical
Vol
VM
Intel® SSD for
Datacenter
SPDKSPDK
Vhost
NVMe-oF
Initiator
BDAL
NVMe-oF
BD
VM
NVMe-oF
Target
VMRemoteStorage
• Enable disaggregation and
migration of VMs using
remote storage
27
Ceph
Cluster
SPDK
Intel® SSD for
Datacenter
Ceph RBD
Driver
BDAL
Ceph
Bdev
SPDK
Vhost
VM
VMCEphStorage
• Potential for innovation in
data services
• Cache
• Deduplication
28
For More information on SPDK
• Visit SPDK.io for tutorials and links to github, maillist, IRC channel and other
resources
• Follow @SPDKProject on twitter for latest events, blogs and other SPDK
community information and activities
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
Basic Architecture
Configure vhost-scsi
controller
 JSON RPC
 creates SPDK constructs for
vhost device and backing
storage
 creates controller-specific
vhost domain socket
Logical
Core 0
Logical
Core 1
vhost-scsi ctrlr
NVMe SSD
scsi dev
scsi lun
bdev
nvme
/spdk/vhost.0
Basic Architecture
Launch VM
 QEMU connects to domain
socket
SPDK
 Assigns logical core
 Starts vhost dev poller
 Allocates NVMe queue pair
 Starts NVMe poller
Logical
Core 0
Logical
Core 1
vhost-scsi ctrlr
NVMe SSD
scsi dev
scsi lun
bdev
nvme
/spdk/vhost.0
vhost-scsi
poller
VQVQVQ
QP
bdev-nvme
poller
VM
Basic Architecture
Repeat for additional
VMs
 pollers spread across
available cores
Logical
Core 0
Logical
Core 1
vhost-scsi ctrlr
NVMe SSD
scsi dev
scsi lun
bdev
nvme
/spdk/vhost.0
vhost-scsi
poller
VQVQVQ
QP
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
VM
Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive
Blobstore Design – Design Goals
• Minimalistic for targeted storage
use cases like Logical Volumes
and RocksDB
• Deliver only the basics to enable
another class of application
• Design for fast storage media
Blobstore Design – High Level
Application interacts with chunks of data called blobs
 Mutable array of pages of data, accessible via ID
Asynchronous
 No blocking, queuing or waiting
Fully parallel
 No locks in IO path
Atomic metadata operations
 Depends on SSD atomicity (i.e. NVMe)
 1+ 4KB metadata pages per blob
Logical Volumes
Blobstore plus:
 UUID xattr for lvolstore, lvols
 Friendly names
– lvol name unique within lvolstore
– lvolstore name unique within application
 Future
– snapshots (requires blobstore support)
NVMe SSD
bdev
bdev
nvme
lvol
blobstore
lvolstore
...
bdev
lvol
Asynchronous Polling
Poller execution
 Reactor on each core
 Iterates through pollers
round-robin
 vhost-scsi poller
– poll for new I/O requests
– submit to NVMe SSD
 bdev-nvme poller
– poll for I/O completions
– complete to guest VM
Logical
Core 0
Logical
Core 1
vhost-scsi ctrlr
NVMe SSD
scsi dev
scsi lun
bdev
nvme
/spdk/vhost.0
vhost-scsi
poller
VQVQVQ
QP
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
vhost-scsi
poller
bdev-nvme
poller
VM

More Related Content

PDF
Intel dpdk Tutorial
PDF
Intel DPDK Step by Step instructions
PPTX
Dpdk applications
PPTX
最近のたまおきの取り組み 〜OpenStack+αの実現に向けて〜 - OpenStack最新情報セミナー(2017年3月)
PPTX
Zynq + Vivado HLS入門
PDF
Network Programming: Data Plane Development Kit (DPDK)
PDF
ARMアーキテクチャにおけるセキュリティ機構の紹介
PDF
eStargzイメージとlazy pullingによる高速なコンテナ起動
Intel dpdk Tutorial
Intel DPDK Step by Step instructions
Dpdk applications
最近のたまおきの取り組み 〜OpenStack+αの実現に向けて〜 - OpenStack最新情報セミナー(2017年3月)
Zynq + Vivado HLS入門
Network Programming: Data Plane Development Kit (DPDK)
ARMアーキテクチャにおけるセキュリティ機構の紹介
eStargzイメージとlazy pullingによる高速なコンテナ起動

What's hot (20)

PDF
DPDK In Depth
PDF
CXL_説明_公開用.pdf
PDF
20111015 勉強会 (PCIe / SR-IOV)
PPTX
Scaleway Approach to VXLAN EVPN Fabric
PPTX
DPDK KNI interface
PDF
How VXLAN works on Linux
PPTX
Vxlan control plane and routing
ODP
Dpdk performance
PDF
KVM環境におけるネットワーク速度ベンチマーク
PDF
eBPF - Rethinking the Linux Kernel
PDF
Service Function Chaining with SRv6
PDF
20221021_JP5.0.2-Webinar-JP_Final.pdf
PDF
OSC2011 Tokyo/Fall 濃いバナ(virtio)
PPTX
Understanding eBPF in a Hurry!
PPTX
Linux Network Stack
PDF
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
What are latest new features that DPDK brings into 2018?
PPTX
VPP事始め
PDF
BPF Internals (eBPF)
DPDK In Depth
CXL_説明_公開用.pdf
20111015 勉強会 (PCIe / SR-IOV)
Scaleway Approach to VXLAN EVPN Fabric
DPDK KNI interface
How VXLAN works on Linux
Vxlan control plane and routing
Dpdk performance
KVM環境におけるネットワーク速度ベンチマーク
eBPF - Rethinking the Linux Kernel
Service Function Chaining with SRv6
20221021_JP5.0.2-Webinar-JP_Final.pdf
OSC2011 Tokyo/Fall 濃いバナ(virtio)
Understanding eBPF in a Hurry!
Linux Network Stack
DPDK: Multi Architecture High Performance Packet Processing
What are latest new features that DPDK brings into 2018?
VPP事始め
BPF Internals (eBPF)
Ad

Similar to Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive (20)

PDF
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
PDF
Introduction to container networking in K8s - SDN/NFV London meetup
PPTX
Performance out of the box developers
PDF
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
PPTX
Ceph Day Taipei - Accelerate Ceph via SPDK
PPTX
E5 Intel Xeon Processor E5 Family Making the Business Case
PDF
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
PDF
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
PPTX
Clear Linux OS - Architecture Overview
PPTX
Intel® Select Solutions for the Network
PDF
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
PPTX
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
PDF
Intel Technologies for High Performance Computing
PDF
Intel xeon-scalable-processors-overview
PDF
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
PDF
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
PDF
LF_OVS_17_IPSEC and OVS DPDK
PDF
Xeon E5 Making the Business Case PowerPoint
PPTX
Training - HPE and Intel Optane SSD Solution.PPTX
PDF
Cloud Technology: Now Entering the Business Process Phase
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Introduction to container networking in K8s - SDN/NFV London meetup
Performance out of the box developers
Intel® Xeon® Scalable Processors Enabled Applications Marketing Guide
Ceph Day Taipei - Accelerate Ceph via SPDK
E5 Intel Xeon Processor E5 Family Making the Business Case
HPC DAY 2017 | Accelerating tomorrow's HPC and AI workflows with Intel Archit...
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Clear Linux OS - Architecture Overview
Intel® Select Solutions for the Network
Spring Hill (NNP-I 1000): Intel's Data Center Inference Chip
M|18 Intel and MariaDB: Strategic Collaboration to Enhance MariaDB Functional...
Intel Technologies for High Performance Computing
Intel xeon-scalable-processors-overview
DAOS - Scale-Out Software-Defined Storage for HPC/Big Data/AI Convergence
Optimizing Apache Spark Throughput Using Intel Optane and Intel Memory Drive...
LF_OVS_17_IPSEC and OVS DPDK
Xeon E5 Making the Business Case PowerPoint
Training - HPE and Intel Optane SSD Solution.PPTX
Cloud Technology: Now Entering the Business Process Phase
Ad

More from Michelle Holley (20)

PDF
NFF-GO (YANFF) - Yet Another Network Function Framework
PDF
Edge and 5G: What is in it for the developers?
PDF
5G and Open Reference Platforms
PDF
De-fogging Edge Computing: Ecosystem, Use-cases, and Opportunities
PDF
Building the SD-Branch using uCPE
PDF
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
PDF
Accelerating Edge Computing Adoption
PDF
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
PDF
DPDK & Cloud Native
PDF
OpenDaylight Update (June 2018)
PDF
Tungsten Fabric Overview
PDF
Orchestrating NFV Workloads in Multiple Clouds
PDF
Convergence of device and data at the Edge Cloud
PDF
Intel® Network Builders - Network Edge Ecosystem Program
PDF
Design Implications, Challenges and Principles of Zero-Touch Management Envir...
PDF
Using Microservices Architecture and Patterns to Address Applications Require...
PDF
Intel Powered AI Applications for Telco
PDF
Artificial Intelligence in the Network
PDF
Service Mesh on Kubernetes with Istio
PDF
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...
NFF-GO (YANFF) - Yet Another Network Function Framework
Edge and 5G: What is in it for the developers?
5G and Open Reference Platforms
De-fogging Edge Computing: Ecosystem, Use-cases, and Opportunities
Building the SD-Branch using uCPE
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Accelerating Edge Computing Adoption
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
DPDK & Cloud Native
OpenDaylight Update (June 2018)
Tungsten Fabric Overview
Orchestrating NFV Workloads in Multiple Clouds
Convergence of device and data at the Edge Cloud
Intel® Network Builders - Network Edge Ecosystem Program
Design Implications, Challenges and Principles of Zero-Touch Management Envir...
Using Microservices Architecture and Patterns to Address Applications Require...
Intel Powered AI Applications for Telco
Artificial Intelligence in the Network
Service Mesh on Kubernetes with Istio
Intel® QuickAssist Technology Introduction, Applications, and Lab, Including ...

Recently uploaded (20)

PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
L1 - Introduction to python Backend.pptx
PPTX
Operating system designcfffgfgggggggvggggggggg
PPTX
Odoo POS Development Services by CandidRoot Solutions
PPT
JAVA ppt tutorial basics to learn java programming
PPTX
ai tools demonstartion for schools and inter college
PDF
Adobe Illustrator 28.6 Crack My Vision of Vector Design
PPTX
history of c programming in notes for students .pptx
PPTX
Transform Your Business with a Software ERP System
PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PPTX
Materi-Enum-and-Record-Data-Type (1).pptx
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
System and Network Administraation Chapter 3
PDF
AI in Product Development-omnex systems
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Digital Strategies for Manufacturing Companies
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Softaken Excel to vCard Converter Software.pdf
L1 - Introduction to python Backend.pptx
Operating system designcfffgfgggggggvggggggggg
Odoo POS Development Services by CandidRoot Solutions
JAVA ppt tutorial basics to learn java programming
ai tools demonstartion for schools and inter college
Adobe Illustrator 28.6 Crack My Vision of Vector Design
history of c programming in notes for students .pptx
Transform Your Business with a Software ERP System
Upgrade and Innovation Strategies for SAP ERP Customers
Materi-Enum-and-Record-Data-Type (1).pptx
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
System and Network Administraation Chapter 3
AI in Product Development-omnex systems
ManageIQ - Sprint 268 Review - Slide Deck
Digital Strategies for Manufacturing Companies
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
How to Migrate SBCGlobal Email to Yahoo Easily
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Internet Downloader Manager (IDM) Crack 6.42 Build 41

Accelerating Virtual Machine Access with the Storage Performance Development Kit (SPDK) – Vhost Deep Dive

  • 1. Anu H Rao Storage Software Product line Manager Datacenter Group, Intel® Corp
  • 2. Notices & Disclaimers Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. No computer system can be absolutely secure. Tests document performance of components on a particular test, in specific systems. Differences in hardware, software, or configuration will affect actual performance. Consult other sources of information to evaluate performance as you consider your purchase. For more complete information about performance and benchmark results, visit http://www.intel.com/benchmarks . Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more complete information visit http://www.intel.com/benchmarks . Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice. Cost reduction scenarios described are intended as examples of how a given Intel-based product, in the specified circumstances and configurations, may affect future costs and provide cost savings. Circumstances will vary. Intel does not guarantee any costs or cost reduction. Intel does not control or audit third-party benchmark data or the web sites referenced in this document. You should visit the referenced web site and confirm whether referenced data are accurate. © 2017 Intel Corporation.
  • 4. Latency(µs) Technology claims are based on comparisons of latency, density and write cycling metrics amongst memory technologies recorded on published specifications of in-market memory products against internal Intel specifications. 0 25 50 75 100 125 150 175 200 10,000 HDD +SAS/SATA SSDNAND +SAS/SATA SSDNAND +NVMe™ SSDoptane™ +NVMe™ kerneldriveroverhead1-8% kerneldriverOverhead<0.01% kerneldriveroverhead30%-50% Drive Latency Controller Latency Driver Latency The Challenge: Media Latency
  • 5. Storage Performance Development Kit 5 Scalable and Efficient Software Ingredients • User space, lockless, polled-mode components • Up to millions of IOPS per core • Designed to extract maximum performance from non-volatile media Storage Reference Architecture • Optimized for latest generation CPUs and SSDs • Open source composable building blocks (BSD licensed) • Available via SPDK.io • Follow @SPDKProject on twitter for latest events and activities
  • 6. Benefits of using SPDK SPDK more performance from CPUs, non- volatile media, and networking 10X MORE IOPS/coreUp to for NVMe-oF* vs. Linux kernel for NVMe vs. Linux kernel8X MORE IOPS/coreUp to Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance 50% BETTERUp to for RocksDB workloadTail Latency 3X BETTERUp to for Virtualized StorageIOPS/core & Latency Up to10.8 Million IOPS with Intel® Xeon Scalable Family and 24 Intel® Optane™ SSD DC P4800X 1.5X BETTERUp to for NVMEoF vs Kernel for Optane SSDLatency
  • 7. SPDK Community http://SPDK.IO Real Time Chat w/ Development Community Backlog and Ideas for Things to Do Main Web Presence Email Discussions Weekly Calls Multiple Annual Meetups Code Reviews & Repo Continuous Integration
  • 9. Architecture Drivers Storage Services Storage Protocols iSCSI Target NVMe-oF* Target SCSINVMe NVMe Devices Blobstore NVMe-oF* Initiator Intel® QuickData Technology Driver Block Device Abstraction (bdev) Ceph RBD Linux AIO Logical Volumes 3rd Party NVMe NVMe* PCIe Driver Released New release 18.01 1H‘18 vhost-blk Target BlobFS Integration RocksDB Ceph Core Application Framework GPT PMDK blk virtio scsi QEMU QoS Linux nbd RDMA virtio blk vhost-scsi Target
  • 11. Virtio • Paravirtualized driver specification • Common mechanisms and layouts for device discovery, I/O queues, etc. • virtio device types include: • virtio-net • virtio-blk • virtio-scsi • virtio-gpu • virtio-rng • virtio-cryptoHypervisor (i.e. QEMU/KVM) Guest VM (Linux*, Windows*, FreeBSD*, etc.) virtio front-end drivers virtio back-end drivers device emulation virtqueuevirtqueuevirtqueue 11
  • 12. QEMU Kernel Guest VM Guest kernel Application virtqueue I/O Processing AIO QEMUVirtIOSCSI 12 1. Add IO to virtqueue 2. IO processed by QEMU 3. IO issued to kernel 4. Kernel pins memory 5. Device executes IO 6. Guest completion interrupt
  • 14. vhost target (kernel or userspace) Vhost • Separate process for I/O processing • vhost protocol for communicating guest VM parameters • memory • number of virtqueues • virtqueue locations Hypervisor (i.e. QEMU/KVM) Guest VM (Linux*, Windows*, FreeBSD*, etc.) virtio front-end drivers device emulation virtio back-end drivers virtqueuevirtqueuevirtqueue vhostvhost
  • 15. QEMU Kernel Guest VM Guest kernel Application virtqueue vhost-kernel AIO KernelVHOST 15 1. Add IO to virtqueue 2. Write virtio doorbell 3. Wake vhost kernel 4. Kernel pins memory 5. Device executes IO 6. Guest completion interrupt kvm
  • 17. 17 Host Memory QEMU Guest VM virtio-scsi Shared Guest VM Memory SPDK vhost vhost DPDK vhost virtio-scsi virtqueuevirtqueuevirtqueue eventfd UNIX domain socket SPDKVHOSTArchitecture
  • 18. QEMU Kernel Guest VM Guest kernel Application virtqueue SPDK Vhost vhost i/o SPDKVHOST 18 1. Add IO to virtqueue 2. Poll virtqueue 3. Device executes IO 4. Guest completion interrupt kvm
  • 19. Architecture Drivers Storage Services Storage Protocols iSCSI Target NVMe-oF* Target SCSI vhost-scsi Target NVMe NVMe Devices Blobstore NVMe-oF* Initiator Intel® QuickData Technology Driver Block Device Abstraction (bdev) Ceph RBD Linux AIO Logical Volumes 3rd Party NVMe NVMe* PCIe Driver Released New release 18.01 1H‘18 vhost-blk Target BlobFS Integration RocksDB Ceph Core Application Framework GPT PMDK blk virtio scsi QEMU QoS Linux nbd RDMA virtio blk
  • 20. Sharing SSDs in userspace Typically not 1:1 VM to local attached NVMe SSD  otherwise just use PCI direct assignment What about SR-IOV?  SR-IOV SSDs not prevalent yet  precludes features such as snapshots What about LVM?  LVM depends on Linux kernel block layer and storage drivers (i.e. nvme)  SPDK wants to use userspace polled mode drivers SPDK Blobstore and Logical Volumes!
  • 22. SPDK vhost Performance 0 10 20 30 40 50 Linux QEMU SPDK QD=1 Latency (in us) System Configuration: 2S Intel® Xeon® Platinum 8180: 28C, E5-2699v3: 18C, 2.5GHz (HT off), Intel® Turbo Boost Technology enabled, 12x16GB DDR4 2133 MT/s, 1 DIMM per channel, Ubuntu* Server 16.04.2 LTS, 4.11 kernel, 23x Intel® P4800x Optane SSD – 375GB, 1 SPDK lvolstore or LVM lvgroup per SSD, SPDK commit ID c5d8b108f22ab, 46 VMs (CentOS 3.10, 1vCPU, 2GB DRAM, 100GB logical volume), vhost dedicated to 10 cores As measured by: fio 2.10.1 – Direct=Yes, 4KB random read I/O, Ramp Time=30s, Run Time=180s, Norandommap=1, I/O Engine = libaio, Numjobs=1 Legend: Linux: Kernel vhost-scsi QEMU: virtio-blk dataplane SPDK: Userspace vhost-scsi SPDK up to 3x better efficiency and latency
  • 23. 48 VMs: vhost-scsi performance (SPDK vs. Kernel) Intel Xeon Platinum 8180 Processor, 24x Intel P4800x 375GB 2 partitions per VM, 10 vhost I/O processing cores 1 11 2.86 2.77 3.4 9.23 8.98 9.49 0 1 2 3 4 5 6 7 8 9 10 4K 100% Read 4K 100% Write 4K 70%Read30%Write IOPSinMillions vhost-kernel vhost-spdk 3.2x 2.7x3.2x • Aggregate IOPS across all 48x VMs reported. All VMs on separate cores than vhost-scsi cores. • 10 vhost-scsi cores for I/O processing • SPDK vhost-scsi up to 3.2x better with 4K 100% Random read I/Os • Used cgroups to restrict kernel vhost-scsi processes to 10 cores System Configuration:Intel Xeon Platinum 8180 @ 2.5GHz. 56 physical cores 6x 16GB, 2667 DDR4, 6 memory Channels, SSD: Intel P4800x 375GB x24 drives, Bios: HT disabled, p-states enabled, turbo enabled, Ubuntu 16.04.1 LTS, 4.11.0 x86_64 kernel, 48 VMs, number of partition: 2, VM config : 1core 1GB memory, VM OS: fedora 25, blk-mq enabled, Software packages: Qemu-2.9, libvirt-3.0.0, spdk (3bfecec994), IO distribution: 10 vhost-cores for SPDK / Kernel. Rest 46 cores for QEMU using cgroups, FIO-2.1.10 with SPDK plugin, io depth=1, 8, 32 numjobs=1, direct=1, block size 4k
  • 24. VM Density: Rate Limiting 20K IOPS per VM Intel Xeon Platinum 8180 Processor, 24x Intel P4800x 375GB 10 vhost-scsi cores 1 11 0 10 20 30 40 50 60 70 80 90 100 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 24 48 96 %CPUUtilization (lowerisbetter) IOPS (higherisbetter) No. of VMs Kernel IOPS SPDK IOPS Kernel CPU Util. SPDK CPU Util. • % CPU utilized shown from VM side • Each VM was running queue depth=1, 4KB random read workload • Hyper threading enabled to allow 112 cores. • Each VM rate limited to 20K IOPS using cgroups • SPDK able to scale to 96 VMs, supporting 20K per VM. Kernel scale till 48 VMs. Beyond 48 VMs, 10 vhost- cores seem bottleneck System Configuration:Intel Xeon Platinum 8180 @ 2.5GHz. 56 physical cores 6x 16GB, 2667 DDR4, 6 memory Channels, SSD: Intel P4800x 375GB x24 drives, Bios: HT disabled, p-states enabled, turbo enabled, Ubuntu 16.04.1 LTS, 4.11.0 x86_64 kernel, 48 VMs, number of partition: 2, VM config : 1core 1GB memory, VM OS: fedora 25, blk-mq enabled, Software packages: Qemu-2.9, libvirt-3.0.0, spdk (3bfecec994), IO distribution: 10 vhost-cores for SPDK / Kernel. Rest 46 cores for QEMU using cgroups, FIO-2.1.10 with SPDK plugin, io depth=1, 8, 32 numjobs=1, direct=1, block size 4k
  • 26. SPDK Vhost BDAL Logical Vol NVMe Driver BDAL NVMe Bdev VM Intel® SSD for Datacenter VMEPHEMERALSTORAGE • Increased efficiency yields greater VM density 26 BDAL Logical Vol VM
  • 28. Ceph Cluster SPDK Intel® SSD for Datacenter Ceph RBD Driver BDAL Ceph Bdev SPDK Vhost VM VMCEphStorage • Potential for innovation in data services • Cache • Deduplication 28
  • 29. For More information on SPDK • Visit SPDK.io for tutorials and links to github, maillist, IRC channel and other resources • Follow @SPDKProject on twitter for latest events, blogs and other SPDK community information and activities
  • 31. Basic Architecture Configure vhost-scsi controller  JSON RPC  creates SPDK constructs for vhost device and backing storage  creates controller-specific vhost domain socket Logical Core 0 Logical Core 1 vhost-scsi ctrlr NVMe SSD scsi dev scsi lun bdev nvme /spdk/vhost.0
  • 32. Basic Architecture Launch VM  QEMU connects to domain socket SPDK  Assigns logical core  Starts vhost dev poller  Allocates NVMe queue pair  Starts NVMe poller Logical Core 0 Logical Core 1 vhost-scsi ctrlr NVMe SSD scsi dev scsi lun bdev nvme /spdk/vhost.0 vhost-scsi poller VQVQVQ QP bdev-nvme poller VM
  • 33. Basic Architecture Repeat for additional VMs  pollers spread across available cores Logical Core 0 Logical Core 1 vhost-scsi ctrlr NVMe SSD scsi dev scsi lun bdev nvme /spdk/vhost.0 vhost-scsi poller VQVQVQ QP bdev-nvme poller vhost-scsi poller bdev-nvme poller vhost-scsi poller bdev-nvme poller vhost-scsi poller bdev-nvme poller VM
  • 35. Blobstore Design – Design Goals • Minimalistic for targeted storage use cases like Logical Volumes and RocksDB • Deliver only the basics to enable another class of application • Design for fast storage media
  • 36. Blobstore Design – High Level Application interacts with chunks of data called blobs  Mutable array of pages of data, accessible via ID Asynchronous  No blocking, queuing or waiting Fully parallel  No locks in IO path Atomic metadata operations  Depends on SSD atomicity (i.e. NVMe)  1+ 4KB metadata pages per blob
  • 37. Logical Volumes Blobstore plus:  UUID xattr for lvolstore, lvols  Friendly names – lvol name unique within lvolstore – lvolstore name unique within application  Future – snapshots (requires blobstore support) NVMe SSD bdev bdev nvme lvol blobstore lvolstore ... bdev lvol
  • 38. Asynchronous Polling Poller execution  Reactor on each core  Iterates through pollers round-robin  vhost-scsi poller – poll for new I/O requests – submit to NVMe SSD  bdev-nvme poller – poll for I/O completions – complete to guest VM Logical Core 0 Logical Core 1 vhost-scsi ctrlr NVMe SSD scsi dev scsi lun bdev nvme /spdk/vhost.0 vhost-scsi poller VQVQVQ QP bdev-nvme poller vhost-scsi poller bdev-nvme poller vhost-scsi poller bdev-nvme poller vhost-scsi poller bdev-nvme poller VM