SlideShare a Scribd company logo
Sharing High-Performance
Devices Across Multiple Virtual
Machines
Preamble
• What does “sharing devices across multiple virtual machines” in our title mean?
• How is it different from virtual networking / NSX, which allow multiple virtual networks to share
underlying networking hardware?
• Virtual networking works well for many standard workloads, but in the realm of extreme
performance we need to deliver much closer to bare-metal performance to meet application
requirements
• Application areas: Science & Research (HPC), Finance, Machine Learning & Big Data, etc.
• This talk is about achieving both extremely high performance and device sharing
2
Sharing High-Performance PCI Devices
1 Technical Background
2 Big Data Analytics with SPARK
3 High Performance (Technical) Computing
CONFIDENTIAL
3
Direct Device Access
Technologies
Accessing PCI devices with maximum performance
VMware
ESXi
VM Direct Path I/O
• Allows PCI devices to be accessed directly by guest OS
– Examples: GPUs for computation (GPGPU), ultra-low latency
interconnects like InfiniBand and RDMA over Converged Ethernet
(RoCE)
• Downsides: No vMotion, No Snapshots, etc.
• Full device is made available to a single VM – no sharing
• No ESXi driver required – just the standard vendor device driver
CONFIDENTIAL
5
Virtual Machine
Guest OS
Kernel
Application
DirectPath I/O
• The PCI standard includes a specification for SR-IOV,
Single Root I/O Virtualization
• A single PCI device can present as multiple logical devices
(Virtual Functions or VFs) to ESX and to VMs
• Downsides: No vMotion, No Snapshots (but note: pvRDMA
feature in ESX 6.5)
• An ESXi driver and a guest driver are required for SR-IOV
• Mellanox Technologies supports ESXi SR-IOV for both
InfiniBand and RoCE interconnects
6
SR-IOV
Virtual Machine
Guest OS
Kernel
Application
PF VF
vSwitch
VMXNET3
Device Partitioning (SR-IOV)
Enabling Technology: Full-featured Virtual Functions
• Device capabilities
– Multi-queue NIC
– RDMA (user / kernel)
– User-space networking
• All capabilities may be used concurrently
– For example, kernel NIC together with
RDMA applications
• Scales to any number of applications or vCPUs
– Millions of hardware queues
– Multiple interrupt vectors
– Thousands of application address spaces
PF VF VF
Hypervisor
Guest
HW
VF
TCP/IP
RDMA
core
RDMA
storage
User-space VF driver
NIC RDMA
RDMA
Application
VF driver
Remote Direct Memory Access (RDMA)
• A hardware transport protocol
– Optimized for moving data to/from memory
• Extreme performance
– 600ns application-to-application latencies
– 100Gbps throughput
– Negligible CPU overheads
• RDMA applications
– Storage (iSER, NFS-RDMA, NVMoF, Lustre)
– HPC (MPI, SHMEM)
– Big data and analytics (Hadoop, Spark)
8
How does RDMA achieve high performance?
• Traditional network stack challenges
– Per message / packet / byte overheads
– User-kernel crossings
– Memory copies
• RDMA provides in hardware:
– Isolation between applications
– Transport
• Packetizing messages
• Reliable delivery
– Address translation
• User-level networking
– Direct hardware access for data path
9
Kernel
User
RDMA-capable
hardware
NVMeF iSER
Buf
Buf
Buf
AppA AppB
Buf
Buf
• Mellanox SN2700 Spectrum Switch
• Each Host:
– OS: ESXi6.5, Memory: 256GB RAM, CPU: dual socket, Intel(R) Xeon(R) CPU E5-2697 v3
@ 2.60GH (14 cores per socket)
– Network: Mellanox ConnectX-4 100Gb (SR-IOV RoCE capable), Hypervisor Driver: MLNX-
NATIVE-ESX-ConnectX-4-5_4.16.10.2-10EM-650.0.0.459867, Firmware: 12.20.10.10
– 1 VM, 20 vCPU and 220GB memory (NOTE: Running multiple NUMA-aligned VMs will
generally yield higher performance – sometimes better than bare-metal)
– Guest OS: Linux CentOS7.3x64
– VM Network: SR-IOV RoCE VF, VM Driver: MLNX OFED 4.1-1.0.2.0
– Running Spark benchmark (HiBench/Terasort)
– Spark version https://github.com/Mellanox/SparkRDMA/, Java version 1.8.0-131
• Name Node Server:
– OS: Linux CentOS7.3x64, Memory: 256GB RAM, CPU: dual socket, Intel(R) Xeon(R) CPU
E5-2697 v3 @ 2.60GH (14 cores per socket)
– Network: Mellanox ConnectX-4 100Gb, Driver: Mellanox OFED 4.1-1.0.2.0, Firmware:
12.20.10.10
• For detailed configuration of Spark on vSphere, see:
– Fast Virtualized Hadoop and Spark on All-Flash Disks – Best Practices for Optimizing Virtualized Big Data
Applications on VMware vSphere 6.5
• https://blogs.vmware.com/performance/2017/08/big-data-vsphere65-perf.html
Test Setup – SR-IOV
16 ESXi6.5
hosts, one
Spark VM
per host
1 Server
used as
Named Node
SPARK Test Results – vSphere with SR-IOV
Runtime samples SR-IOV TCP (sec) SR-IOV RDMA (sec) Improvement
Average 222 (1.05x) 171 (1.01x) 23%
Min 213 (1.07x) 165 (1.05x) 23%
Max 233 (1.05x) 174 (1.0x) 25%
0
50
100
150
200
250
Average Min Max
Runtime(secs)
TCP vs. RDMA (Lower Is Better)
SR-IOV TCP SR-IOV RDMA
High Performance Computing
Research, Science, and Engineering applications on vSphere
MPI Workloads
14
MPI (Message Passing Interface)
Examples:
• Weather forecasting
• Molecular modelling
• Jet engine design
• Spaceship, airplane & automobile design
ESXi ESXiESXi
15
InfiniBand SR-IOV MPI Example
Cluster 2
Host
VM VM
Host Host
IB
IB
VM
VM VM
IB IB
VM
IB
Cluster 1
SR-IOV
SR-IOV
IB
IB
SR-IOV
SR-IOV
IB
IB
SR-IOV
SR-IOV
• SR-IOV InfiniBand
• All VMs: #vCPU = #cores
• 100% CPU overcommit
• No memory overcommit
ESXi ESXiESXi LinuxLinuxLinux
16
InfiniBand SR-IOV MPI Performance Test
Cluster 2
Host
VM VM
Host Host
IB
IB
VM
VM VM
IB IB
VM
IB
Cluster 1
SR-IOV
SR-IOV
IB
IB
SR-IOV
SR-IOV
IB
IB
SR-IOV
SR-IOV
93.4 93.4
98.5
169.3
169.3
Run time (seconds)
Application: NAMD
Benchmark: STMV
20-vCPU VMs for all tests
60 MPI processes per job
Bare metal
One vCluster
Two vClusters
10%
Summary
• Virtualization can support high-performance device sharing for cases in which extreme
performance is a critical requirement
• Virtualization supports SR-IOV device sharing and delivers near bare-metal performance
– High Performance Computing
– Big Data SPARK Analytics
• The VMware platform and partner ecosystem address the extreme performance needs of the
most demanding emerging workloads
17

More Related Content

PDF
Learning from ZFS to Scale Storage on and under Containers
PPTX
Ceph Performance Profiling and Reporting
PDF
DPDK Integration: A Product's Journey - Roger B. Melton
PDF
Ceph Day Beijing - SPDK for Ceph
PDF
SOUG_GV_Flashgrid_V4
PPTX
Walk Through a Software Defined Everything PoC
PPTX
NGENSTOR_ODA_P2V_V5
PDF
Ceph Day Beijing - Ceph RDMA Update
Learning from ZFS to Scale Storage on and under Containers
Ceph Performance Profiling and Reporting
DPDK Integration: A Product's Journey - Roger B. Melton
Ceph Day Beijing - SPDK for Ceph
SOUG_GV_Flashgrid_V4
Walk Through a Software Defined Everything PoC
NGENSTOR_ODA_P2V_V5
Ceph Day Beijing - Ceph RDMA Update

What's hot (19)

PPTX
Symmetric Crypto for DPDK - Declan Doherty
PPTX
Ceph: Low Fail Go Scale
PPTX
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
PDF
Generic Resource Manager - László Vadkerti, András Kovács
PPTX
Inside Microsoft's FPGA-Based Configurable Cloud
PDF
Developing a Ceph Appliance for Secure Environments
PDF
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
PDF
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
PPTX
Ceph Community Talk on High-Performance Solid Sate Ceph
PDF
Different approaches to performance enhancements in network virtualization fo...
PPTX
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
PPTX
Ceph Tech Talk -- Ceph Benchmarking Tool
PPTX
Revisiting CephFS MDS and mClock QoS Scheduler
PDF
Ceph on All Flash Storage -- Breaking Performance Barriers
PDF
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
PDF
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
PDF
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
PDF
2016-JAN-28 -- High Performance Production Databases on Ceph
PDF
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Symmetric Crypto for DPDK - Declan Doherty
Ceph: Low Fail Go Scale
Ceph Day Melbourne - Walk Through a Software Defined Everything PoC
Generic Resource Manager - László Vadkerti, András Kovács
Inside Microsoft's FPGA-Based Configurable Cloud
Developing a Ceph Appliance for Secure Environments
Accelerating Cassandra Workloads on Ceph with All-Flash PCIE SSDS
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Community Talk on High-Performance Solid Sate Ceph
Different approaches to performance enhancements in network virtualization fo...
Using Recently Published Ceph Reference Architectures to Select Your Ceph Con...
Ceph Tech Talk -- Ceph Benchmarking Tool
Revisiting CephFS MDS and mClock QoS Scheduler
Ceph on All Flash Storage -- Breaking Performance Barriers
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Hands-on Lab: How to Unleash Your Storage Performance by Using NVM Express™ B...
2016-JAN-28 -- High Performance Production Databases on Ceph
Simplifying Ceph Management with Virtual Storage Manager (VSM)
Ad

Similar to Sharing High-Performance Interconnects Across Multiple Virtual Machines (20)

PDF
VMworld 2013: What's New in vSphere Platform & Storage
PDF
Comparing MS Cloud with VMware Cloud
PPTX
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
PDF
The Unofficial VCAP / VCP VMware Study Guide
PPTX
Mellanox Approach to NFV & SDN
PPTX
VMworld 2010 - Building an Affordable vSphere Environment for a Lab or Small ...
PPTX
General-and-complete_Training_Slide_v0.9-TGT.pptx
PPTX
Virtualization Acceleration
PDF
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
PDF
VMworld 2014: Extreme Performance Series
PPTX
What is coming for VMware vSphere?
PDF
Virtualization Presentation Preparation.pdf
PPTX
Vcp6.7part3
PPT
Vsphere 4-partner-training180
PDF
Dell EMC VxRAIL Appliance based on VMware SDS
PPTX
Virtualized high performance computing with mellanox fdr and ro ce
PPTX
VMware virtualization toipcs ESX and ESXI architecture
PDF
VMware vSphere Version Comparison 4.0 to 6.5
PDF
Partner Presentation vSphere6-VSAN-vCloud-vRealize
PDF
HCI comparison whatmatrix
VMworld 2013: What's New in vSphere Platform & Storage
Comparing MS Cloud with VMware Cloud
Erez Cohen & Aviram Bar Haim, Mellanox - Enhancing Your OpenStack Cloud With ...
The Unofficial VCAP / VCP VMware Study Guide
Mellanox Approach to NFV & SDN
VMworld 2010 - Building an Affordable vSphere Environment for a Lab or Small ...
General-and-complete_Training_Slide_v0.9-TGT.pptx
Virtualization Acceleration
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
VMworld 2014: Extreme Performance Series
What is coming for VMware vSphere?
Virtualization Presentation Preparation.pdf
Vcp6.7part3
Vsphere 4-partner-training180
Dell EMC VxRAIL Appliance based on VMware SDS
Virtualized high performance computing with mellanox fdr and ro ce
VMware virtualization toipcs ESX and ESXI architecture
VMware vSphere Version Comparison 4.0 to 6.5
Partner Presentation vSphere6-VSAN-vCloud-vRealize
HCI comparison whatmatrix
Ad

More from inside-BigData.com (20)

PDF
Major Market Shifts in IT
PDF
Preparing to program Aurora at Exascale - Early experiences and future direct...
PPTX
Transforming Private 5G Networks
PDF
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
PDF
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
PDF
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
PDF
HPC Impact: EDA Telemetry Neural Networks
PDF
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
PDF
Machine Learning for Weather Forecasts
PPTX
HPC AI Advisory Council Update
PDF
Fugaku Supercomputer joins fight against COVID-19
PDF
Energy Efficient Computing using Dynamic Tuning
PDF
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
PDF
State of ARM-based HPC
PDF
Versal Premium ACAP for Network and Cloud Acceleration
PDF
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
PDF
Scaling TCO in a Post Moore's Era
PDF
CUDA-Python and RAPIDS for blazing fast scientific computing
PDF
Introducing HPC with a Raspberry Pi Cluster
PDF
Overview of HPC Interconnects
Major Market Shifts in IT
Preparing to program Aurora at Exascale - Early experiences and future direct...
Transforming Private 5G Networks
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
HPC Impact: EDA Telemetry Neural Networks
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Machine Learning for Weather Forecasts
HPC AI Advisory Council Update
Fugaku Supercomputer joins fight against COVID-19
Energy Efficient Computing using Dynamic Tuning
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
State of ARM-based HPC
Versal Premium ACAP for Network and Cloud Acceleration
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Scaling TCO in a Post Moore's Era
CUDA-Python and RAPIDS for blazing fast scientific computing
Introducing HPC with a Raspberry Pi Cluster
Overview of HPC Interconnects

Recently uploaded (20)

PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Approach and Philosophy of On baking technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Approach and Philosophy of On baking technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Digital-Transformation-Roadmap-for-Companies.pptx
Spectral efficient network and resource selection model in 5G networks
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
cuic standard and advanced reporting.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
“AI and Expert System Decision Support & Business Intelligence Systems”
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf

Sharing High-Performance Interconnects Across Multiple Virtual Machines

  • 1. Sharing High-Performance Devices Across Multiple Virtual Machines
  • 2. Preamble • What does “sharing devices across multiple virtual machines” in our title mean? • How is it different from virtual networking / NSX, which allow multiple virtual networks to share underlying networking hardware? • Virtual networking works well for many standard workloads, but in the realm of extreme performance we need to deliver much closer to bare-metal performance to meet application requirements • Application areas: Science & Research (HPC), Finance, Machine Learning & Big Data, etc. • This talk is about achieving both extremely high performance and device sharing 2
  • 3. Sharing High-Performance PCI Devices 1 Technical Background 2 Big Data Analytics with SPARK 3 High Performance (Technical) Computing CONFIDENTIAL 3
  • 4. Direct Device Access Technologies Accessing PCI devices with maximum performance
  • 5. VMware ESXi VM Direct Path I/O • Allows PCI devices to be accessed directly by guest OS – Examples: GPUs for computation (GPGPU), ultra-low latency interconnects like InfiniBand and RDMA over Converged Ethernet (RoCE) • Downsides: No vMotion, No Snapshots, etc. • Full device is made available to a single VM – no sharing • No ESXi driver required – just the standard vendor device driver CONFIDENTIAL 5 Virtual Machine Guest OS Kernel Application DirectPath I/O
  • 6. • The PCI standard includes a specification for SR-IOV, Single Root I/O Virtualization • A single PCI device can present as multiple logical devices (Virtual Functions or VFs) to ESX and to VMs • Downsides: No vMotion, No Snapshots (but note: pvRDMA feature in ESX 6.5) • An ESXi driver and a guest driver are required for SR-IOV • Mellanox Technologies supports ESXi SR-IOV for both InfiniBand and RoCE interconnects 6 SR-IOV Virtual Machine Guest OS Kernel Application PF VF vSwitch VMXNET3 Device Partitioning (SR-IOV)
  • 7. Enabling Technology: Full-featured Virtual Functions • Device capabilities – Multi-queue NIC – RDMA (user / kernel) – User-space networking • All capabilities may be used concurrently – For example, kernel NIC together with RDMA applications • Scales to any number of applications or vCPUs – Millions of hardware queues – Multiple interrupt vectors – Thousands of application address spaces PF VF VF Hypervisor Guest HW VF TCP/IP RDMA core RDMA storage User-space VF driver NIC RDMA RDMA Application VF driver
  • 8. Remote Direct Memory Access (RDMA) • A hardware transport protocol – Optimized for moving data to/from memory • Extreme performance – 600ns application-to-application latencies – 100Gbps throughput – Negligible CPU overheads • RDMA applications – Storage (iSER, NFS-RDMA, NVMoF, Lustre) – HPC (MPI, SHMEM) – Big data and analytics (Hadoop, Spark) 8
  • 9. How does RDMA achieve high performance? • Traditional network stack challenges – Per message / packet / byte overheads – User-kernel crossings – Memory copies • RDMA provides in hardware: – Isolation between applications – Transport • Packetizing messages • Reliable delivery – Address translation • User-level networking – Direct hardware access for data path 9 Kernel User RDMA-capable hardware NVMeF iSER Buf Buf Buf AppA AppB Buf Buf
  • 10. • Mellanox SN2700 Spectrum Switch • Each Host: – OS: ESXi6.5, Memory: 256GB RAM, CPU: dual socket, Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GH (14 cores per socket) – Network: Mellanox ConnectX-4 100Gb (SR-IOV RoCE capable), Hypervisor Driver: MLNX- NATIVE-ESX-ConnectX-4-5_4.16.10.2-10EM-650.0.0.459867, Firmware: 12.20.10.10 – 1 VM, 20 vCPU and 220GB memory (NOTE: Running multiple NUMA-aligned VMs will generally yield higher performance – sometimes better than bare-metal) – Guest OS: Linux CentOS7.3x64 – VM Network: SR-IOV RoCE VF, VM Driver: MLNX OFED 4.1-1.0.2.0 – Running Spark benchmark (HiBench/Terasort) – Spark version https://github.com/Mellanox/SparkRDMA/, Java version 1.8.0-131 • Name Node Server: – OS: Linux CentOS7.3x64, Memory: 256GB RAM, CPU: dual socket, Intel(R) Xeon(R) CPU E5-2697 v3 @ 2.60GH (14 cores per socket) – Network: Mellanox ConnectX-4 100Gb, Driver: Mellanox OFED 4.1-1.0.2.0, Firmware: 12.20.10.10 • For detailed configuration of Spark on vSphere, see: – Fast Virtualized Hadoop and Spark on All-Flash Disks – Best Practices for Optimizing Virtualized Big Data Applications on VMware vSphere 6.5 • https://blogs.vmware.com/performance/2017/08/big-data-vsphere65-perf.html Test Setup – SR-IOV 16 ESXi6.5 hosts, one Spark VM per host 1 Server used as Named Node
  • 11. SPARK Test Results – vSphere with SR-IOV Runtime samples SR-IOV TCP (sec) SR-IOV RDMA (sec) Improvement Average 222 (1.05x) 171 (1.01x) 23% Min 213 (1.07x) 165 (1.05x) 23% Max 233 (1.05x) 174 (1.0x) 25% 0 50 100 150 200 250 Average Min Max Runtime(secs) TCP vs. RDMA (Lower Is Better) SR-IOV TCP SR-IOV RDMA
  • 12. High Performance Computing Research, Science, and Engineering applications on vSphere
  • 13. MPI Workloads 14 MPI (Message Passing Interface) Examples: • Weather forecasting • Molecular modelling • Jet engine design • Spaceship, airplane & automobile design
  • 14. ESXi ESXiESXi 15 InfiniBand SR-IOV MPI Example Cluster 2 Host VM VM Host Host IB IB VM VM VM IB IB VM IB Cluster 1 SR-IOV SR-IOV IB IB SR-IOV SR-IOV IB IB SR-IOV SR-IOV • SR-IOV InfiniBand • All VMs: #vCPU = #cores • 100% CPU overcommit • No memory overcommit
  • 15. ESXi ESXiESXi LinuxLinuxLinux 16 InfiniBand SR-IOV MPI Performance Test Cluster 2 Host VM VM Host Host IB IB VM VM VM IB IB VM IB Cluster 1 SR-IOV SR-IOV IB IB SR-IOV SR-IOV IB IB SR-IOV SR-IOV 93.4 93.4 98.5 169.3 169.3 Run time (seconds) Application: NAMD Benchmark: STMV 20-vCPU VMs for all tests 60 MPI processes per job Bare metal One vCluster Two vClusters 10%
  • 16. Summary • Virtualization can support high-performance device sharing for cases in which extreme performance is a critical requirement • Virtualization supports SR-IOV device sharing and delivers near bare-metal performance – High Performance Computing – Big Data SPARK Analytics • The VMware platform and partner ecosystem address the extreme performance needs of the most demanding emerging workloads 17

Editor's Notes

  • #10: RDMA -> Remote Direct Access Memory
  • #11: RDMA Communication Model