SlideShare a Scribd company logo
DOAG 2020 โ”‚ ยฉ2020 VMware, Inc.
ESXi Performance
Principles
DOAG Edition
Valentin Bondzio
Sr. Staff TSE / GSS Premier Services
2020-01-23
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 2
Brief Intro
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 3
Brief Intro
@VMware since 2009
Global Support Services / Premier Services
Focus on Resource Management, Performance and Windows Internals
Originally from Berlin, living in Ireland since 2007
And most importantly โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 4
Brief Intro
Not an Oracle expert !
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 5
Brief Intro
Not an Oracle expert !
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Agenda
6
CPU Scheduling and Usage Accounting
The โ€œbasicsโ€
โ€œPower Managementโ€
The Good, the Better and the Ugly
ESXi Memory Management
More โ€œbasicsโ€
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Agenda
7
CPU Scheduling and Usage Accounting
The โ€œbasicsโ€
โ€œPower Managementโ€
The Good, the Better and the Ugly
ESXi Memory Management
More โ€œbasicsโ€
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Agenda
8
CPU Scheduling and Usage Accounting
The โ€œbasicsโ€
โ€œPower Managementโ€
The Good, the Better and the Ugly
ESXi Memory Management
More โ€œbasicsโ€
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
+vMotion
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Agenda
9
CPU Scheduling and Usage Accounting
The โ€œbasicsโ€
โ€œPower Managementโ€
The Good, the Better and the Ugly
ESXi Memory Management
More โ€œbasicsโ€
Local resource distribution
What else is running on ESXi
CPU Topology Abstraction
CPU Socket != NUMA node
+I/O stuff
+vMotion
+Backup
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 10
Resource guarantees and weighting (shares) on a per VM or โ€œResource Poolโ€ level
CPU Scheduler Overview
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 11
Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local)
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU
vCPU
vCPU vCPU vCPU
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 12
Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local)
โ€ข For fairness: select VM with the least (consumed CPU time / fair share)
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU
vCPU
vCPU vCPU vCPU
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 13
Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local)
โ€ข For fairness: select VM with the least (consumed CPU time / fair share)
โ€ข For priority: run latency-sensitive VM (high) before anyone else
CPU Scheduler Overview
What does the scheduler do?
vCPU
HT / Core
vCPU vCPU
vCPU
vCPU vCPU vCPU
IO
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 14
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 15
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
โ€ข To balance load across physical execution contexts (PCPUs)
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 16
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
โ€ข To balance load across physical execution contexts (PCPUs)
โ€ข To preserve cache state, minimize migration cost
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 17
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
โ€ข To balance load across physical execution contexts (PCPUs)
โ€ข To preserve cache state, minimize migration cost
โ€ข To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM)
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM VM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 18
LLC
Place the worlds / threads on physical CPUs (Global)
CPU Scheduler Overview
What does the scheduler do?
โ€ข To balance load across physical execution contexts (PCPUs)
โ€ข To preserve cache state, minimize migration cost
โ€ข To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM)
โ€ข To keep VMs or threads that have frequent communications close to each other
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
Core
HT 0
HT 1
LLC
VM VM VM VM
VM VM
VM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 19
CPU Scheduler Overview
How does that look?
10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01
PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4
PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6
CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP
96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00
96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00
96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00
96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05
96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05
96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 20
CPU Scheduler Overview
How does that look?
10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01
PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4
PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6
CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1
ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP
96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00
96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00
96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00
96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05
96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05
96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 21
?
CPU Usage Accounting
What states are there
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 22
CPU Usage Accounting
What states are there
Not Running
Running
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 23
CPU Usage Accounting
What states are there
Idle
(descheduled)
Running Ready
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 24
CPU Usage Accounting
In an ideal world
Idle
(descheduled)
Running
Ready
Usage
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 25
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 26
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
โ€œstolen timeโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 27
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
โ€œstolen timeโ€
s
y
s
V
m
w
a
I
t
wait
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 28
CPU Usage Accounting
What is charged against the VM
Idle
(descheduled)
Running
Ready
Usage Overlap HT busy Frequency ..
โ€œstolen timeโ€
s
y
s
V
m
w
a
I
t
wait
C
S
T
P
R
D
Y
M
L
M
T
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 29
%LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution.
โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency
stolen time aka โ€œ%LAT_Cโ€
CPU Usage Accounting
Ideal Current
Demand
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 30
%LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution.
โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency
stolen time aka โ€œ%LAT_Cโ€
CPU Usage Accounting
Ideal Current
%LAT_C
Demand
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 31
%LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution.
โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency
stolen time aka โ€œ%LAT_Cโ€
CPU Usage Accounting
Ideal Current
%LAT_C
Sources of Compute Latency:
โ€ข VM resource contention: check %RDY and %CSTP
โ€ข Power management (P-State): frequency throttling
โ€ข Hardware contention: HTs are in use
Demand
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 32
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Intelยฎ Hyper-Threading Technology
Cores and Threads
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 33
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 34
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 35
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Maybe two slightly less capable โ€œlogicalโ€ cores?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 36
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Maybe two slightly less capable โ€œlogicalโ€ cores?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
โ€œphysicalโ€ core
โ€œlogicalโ€
core0
โ€œlogicalโ€
core1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 37
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Maybe two slightly less capable โ€œlogicalโ€ cores?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
โ€œphysicalโ€ core
โ€œlogicalโ€
core0
โ€œlogicalโ€
core1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 38
Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€?
Maybe two slightly less capable โ€œlogicalโ€ cores?
Intelยฎ Hyper-Threading Technology
Cores and Threads
โ€œphysicalโ€ core
โ€œlogicalโ€
core
โ€œphysicalโ€ core
โ€œphysicalโ€ core
โ€œlogicalโ€
core0
โ€œlogicalโ€
core1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 39
Intelยฎ Hyper-Threading Technology
Individual throughput reduction, aggregated throughput increase at high load
100
100
~125
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 40
Intelยฎ Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 41
Intelยฎ Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 42
Intelยฎ Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
2 x 50 + 12.5 = 62.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 43
Intelยฎ Hyper-Threading Technology on ESXi
Throughput reduction is accounted for in USED
100 100
125
HTEfficiencyShift โ€“ Default: 2
HT is:
1: 50 %
2: 25 %
3: 12.5 %
4: 6.25 %
5: 3.125 %
more efficient than no-HT
2 x 50 + 12.5 = 62.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 44
CPU Usage Accounting
Usage vs. Utilization
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 45
Umbrella Term
Power Management
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 46
Umbrella Term
Power Management
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 47
Umbrella Term
Power Management
P-States
Options aka: Power Regulator, CPU Power Management, EIST
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 48
Umbrella Term
Power Management
P-States
Deep C-States
Options aka: Power Regulator, CPU Power Management, EIST
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 49
Power Management refresher โ€ฆ
P-State = voltage / frequency point
C-State = idle state, running or varying degrees of stuff turned off
P2
P1
/ NF
P0
/ TB
Frequency
C0 C1-Cn
P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 50
C-State Transition
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 51
C1
C1
C1
C1
C-State Transition
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 52
C1
C1
C1
C1
C-State Transition
~1ยตs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 53
Deep C-State Transition
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 54
Deep C-State Transition
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 55
C6
C6
C6
C6
Deep C-State Transition
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 56
C6
C6
C6
C6
Deep C-State Transition
~30ยตs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 57
Dell
Power Management _Profiles_
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 58
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 59
Who controls what? โ†’ allow control / ๏ƒŸ use
Power Management refresher โ€ฆ
CPU
BIOS
ESXi
VM /
guest
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 60
Who controls what? โ†’ allow control / ๏ƒŸ use
Power Management refresher โ€ฆ
CPU
BIOS
ESXi
VM /
guest
deep C-
States
P-States
HLT / C1-Cn
P-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 61
Who controls what? โ†’ allow control / ๏ƒŸ use
Power Management refresher โ€ฆ
CPU
BIOS
ESXi
VM /
guest
HLT / C1
deep C-
States
P-States
HLT / C1-Cn
P-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 62
Who controls what? โ†’ allow control / ๏ƒŸ use
Power Management refresher โ€ฆ
CPU
BIOS
ESXi
VM /
guest
HLT / C1
deep C-
States
P-States
HLT / C1-Cn
P-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 63
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 64
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 65
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management โ†’ "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performanceโ€œ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 66
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management โ†’ "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performanceโ€œ
C States โ†’ "Enabled"
"Disabled"
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 67
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management โ†’ "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performanceโ€œ
C States โ†’ "Enabled"
"Disabled"
P-States
P-States
P-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 68
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management โ†’ "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performanceโ€œ
C States โ†’ "Enabled"
"Disabled"
P-States
P-States
P-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 69
ESXi Power Management Policy
Only affects whatโ€™s presented from the BIOS (DELL terminology)
System Profile โ†’ "Performance Per Watt (DAPC)"
"Performance Per Watt (OS)"
"Performance"
"Dense Configuration"
"Custom"
CPU Power Management โ†’ "System DPBM (DAPC)"
"OS DBPM"
"Maximum Performanceโ€œ
C States โ†’ "Enabled"
"Disabled"
P-States
P-States
P-States
C-States
C-States
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 70
Most likely โ€ฆ
Which BIOS policy am I running on?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 71
Most likely โ€œDynamicโ€
Most likely โ€ฆ
Which BIOS policy am I running on?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 72
Most likely โ€œDynamicโ€
Most likely โ€ฆ
Which BIOS policy am I running on?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 73
Most likely โ€œDynamicโ€
Very likely โ€œPerformanceโ€
Most likely โ€ฆ
Which BIOS policy am I running on?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 74
Most likely โ€œDynamicโ€
Which BIOS policy am I running on?
4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 94W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF
0 0.3 0.7 1 23 76 50.0
1 0.0 0.0 0 0 100 50.1
2 0.1 0.2 0 6 94 50.0
3 0.0 0.0 0 0 100 50.1
4 5.2 10.4 10 5 85 50.0
5 0.0 0.0 0 5 95 51.0
6 0.0 0.1 0 3 97 50.0
7 0.0 0.0 0 0 100 50.0
8 0.1 0.4 0 16 84 50.0
9 0.0 0.0 0 0 100 50.0
10 0.0 0.0 0 0 100 50.0
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 75
Most likely โ€œDynamicโ€
Which BIOS policy am I running on?
4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 94W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF
0 0.3 0.7 1 23 76 50.0
1 0.0 0.0 0 0 100 50.1
2 0.1 0.2 0 6 94 50.0
3 0.0 0.0 0 0 100 50.1
4 5.2 10.4 10 5 85 50.0
5 0.0 0.0 0 5 95 51.0
6 0.0 0.1 0 3 97 50.0
7 0.0 0.0 0 0 100 50.0
8 0.1 0.4 0 16 84 50.0
9 0.0 0.0 0 0 100 50.0
10 0.0 0.0 0 0 100 50.0
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 76
Most likely โ€œPerformanceโ€
Which BIOS policy am I running on?
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
9 0.1 0.2 0 100 108.3
10 0.0 0.1 0 100 108.3
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 77
Most likely โ€œPerformanceโ€
Which BIOS policy am I running on?
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
9 0.1 0.2 0 100 108.3
10 0.0 0.1 0 100 108.3
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 78
Most likely โ€œCustomโ€
Which BIOS policy am I running on?
5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00
Power Usage: 107W, Power Cap: N/A
PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200
CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF
0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2
1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3
2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9
3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5
4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0
5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3
6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7
7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7
8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8
9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6
10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 79
Most likely โ€œCustomโ€
Which BIOS policy am I running on?
5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00
Power Usage: 107W, Power Cap: N/A
PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200
CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF
0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2
1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3
2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9
3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5
4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0
5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3
6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7
7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7
8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8
9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6
10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 80
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
P1
TB1
C1 C1
C1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 81
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
C6
P1
TB1
C1 C1
C1
P1
TB1
C0
P1
TB1
C6 C6
TB2 TB2
TB3 TB3
TB4 TB4
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 82
The magic of Turbo Boost
Dynamic, supported overclocking
P1
TB1
Frequency
C0
C-State
depth
C6
P1
TB1
C1 C1
C1
P1
TB1
C0
C6 C6
TB2
TB3
TB4
TB5
C6
TB6
TB7
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 83
Power Policy โ€œplayfield"
BIOS โ€œDynamicโ€ pre Haswell
Bad
Good
Optimal*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 84
Power Policy โ€œplayfield"
BIOS โ€œDynamicโ€ pre Haswell
Bad
Good
Optimal*
BIOS โ€œDynamicโ€ on Haswell+
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 85
Power Policy โ€œplayfield"
BIOS โ€œDynamicโ€ pre Haswell
BIOS โ€œMaximum / High Performanceโ€
Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E)
Bad
Good
Optimal*
BIOS โ€œDynamicโ€ on Haswell+
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 86
Power Policy โ€œplayfield"
BIOS โ€œDynamicโ€ pre Haswell
BIOS โ€œMaximum / High Performanceโ€
Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E)
Custom BIOS + Custom or Balanced ESXi policy
Bad
Good
Optimal*
* a few workloads fare better with more deterministic performance
BIOS โ€œDynamicโ€ on Haswell+
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 87
Power Policy โ€œplayfield"
Custom done right!
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 88
Power Policy โ€œplayfield"
Custom done right!
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 89
Power Policy โ€œplayfield"
Custom done right!
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 90
Power Policy โ€œplayfield"
Custom done right!
โ€œPerformanceโ€
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 91
Power Policy โ€œplayfield"
Custom done right!
โ€œPerformanceโ€
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
Custom BIOS
+
ESXi Balanced
โ€œDynamicโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 92
โ€œWhy doesnโ€™t the frequency I see in Task Manager
change?โ€
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 93
โ€œWhy doesnโ€™t the frequency I see in Task Manager
change?โ€
โ€ข Possibility 1: You are looking at the brand string
โ€ข Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 94
โ€œWhy doesnโ€™t the frequency I see in Task Manager
change?โ€
โ€ข Possibility 1: You are looking at the brand string
โ€ข Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
โ€ข Base frequency should be:
CPUID.(EAX=16h):EAX[15-00]
โ€“ But it seems Windows is getting that from SMBIOS
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 95
โ€œWhy doesnโ€™t the frequency I see in Task Manager
change?โ€
โ€ข Possibility 1: You are looking at the brand string
โ€ข Possibility 2: You are looking in the right place
(but the guest OS has no way of knowing)
โ€ข Base frequency should be:
CPUID.(EAX=16h):EAX[15-00]
โ€“ But it seems Windows is getting that from SMBIOS
Frequently Asked Questions
Power Management Trivia
# grep cpuid ./WinTest.vmx
cpuid.16.eax = "----------------0100011100011000"
cpuid.coresPerSocket = "6"
cpuid.brandstring = "VMware (R) SuperSecretCPU (R) @ 18.2 GHz"
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 96
โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 97
โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€
Frequently Asked Questions
Power Management Trivia
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 98
โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€
โ€ข You canโ€™t turn off C1, you can disable different levels of deep C-States (C2+)
Frequently Asked Questions
Power Management Trivia
4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00
Power Usage: 142W, Power Cap: N/A
PSTATE MHZ:
CPU %USED %UTIL %C0 %C1 %A/MPERF
0 0.0 0.1 0 100 108.3
1 0.1 0.1 0 100 108.4
2 0.1 0.1 0 100 108.3
3 0.0 0.1 0 100 108.4
4 0.0 0.0 0 100 108.3
5 18.0 16.7 17 83 108.3
6 0.0 0.1 0 100 108.4
7 0.2 0.2 0 100 108.3
8 0.0 0.0 0 100 108.3
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 99
โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 100
โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€
โ€ข No, besides possibly:
โ€“ PSU redundancy issues
โ€“ Power capping
โ€“ Temperature
โ€“ Firmware bugs
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 101
โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€
โ€ข No, besides possibly:
โ€“ PSU redundancy issues
โ€“ Power capping
โ€“ Temperature
โ€“ Firmware bugs
โ€ข And definitely โ€ฆ
โ€“ No ability to control P-/deep C-States
โ€“ No maximum Turbo Boost frequencies โ€ฆ
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 102
โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€
โ€ข No, besides possibly:
โ€“ PSU redundancy issues
โ€“ Power capping
โ€“ Temperature
โ€“ Firmware bugs
โ€ข And definitely โ€ฆ
โ€“ No ability to control P-/deep C-States
โ€“ No maximum Turbo Boost frequencies โ€ฆ
Frequently Asked Questions
Power Management Trivia
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 103
โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€
โ€ข No, besides possibly:
โ€“ PSU redundancy issues
โ€“ Power capping
โ€“ Temperature
โ€“ Firmware bugs
โ€ข And definitely โ€ฆ
โ€“ No ability to control P-/deep C-States
โ€“ No maximum Turbo Boost frequencies โ€ฆ
Frequently Asked Questions
Power Management Trivia
http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 104
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 105
โ€œI can clearly see C2 in perfmon on Windows,
why are you lying to me?โ€
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 106
โ€œI can clearly see C2 in perfmon on Windows,
why are you lying to me?โ€
โ€ข This is either a perfmon bug or a choice to
represent
an โ€œenlightenedโ€ idle feature
โ€“ โ€œIntelligent Timer Tick Distribution (ITTD)โ€
โ€“ needs Windows 2012 R2 / vHW 11
โ€“ disable via โ€œmonitor.disable_guest_idle_msr = trueโ€
โ€ข you really shouldnโ€™t have to ever โ€ฆ
Frequently Asked Questions
Power Management Trivia
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 107
What runs where and when
The high level picture
CPU
VMK VMM
OS / APPs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 108
What runs where and when
Mostly Direct Exec
CPU
OS / APPs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 109
What runs where and when
Mostly Direct Exec
PCPU
vCPU
(โ€ฆ)
0xffffffff810a99d0 <+416>: test %eax,%eax
0xffffffff810a99d2 <+418>: je 0xffffffff810a9932 <cpu_startup_entry+258>
0xffffffff810a99d8 <+424>: callq 0xffffffff810c6ed0 <rcu_irq_enter>
0xffffffff810a99dd <+429>: mov 0x82740c(%rip),%r13
0xffffffff810a99e4 <+436>: test %r13,%r13
0xffffffff810a99e7 <+439>: je 0xffffffff810a9a07 <cpu_startup_entry+471>
0xffffffff810a99e9 <+441>: mov 0x0(%r13),%rax
0xffffffff810a99ed <+445>: no0xffffffff810a99f0 <+448>: mov 0x8(%r13),%rdi
0xffffffff810a99f4 <+452>: add $0x10,%r13
0xffffffff810a99f8 <+456>: xor %esi,%esi
0xffffffff810a99fa <+458>: mov %ebp,%edx
0xffffffff810a99fc <+460>: callq *%rax
0xffffffff810a99fe <+462>: mov 0x0(%r13),%rax
0xffffffff810a9a02 <+466>: test %rax,%rax
0xffffffff810a9a05 <+469>: jne 0xffffffff810a99f0 <cpu_startup_entry+448>
0xffffffff810a9a07 <+471>: callq 0xffffffff810c6e40 <rcu_irq_exit>
0xffffffff810a9a0c <+476>: jmpq 0xffffffff810a9932 <cpu_startup_entry+258>
0xffffffff810a9a11 <+481>: nopl 0x0(%rax)
0xffffffff810a9a18 <+488>: mov %gs:0xa0e4,%eax
0xffffffff810a9a20 <+496>: mov %eax,%eax
0xffffffff810a9a22 <+498>: bt %rax,(%rbx)
(โ€ฆ)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 110
What runs where and when
What about Idle?
CPU
vCPU
(โ€ฆ)
0xffffffff81052c20 <+0>: sti
0xffffffff81052c21 <+1>: hlt
*loud screeching sound*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 111
What runs where and when
VMM traps on the privileged instruction and puts (with VMK) the vCPU to โ€œsleep
CPU
VMM
(โ€ฆ)
0xffffffff81052c20 <+0>: sti
0xffffffff81052c21 <+1>: hlt
*tells VMK to deschedule*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 112
What runs where and when
The scheduler decides what next to run
CPU
VMK
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 113
What runs where and when
E.g. a vCPU / world that is ready to run
CPU
other vCPU
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 114
What runs where and when
ESXiโ€™s _own_ idle thread
CPU
C1-Cn
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 115
Manage host physical memory to abstract physical memory away from guest.
Allow memory over-commitment to provide an illusion of virtual DRAM to the guest.
Hide transient host memory pressure from application
Memory Management Overview
Goals and Objectives
Host Physical Memory Guest Memory
ESXi
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 116
Virtual Memory
Process 0
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 117
Virtual Memory
Process 0
Process 1
Process 2
Process 3
Process n
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 118
Virtual Memory
From the processโ€™ point of
view, it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Process 0
Process 1
Process 2
Process 3
Process n
256 TB
256 TB
256 TB
256 TB
256 TB
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 119
Virtual Memory
From the processโ€™ point of
view, it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virtual Memory abstracts
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 120
Virtual Memory
From the processโ€™ point of
view, it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virtual Memory abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 121
Virtual Memory
From the processโ€™ point of
view, it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virtual Memory abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
The process is unaware what
is backing the virtual address
โ€ข Physical Memory
โ€ข Swap File
Process 0
Process 1
Process 2
Process 3
Process n
Magic
256 TB
256 TB
256 TB
256 TB
256 TB
64 TB
256 TB
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 122
Virtual Physical Memory
VM 0
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 123
Virtual Physical Memory
VM 0
VM 1
VM 2
VM 3
VM n
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 124
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
VM 0
VM 1
VM 2
VM 3
VM n
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 125
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virt. Physical Mem. abstracts
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 126
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virt. Physical Mem. abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 127
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virt. Physical Mem. abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
The VM is unaware what is
backing the physical address
โ€ข Physical Memory
โ€ข Swap File
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
Abstraction โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 128
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virt. Physical Mem. abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
The VM is unaware what is
backing the physical address
โ€ข Physical Memory
โ€ข Swap File
โ€ข Or COW, ZIP, BLN
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
*** TB
Abstraction โ€ฆ
*** TB
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 129
Virtual Physical Memory
From the VMs point of view,
it provides:
โ€ข Contiguous address space
โ€ข Isolation / Security
Virt. Physical Mem. abstracts
โ€ข It provides the possibility to
overcommit โ€ฆ
The VM is unaware what is
backing the physical address
โ€ข Physical Memory
โ€ข Swap File
โ€ข Or COW, ZIP, BLN
VM 0
VM 1
VM 2
VM 3
VM n
Magic
6 TB
6 TB
6 TB
6 TB
6 TB
16 TB
*** TB
*** TB
Abstraction โ€ฆ
*** TB
*** TB
*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 130
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 131
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 132
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 133
Understanding VM memory usage on ESXi
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
Active Memory
Idle Memory
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 134
Understanding VM memory usage on ESXi
Reclaim memory from VM if it using more than it is entitled.
โ€ข Entitlement depends on configuration (reservation / shares / limit).
โ€ข Techniques to reclaim memory from VMs includes:
โ€“ Page sharing > Ballooning > Compression > Host swapping
โ€“ Breaks host large pages
Memory Management Overview
How to Hide Memory Pressure?
Total Memory Size
Allocated Memory
Free Memory
Active Memory
Idle Memory
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 135
Active Memory
Not the same as guest stats!
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 136
Active Memory
Not the same as guest stats!
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 137
Active Memory
Not the same as guest stats!
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 138
Active Memory
Not the same as guest stats!
!=
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 139
Active Memory
ESXi VM level heuristic
โ€ข Weighted, moving average
โ€ข OS / VMTools independent
โ€ข โ€œMemory Samplingโ€
aka Touched
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 140
Active Memory
ESXi VM level heuristic
โ€ข Weighted, moving average
โ€ข OS / VMTools independent
โ€ข โ€œMemory Samplingโ€
Un-maps 100 random pages
over the entire VMs mapped
address space
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 141
Active Memory
ESXi VM level heuristic
โ€ข Weighted, moving average
โ€ข OS / VMTools independent
โ€ข โ€œMemory Samplingโ€
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ
/ min
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 142
Active Memory
ESXi VM level heuristic
โ€ข Weighted, moving average
โ€ข OS / VMTools independent
โ€ข โ€œMemory Samplingโ€
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ
/ min
Read
Read Write
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 143
Active Memory
ESXi VM level heuristic
โ€ข Weighted, moving average
โ€ข OS / VMTools independent
โ€ข โ€œMemory Samplingโ€
Un-maps 100 random pages
over the entire VMs mapped
address space
Monitors R/W for a minute
(access traps to the VMM)
After one minute, re-maps all
remaining pages, starts again
aka Touched
VM mapped memory
4 KB
100 x
4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ
/ min
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 144
Active Memory
vs. Consumed
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 145
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 146
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 147
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 148
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 149
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 150
Active Memory
What to trust?
consumed
active
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 151
Active Memory
What to trust?
active consumed
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 152
Active Memory
What to trust?
active consumed
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 153
Guest Memory Metrics
In a nutshell
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 154
Guest Memory Metrics
In a nutshell
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 155
Guest Memory Metrics
In a nutshell
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 156
Guest Memory Metrics
In a nutshell
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 157
Guest Memory Metrics
In a nutshell
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 158
Active Memory
Guests working set tends to be between active and consumed
consumed
active guest WS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 159
Active Memory
Guest WS might over report (greedy app)
active guest WS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 160
Active Memory
But guest WS will not underreport
consumed
active
guest WS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 161
Active Memory
Not then end all of guest workload estimation
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 162
Hierarchical Resource Groups
From an ESXi perspective
host The host owns all resources
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 163
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 164
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
minfree kernel helper ft drivers vmotion โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 165
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
minfree kernel helper ft drivers vmotion โ€ฆ
vmkboot CpuSched Init โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 166
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
The host owns all resources
Those are distributed by
hierarchical resource groups
Consumers can demand
(request) resources
minfree kernel helper ft drivers vmotion โ€ฆ
vmkboot CpuSched Init โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 167
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 168
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
Global Resource Pools are
then distributed back to
hosts into Local RPs
โ€ข Based on VMs demand
โ€ฆ
pool4
pool3
pool2
pool1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 169
Hierarchical Resource Groups
From an ESXi perspective
host
system vim iofilters user
vCenter shows the sum of all
user resources as:
Total Reservation Capacity
Global Resource Pools are
then distributed back to
hosts into Local RPs
โ€ข Based on VMs demand
โ€ฆ
vm.vmid
vm.vmid
vm.vmid
โ€ฆ
pool4
pool3
pool2
pool1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 170
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
โ€ข VM starts / vMotions etc.
โ€ข Based on VMs demand
โ€ฆ
vm.vmid
vm.vmid
โ€ฆ
pool430
pool231
pool15
pool1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 171
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
โ€ข VM starts / vMotions etc.
โ€ข Based on VMs demand
The local hierarchy is equal
to the global one
โ€ข Check for VM / LRG siblings
โ€ฆ
vm.vmid
vm.vmid
โ€ฆ
pool430
pool231
pool15
pool1
vm.vmid pool321
vm.vmid vm.vmid โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 172
Hierarchical Resource Groups
From an ESXi perspective
user Local Resource Groups are
created and incrementally
numbered when clients are
instantiated:
โ€ข VM starts / vMotions etc.
โ€ข Based on VMs demand
The local hierarchy is equal
to the global one
โ€ข Check for VM / LRG siblings
VM groups have multiple leaf
consumers
โ€ข vmid is local, not global
โ€ฆ
vm.vmid
vm.vmid
โ€ฆ
pool430
pool231
pool15
pool1
vm.vmid pool321
vm.vmid vm.vmid โ€ฆ
vmm uw ...
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 173
cpu.resv Reservation
cpu.limit Limit
cpu.shares Shares
cpu.resvLimit Expandable*
mem.resv Reservation
mem.limit Limit
mem.shares Shares
mem.resvLimit Expandable*
Memory
CPU
Hierarchical Resource Groups
Both Memory and CPU resources
host
system vim iofilters user
โ€ฆ
vm.vmid
vm.vmid
vm.vmid
โ€ฆ
pool4
pool3
pool2
pool1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 174
ESXi CLI (via SSH)
โ€ฆ for CPU โ€ฆ for Memory โ€ฆ for comparison
Tools
sched-stats memstats esxtop
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 175
Tools
cmdline for local groups (no VMs)
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 176
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 177
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 178
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 179
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 180
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 181
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 182
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 183
Tools
cmdline for local groups (no VMs)
# sched-stats -t groups | awk 'NR == 1
|| $2 ~ /^(vm.|pool)[0-9]+/
|| /^ +[0-4] /
{printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn"
,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}'
vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz
0 host 0 933 1600 1600 1600 pct 4096000 5232 33168
1 system 0 659 10 -1 -1 pct 500 288 33168
2 vim 0 271 4944 -1 -1 mhz 500 4344 33768
3 iofilters 0 3 0 -1 -1 pct 1000 0 33168
4 user 0 0 0 -1 -1 pct 9000 0 33168
sched-stats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 184
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 185
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 186
Tools
cmdline for local groups (with VMs)
# memstats -r group-stats
-g0 -l2
-s gid:name:min:max::conResv:availResv
-u mb
| sed -n '/^-+/,/.*n/p'
---------------------------------------------------------------------------------
gid name min max conResv availResv
---------------------------------------------------------------------------------
0 host 97823 97823 28917 68907
1 system 20024 -1 20008 68923
2 vim 0 -1 3378 68907
3 iofilters 0 -1 25 68907
4 user 0 -1 5490 68907
---------------------------------------------------------------------------------
memstats
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 187
(N)UMA
+ terminology
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 188
DIMMs
(N)UMA
+ terminology
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 189
DIMMs
Socket / Package
(N)UMA
+ terminology
0
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 190
DIMMs
Socket / Package
NUMA node
(N)UMA
+ terminology
0
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 191
DIMMs
Socket / Package
NUMA node
(N)UMA
+ terminology
0
1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 192
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
(N)UMA
+ terminology
0
2
1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 193
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
(N)UMA
+ terminology
0
2
1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 194
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
LLC / DIE
(N)UMA
+ terminology
0
2
1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 195
DIMMs
Socket / Package
NUMA node
Socket != NUMA node
LLC / DIE
(CoD, SNC / Zen1/2)
(N)UMA
+ terminology
0
2
1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 196
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 197
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 198
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 199
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 200
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 201
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 202
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 203
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 204
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
Finland + Algeria
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 205
You want to calculate a + b and the operands are in:
Importance of Memory Access Latency
Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
your head
=
register / 1 cycle
this room
=
L1-L2 / 10 cycles
this building
=
DRAM / 100 cycles
Finland + Algeria
=
Disk / 10^6 cycles
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 206
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 207
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 208
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 209
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 210
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 211
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 212
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 213
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 214
Importance of Memory Access Latency
Numbers based on Intel i7-3770 @ 3.4 GHz
access size cycles ns
L1 32 KB 4-5 1.5
L2 256 KB 12 4
L3 8 MB 30 10
DRAM GBs 30+ 66*
L3 / Last Level Cache
core
0
core
1
core
2
core
3
L1 L1 L1 L1
L2 L2 L2 L2
IMC QPI
DRAM
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 215
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
โ€ข NB also known as โ€œMemory
Controller Hubโ€ or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 216
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
โ€ข NB also known as โ€œMemory
Controller Hubโ€ or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 217
N(UMA)
All sockets share the FSB to
the Northbridge and hence
the bandwidth
โ€ข NB also known as โ€œMemory
Controller Hubโ€ or MCH
Uniform memory access
latency between every CPU
and every DIMM
Von Neumann Bottleneck
getting worse with faster
CPUs / more RAM
Pre-Opteron/Nehalem
1 2
NB
0 3
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 218
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
โ€ข Some AMDโ€™s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPUโ€™s IMC
โ€ข This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 219
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
โ€ข Some AMDโ€™s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPUโ€™s IMC
โ€ข This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 220
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
โ€ข Some AMDโ€™s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPUโ€™s IMC
โ€ข This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 221
0 1
3 2
NUMA
Every NUMA node has its
own Integrated Memory
Controller (IMC)
โ€ข Some AMDโ€™s (Bulldozer and
newer) have two nodes per
socket / package
Remote access has to go
over the interconnect and
remote CPUโ€™s IMC
โ€ข This adds additional latency
making local and remote
access Non-Uniform
Post-Opteron/Nehalem
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 222
0 1
3 2
NUMA
2 QPI / IC
CPU
/ns
0 1 2 3
0 72 291 323 294
1 296 72 293 315
2 319 296 71 296
3 290 325 300 71
local adjacent โ€œroutedโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 223
CPU
/ns
0 1 2 3
0 136 194 198 201
1 194 135 194 196
2 201 194 135 200
3 202 197 198 135
0 1
3 2
NUMA
3 QPI / IC
local adjacent
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 224
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 225
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 226
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 227
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 228
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 229
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 230
0 1
3 2
NUMA
Basic Migration Types
NUMA clients (vCPUs +
memory) are kept local to a
home node
Balance migrations re-assign
the home node, memory
follows vCPUs!
Locality migrations set home
node to where the most
memory resides
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 231
NUMA migration incurs significant cost.
โ€ข All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers.
โ€ข Copying memory pages across NUMA boundaries cost memory bandwidth.
NUMA Scheduler Consideration
Local Contention vs Remote Access
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 232
NUMA migration incurs significant cost.
โ€ข All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers.
โ€ข Copying memory pages across NUMA boundaries cost memory bandwidth.
NUMA Scheduler Consideration
Local Contention vs Remote Access
0
10
20
30
40
50
60
70
80
90
100
0
1
2
3
4
5
6
7
8
9
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
%Local-Mem
#Migrations
time (30sec)
Memory Locality & NUMA-migrations
(with NUMA Migration)
%local #migrations
0
20
40
60
80
100
120
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
4
7
10
13
16
19
22
25
28
31
34
37
40
43
46
49
52
55
58
61
64
67
70
73
76
79
82
85
%Local
#Migrations
time (30sec units)
Memory Locality & NUMA-migrations
(No NUMA Migration)
%local #migrations
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 233
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 234
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 235
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 236
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 237
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 238
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 239
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 240
numa.vcpu.min = 9
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 241
numa.vcpu.min = 9
Max vSMP 32
Max vSMP 8
CPS in GUI & supported
We had good(ish) reasonsos
vNUMA auto-sizing history
(โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ)
cpuid.coresPerSocket vNUMA
My starting data @ VMware
ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0
ESX 3.5
cpuid.coresPerSocket โ†’ numa.vcpu.maxPerVirtualNode
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 242
VPD doesnโ€™t affect ESXi sched.
PPD does define ESXi NUMA sched.
โ€ข AKA NUMA client
Doesnโ€™t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two levelโ€™s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 243
VPD doesnโ€™t affect ESXi sched.
PPD does define ESXi NUMA sched.
โ€ข AKA NUMA client
Doesnโ€™t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two levelโ€™s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
C
PPD
VPD
C C C C C
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 244
VPD doesnโ€™t affect ESXi sched.
PPD does define ESXi NUMA sched.
โ€ข AKA NUMA client
Doesnโ€™t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two levelโ€™s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
PPD
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 245
VPD doesnโ€™t affect ESXi sched.
PPD does define ESXi NUMA sched.
โ€ข AKA NUMA client
Doesnโ€™t influence ESXi sched.
Might influence Guest / App sched.
CPU Topology
vNUMA Topology
Two levelโ€™s of abstraction
Virtual and Physical Proximity Domains
VPD
PPD
CPS
PPD
VPD
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 246
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 247
Running Compute Intensive Benchmark
Case Study: Project Pacific
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 248
Running Compute Intensive Benchmark
Case Study: Project Pacific
43.5% local memory access
on native Linux
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 249
Running Compute Intensive Benchmark
Case Study: Project Pacific
43.5% local memory access
on native Linux
99.2% local memory
access on Pacific Cluster
https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better-
performance-than-bare-metal.html
250
DOAG 2020 โ”‚ ยฉ2020 VMware, Inc.
IO stuff
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
vSphere 6.0 achieves Line Rate throughput on a 40GigE NIC
Throughput โ†‘ from 20.5 to 35.5 Gbps
CPU Used โ†“ from 36 to 13 % (per Gbps)
Herculean Network IO
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 252
By default, vSphere tunes for lower CPU usage by batching I/O operations
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 253
By default, vSphere tunes for lower CPU usage by batching I/O operations
โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
โ€ข When disabled:
โ€“ Every packet received interrupts immediately
โ€“ Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 254
By default, vSphere tunes for lower CPU usage by batching I/O operations
โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
โ€ข When disabled:
โ€“ Every packet received interrupts immediately
โ€“ Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 255
By default, vSphere tunes for lower CPU usage by batching I/O operations
โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
โ€ข When disabled:
โ€“ Every packet received interrupts immediately
โ€“ Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1
1
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 256
By default, vSphere tunes for lower CPU usage by batching I/O operations
โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
โ€ข When disabled:
โ€“ Every packet received interrupts immediately
โ€“ Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1
1
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 257
By default, vSphere tunes for lower CPU usage by batching I/O operations
โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3)
โ€ข When disabled:
โ€“ Every packet received interrupts immediately
โ€“ Every packet will be issued immediately
Virtual NIC coalescing - recap
Trading CPU Cycles for Lower Latency
1 2 3 4 5 6 7 8 9 .. .. ..
1 2 3 4 5 6 7 8 9 .. .. ..
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 258
Possible Latency Optimizations
Network latency optimization on the VM level
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 259
Disable LRO (Large Receive Offload)
โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€
โ€ข Small packets are no longer concatenated into larger ones
Possible Latency Optimizations
Network latency optimization on the VM level
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 260
Disable LRO (Large Receive Offload)
โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€
โ€ข Small packets are no longer concatenated into larger ones
Disable (vNIC) coalescing
โ€ข VMX option: โ€œethernetX.coalescingScheme = disabledโ€
โ€ข Issue TX immediately and immediately interrupt on RX
Possible Latency Optimizations
Network latency optimization on the VM level
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 261
Disable LRO (Large Receive Offload)
โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€
โ€ข Small packets are no longer concatenated into larger ones
Disable (vNIC) coalescing
โ€ข VMX option: โ€œethernetX.coalescingScheme = disabledโ€
โ€ข Issue TX immediately and immediately interrupt on RX
Disable Dynamic queueing
โ€ข NetQueue feature, load balances and combines less used queues
โ€ข Disabling guarantees a single queue for the VM
Possible Latency Optimizations
Network latency optimization on the VM level
Network
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Network โ€“ Recommendations
Use vmxnet3 Guest Network Driver
Very efficient and required for maximum performance=
Evaluate Disabling Interrupt Coalescing
Default mechanism may induce small amounts of latency in favor of throughout
Itโ€™s a 10Gb+ World
1Gb saturation is real, more bandwidth required today, especially in light of vSAN, MonsterVM vMotion
Use Latency Sensitivity High โ€˜Cautiouslyโ€™
While it can reduce latency and jitter in the 10us use case, it comes at a cost with core reservations, etc
Requires FULL CPU and MEM reservation โ€“ or it wonโ€™t work and wonโ€™t tell you
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Herculean Storage IO
โ€ข More than 1 Million IOPs from 1 VM
Hypervisor: vSphere 5.1
Server: HP DL380 Gen8
CPU: 2 x Intel Xeon E5-2690, HT disabled
Memory: 256GB
HBAs: 5 x QLE2562
Storage: 2 x Violin Memory 6616 Flash Arrays
VM: Windows Server 2008 R2, 8 vCPUs and 48GB.
Iometer Config: 4K IO size w/ 16 workers
Reference: http://blogs.vmware.com/performance/2012/08/1millioniops-on-1vm.html
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
-
30
%
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Bare-metal to virtual TPC-C* gap then and now(ish)
* Non-complaint,
fair-use
implementation of
the workload on
Oracle 12c. Not
comparable to
official results.
-
30
%
-
10%
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Scaling out vs. up on the same host to amortize overhead
1416.37
0
200
400
600
800
1000
1200
1400
1600
Baremetal tpsE
Throughput
Score
TPC-E on native HP Proliant DL 385 G8
http://blogs.vmware.com/vsphere/2013/09/worlds-first-tpc-vms-benchmark-result.html
http://www.tpc.org/4064 / http://www.tpc.org/5201
470.31
468.11
457.55
0
200
400
600
800
1000
1200
1400
1600
Virtual tpsE of 3 VMs running TPC-VMS
Throughput
Score
TPC-VMS on virtualized HP Proliant DL 385 G8
VM3
VM2
VM1
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Storage I/O latencies are higher in virtual
The Problem - with Database Logs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
โ€ข Long (5+ ms) latency on HDDs
โ€ข Random I/O, Many threads banging on the same spindle(s)
โ€ข Even some SSDs are ~1ms
The Problem - with Database Logs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
โ€ข Long (5+ ms) latency on HDDs
โ€ข Random I/O, Many threads banging on the same spindle(s)
โ€ข Even some SSDs are ~1ms
Not OK for Redo Log access
The Problem - with Database Logs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Storage I/O latencies are higher in virtual
Usually not a noticeable problem for Data IO
โ€ข Long (5+ ms) latency on HDDs
โ€ข Random I/O, Many threads banging on the same spindle(s)
โ€ข Even some SSDs are ~1ms
Not OK for Redo Log access
โ€ข Short (<<1ms latency)
โ€ข Sequential I/O, Single-threaded, Write-Only
โ€ข Typically a write-back cache in the HBA or the array
โ€ข Check the Top 5 wait events in Oracle AWR or equivalent database health reports
The Problem - with Database Logs
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
But when sensing low IOPS, vSphere stops batching and switches to low latency mode
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
By default, vSphere tunes for lower CPU usage by batching I/O operations
But when sensing low IOPS, vSphere stops batching and switches to low latency mode
โ€ข For lowest latency, put the log device on a vSCSI adapter by itself
โ€ข Batching and coalescing is on a per-vSCSI bus, not device(!) basis
โ€ข Explicit tuning can prove more effective though
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Explicit workaround on the issuing path:
โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel
โ€“ But dynamically adjust for low IOPS case
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Explicit workaround on the issuing path:
โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel
โ€“ But dynamically adjust for low IOPS case
โ€ข To explicitly force immediate initiation of I/O operation (sync)
โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Explicit workaround on the issuing path:
โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel
โ€“ But dynamically adjust for low IOPS case
โ€ข To explicitly force immediate initiation of I/O operation (sync)
โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€
Explicit workaround on the completion path:
โ€ข Default is coalescing of Virtual Interrupts
โ€“ vSphere automatically suspends interrupt coalescing for low IOPS workloads
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Explicit workaround on the issuing path:
โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel
โ€“ But dynamically adjust for low IOPS case
โ€ข To explicitly force immediate initiation of I/O operation (sync)
โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€
Explicit workaround on the completion path:
โ€ข Default is coalescing of Virtual Interrupts
โ€“ vSphere automatically suspends interrupt coalescing for low IOPS workloads
โ€ข Or explicitly disable Virtual Interrupt Coalescing
โ€“ For PVSCSI: scsiNNN.intrCoalescing = โ€œFalseโ€
โ€“ For other vHBAs: scsiNNN.ic = โ€œFalseโ€
The Solution - Trade CPU Cycles for Lower Latency
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
VMFS on par or faster than RDM (approx. 1%)
Reference: http://www.vmware.com/techpapers/2017/sql-server-vsphere65-perf.html
Myth Revisited: RDM versus VMFS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc.
Storage โ€“ Recommendations
Use Multiple vSCSI Adapters
Allows for more queues and I/Oโ€™s in flight
Use pvscsi vSCSI Adapter
More efficient I/Oโ€™s per cycle
Donโ€™t Use RDMโ€™s
Unless needed for shared disk clustering, no longer a performance advantage
VMware Snapshots Should Be โ€˜Temporaryโ€™
Despite constant performance improvements, snapshots should not live forever, Co-Stop, Syncronous
Leverage Your Storage OEMโ€™s Integration Guide
They provide necessary guidance around items like multi-pathing
282
DOAG 2020 โ”‚ ยฉ2020 VMware, Inc.
vMotion
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 283
vMotion Workflow
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 284
vMotion Workflow
Create VM on Destination
1
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 285
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 286
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 287
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
4 vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 288
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
4
5
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 289
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
4
5
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
Execution
Switchover
Time of 1 sec
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 290
Quiesce VM on Source
Copy Memory
vMotion Workflow
Create VM on Destination
1
2
3
Transfer Device State
Resume VM on Destination
Power Off VM on Source
4
5
6
vMotion Network
Datastore
Source
ESXi Host
Destination
ESXi Host
Execution
Switchover
Time of 1 sec
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 291
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Iterative Memory Pre-Copy
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 292
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Iterative Memory Pre-Copy
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 293
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Iterative Memory Pre-Copy
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 294
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Phase 2:
Send the 3GB. While that transfer is happening, the VM dirties 1GB
Iterative Memory Pre-Copy
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 295
Memory Copy
Source VM Memory
Destination VM Memory
Phase 0:
Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB
Phase 1:
Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB
Phase 2:
Send the 3GB. While that transfer is happening, the VM dirties 1GB
Phase 3:
Send the remaining 1GB
Iterative Memory Pre-Copy
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 296
vMotion of Oracle RAC
Itโ€™s been working for a while โ€ฆ
297
Confidential โ”‚ ยฉ2018 VMware, Inc.
pre 6.5*
Trace Cost
LP remap
Prealloced memory
RDTSC cost
(SDPS)
Common Issues for Monster VMs
โ€น#โ€บ 298
Confidential โ”‚ ยฉ2018 VMware, Inc.
- use ESXi 6.5
- use multi NIC (10Gb+!)
299
DOAG 2020 โ”‚ ยฉ2020 VMware, Inc.
Performance
Troubleshooting
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 300
How to troubleshoot any issue
No matter how complicated
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 301
1. Identify a related system or component
that your team is not responsible for
How to troubleshoot any issue
No matter how complicated
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 302
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
How to troubleshoot any issue
No matter how complicated
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 303
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
3. Assign the issue to the responsible team
How to troubleshoot any issue
No matter how complicated
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 304
1. Identify a related system or component
that your team is not responsible for
2. Hypothesize that the issue is with that component
3. Assign the issue to the responsible team
4. When proven wrong, go to 1.
How to troubleshoot any issue
No matter how complicated
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 305
Tuning guide for a completely different system
Some advanced option found on a blog
Vaguely fitting KB
etc.
Perfectly valid methods to โ€œtroubleshootโ€ or โ€œtuneโ€
/s
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 306
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
Y
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 307
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
2. Help me solve problem Y
Y
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 308
The biggest enemy
"XY Problem"
1. I have problem X
1. I have problem Y
2. Help me solve problem Y
3. Hey! I still have a problem
Y
?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 309
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
Y
?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 310
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
X
Y
?
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 311
The biggest enemy
"XY Problem"
1. I have problem X
2. I think it is because of Y
3. I have problem Y
4. Help me solve problem Y
5. Hey! I still have a problem
tl;dr
donโ€™t jump to conclusions
X
Y
?
!
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 312
Where to use caution
Believing anybody
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 313
Where to use caution
Believing anybody
โ€œTrust, but verify.โ€œ*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 314
Where to use caution
Believing anybody
* From the Russian proverb:
"ะ”ะพะฒะตั€ัะน, ะฝะพ ะฟั€ะพะฒะตั€ัะน"
{Doveryai, no proveryai}
โ€œTrust, but verify.โ€œ*
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 315
Where to use caution
Comparing hosts, past and present, etc.
!=
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 316
Donโ€™t assume newer == better
Where to use caution
Comparing hosts, past and present, etc.
!=
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 317
Donโ€™t assume newer == better
Identify all differences
Where to use caution
Comparing hosts, past and present, etc.
!=
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 318
Where to use caution
Relying on Traffic Light Dashboards alone
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 319
All metrics green?
Where to use caution
Relying on Traffic Light Dashboards alone
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 320
All metrics green?
โ†’ All good then! (false negative)
Where to use caution
Relying on Traffic Light Dashboards alone
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 321
All metrics green?
โ†’ All good then! (false negative)
Some metrics red?
Where to use caution
Relying on Traffic Light Dashboards alone
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 322
All metrics green?
โ†’ All good then! (false negative)
Some metrics red?
โ†’ Something must be broken! (false positive)
Where to use caution
Relying on Traffic Light Dashboards alone
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 323
Where to use caution
Working through a list of known issues
Very good to start with!
โ€ข Donโ€™t spend more than half and hour
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 324
Where to use caution
Working through a list of known issues
Very good to start with!
โ€ข Donโ€™t spend more than half and hour
Can be from different perspectives
โ€ข Application
โ€ข Resources, e.g.:
โ€“ CPU contention
โ€“ Memory pressure
โ€“ Disk latency
โ€“ Etc.
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 325
Apply different methodologies as needed
e.g. directionally
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 326
Apply different methodologies as needed
e.g. directionally
Top โ†’ Down: drill down from the application / its metrics
โ€ข app specific / difficult to "profile" the whole path
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 327
Apply different methodologies as needed
e.g. directionally
Top โ†’ Down: drill down from the application / its metrics
โ€ข app specific / difficult to "profile" the whole path
Bottom โ†’ Up: investigate from the resource point of view
โ€ข easy to run into false positives / not all resources evenly covered
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 328
Apply different methodologies as needed
e.g. directionally
Top โ†’ Down: drill down from the application / its metrics
โ€ข app specific / difficult to "profile" the whole path
Bottom โ†’ Up: investigate from the resource point of view
โ€ข easy to run into false positives / not all resources evenly covered
Recommendation: Bottom Up Checklist first
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 329
What makes you think there is a performance issue
Has it ever performed well
What has changed since
Can it be quantified
What else is affected
What is the timing
Is it reproducible
etc.
Ask questions
Good ones, preferably
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 330
Take notes along the way
seriously
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 331
Take notes along the way
seriously
"Remember kids, the
only difference between
science and screwing
around is writing it
down."
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 332
Provide an exact timeline
Part of notetaking but often forgotten
2017-11-28
23:00 UTC
Upgrade
2017-11-29
07:00 UTC
Issue first
noticed
2017-11-29
> 23:59 UTC
Tried
everything
under the sun
and wrote
down nothing
2017-11-30
08:00
Called
GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 333
Be accurate and universal
https://xkcd.com/1179/
334
DOAG 2020 โ”‚ ยฉ2020 VMware, Inc.
SR examples
โ€œThe case of the unexplained โ€ฆโ€
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 335
Initial SR description:
โ€ข Oracle DB on virtual 64bit W2K8 three times slower than physical
โ€ข on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical
โ€ข benchmarked with production equivalent test script
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 336
Initial SR description:
โ€ข Oracle DB on virtual 64bit W2K8 three times slower than physical
โ€ข on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical
โ€ข benchmarked with production equivalent test script
Troubleshooting in support:
โ€ข checked logs for errors
โ€ข basics like power management, limits, etc
โ€ข research if similar issues have been reported
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 337
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 338
Reproducing in-house:
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 339
Reproducing in-house:
โ€ข the customer provided two pre-configured VMs
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 340
Reproducing in-house:
โ€ข the customer provided two pre-configured VMs
โ€ข during initial run, the 64bit VM performed worse by a factor of 3
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 341
Reproducing in-house:
โ€ข the customer provided two pre-configured VMs
โ€ข during initial run, the 64bit VM performed worse by a factor of 3
โ€ข automated benchmark start and result collection, dropped to 1.6 on avg.
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 342
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 343
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 344
Murphy's law strikes:
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 345
Murphy's law strikes:
โ€ข Minor configuration issues (DB not starting, tnsnames changes)
โ€ข Initial booking for lab server ran out and it was re-imaged
โ€ข Redeploy to local box was delayed due to a network issue
โ€ข Automation scripts had to be recreated
โ€ข Flashback store ran full
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 346
Murphy's law strikes:
โ€ข Minor configuration issues (DB not starting, tnsnames changes)
โ€ข Initial booking for lab server ran out and it was re-imaged
โ€ข Redeploy to local box was delayed due to a network issue
โ€ข Automation scripts had to be recreated
โ€ข Flashback store ran full
Our Oracle DBA configured both VMs with a default config โ€ฆ.
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 347
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 348
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 349
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 350
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 351
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 352
Example 1 โ€“ Oracle DB performance
Tales from GSS
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 353
Example 1 โ€“ Oracle DB performance
Tales from GSS
"The more updates or inserts
in a workload, the more
expensive it is to turn on
block checkingโ€œ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 354
The benchmark was an insert loopโ€ฆ
Example 1 โ€“ Oracle DB performance
Tales from GSS
"The more updates or inserts
in a workload, the more
expensive it is to turn on
block checkingโ€œ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 355
Example 1 โ€“ Oracle DB performance
In a Nutshell โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 356
Configuration issue
Example 1 โ€“ Oracle DB performance
In a Nutshell โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 357
Configuration issue
No virtualization fault
Example 1 โ€“ Oracle DB performance
In a Nutshell โ€ฆ
DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 358
Configuration issue
No virtualization fault
~70 hours
Example 1 โ€“ Oracle DB performance
In a Nutshell โ€ฆ

More Related Content

PDF
Project ACRN schedule framework introduction
PPTX
CPU Scheduling for Virtual Desktop Infrastructure
PPTX
webinar vmware v-sphere performance management Challenges and Best Practices
ย 
PDF
XPDDS19: Core Scheduling in Xen - Jรผrgen GroรŸ, SUSE
PPTX
VDI Design Guide
PDF
Core Scheduling for Virtualization: Where are We? (If we Want it!)
PPTX
3. CPU virtualization and scheduling
PPTX
Energy Efficiency in Large Scale Systems
Project ACRN schedule framework introduction
CPU Scheduling for Virtual Desktop Infrastructure
webinar vmware v-sphere performance management Challenges and Best Practices
ย 
XPDDS19: Core Scheduling in Xen - Jรผrgen GroรŸ, SUSE
VDI Design Guide
Core Scheduling for Virtualization: Where are We? (If we Want it!)
3. CPU virtualization and scheduling
Energy Efficiency in Large Scale Systems

Similar to 2020-ntn-vsphere_performance_principles_bondzio.pdf (20)

PDF
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02
PPTX
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
PDF
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
ย 
PDF
Advancedtroubleshooting 101208145718-phpapp01
PDF
The CPU Scheduler in VMware vSphere 5.1.
PDF
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
ย 
PDF
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
ย 
PDF
Vmware esx top commands doc 9279
PDF
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
ย 
PPT
OK Labs - Virtualization as the Nexus of Multicore Power Management
PDF
AIX Performance Tuning Session at STU2017
PPTX
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
ย 
PPTX
Operating System
PDF
Building vSphere Perf Monitoring Tools
PDF
Linux Performance Profiling and Monitoring
PDF
OSDC 2015: Georg Schรถnberger | Linux Performance Profiling and Monitoring
ย 
PPTX
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
PDF
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
PDF
The have no fear guide to virtualizing databases
Advancedperformancetroubleshootingusingesxtop 101110131727-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02
Vmwareperformancetroubleshooting 100224104321-phpapp02 (1)
VMworld 2013: Extreme Performance Series: Monster Virtual Machines
ย 
Advancedtroubleshooting 101208145718-phpapp01
The CPU Scheduler in VMware vSphere 5.1.
OSMC 2015: Linux Performance Profiling and Monitoring by Werner Fischer
ย 
OSMC 2015 | Linux Performance Profiling and Monitoring by Werner Fischer
ย 
Vmware esx top commands doc 9279
OSDC 2017 - Werner Fischer - Linux performance profiling and monitoring
ย 
OK Labs - Virtualization as the Nexus of Multicore Power Management
AIX Performance Tuning Session at STU2017
VMworld 2015: Extreme Performance Series - vSphere Compute & Memory
ย 
Operating System
Building vSphere Perf Monitoring Tools
Linux Performance Profiling and Monitoring
OSDC 2015: Georg Schรถnberger | Linux Performance Profiling and Monitoring
ย 
24 Hours of PASS, Summit Preview Session: Virtual SQL Server CPUs
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
The have no fear guide to virtualizing databases
Ad

Recently uploaded (20)

PPTX
Introuction about ICD -10 and ICD-11 PPT.pptx
PPTX
introduction about ICD -10 & ICD-11 ppt.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
ย 
PPTX
Funds Management Learning Material for Beg
PDF
๐Ÿ’ฐ ๐”๐Š๐“๐ˆ ๐Š๐„๐Œ๐„๐๐€๐๐†๐€๐ ๐Š๐ˆ๐๐„๐‘๐Ÿ’๐ƒ ๐‡๐€๐‘๐ˆ ๐ˆ๐๐ˆ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ“ ๐Ÿ’ฐ
ย 
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
PDF
The Internet -By the Numbers, Sri Lanka Edition
ย 
PPTX
Slides PPTX World Game (s) Eco Economic Epochs.pptx
PDF
Sims 4 Historia para lo sims 4 para jugar
PPTX
Introduction to Information and Communication Technology
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
ย 
PPTX
innovation process that make everything different.pptx
PPTX
Digital Literacy And Online Safety on internet
PPTX
artificial intelligence overview of it and more
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Introuction about ICD -10 and ICD-11 PPT.pptx
introduction about ICD -10 & ICD-11 ppt.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
ย 
Funds Management Learning Material for Beg
๐Ÿ’ฐ ๐”๐Š๐“๐ˆ ๐Š๐„๐Œ๐„๐๐€๐๐†๐€๐ ๐Š๐ˆ๐๐„๐‘๐Ÿ’๐ƒ ๐‡๐€๐‘๐ˆ ๐ˆ๐๐ˆ ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ“ ๐Ÿ’ฐ
ย 
international classification of diseases ICD-10 review PPT.pptx
Tenda Login Guide: Access Your Router in 5 Easy Steps
The Internet -By the Numbers, Sri Lanka Edition
ย 
Slides PPTX World Game (s) Eco Economic Epochs.pptx
Sims 4 Historia para lo sims 4 para jugar
Introduction to Information and Communication Technology
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
Triggering QUIC, presented by Geoff Huston at IETF 123
ย 
innovation process that make everything different.pptx
Digital Literacy And Online Safety on internet
artificial intelligence overview of it and more
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Ad

2020-ntn-vsphere_performance_principles_bondzio.pdf

  • 1. DOAG 2020 โ”‚ ยฉ2020 VMware, Inc. ESXi Performance Principles DOAG Edition Valentin Bondzio Sr. Staff TSE / GSS Premier Services 2020-01-23
  • 2. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 2 Brief Intro
  • 3. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 3 Brief Intro @VMware since 2009 Global Support Services / Premier Services Focus on Resource Management, Performance and Windows Internals Originally from Berlin, living in Ireland since 2007 And most importantly โ€ฆ
  • 4. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 4 Brief Intro Not an Oracle expert !
  • 5. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 5 Brief Intro Not an Oracle expert !
  • 6. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Agenda 6 CPU Scheduling and Usage Accounting The โ€œbasicsโ€ โ€œPower Managementโ€ The Good, the Better and the Ugly ESXi Memory Management More โ€œbasicsโ€ Local resource distribution What else is running on ESXi CPU Topology Abstraction CPU Socket != NUMA node
  • 7. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Agenda 7 CPU Scheduling and Usage Accounting The โ€œbasicsโ€ โ€œPower Managementโ€ The Good, the Better and the Ugly ESXi Memory Management More โ€œbasicsโ€ Local resource distribution What else is running on ESXi CPU Topology Abstraction CPU Socket != NUMA node +I/O stuff
  • 8. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Agenda 8 CPU Scheduling and Usage Accounting The โ€œbasicsโ€ โ€œPower Managementโ€ The Good, the Better and the Ugly ESXi Memory Management More โ€œbasicsโ€ Local resource distribution What else is running on ESXi CPU Topology Abstraction CPU Socket != NUMA node +I/O stuff +vMotion
  • 9. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Agenda 9 CPU Scheduling and Usage Accounting The โ€œbasicsโ€ โ€œPower Managementโ€ The Good, the Better and the Ugly ESXi Memory Management More โ€œbasicsโ€ Local resource distribution What else is running on ESXi CPU Topology Abstraction CPU Socket != NUMA node +I/O stuff +vMotion +Backup
  • 10. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 10 Resource guarantees and weighting (shares) on a per VM or โ€œResource Poolโ€ level CPU Scheduler Overview
  • 11. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 11 Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local) CPU Scheduler Overview What does the scheduler do? vCPU HT / Core vCPU vCPU vCPU vCPU vCPU
  • 12. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 12 Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local) โ€ข For fairness: select VM with the least (consumed CPU time / fair share) CPU Scheduler Overview What does the scheduler do? vCPU HT / Core vCPU vCPU vCPU vCPU vCPU
  • 13. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 13 Dispatch VMs (its โ€œworldsโ€) to honor CPU settings (Local) โ€ข For fairness: select VM with the least (consumed CPU time / fair share) โ€ข For priority: run latency-sensitive VM (high) before anyone else CPU Scheduler Overview What does the scheduler do? vCPU HT / Core vCPU vCPU vCPU vCPU vCPU vCPU IO
  • 14. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 14 LLC Place the worlds / threads on physical CPUs (Global) CPU Scheduler Overview What does the scheduler do? Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 LLC
  • 15. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 15 LLC Place the worlds / threads on physical CPUs (Global) CPU Scheduler Overview What does the scheduler do? โ€ข To balance load across physical execution contexts (PCPUs) Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 LLC VM VM VM VM
  • 16. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 16 LLC Place the worlds / threads on physical CPUs (Global) CPU Scheduler Overview What does the scheduler do? โ€ข To balance load across physical execution contexts (PCPUs) โ€ข To preserve cache state, minimize migration cost Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 LLC VM VM VM VM
  • 17. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 17 LLC Place the worlds / threads on physical CPUs (Global) CPU Scheduler Overview What does the scheduler do? โ€ข To balance load across physical execution contexts (PCPUs) โ€ข To preserve cache state, minimize migration cost โ€ข To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM) Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 LLC VM VM VM VM VM
  • 18. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 18 LLC Place the worlds / threads on physical CPUs (Global) CPU Scheduler Overview What does the scheduler do? โ€ข To balance load across physical execution contexts (PCPUs) โ€ข To preserve cache state, minimize migration cost โ€ข To avoid contention from hardware (HT, LLC, etc.) and sibling vCPUs (from the same VM) โ€ข To keep VMs or threads that have frequent communications close to each other Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 Core HT 0 HT 1 LLC VM VM VM VM VM VM VM
  • 19. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 19 CPU Scheduler Overview How does that look? 10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01 PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4 PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6 CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1 ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP 96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00 96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00 96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00 96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05 96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05 96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
  • 20. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 20 CPU Scheduler Overview How does that look? 10:10:29am up 2 days 48 min, 674 worlds, 1 VMs, 2 vCPUs; CPU load average: 0.02, 0.01, 0.01 PCPU USED(%): 0.3 0.1 0.0 0.3 0.2 0.1 0.0 0.0 0.0 0.2 50 50 4.1 0.1 0.1 0.0 0.0 0.0 0.1 0.0 0.0 0.1 0.0 0.0 AVG: 4.4 PCPU UTIL(%): 0.5 0.1 0.1 0.6 0.2 0.2 0.0 0.2 0.0 0.3 100 100 4.2 0.2 0.1 0.1 0.0 0.0 0.1 0.0 0.0 0.2 0.1 0.1 AVG: 8.6 CORE UTIL(%): 0.6 0.7 0.4 0.9 0.3 100 4.3 0.2 0.0 0.1 0.4 0.7 AVG: 9.1 ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP 96337 148153 vmx 1 0.02 0.01 0.02 61.82 - 37.86 0.00 0.00 96339 148153 NetWorld-VM-96338 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96340 148153 NUMASchedRemapEpochInitial 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96341 148153 vmast.96338 1 0.03 0.05 0.00 99.63 - 0.00 0.00 0.00 96343 148153 vmx-vthread-6 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96344 148153 vmx-mks:Debian86 1 0.00 0.00 0.00 61.55 - 38.13 0.00 0.00 96345 148153 vmx-svga:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96346 148153 vmx-vcpu-0:Debian86 1 62.35 99.68 0.00 0.00 0.00 0.00 0.00 0.05 96348 148153 vmx-vcpu-1:Debian86 1 62.36 99.67 0.00 0.00 0.00 0.01 0.00 0.05 96347 148153 PVSCSI-96338:0 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00 96350 148153 vmx-vthread-7:Debian86 1 0.00 0.00 0.00 99.68 - 0.00 0.00 0.00
  • 21. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 21 ? CPU Usage Accounting What states are there
  • 22. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 22 CPU Usage Accounting What states are there Not Running Running
  • 23. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 23 CPU Usage Accounting What states are there Idle (descheduled) Running Ready
  • 24. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 24 CPU Usage Accounting In an ideal world Idle (descheduled) Running Ready Usage
  • 25. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 25 CPU Usage Accounting What is charged against the VM Idle (descheduled) Running Ready Usage Overlap HT busy Frequency ..
  • 26. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 26 CPU Usage Accounting What is charged against the VM Idle (descheduled) Running Ready Usage Overlap HT busy Frequency .. โ€œstolen timeโ€
  • 27. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 27 CPU Usage Accounting What is charged against the VM Idle (descheduled) Running Ready Usage Overlap HT busy Frequency .. โ€œstolen timeโ€ s y s V m w a I t wait
  • 28. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 28 CPU Usage Accounting What is charged against the VM Idle (descheduled) Running Ready Usage Overlap HT busy Frequency .. โ€œstolen timeโ€ s y s V m w a I t wait C S T P R D Y M L M T
  • 29. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 29 %LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution. โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency stolen time aka โ€œ%LAT_Cโ€ CPU Usage Accounting Ideal Current Demand
  • 30. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 30 %LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution. โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency stolen time aka โ€œ%LAT_Cโ€ CPU Usage Accounting Ideal Current %LAT_C Demand
  • 31. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 31 %LAT_C captures the gap between โ€œidealโ€ execution (demand) and โ€œcurrentโ€ execution. โ€ข โ€œIdealโ€: unlimited dedicated cores running at nominal processor frequency stolen time aka โ€œ%LAT_Cโ€ CPU Usage Accounting Ideal Current %LAT_C Sources of Compute Latency: โ€ข VM resource contention: check %RDY and %CSTP โ€ข Power management (P-State): frequency throttling โ€ข Hardware contention: HTs are in use Demand
  • 32. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 32 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Intelยฎ Hyper-Threading Technology Cores and Threads
  • 33. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 33 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core
  • 34. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 34 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core
  • 35. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 35 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Maybe two slightly less capable โ€œlogicalโ€ cores? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core
  • 36. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 36 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Maybe two slightly less capable โ€œlogicalโ€ cores? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core โ€œphysicalโ€ core โ€œlogicalโ€ core0 โ€œlogicalโ€ core1
  • 37. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 37 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Maybe two slightly less capable โ€œlogicalโ€ cores? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core โ€œphysicalโ€ core โ€œlogicalโ€ core0 โ€œlogicalโ€ core1
  • 38. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 38 Does enabling HT โ€œspawnโ€ a less capable โ€œlogical coreโ€? Maybe two slightly less capable โ€œlogicalโ€ cores? Intelยฎ Hyper-Threading Technology Cores and Threads โ€œphysicalโ€ core โ€œlogicalโ€ core โ€œphysicalโ€ core โ€œphysicalโ€ core โ€œlogicalโ€ core0 โ€œlogicalโ€ core1
  • 39. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 39 Intelยฎ Hyper-Threading Technology Individual throughput reduction, aggregated throughput increase at high load 100 100 ~125
  • 40. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 40 Intelยฎ Hyper-Threading Technology on ESXi Throughput reduction is accounted for in USED 100 100
  • 41. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 41 Intelยฎ Hyper-Threading Technology on ESXi Throughput reduction is accounted for in USED 100 100 125
  • 42. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 42 Intelยฎ Hyper-Threading Technology on ESXi Throughput reduction is accounted for in USED 100 100 125 2 x 50 + 12.5 = 62.5
  • 43. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 43 Intelยฎ Hyper-Threading Technology on ESXi Throughput reduction is accounted for in USED 100 100 125 HTEfficiencyShift โ€“ Default: 2 HT is: 1: 50 % 2: 25 % 3: 12.5 % 4: 6.25 % 5: 3.125 % more efficient than no-HT 2 x 50 + 12.5 = 62.5
  • 44. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 44 CPU Usage Accounting Usage vs. Utilization
  • 45. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 45 Umbrella Term Power Management
  • 46. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 46 Umbrella Term Power Management
  • 47. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 47 Umbrella Term Power Management P-States Options aka: Power Regulator, CPU Power Management, EIST
  • 48. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 48 Umbrella Term Power Management P-States Deep C-States Options aka: Power Regulator, CPU Power Management, EIST
  • 49. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 49 Power Management refresher โ€ฆ P-State = voltage / frequency point C-State = idle state, running or varying degrees of stuff turned off P2 P1 / NF P0 / TB Frequency C0 C1-Cn P3 P4 P5 P6 P7 P8 P9 P10 P11 P12 P13
  • 50. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 50 C-State Transition
  • 51. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 51 C1 C1 C1 C1 C-State Transition
  • 52. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 52 C1 C1 C1 C1 C-State Transition ~1ยตs
  • 53. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 53 Deep C-State Transition
  • 54. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 54 Deep C-State Transition
  • 55. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 55 C6 C6 C6 C6 Deep C-State Transition
  • 56. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 56 C6 C6 C6 C6 Deep C-State Transition ~30ยตs
  • 57. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 57 Dell Power Management _Profiles_
  • 58. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 58 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS
  • 59. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 59 Who controls what? โ†’ allow control / ๏ƒŸ use Power Management refresher โ€ฆ CPU BIOS ESXi VM / guest
  • 60. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 60 Who controls what? โ†’ allow control / ๏ƒŸ use Power Management refresher โ€ฆ CPU BIOS ESXi VM / guest deep C- States P-States HLT / C1-Cn P-States
  • 61. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 61 Who controls what? โ†’ allow control / ๏ƒŸ use Power Management refresher โ€ฆ CPU BIOS ESXi VM / guest HLT / C1 deep C- States P-States HLT / C1-Cn P-States
  • 62. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 62 Who controls what? โ†’ allow control / ๏ƒŸ use Power Management refresher โ€ฆ CPU BIOS ESXi VM / guest HLT / C1 deep C- States P-States HLT / C1-Cn P-States
  • 63. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 63 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology)
  • 64. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 64 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom"
  • 65. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 65 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom" CPU Power Management โ†’ "System DPBM (DAPC)" "OS DBPM" "Maximum Performanceโ€œ
  • 66. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 66 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom" CPU Power Management โ†’ "System DPBM (DAPC)" "OS DBPM" "Maximum Performanceโ€œ C States โ†’ "Enabled" "Disabled"
  • 67. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 67 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom" CPU Power Management โ†’ "System DPBM (DAPC)" "OS DBPM" "Maximum Performanceโ€œ C States โ†’ "Enabled" "Disabled" P-States P-States P-States
  • 68. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 68 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom" CPU Power Management โ†’ "System DPBM (DAPC)" "OS DBPM" "Maximum Performanceโ€œ C States โ†’ "Enabled" "Disabled" P-States P-States P-States
  • 69. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 69 ESXi Power Management Policy Only affects whatโ€™s presented from the BIOS (DELL terminology) System Profile โ†’ "Performance Per Watt (DAPC)" "Performance Per Watt (OS)" "Performance" "Dense Configuration" "Custom" CPU Power Management โ†’ "System DPBM (DAPC)" "OS DBPM" "Maximum Performanceโ€œ C States โ†’ "Enabled" "Disabled" P-States P-States P-States C-States C-States
  • 70. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 70 Most likely โ€ฆ Which BIOS policy am I running on?
  • 71. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 71 Most likely โ€œDynamicโ€ Most likely โ€ฆ Which BIOS policy am I running on?
  • 72. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 72 Most likely โ€œDynamicโ€ Most likely โ€ฆ Which BIOS policy am I running on?
  • 73. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 73 Most likely โ€œDynamicโ€ Very likely โ€œPerformanceโ€ Most likely โ€ฆ Which BIOS policy am I running on?
  • 74. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 74 Most likely โ€œDynamicโ€ Which BIOS policy am I running on? 4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 94W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF 0 0.3 0.7 1 23 76 50.0 1 0.0 0.0 0 0 100 50.1 2 0.1 0.2 0 6 94 50.0 3 0.0 0.0 0 0 100 50.1 4 5.2 10.4 10 5 85 50.0 5 0.0 0.0 0 5 95 51.0 6 0.0 0.1 0 3 97 50.0 7 0.0 0.0 0 0 100 50.0 8 0.1 0.4 0 16 84 50.0 9 0.0 0.0 0 0 100 50.0 10 0.0 0.0 0 0 100 50.0 (โ€ฆ)
  • 75. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 75 Most likely โ€œDynamicโ€ Which BIOS policy am I running on? 4:30:58pm up 2 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 94W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %C2 %A/MPERF 0 0.3 0.7 1 23 76 50.0 1 0.0 0.0 0 0 100 50.1 2 0.1 0.2 0 6 94 50.0 3 0.0 0.0 0 0 100 50.1 4 5.2 10.4 10 5 85 50.0 5 0.0 0.0 0 5 95 51.0 6 0.0 0.1 0 3 97 50.0 7 0.0 0.0 0 0 100 50.0 8 0.1 0.4 0 16 84 50.0 9 0.0 0.0 0 0 100 50.0 10 0.0 0.0 0 0 100 50.0 (โ€ฆ)
  • 76. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 76 Most likely โ€œPerformanceโ€ Which BIOS policy am I running on? 4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 142W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %A/MPERF 0 0.0 0.1 0 100 108.3 1 0.1 0.1 0 100 108.4 2 0.1 0.1 0 100 108.3 3 0.0 0.1 0 100 108.4 4 0.0 0.0 0 100 108.3 5 18.0 16.7 17 83 108.3 6 0.0 0.1 0 100 108.4 7 0.2 0.2 0 100 108.3 8 0.0 0.0 0 100 108.3 9 0.1 0.2 0 100 108.3 10 0.0 0.1 0 100 108.3 (โ€ฆ)
  • 77. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 77 Most likely โ€œPerformanceโ€ Which BIOS policy am I running on? 4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 142W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %A/MPERF 0 0.0 0.1 0 100 108.3 1 0.1 0.1 0 100 108.4 2 0.1 0.1 0 100 108.3 3 0.0 0.1 0 100 108.4 4 0.0 0.0 0 100 108.3 5 18.0 16.7 17 83 108.3 6 0.0 0.1 0 100 108.4 7 0.2 0.2 0 100 108.3 8 0.0 0.0 0 100 108.3 9 0.1 0.2 0 100 108.3 10 0.0 0.1 0 100 108.3 (โ€ฆ)
  • 78. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 78 Most likely โ€œCustomโ€ Which BIOS policy am I running on? 5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00 Power Usage: 107W, Power Cap: N/A PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF 0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2 1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3 2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9 3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5 4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0 5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3 6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7 7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7 8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8 9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6 10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0 (โ€ฆ)
  • 79. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 79 Most likely โ€œCustomโ€ Which BIOS policy am I running on? 5:09:53pm up 6 min, 827 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.01, 0.01, 0.00 Power Usage: 107W, Power Cap: N/A PSTATE MHZ: 2401 2400 2300 2200 2100 2000 1900 1800 1700 1600 1500 1400 1300 1200 CPU %USED %UTIL %C0 %C1 %C2 %P0 %P1 %P2 %P3 %P4 %P5 %P6 %P7 %P8 %P9 %P10 %P11 %P12 %P13 %A/MPERF 0 0.2 0.4 0 16 83 62 0 0 0 0 0 0 0 0 0 0 0 0 38 75.2 1 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 59.3 2 0.0 0.1 0 5 95 15 0 0 0 0 0 0 0 0 0 0 0 0 85 57.9 3 0.0 0.0 0 1 98 38 0 0 0 0 0 0 0 0 0 0 0 0 62 61.5 4 0.0 0.0 0 4 96 5 0 0 0 0 0 0 0 0 0 0 0 0 95 52.0 5 0.0 0.0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 50.3 6 0.1 0.1 0 1 99 7 0 0 0 0 0 0 0 0 0 0 0 0 93 67.7 7 0.1 0.1 0 0 100 99 0 0 0 0 0 0 0 0 0 0 0 0 1 77.7 8 0.0 0.0 0 0 100 10 0 0 0 0 0 0 0 0 0 0 0 0 90 50.8 9 0.0 0.1 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 100 51.6 10 0.0 0.0 0 3 97 8 0 0 0 0 0 0 0 0 0 0 0 0 92 54.0 (โ€ฆ)
  • 80. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 80 The magic of Turbo Boost Dynamic, supported overclocking P1 TB1 Frequency C0 C-State depth P1 TB1 C1 C1 C1
  • 81. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 81 The magic of Turbo Boost Dynamic, supported overclocking P1 TB1 Frequency C0 C-State depth C6 P1 TB1 C1 C1 C1 P1 TB1 C0 P1 TB1 C6 C6 TB2 TB2 TB3 TB3 TB4 TB4
  • 82. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 82 The magic of Turbo Boost Dynamic, supported overclocking P1 TB1 Frequency C0 C-State depth C6 P1 TB1 C1 C1 C1 P1 TB1 C0 C6 C6 TB2 TB3 TB4 TB5 C6 TB6 TB7
  • 83. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 83 Power Policy โ€œplayfield" BIOS โ€œDynamicโ€ pre Haswell Bad Good Optimal*
  • 84. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 84 Power Policy โ€œplayfield" BIOS โ€œDynamicโ€ pre Haswell Bad Good Optimal* BIOS โ€œDynamicโ€ on Haswell+
  • 85. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 85 Power Policy โ€œplayfield" BIOS โ€œDynamicโ€ pre Haswell BIOS โ€œMaximum / High Performanceโ€ Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E) Bad Good Optimal* BIOS โ€œDynamicโ€ on Haswell+
  • 86. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 86 Power Policy โ€œplayfield" BIOS โ€œDynamicโ€ pre Haswell BIOS โ€œMaximum / High Performanceโ€ Same* as Custom BIOS + High Performance ESXi policy (with the exception of C1E) Custom BIOS + Custom or Balanced ESXi policy Bad Good Optimal* * a few workloads fare better with more deterministic performance BIOS โ€œDynamicโ€ on Haswell+
  • 87. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 87 Power Policy โ€œplayfield" Custom done right!
  • 88. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 88 Power Policy โ€œplayfield" Custom done right! Custom BIOS + ESXi Balanced โ€œDynamicโ€
  • 89. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 89 Power Policy โ€œplayfield" Custom done right! Custom BIOS + ESXi Balanced โ€œDynamicโ€
  • 90. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 90 Power Policy โ€œplayfield" Custom done right! โ€œPerformanceโ€ Custom BIOS + ESXi Balanced โ€œDynamicโ€ Custom BIOS + ESXi Balanced โ€œDynamicโ€
  • 91. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 91 Power Policy โ€œplayfield" Custom done right! โ€œPerformanceโ€ Custom BIOS + ESXi Balanced โ€œDynamicโ€ Custom BIOS + ESXi Balanced โ€œDynamicโ€
  • 92. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 92 โ€œWhy doesnโ€™t the frequency I see in Task Manager change?โ€ Frequently Asked Questions Power Management Trivia
  • 93. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 93 โ€œWhy doesnโ€™t the frequency I see in Task Manager change?โ€ โ€ข Possibility 1: You are looking at the brand string โ€ข Possibility 2: You are looking in the right place (but the guest OS has no way of knowing) Frequently Asked Questions Power Management Trivia
  • 94. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 94 โ€œWhy doesnโ€™t the frequency I see in Task Manager change?โ€ โ€ข Possibility 1: You are looking at the brand string โ€ข Possibility 2: You are looking in the right place (but the guest OS has no way of knowing) โ€ข Base frequency should be: CPUID.(EAX=16h):EAX[15-00] โ€“ But it seems Windows is getting that from SMBIOS Frequently Asked Questions Power Management Trivia
  • 95. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 95 โ€œWhy doesnโ€™t the frequency I see in Task Manager change?โ€ โ€ข Possibility 1: You are looking at the brand string โ€ข Possibility 2: You are looking in the right place (but the guest OS has no way of knowing) โ€ข Base frequency should be: CPUID.(EAX=16h):EAX[15-00] โ€“ But it seems Windows is getting that from SMBIOS Frequently Asked Questions Power Management Trivia # grep cpuid ./WinTest.vmx cpuid.16.eax = "----------------0100011100011000" cpuid.coresPerSocket = "6" cpuid.brandstring = "VMware (R) SuperSecretCPU (R) @ 18.2 GHz"
  • 96. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 96 โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€ Frequently Asked Questions Power Management Trivia
  • 97. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 97 โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€ Frequently Asked Questions Power Management Trivia 4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 142W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %A/MPERF 0 0.0 0.1 0 100 108.3 1 0.1 0.1 0 100 108.4 2 0.1 0.1 0 100 108.3 3 0.0 0.1 0 100 108.4 4 0.0 0.0 0 100 108.3 5 18.0 16.7 17 83 108.3 6 0.0 0.1 0 100 108.4 7 0.2 0.2 0 100 108.3 8 0.0 0.0 0 100 108.3 (โ€ฆ)
  • 98. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 98 โ€œI turned off all C-States, why is it still showing C1 in esxtop?โ€ โ€ข You canโ€™t turn off C1, you can disable different levels of deep C-States (C2+) Frequently Asked Questions Power Management Trivia 4:38:51pm up 1 min, 1276 worlds, 0 VMs, 0 vCPUs; CPU load average: 0.02, 0.00, 0.00 Power Usage: 142W, Power Cap: N/A PSTATE MHZ: CPU %USED %UTIL %C0 %C1 %A/MPERF 0 0.0 0.1 0 100 108.3 1 0.1 0.1 0 100 108.4 2 0.1 0.1 0 100 108.3 3 0.0 0.1 0 100 108.4 4 0.0 0.0 0 100 108.3 5 18.0 16.7 17 83 108.3 6 0.0 0.1 0 100 108.4 7 0.2 0.2 0 100 108.3 8 0.0 0.0 0 100 108.3 (โ€ฆ)
  • 99. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 99 โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€ Frequently Asked Questions Power Management Trivia
  • 100. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 100 โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€ โ€ข No, besides possibly: โ€“ PSU redundancy issues โ€“ Power capping โ€“ Temperature โ€“ Firmware bugs Frequently Asked Questions Power Management Trivia
  • 101. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 101 โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€ โ€ข No, besides possibly: โ€“ PSU redundancy issues โ€“ Power capping โ€“ Temperature โ€“ Firmware bugs โ€ข And definitely โ€ฆ โ€“ No ability to control P-/deep C-States โ€“ No maximum Turbo Boost frequencies โ€ฆ Frequently Asked Questions Power Management Trivia
  • 102. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 102 โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€ โ€ข No, besides possibly: โ€“ PSU redundancy issues โ€“ Power capping โ€“ Temperature โ€“ Firmware bugs โ€ข And definitely โ€ฆ โ€“ No ability to control P-/deep C-States โ€“ No maximum Turbo Boost frequencies โ€ฆ Frequently Asked Questions Power Management Trivia http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
  • 103. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 103 โ€œI wonโ€™t have any issues if I have everything set to High Performance in the BIOS, right?โ€ โ€ข No, besides possibly: โ€“ PSU redundancy issues โ€“ Power capping โ€“ Temperature โ€“ Firmware bugs โ€ข And definitely โ€ฆ โ€“ No ability to control P-/deep C-States โ€“ No maximum Turbo Boost frequencies โ€ฆ Frequently Asked Questions Power Management Trivia http://www.intel.com/content/dam/www/public/us/en/documents/specification-updates/xeon-e5-v3-spec-update.pdf
  • 104. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 104 Frequently Asked Questions Power Management Trivia
  • 105. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 105 โ€œI can clearly see C2 in perfmon on Windows, why are you lying to me?โ€ Frequently Asked Questions Power Management Trivia
  • 106. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 106 โ€œI can clearly see C2 in perfmon on Windows, why are you lying to me?โ€ โ€ข This is either a perfmon bug or a choice to represent an โ€œenlightenedโ€ idle feature โ€“ โ€œIntelligent Timer Tick Distribution (ITTD)โ€ โ€“ needs Windows 2012 R2 / vHW 11 โ€“ disable via โ€œmonitor.disable_guest_idle_msr = trueโ€ โ€ข you really shouldnโ€™t have to ever โ€ฆ Frequently Asked Questions Power Management Trivia
  • 107. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 107 What runs where and when The high level picture CPU VMK VMM OS / APPs
  • 108. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 108 What runs where and when Mostly Direct Exec CPU OS / APPs
  • 109. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 109 What runs where and when Mostly Direct Exec PCPU vCPU (โ€ฆ) 0xffffffff810a99d0 <+416>: test %eax,%eax 0xffffffff810a99d2 <+418>: je 0xffffffff810a9932 <cpu_startup_entry+258> 0xffffffff810a99d8 <+424>: callq 0xffffffff810c6ed0 <rcu_irq_enter> 0xffffffff810a99dd <+429>: mov 0x82740c(%rip),%r13 0xffffffff810a99e4 <+436>: test %r13,%r13 0xffffffff810a99e7 <+439>: je 0xffffffff810a9a07 <cpu_startup_entry+471> 0xffffffff810a99e9 <+441>: mov 0x0(%r13),%rax 0xffffffff810a99ed <+445>: no0xffffffff810a99f0 <+448>: mov 0x8(%r13),%rdi 0xffffffff810a99f4 <+452>: add $0x10,%r13 0xffffffff810a99f8 <+456>: xor %esi,%esi 0xffffffff810a99fa <+458>: mov %ebp,%edx 0xffffffff810a99fc <+460>: callq *%rax 0xffffffff810a99fe <+462>: mov 0x0(%r13),%rax 0xffffffff810a9a02 <+466>: test %rax,%rax 0xffffffff810a9a05 <+469>: jne 0xffffffff810a99f0 <cpu_startup_entry+448> 0xffffffff810a9a07 <+471>: callq 0xffffffff810c6e40 <rcu_irq_exit> 0xffffffff810a9a0c <+476>: jmpq 0xffffffff810a9932 <cpu_startup_entry+258> 0xffffffff810a9a11 <+481>: nopl 0x0(%rax) 0xffffffff810a9a18 <+488>: mov %gs:0xa0e4,%eax 0xffffffff810a9a20 <+496>: mov %eax,%eax 0xffffffff810a9a22 <+498>: bt %rax,(%rbx) (โ€ฆ)
  • 110. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 110 What runs where and when What about Idle? CPU vCPU (โ€ฆ) 0xffffffff81052c20 <+0>: sti 0xffffffff81052c21 <+1>: hlt *loud screeching sound*
  • 111. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 111 What runs where and when VMM traps on the privileged instruction and puts (with VMK) the vCPU to โ€œsleep CPU VMM (โ€ฆ) 0xffffffff81052c20 <+0>: sti 0xffffffff81052c21 <+1>: hlt *tells VMK to deschedule*
  • 112. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 112 What runs where and when The scheduler decides what next to run CPU VMK
  • 113. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 113 What runs where and when E.g. a vCPU / world that is ready to run CPU other vCPU
  • 114. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 114 What runs where and when ESXiโ€™s _own_ idle thread CPU C1-Cn
  • 115. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 115 Manage host physical memory to abstract physical memory away from guest. Allow memory over-commitment to provide an illusion of virtual DRAM to the guest. Hide transient host memory pressure from application Memory Management Overview Goals and Objectives Host Physical Memory Guest Memory ESXi
  • 116. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 116 Virtual Memory Process 0
  • 117. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 117 Virtual Memory Process 0 Process 1 Process 2 Process 3 Process n
  • 118. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 118 Virtual Memory From the processโ€™ point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Process 0 Process 1 Process 2 Process 3 Process n 256 TB 256 TB 256 TB 256 TB 256 TB
  • 119. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 119 Virtual Memory From the processโ€™ point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virtual Memory abstracts Process 0 Process 1 Process 2 Process 3 Process n Magic 256 TB 256 TB 256 TB 256 TB 256 TB
  • 120. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 120 Virtual Memory From the processโ€™ point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virtual Memory abstracts โ€ข It provides the possibility to overcommit โ€ฆ Process 0 Process 1 Process 2 Process 3 Process n Magic 256 TB 256 TB 256 TB 256 TB 256 TB
  • 121. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 121 Virtual Memory From the processโ€™ point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virtual Memory abstracts โ€ข It provides the possibility to overcommit โ€ฆ The process is unaware what is backing the virtual address โ€ข Physical Memory โ€ข Swap File Process 0 Process 1 Process 2 Process 3 Process n Magic 256 TB 256 TB 256 TB 256 TB 256 TB 64 TB 256 TB
  • 122. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 122 Virtual Physical Memory VM 0 Abstraction โ€ฆ
  • 123. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 123 Virtual Physical Memory VM 0 VM 1 VM 2 VM 3 VM n Abstraction โ€ฆ
  • 124. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 124 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security VM 0 VM 1 VM 2 VM 3 VM n 6 TB 6 TB 6 TB 6 TB 6 TB Abstraction โ€ฆ
  • 125. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 125 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virt. Physical Mem. abstracts VM 0 VM 1 VM 2 VM 3 VM n Magic 6 TB 6 TB 6 TB 6 TB 6 TB Abstraction โ€ฆ
  • 126. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 126 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virt. Physical Mem. abstracts โ€ข It provides the possibility to overcommit โ€ฆ VM 0 VM 1 VM 2 VM 3 VM n Magic 6 TB 6 TB 6 TB 6 TB 6 TB Abstraction โ€ฆ
  • 127. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 127 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virt. Physical Mem. abstracts โ€ข It provides the possibility to overcommit โ€ฆ The VM is unaware what is backing the physical address โ€ข Physical Memory โ€ข Swap File VM 0 VM 1 VM 2 VM 3 VM n Magic 6 TB 6 TB 6 TB 6 TB 6 TB 16 TB *** TB Abstraction โ€ฆ
  • 128. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 128 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virt. Physical Mem. abstracts โ€ข It provides the possibility to overcommit โ€ฆ The VM is unaware what is backing the physical address โ€ข Physical Memory โ€ข Swap File โ€ข Or COW, ZIP, BLN VM 0 VM 1 VM 2 VM 3 VM n Magic 6 TB 6 TB 6 TB 6 TB 6 TB 16 TB *** TB *** TB Abstraction โ€ฆ *** TB
  • 129. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 129 Virtual Physical Memory From the VMs point of view, it provides: โ€ข Contiguous address space โ€ข Isolation / Security Virt. Physical Mem. abstracts โ€ข It provides the possibility to overcommit โ€ฆ The VM is unaware what is backing the physical address โ€ข Physical Memory โ€ข Swap File โ€ข Or COW, ZIP, BLN VM 0 VM 1 VM 2 VM 3 VM n Magic 6 TB 6 TB 6 TB 6 TB 6 TB 16 TB *** TB *** TB Abstraction โ€ฆ *** TB *** TB *
  • 130. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 130 Understanding VM memory usage on ESXi Memory Management Overview How to Hide Memory Pressure?
  • 131. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 131 Understanding VM memory usage on ESXi Memory Management Overview How to Hide Memory Pressure? Total Memory Size
  • 132. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 132 Understanding VM memory usage on ESXi Memory Management Overview How to Hide Memory Pressure? Total Memory Size Allocated Memory Free Memory
  • 133. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 133 Understanding VM memory usage on ESXi Memory Management Overview How to Hide Memory Pressure? Total Memory Size Allocated Memory Free Memory Active Memory Idle Memory
  • 134. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 134 Understanding VM memory usage on ESXi Reclaim memory from VM if it using more than it is entitled. โ€ข Entitlement depends on configuration (reservation / shares / limit). โ€ข Techniques to reclaim memory from VMs includes: โ€“ Page sharing > Ballooning > Compression > Host swapping โ€“ Breaks host large pages Memory Management Overview How to Hide Memory Pressure? Total Memory Size Allocated Memory Free Memory Active Memory Idle Memory
  • 135. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 135 Active Memory Not the same as guest stats!
  • 136. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 136 Active Memory Not the same as guest stats!
  • 137. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 137 Active Memory Not the same as guest stats!
  • 138. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 138 Active Memory Not the same as guest stats! !=
  • 139. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 139 Active Memory ESXi VM level heuristic โ€ข Weighted, moving average โ€ข OS / VMTools independent โ€ข โ€œMemory Samplingโ€ aka Touched
  • 140. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 140 Active Memory ESXi VM level heuristic โ€ข Weighted, moving average โ€ข OS / VMTools independent โ€ข โ€œMemory Samplingโ€ Un-maps 100 random pages over the entire VMs mapped address space aka Touched VM mapped memory 4 KB 100 x 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ
  • 141. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 141 Active Memory ESXi VM level heuristic โ€ข Weighted, moving average โ€ข OS / VMTools independent โ€ข โ€œMemory Samplingโ€ Un-maps 100 random pages over the entire VMs mapped address space Monitors R/W for a minute (access traps to the VMM) aka Touched VM mapped memory 4 KB 100 x 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ / min
  • 142. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 142 Active Memory ESXi VM level heuristic โ€ข Weighted, moving average โ€ข OS / VMTools independent โ€ข โ€œMemory Samplingโ€ Un-maps 100 random pages over the entire VMs mapped address space Monitors R/W for a minute (access traps to the VMM) aka Touched VM mapped memory 4 KB 100 x 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ / min Read Read Write
  • 143. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 143 Active Memory ESXi VM level heuristic โ€ข Weighted, moving average โ€ข OS / VMTools independent โ€ข โ€œMemory Samplingโ€ Un-maps 100 random pages over the entire VMs mapped address space Monitors R/W for a minute (access traps to the VMM) After one minute, re-maps all remaining pages, starts again aka Touched VM mapped memory 4 KB 100 x 4 KB 4 KB 4 KB 4 KB 4 KB 4 KB โ€ฆ / min
  • 144. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 144 Active Memory vs. Consumed
  • 145. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 145 Active Memory What to trust? consumed active
  • 146. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 146 Active Memory What to trust? consumed active
  • 147. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 147 Active Memory What to trust? consumed active
  • 148. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 148 Active Memory What to trust? consumed active
  • 149. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 149 Active Memory What to trust? consumed active
  • 150. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 150 Active Memory What to trust? consumed active
  • 151. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 151 Active Memory What to trust? active consumed
  • 152. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 152 Active Memory What to trust? active consumed
  • 153. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 153 Guest Memory Metrics In a nutshell
  • 154. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 154 Guest Memory Metrics In a nutshell
  • 155. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 155 Guest Memory Metrics In a nutshell
  • 156. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 156 Guest Memory Metrics In a nutshell
  • 157. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 157 Guest Memory Metrics In a nutshell
  • 158. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 158 Active Memory Guests working set tends to be between active and consumed consumed active guest WS
  • 159. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 159 Active Memory Guest WS might over report (greedy app) active guest WS
  • 160. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 160 Active Memory But guest WS will not underreport consumed active guest WS
  • 161. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 161 Active Memory Not then end all of guest workload estimation
  • 162. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 162 Hierarchical Resource Groups From an ESXi perspective host The host owns all resources
  • 163. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 163 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user The host owns all resources Those are distributed by hierarchical resource groups
  • 164. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 164 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user The host owns all resources Those are distributed by hierarchical resource groups minfree kernel helper ft drivers vmotion โ€ฆ
  • 165. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 165 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user The host owns all resources Those are distributed by hierarchical resource groups minfree kernel helper ft drivers vmotion โ€ฆ vmkboot CpuSched Init โ€ฆ
  • 166. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 166 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user The host owns all resources Those are distributed by hierarchical resource groups Consumers can demand (request) resources minfree kernel helper ft drivers vmotion โ€ฆ vmkboot CpuSched Init โ€ฆ
  • 167. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 167 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user vCenter shows the sum of all user resources as: Total Reservation Capacity
  • 168. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 168 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user vCenter shows the sum of all user resources as: Total Reservation Capacity Global Resource Pools are then distributed back to hosts into Local RPs โ€ข Based on VMs demand โ€ฆ pool4 pool3 pool2 pool1
  • 169. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 169 Hierarchical Resource Groups From an ESXi perspective host system vim iofilters user vCenter shows the sum of all user resources as: Total Reservation Capacity Global Resource Pools are then distributed back to hosts into Local RPs โ€ข Based on VMs demand โ€ฆ vm.vmid vm.vmid vm.vmid โ€ฆ pool4 pool3 pool2 pool1
  • 170. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 170 Hierarchical Resource Groups From an ESXi perspective user Local Resource Groups are created and incrementally numbered when clients are instantiated: โ€ข VM starts / vMotions etc. โ€ข Based on VMs demand โ€ฆ vm.vmid vm.vmid โ€ฆ pool430 pool231 pool15 pool1
  • 171. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 171 Hierarchical Resource Groups From an ESXi perspective user Local Resource Groups are created and incrementally numbered when clients are instantiated: โ€ข VM starts / vMotions etc. โ€ข Based on VMs demand The local hierarchy is equal to the global one โ€ข Check for VM / LRG siblings โ€ฆ vm.vmid vm.vmid โ€ฆ pool430 pool231 pool15 pool1 vm.vmid pool321 vm.vmid vm.vmid โ€ฆ
  • 172. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 172 Hierarchical Resource Groups From an ESXi perspective user Local Resource Groups are created and incrementally numbered when clients are instantiated: โ€ข VM starts / vMotions etc. โ€ข Based on VMs demand The local hierarchy is equal to the global one โ€ข Check for VM / LRG siblings VM groups have multiple leaf consumers โ€ข vmid is local, not global โ€ฆ vm.vmid vm.vmid โ€ฆ pool430 pool231 pool15 pool1 vm.vmid pool321 vm.vmid vm.vmid โ€ฆ vmm uw ...
  • 173. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 173 cpu.resv Reservation cpu.limit Limit cpu.shares Shares cpu.resvLimit Expandable* mem.resv Reservation mem.limit Limit mem.shares Shares mem.resvLimit Expandable* Memory CPU Hierarchical Resource Groups Both Memory and CPU resources host system vim iofilters user โ€ฆ vm.vmid vm.vmid vm.vmid โ€ฆ pool4 pool3 pool2 pool1
  • 174. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 174 ESXi CLI (via SSH) โ€ฆ for CPU โ€ฆ for Memory โ€ฆ for comparison Tools sched-stats memstats esxtop
  • 175. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 175 Tools cmdline for local groups (no VMs) sched-stats
  • 176. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 176 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 sched-stats
  • 177. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 177 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ sched-stats
  • 178. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 178 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / sched-stats
  • 179. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 179 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / {printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn" ,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}' sched-stats
  • 180. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 180 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / {printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn" ,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}' vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz 0 host 0 933 1600 1600 1600 pct 4096000 5232 33168 1 system 0 659 10 -1 -1 pct 500 288 33168 2 vim 0 271 4944 -1 -1 mhz 500 4344 33768 3 iofilters 0 3 0 -1 -1 pct 1000 0 33168 4 user 0 0 0 -1 -1 pct 9000 0 33168 sched-stats
  • 181. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 181 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / {printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn" ,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}' vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz 0 host 0 933 1600 1600 1600 pct 4096000 5232 33168 1 system 0 659 10 -1 -1 pct 500 288 33168 2 vim 0 271 4944 -1 -1 mhz 500 4344 33768 3 iofilters 0 3 0 -1 -1 pct 1000 0 33168 4 user 0 0 0 -1 -1 pct 9000 0 33168 sched-stats
  • 182. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 182 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / {printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn" ,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}' vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz 0 host 0 933 1600 1600 1600 pct 4096000 5232 33168 1 system 0 659 10 -1 -1 pct 500 288 33168 2 vim 0 271 4944 -1 -1 mhz 500 4344 33768 3 iofilters 0 3 0 -1 -1 pct 1000 0 33168 4 user 0 0 0 -1 -1 pct 9000 0 33168 sched-stats
  • 183. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 183 Tools cmdline for local groups (no VMs) # sched-stats -t groups | awk 'NR == 1 || $2 ~ /^(vm.|pool)[0-9]+/ || /^ +[0-4] / {printf ("%-10s%-12s%-9s%-6s%-6s%-6s%-9s%-6s%-9s%-9s%-10sn" ,$1, $2, $3, $6, $8, $9, $10, $11, $12, $13, $14)}' vmgid name pgid vsmps amin amax minLimit units ashares resvMHz availMHz 0 host 0 933 1600 1600 1600 pct 4096000 5232 33168 1 system 0 659 10 -1 -1 pct 500 288 33168 2 vim 0 271 4944 -1 -1 mhz 500 4344 33768 3 iofilters 0 3 0 -1 -1 pct 1000 0 33168 4 user 0 0 0 -1 -1 pct 9000 0 33168 sched-stats
  • 184. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 184 Tools cmdline for local groups (with VMs) # memstats -r group-stats -g0 -l2 -s gid:name:min:max::conResv:availResv -u mb | sed -n '/^-+/,/.*n/p' --------------------------------------------------------------------------------- gid name min max conResv availResv --------------------------------------------------------------------------------- 0 host 97823 97823 28917 68907 1 system 20024 -1 20008 68923 2 vim 0 -1 3378 68907 3 iofilters 0 -1 25 68907 4 user 0 -1 5490 68907 --------------------------------------------------------------------------------- memstats
  • 185. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 185 Tools cmdline for local groups (with VMs) # memstats -r group-stats -g0 -l2 -s gid:name:min:max::conResv:availResv -u mb | sed -n '/^-+/,/.*n/p' --------------------------------------------------------------------------------- gid name min max conResv availResv --------------------------------------------------------------------------------- 0 host 97823 97823 28917 68907 1 system 20024 -1 20008 68923 2 vim 0 -1 3378 68907 3 iofilters 0 -1 25 68907 4 user 0 -1 5490 68907 --------------------------------------------------------------------------------- memstats
  • 186. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 186 Tools cmdline for local groups (with VMs) # memstats -r group-stats -g0 -l2 -s gid:name:min:max::conResv:availResv -u mb | sed -n '/^-+/,/.*n/p' --------------------------------------------------------------------------------- gid name min max conResv availResv --------------------------------------------------------------------------------- 0 host 97823 97823 28917 68907 1 system 20024 -1 20008 68923 2 vim 0 -1 3378 68907 3 iofilters 0 -1 25 68907 4 user 0 -1 5490 68907 --------------------------------------------------------------------------------- memstats
  • 187. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 187 (N)UMA + terminology
  • 188. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 188 DIMMs (N)UMA + terminology
  • 189. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 189 DIMMs Socket / Package (N)UMA + terminology 0
  • 190. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 190 DIMMs Socket / Package NUMA node (N)UMA + terminology 0
  • 191. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 191 DIMMs Socket / Package NUMA node (N)UMA + terminology 0 1
  • 192. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 192 DIMMs Socket / Package NUMA node Socket != NUMA node (N)UMA + terminology 0 2 1
  • 193. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 193 DIMMs Socket / Package NUMA node Socket != NUMA node (N)UMA + terminology 0 2 1
  • 194. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 194 DIMMs Socket / Package NUMA node Socket != NUMA node LLC / DIE (N)UMA + terminology 0 2 1
  • 195. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 195 DIMMs Socket / Package NUMA node Socket != NUMA node LLC / DIE (CoD, SNC / Zen1/2) (N)UMA + terminology 0 2 1
  • 196. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 196 Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
  • 197. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 197 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted)
  • 198. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 198 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head
  • 199. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 199 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle
  • 200. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 200 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room
  • 201. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 201 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room = L1-L2 / 10 cycles
  • 202. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 202 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room = L1-L2 / 10 cycles this building
  • 203. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 203 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room = L1-L2 / 10 cycles this building = DRAM / 100 cycles
  • 204. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 204 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room = L1-L2 / 10 cycles this building = DRAM / 100 cycles Finland + Algeria
  • 205. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 205 You want to calculate a + b and the operands are in: Importance of Memory Access Latency Jim Grayโ€™s Storage Latency Analogy (slightly adapted) your head = register / 1 cycle this room = L1-L2 / 10 cycles this building = DRAM / 100 cycles Finland + Algeria = Disk / 10^6 cycles
  • 206. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 206 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz
  • 207. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 207 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz
  • 208. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 208 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz
  • 209. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 209 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 210. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 210 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L1 32 KB 4-5 1.5 L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 211. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 211 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L1 32 KB 4-5 1.5 L2 256 KB 12 4 L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 212. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 212 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L1 32 KB 4-5 1.5 L2 256 KB 12 4 L3 8 MB 30 10 L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 213. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 213 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L1 32 KB 4-5 1.5 L2 256 KB 12 4 L3 8 MB 30 10 L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 214. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 214 Importance of Memory Access Latency Numbers based on Intel i7-3770 @ 3.4 GHz access size cycles ns L1 32 KB 4-5 1.5 L2 256 KB 12 4 L3 8 MB 30 10 DRAM GBs 30+ 66* L3 / Last Level Cache core 0 core 1 core 2 core 3 L1 L1 L1 L1 L2 L2 L2 L2 IMC QPI DRAM
  • 215. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 215 N(UMA) All sockets share the FSB to the Northbridge and hence the bandwidth โ€ข NB also known as โ€œMemory Controller Hubโ€ or MCH Uniform memory access latency between every CPU and every DIMM Von Neumann Bottleneck getting worse with faster CPUs / more RAM Pre-Opteron/Nehalem 1 2 NB 0 3
  • 216. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 216 N(UMA) All sockets share the FSB to the Northbridge and hence the bandwidth โ€ข NB also known as โ€œMemory Controller Hubโ€ or MCH Uniform memory access latency between every CPU and every DIMM Von Neumann Bottleneck getting worse with faster CPUs / more RAM Pre-Opteron/Nehalem 1 2 NB 0 3
  • 217. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 217 N(UMA) All sockets share the FSB to the Northbridge and hence the bandwidth โ€ข NB also known as โ€œMemory Controller Hubโ€ or MCH Uniform memory access latency between every CPU and every DIMM Von Neumann Bottleneck getting worse with faster CPUs / more RAM Pre-Opteron/Nehalem 1 2 NB 0 3
  • 218. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 218 0 1 3 2 NUMA Every NUMA node has its own Integrated Memory Controller (IMC) โ€ข Some AMDโ€™s (Bulldozer and newer) have two nodes per socket / package Remote access has to go over the interconnect and remote CPUโ€™s IMC โ€ข This adds additional latency making local and remote access Non-Uniform Post-Opteron/Nehalem
  • 219. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 219 0 1 3 2 NUMA Every NUMA node has its own Integrated Memory Controller (IMC) โ€ข Some AMDโ€™s (Bulldozer and newer) have two nodes per socket / package Remote access has to go over the interconnect and remote CPUโ€™s IMC โ€ข This adds additional latency making local and remote access Non-Uniform Post-Opteron/Nehalem
  • 220. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 220 0 1 3 2 NUMA Every NUMA node has its own Integrated Memory Controller (IMC) โ€ข Some AMDโ€™s (Bulldozer and newer) have two nodes per socket / package Remote access has to go over the interconnect and remote CPUโ€™s IMC โ€ข This adds additional latency making local and remote access Non-Uniform Post-Opteron/Nehalem
  • 221. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 221 0 1 3 2 NUMA Every NUMA node has its own Integrated Memory Controller (IMC) โ€ข Some AMDโ€™s (Bulldozer and newer) have two nodes per socket / package Remote access has to go over the interconnect and remote CPUโ€™s IMC โ€ข This adds additional latency making local and remote access Non-Uniform Post-Opteron/Nehalem
  • 222. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 222 0 1 3 2 NUMA 2 QPI / IC CPU /ns 0 1 2 3 0 72 291 323 294 1 296 72 293 315 2 319 296 71 296 3 290 325 300 71 local adjacent โ€œroutedโ€
  • 223. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 223 CPU /ns 0 1 2 3 0 136 194 198 201 1 194 135 194 196 2 201 194 135 200 3 202 197 198 135 0 1 3 2 NUMA 3 QPI / IC local adjacent
  • 224. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 224 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 225. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 225 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 226. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 226 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 227. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 227 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 228. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 228 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 229. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 229 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 230. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 230 0 1 3 2 NUMA Basic Migration Types NUMA clients (vCPUs + memory) are kept local to a home node Balance migrations re-assign the home node, memory follows vCPUs! Locality migrations set home node to where the most memory resides
  • 231. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 231 NUMA migration incurs significant cost. โ€ข All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers. โ€ข Copying memory pages across NUMA boundaries cost memory bandwidth. NUMA Scheduler Consideration Local Contention vs Remote Access
  • 232. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 232 NUMA migration incurs significant cost. โ€ข All pages need to be remapped, i.e. %localMemory initially drops to 0% and slowly recovers. โ€ข Copying memory pages across NUMA boundaries cost memory bandwidth. NUMA Scheduler Consideration Local Contention vs Remote Access 0 10 20 30 40 50 60 70 80 90 100 0 1 2 3 4 5 6 7 8 9 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 %Local-Mem #Migrations time (30sec) Memory Locality & NUMA-migrations (with NUMA Migration) %local #migrations 0 20 40 60 80 100 120 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 %Local #Migrations time (30sec units) Memory Locality & NUMA-migrations (No NUMA Migration) %local #migrations
  • 233. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 233 We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 234. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 234 We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 235. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 235 We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 236. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 236 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 237. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 237 Max vSMP 8 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 238. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 238 Max vSMP 8 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket vNUMA My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 239. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 239 Max vSMP 32 Max vSMP 8 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket vNUMA My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 240. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 240 numa.vcpu.min = 9 Max vSMP 32 Max vSMP 8 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket vNUMA My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5
  • 241. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 241 numa.vcpu.min = 9 Max vSMP 32 Max vSMP 8 CPS in GUI & supported We had good(ish) reasonsos vNUMA auto-sizing history (โ€ฆ) 2007 2008 2009 2010 2011 2012 2013 2014 (โ€ฆ) cpuid.coresPerSocket vNUMA My starting data @ VMware ESX 4.0 ESX 4.1 ESXi 5.0 ESXi 5.1 ESXi 5.5 ESXi 6.0 ESX 3.5 cpuid.coresPerSocket โ†’ numa.vcpu.maxPerVirtualNode
  • 242. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 242 VPD doesnโ€™t affect ESXi sched. PPD does define ESXi NUMA sched. โ€ข AKA NUMA client Doesnโ€™t influence ESXi sched. Might influence Guest / App sched. CPU Topology vNUMA Topology Two levelโ€™s of abstraction Virtual and Physical Proximity Domains VPD PPD CPS
  • 243. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 243 VPD doesnโ€™t affect ESXi sched. PPD does define ESXi NUMA sched. โ€ข AKA NUMA client Doesnโ€™t influence ESXi sched. Might influence Guest / App sched. CPU Topology vNUMA Topology Two levelโ€™s of abstraction Virtual and Physical Proximity Domains VPD PPD C PPD VPD C C C C C
  • 244. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 244 VPD doesnโ€™t affect ESXi sched. PPD does define ESXi NUMA sched. โ€ข AKA NUMA client Doesnโ€™t influence ESXi sched. Might influence Guest / App sched. CPU Topology vNUMA Topology Two levelโ€™s of abstraction Virtual and Physical Proximity Domains VPD PPD CPS PPD
  • 245. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 245 VPD doesnโ€™t affect ESXi sched. PPD does define ESXi NUMA sched. โ€ข AKA NUMA client Doesnโ€™t influence ESXi sched. Might influence Guest / App sched. CPU Topology vNUMA Topology Two levelโ€™s of abstraction Virtual and Physical Proximity Domains VPD PPD CPS PPD VPD
  • 246. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 246
  • 247. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 247 Running Compute Intensive Benchmark Case Study: Project Pacific https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better- performance-than-bare-metal.html
  • 248. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 248 Running Compute Intensive Benchmark Case Study: Project Pacific 43.5% local memory access on native Linux https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better- performance-than-bare-metal.html
  • 249. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 249 Running Compute Intensive Benchmark Case Study: Project Pacific 43.5% local memory access on native Linux 99.2% local memory access on Pacific Cluster https://blogs.vmware.com/performance/2019/10/how-does-project-pacific-deliver-8-better- performance-than-bare-metal.html
  • 250. 250 DOAG 2020 โ”‚ ยฉ2020 VMware, Inc. IO stuff
  • 251. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. vSphere 6.0 achieves Line Rate throughput on a 40GigE NIC Throughput โ†‘ from 20.5 to 35.5 Gbps CPU Used โ†“ from 36 to 13 % (per Gbps) Herculean Network IO
  • 252. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 252 By default, vSphere tunes for lower CPU usage by batching I/O operations Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency Network
  • 253. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 253 By default, vSphere tunes for lower CPU usage by batching I/O operations โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3) โ€ข When disabled: โ€“ Every packet received interrupts immediately โ€“ Every packet will be issued immediately Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency Network
  • 254. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 254 By default, vSphere tunes for lower CPU usage by batching I/O operations โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3) โ€ข When disabled: โ€“ Every packet received interrupts immediately โ€“ Every packet will be issued immediately Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency Network
  • 255. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 255 By default, vSphere tunes for lower CPU usage by batching I/O operations โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3) โ€ข When disabled: โ€“ Every packet received interrupts immediately โ€“ Every packet will be issued immediately Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency 1 1 Network
  • 256. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 256 By default, vSphere tunes for lower CPU usage by batching I/O operations โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3) โ€ข When disabled: โ€“ Every packet received interrupts immediately โ€“ Every packet will be issued immediately Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency 1 1 Network
  • 257. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 257 By default, vSphere tunes for lower CPU usage by batching I/O operations โ€ข By default, that is also the case for the RX and TX path on vNICs (here vmxnet3) โ€ข When disabled: โ€“ Every packet received interrupts immediately โ€“ Every packet will be issued immediately Virtual NIC coalescing - recap Trading CPU Cycles for Lower Latency 1 2 3 4 5 6 7 8 9 .. .. .. 1 2 3 4 5 6 7 8 9 .. .. .. Network
  • 258. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 258 Possible Latency Optimizations Network latency optimization on the VM level Network
  • 259. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 259 Disable LRO (Large Receive Offload) โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€ โ€ข Small packets are no longer concatenated into larger ones Possible Latency Optimizations Network latency optimization on the VM level Network
  • 260. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 260 Disable LRO (Large Receive Offload) โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€ โ€ข Small packets are no longer concatenated into larger ones Disable (vNIC) coalescing โ€ข VMX option: โ€œethernetX.coalescingScheme = disabledโ€ โ€ข Issue TX immediately and immediately interrupt on RX Possible Latency Optimizations Network latency optimization on the VM level Network
  • 261. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 261 Disable LRO (Large Receive Offload) โ€ข Host wide: โ€œNet.Vmxnet3SwLRO = falseโ€ โ€ข Small packets are no longer concatenated into larger ones Disable (vNIC) coalescing โ€ข VMX option: โ€œethernetX.coalescingScheme = disabledโ€ โ€ข Issue TX immediately and immediately interrupt on RX Disable Dynamic queueing โ€ข NetQueue feature, load balances and combines less used queues โ€ข Disabling guarantees a single queue for the VM Possible Latency Optimizations Network latency optimization on the VM level Network
  • 262. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Network โ€“ Recommendations Use vmxnet3 Guest Network Driver Very efficient and required for maximum performance= Evaluate Disabling Interrupt Coalescing Default mechanism may induce small amounts of latency in favor of throughout Itโ€™s a 10Gb+ World 1Gb saturation is real, more bandwidth required today, especially in light of vSAN, MonsterVM vMotion Use Latency Sensitivity High โ€˜Cautiouslyโ€™ While it can reduce latency and jitter in the 10us use case, it comes at a cost with core reservations, etc Requires FULL CPU and MEM reservation โ€“ or it wonโ€™t work and wonโ€™t tell you
  • 263. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Herculean Storage IO โ€ข More than 1 Million IOPs from 1 VM Hypervisor: vSphere 5.1 Server: HP DL380 Gen8 CPU: 2 x Intel Xeon E5-2690, HT disabled Memory: 256GB HBAs: 5 x QLE2562 Storage: 2 x Violin Memory 6616 Flash Arrays VM: Windows Server 2008 R2, 8 vCPUs and 48GB. Iometer Config: 4K IO size w/ 16 workers Reference: http://blogs.vmware.com/performance/2012/08/1millioniops-on-1vm.html
  • 264. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Bare-metal to virtual TPC-C* gap then and now(ish) * Non-complaint, fair-use implementation of the workload on Oracle 12c. Not comparable to official results.
  • 265. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Bare-metal to virtual TPC-C* gap then and now(ish) * Non-complaint, fair-use implementation of the workload on Oracle 12c. Not comparable to official results. - 30 %
  • 266. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Bare-metal to virtual TPC-C* gap then and now(ish) * Non-complaint, fair-use implementation of the workload on Oracle 12c. Not comparable to official results. - 30 % - 10%
  • 267. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Scaling out vs. up on the same host to amortize overhead 1416.37 0 200 400 600 800 1000 1200 1400 1600 Baremetal tpsE Throughput Score TPC-E on native HP Proliant DL 385 G8 http://blogs.vmware.com/vsphere/2013/09/worlds-first-tpc-vms-benchmark-result.html http://www.tpc.org/4064 / http://www.tpc.org/5201 470.31 468.11 457.55 0 200 400 600 800 1000 1200 1400 1600 Virtual tpsE of 3 VMs running TPC-VMS Throughput Score TPC-VMS on virtualized HP Proliant DL 385 G8 VM3 VM2 VM1
  • 268. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Storage I/O latencies are higher in virtual The Problem - with Database Logs
  • 269. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Storage I/O latencies are higher in virtual Usually not a noticeable problem for Data IO โ€ข Long (5+ ms) latency on HDDs โ€ข Random I/O, Many threads banging on the same spindle(s) โ€ข Even some SSDs are ~1ms The Problem - with Database Logs
  • 270. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Storage I/O latencies are higher in virtual Usually not a noticeable problem for Data IO โ€ข Long (5+ ms) latency on HDDs โ€ข Random I/O, Many threads banging on the same spindle(s) โ€ข Even some SSDs are ~1ms Not OK for Redo Log access The Problem - with Database Logs
  • 271. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Storage I/O latencies are higher in virtual Usually not a noticeable problem for Data IO โ€ข Long (5+ ms) latency on HDDs โ€ข Random I/O, Many threads banging on the same spindle(s) โ€ข Even some SSDs are ~1ms Not OK for Redo Log access โ€ข Short (<<1ms latency) โ€ข Sequential I/O, Single-threaded, Write-Only โ€ข Typically a write-back cache in the HBA or the array โ€ข Check the Top 5 wait events in Oracle AWR or equivalent database health reports The Problem - with Database Logs
  • 272. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. The Solution - Trade CPU Cycles for Lower Latency
  • 273. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. By default, vSphere tunes for lower CPU usage by batching I/O operations The Solution - Trade CPU Cycles for Lower Latency
  • 274. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. By default, vSphere tunes for lower CPU usage by batching I/O operations But when sensing low IOPS, vSphere stops batching and switches to low latency mode The Solution - Trade CPU Cycles for Lower Latency
  • 275. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. By default, vSphere tunes for lower CPU usage by batching I/O operations But when sensing low IOPS, vSphere stops batching and switches to low latency mode โ€ข For lowest latency, put the log device on a vSCSI adapter by itself โ€ข Batching and coalescing is on a per-vSCSI bus, not device(!) basis โ€ข Explicit tuning can prove more effective though The Solution - Trade CPU Cycles for Lower Latency
  • 276. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Explicit workaround on the issuing path: โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel โ€“ But dynamically adjust for low IOPS case The Solution - Trade CPU Cycles for Lower Latency
  • 277. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Explicit workaround on the issuing path: โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel โ€“ But dynamically adjust for low IOPS case โ€ข To explicitly force immediate initiation of I/O operation (sync) โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€ The Solution - Trade CPU Cycles for Lower Latency
  • 278. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Explicit workaround on the issuing path: โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel โ€“ But dynamically adjust for low IOPS case โ€ข To explicitly force immediate initiation of I/O operation (sync) โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€ Explicit workaround on the completion path: โ€ข Default is coalescing of Virtual Interrupts โ€“ vSphere automatically suspends interrupt coalescing for low IOPS workloads The Solution - Trade CPU Cycles for Lower Latency
  • 279. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Explicit workaround on the issuing path: โ€ข Default is Asynchronous request passing from vSCSI adapter to VMKernel โ€“ But dynamically adjust for low IOPS case โ€ข To explicitly force immediate initiation of I/O operation (sync) โ€“ scsiNNN.reqCallThreshold = โ€œ1โ€ Explicit workaround on the completion path: โ€ข Default is coalescing of Virtual Interrupts โ€“ vSphere automatically suspends interrupt coalescing for low IOPS workloads โ€ข Or explicitly disable Virtual Interrupt Coalescing โ€“ For PVSCSI: scsiNNN.intrCoalescing = โ€œFalseโ€ โ€“ For other vHBAs: scsiNNN.ic = โ€œFalseโ€ The Solution - Trade CPU Cycles for Lower Latency
  • 280. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. VMFS on par or faster than RDM (approx. 1%) Reference: http://www.vmware.com/techpapers/2017/sql-server-vsphere65-perf.html Myth Revisited: RDM versus VMFS
  • 281. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. Storage โ€“ Recommendations Use Multiple vSCSI Adapters Allows for more queues and I/Oโ€™s in flight Use pvscsi vSCSI Adapter More efficient I/Oโ€™s per cycle Donโ€™t Use RDMโ€™s Unless needed for shared disk clustering, no longer a performance advantage VMware Snapshots Should Be โ€˜Temporaryโ€™ Despite constant performance improvements, snapshots should not live forever, Co-Stop, Syncronous Leverage Your Storage OEMโ€™s Integration Guide They provide necessary guidance around items like multi-pathing
  • 282. 282 DOAG 2020 โ”‚ ยฉ2020 VMware, Inc. vMotion
  • 283. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 283 vMotion Workflow vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 284. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 284 vMotion Workflow Create VM on Destination 1 vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 285. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 285 Copy Memory vMotion Workflow Create VM on Destination 1 2 vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 286. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 286 Quiesce VM on Source Copy Memory vMotion Workflow Create VM on Destination 1 2 3 vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 287. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 287 Quiesce VM on Source Copy Memory vMotion Workflow Create VM on Destination 1 2 3 Transfer Device State 4 vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 288. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 288 Quiesce VM on Source Copy Memory vMotion Workflow Create VM on Destination 1 2 3 Transfer Device State Resume VM on Destination 4 5 vMotion Network Datastore Source ESXi Host Destination ESXi Host
  • 289. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 289 Quiesce VM on Source Copy Memory vMotion Workflow Create VM on Destination 1 2 3 Transfer Device State Resume VM on Destination 4 5 vMotion Network Datastore Source ESXi Host Destination ESXi Host Execution Switchover Time of 1 sec
  • 290. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 290 Quiesce VM on Source Copy Memory vMotion Workflow Create VM on Destination 1 2 3 Transfer Device State Resume VM on Destination Power Off VM on Source 4 5 6 vMotion Network Datastore Source ESXi Host Destination ESXi Host Execution Switchover Time of 1 sec
  • 291. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 291 Memory Copy Source VM Memory Destination VM Memory Phase 0: Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB Iterative Memory Pre-Copy
  • 292. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 292 Memory Copy Source VM Memory Destination VM Memory Phase 0: Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB Iterative Memory Pre-Copy
  • 293. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 293 Memory Copy Source VM Memory Destination VM Memory Phase 0: Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB Phase 1: Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB Iterative Memory Pre-Copy
  • 294. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 294 Memory Copy Source VM Memory Destination VM Memory Phase 0: Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB Phase 1: Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB Phase 2: Send the 3GB. While that transfer is happening, the VM dirties 1GB Iterative Memory Pre-Copy
  • 295. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 295 Memory Copy Source VM Memory Destination VM Memory Phase 0: Copy the VMโ€™s 40GB of memory, trace pages. As we send that memory, the VM dirties 10GB Phase 1: Retransmit the dirtied 10GB. In the process, the VM dirties another 3GB Phase 2: Send the 3GB. While that transfer is happening, the VM dirties 1GB Phase 3: Send the remaining 1GB Iterative Memory Pre-Copy
  • 296. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 296 vMotion of Oracle RAC Itโ€™s been working for a while โ€ฆ
  • 297. 297 Confidential โ”‚ ยฉ2018 VMware, Inc. pre 6.5* Trace Cost LP remap Prealloced memory RDTSC cost (SDPS) Common Issues for Monster VMs
  • 298. โ€น#โ€บ 298 Confidential โ”‚ ยฉ2018 VMware, Inc. - use ESXi 6.5 - use multi NIC (10Gb+!)
  • 299. 299 DOAG 2020 โ”‚ ยฉ2020 VMware, Inc. Performance Troubleshooting
  • 300. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 300 How to troubleshoot any issue No matter how complicated
  • 301. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 301 1. Identify a related system or component that your team is not responsible for How to troubleshoot any issue No matter how complicated
  • 302. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 302 1. Identify a related system or component that your team is not responsible for 2. Hypothesize that the issue is with that component How to troubleshoot any issue No matter how complicated
  • 303. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 303 1. Identify a related system or component that your team is not responsible for 2. Hypothesize that the issue is with that component 3. Assign the issue to the responsible team How to troubleshoot any issue No matter how complicated
  • 304. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 304 1. Identify a related system or component that your team is not responsible for 2. Hypothesize that the issue is with that component 3. Assign the issue to the responsible team 4. When proven wrong, go to 1. How to troubleshoot any issue No matter how complicated
  • 305. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 305 Tuning guide for a completely different system Some advanced option found on a blog Vaguely fitting KB etc. Perfectly valid methods to โ€œtroubleshootโ€ or โ€œtuneโ€ /s
  • 306. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 306 The biggest enemy "XY Problem" 1. I have problem X 1. I have problem Y Y
  • 307. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 307 The biggest enemy "XY Problem" 1. I have problem X 1. I have problem Y 2. Help me solve problem Y Y
  • 308. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 308 The biggest enemy "XY Problem" 1. I have problem X 1. I have problem Y 2. Help me solve problem Y 3. Hey! I still have a problem Y ?
  • 309. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 309 The biggest enemy "XY Problem" 1. I have problem X 2. I think it is because of Y 3. I have problem Y 4. Help me solve problem Y 5. Hey! I still have a problem Y ?
  • 310. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 310 The biggest enemy "XY Problem" 1. I have problem X 2. I think it is because of Y 3. I have problem Y 4. Help me solve problem Y 5. Hey! I still have a problem X Y ?
  • 311. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 311 The biggest enemy "XY Problem" 1. I have problem X 2. I think it is because of Y 3. I have problem Y 4. Help me solve problem Y 5. Hey! I still have a problem tl;dr donโ€™t jump to conclusions X Y ? !
  • 312. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 312 Where to use caution Believing anybody
  • 313. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 313 Where to use caution Believing anybody โ€œTrust, but verify.โ€œ*
  • 314. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 314 Where to use caution Believing anybody * From the Russian proverb: "ะ”ะพะฒะตั€ัะน, ะฝะพ ะฟั€ะพะฒะตั€ัะน" {Doveryai, no proveryai} โ€œTrust, but verify.โ€œ*
  • 315. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 315 Where to use caution Comparing hosts, past and present, etc. !=
  • 316. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 316 Donโ€™t assume newer == better Where to use caution Comparing hosts, past and present, etc. !=
  • 317. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 317 Donโ€™t assume newer == better Identify all differences Where to use caution Comparing hosts, past and present, etc. !=
  • 318. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 318 Where to use caution Relying on Traffic Light Dashboards alone
  • 319. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 319 All metrics green? Where to use caution Relying on Traffic Light Dashboards alone
  • 320. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 320 All metrics green? โ†’ All good then! (false negative) Where to use caution Relying on Traffic Light Dashboards alone
  • 321. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 321 All metrics green? โ†’ All good then! (false negative) Some metrics red? Where to use caution Relying on Traffic Light Dashboards alone
  • 322. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 322 All metrics green? โ†’ All good then! (false negative) Some metrics red? โ†’ Something must be broken! (false positive) Where to use caution Relying on Traffic Light Dashboards alone
  • 323. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 323 Where to use caution Working through a list of known issues Very good to start with! โ€ข Donโ€™t spend more than half and hour
  • 324. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 324 Where to use caution Working through a list of known issues Very good to start with! โ€ข Donโ€™t spend more than half and hour Can be from different perspectives โ€ข Application โ€ข Resources, e.g.: โ€“ CPU contention โ€“ Memory pressure โ€“ Disk latency โ€“ Etc.
  • 325. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 325 Apply different methodologies as needed e.g. directionally
  • 326. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 326 Apply different methodologies as needed e.g. directionally Top โ†’ Down: drill down from the application / its metrics โ€ข app specific / difficult to "profile" the whole path
  • 327. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 327 Apply different methodologies as needed e.g. directionally Top โ†’ Down: drill down from the application / its metrics โ€ข app specific / difficult to "profile" the whole path Bottom โ†’ Up: investigate from the resource point of view โ€ข easy to run into false positives / not all resources evenly covered
  • 328. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 328 Apply different methodologies as needed e.g. directionally Top โ†’ Down: drill down from the application / its metrics โ€ข app specific / difficult to "profile" the whole path Bottom โ†’ Up: investigate from the resource point of view โ€ข easy to run into false positives / not all resources evenly covered Recommendation: Bottom Up Checklist first
  • 329. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 329 What makes you think there is a performance issue Has it ever performed well What has changed since Can it be quantified What else is affected What is the timing Is it reproducible etc. Ask questions Good ones, preferably
  • 330. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 330 Take notes along the way seriously
  • 331. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 331 Take notes along the way seriously "Remember kids, the only difference between science and screwing around is writing it down."
  • 332. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 332 Provide an exact timeline Part of notetaking but often forgotten 2017-11-28 23:00 UTC Upgrade 2017-11-29 07:00 UTC Issue first noticed 2017-11-29 > 23:59 UTC Tried everything under the sun and wrote down nothing 2017-11-30 08:00 Called GSS
  • 333. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 333 Be accurate and universal https://xkcd.com/1179/
  • 334. 334 DOAG 2020 โ”‚ ยฉ2020 VMware, Inc. SR examples โ€œThe case of the unexplained โ€ฆโ€
  • 335. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 335 Initial SR description: โ€ข Oracle DB on virtual 64bit W2K8 three times slower than physical โ€ข on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical โ€ข benchmarked with production equivalent test script Example 1 โ€“ Oracle DB performance Tales from GSS
  • 336. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 336 Initial SR description: โ€ข Oracle DB on virtual 64bit W2K8 three times slower than physical โ€ข on 32bit W2K8 and 32/64bit RHEL5, only 5% slower than physical โ€ข benchmarked with production equivalent test script Troubleshooting in support: โ€ข checked logs for errors โ€ข basics like power management, limits, etc โ€ข research if similar issues have been reported Example 1 โ€“ Oracle DB performance Tales from GSS
  • 337. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 337 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 338. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 338 Reproducing in-house: Example 1 โ€“ Oracle DB performance Tales from GSS
  • 339. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 339 Reproducing in-house: โ€ข the customer provided two pre-configured VMs Example 1 โ€“ Oracle DB performance Tales from GSS
  • 340. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 340 Reproducing in-house: โ€ข the customer provided two pre-configured VMs โ€ข during initial run, the 64bit VM performed worse by a factor of 3 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 341. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 341 Reproducing in-house: โ€ข the customer provided two pre-configured VMs โ€ข during initial run, the 64bit VM performed worse by a factor of 3 โ€ข automated benchmark start and result collection, dropped to 1.6 on avg. Example 1 โ€“ Oracle DB performance Tales from GSS
  • 342. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 342 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 343. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 343 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 344. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 344 Murphy's law strikes: Example 1 โ€“ Oracle DB performance Tales from GSS
  • 345. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 345 Murphy's law strikes: โ€ข Minor configuration issues (DB not starting, tnsnames changes) โ€ข Initial booking for lab server ran out and it was re-imaged โ€ข Redeploy to local box was delayed due to a network issue โ€ข Automation scripts had to be recreated โ€ข Flashback store ran full Example 1 โ€“ Oracle DB performance Tales from GSS
  • 346. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 346 Murphy's law strikes: โ€ข Minor configuration issues (DB not starting, tnsnames changes) โ€ข Initial booking for lab server ran out and it was re-imaged โ€ข Redeploy to local box was delayed due to a network issue โ€ข Automation scripts had to be recreated โ€ข Flashback store ran full Our Oracle DBA configured both VMs with a default config โ€ฆ. Example 1 โ€“ Oracle DB performance Tales from GSS
  • 347. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 347 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 348. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 348 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 349. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 349 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 350. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 350 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 351. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 351 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 352. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 352 Example 1 โ€“ Oracle DB performance Tales from GSS
  • 353. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 353 Example 1 โ€“ Oracle DB performance Tales from GSS "The more updates or inserts in a workload, the more expensive it is to turn on block checkingโ€œ
  • 354. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 354 The benchmark was an insert loopโ€ฆ Example 1 โ€“ Oracle DB performance Tales from GSS "The more updates or inserts in a workload, the more expensive it is to turn on block checkingโ€œ
  • 355. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 355 Example 1 โ€“ Oracle DB performance In a Nutshell โ€ฆ
  • 356. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 356 Configuration issue Example 1 โ€“ Oracle DB performance In a Nutshell โ€ฆ
  • 357. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 357 Configuration issue No virtualization fault Example 1 โ€“ Oracle DB performance In a Nutshell โ€ฆ
  • 358. DOAG 2020 NOON2NOON โ”‚ ยฉ2020 VMware, Inc. 358 Configuration issue No virtualization fault ~70 hours Example 1 โ€“ Oracle DB performance In a Nutshell โ€ฆ