SlideShare a Scribd company logo
Brought to you by
Analyze Virtual Machine
Overhead Compared to Bare
Metal with Tracing
Steven Rostedt
Software Engineer at Google Inc
Steven Rostedt
Software Engineer at Google
■ One of the original authors of the PREEMPT_RT patch set
■ Creator and maintainer of ftrace (the official Linux tracer)
■ Creator of “make localmodconfig”
■ Creator and maintainer of “ktest.pl” Linux testing framework
Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
■ Can easily shutdown and restart
● No lengthy BIOS
Using Virtual Machines
Pros:
■ Gives more flexibility
● Can migrate from one machine to another
● Duplicate VMs
● Can save them (archives)
■ Can easily shutdown and restart
● No lengthy BIOS
■ More “secure”
● Better isolation of tasks
Using Virtual Machines
Cons:
■ Takes up more memory
● Requires two operating systems on the machine
Using Virtual Machines
Cons:
■ Takes up more memory
● Requires two operating systems on the machine
■ Adds overhead
● There’s indirection between the VM and the devices
Virtual Machines Overhead
How bad is it really?
Virtual Machines Overhead
How bad is it really?
■ The pros usually outweigh the cons
● We are always trying to improve
Virtual Machines Overhead
How bad is it really?
■ The pros usually outweigh the cons
● We are always trying to improve
■ Where to look for that improvement
● Usually anytime the hypervisor needs to do work for the VM
● Utilize virtio more
■ Virtual devices that take advantage of the virtual environment
■ Easier to implement, and less overhead than simulating real devices
Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
● Can record to a file for post processing
Tools to determine Virtual Machine Overhead
As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
■ trace-cmd
● A front end CLI tool to interact with the Linux kernel tracing facility
● Can start tracing and examine the live events
● Can record to a file for post processing
● This talk will focus on the post processing
What to look at for Virtual Machine Overhead
Databases are a common critical path of virtual machine services
■ sysbench
● https://github.com/akopytov/sysbench
● Available on most distributions
● Has a mysql benchmark
● Good to compare different machines (in this case, VM vs Bare-metal)
What to look at for Virtual Machine Overhead
Databases are a common critical path of virtual machine services
■ sysbench
● https://github.com/akopytov/sysbench
● Available on most distributions
● Has a mysql benchmark
● Good to compare different machines (in this case, VM vs Bare-metal)
■ mysql / MariaDB
● In Debian the “mysql” commands are run by tasks called mariadbd
● In Fedora the “mysql” commands are run by tasks called “mysqld”
■ But it still is MariaDB and not mysql!
The setup
Using two bare-metal machines. One with Fedora 33 the other is Debian “testing”
■ Fedora 33
● Kernel: 5.14.18-100.fc33.x86_64
● File system: ext4
● 8 CPUs (4 cores / 2 hyperthreaded)
● 32Gs RAM
● VM - same setup but only 4 CPUs / 2G RAM
■ Debian testing (from a month ago)
● Kernel: 5.18.0-3-amd64
● File system: ext4
● 6 CPUs (6 cores)
● 16Gs RAM
● VM - same setup but only 4 CPUs / 2G RAM
The setup
Fedora Debian
Host
VM
mysql
mysql
Host
VM
mysql
mysql
The setup
To make it even, limit everything to a single CPU
■ taskset
● Sets the affinity of the tasks
● Set everything to CPU 1
■ CPU 0 usually has house keeping tasks
● Set the database server (mysqld / mariadbd) to CPU 1
● Set the sysbench application to CPU 1
Set the database server to CPU 1
# ps aux | grep mysqld
mysql 20539 0.1 3.4 2289088 131892 ? Ssl 19:36 0:02 /usr/libexec/mysqld --basedir=/usr
root 20925 0.0 0.0 221440 860 pts/3 S+ 20:18 0:00 grep --color=auto mysqld
# taskset -a -pc 1 20539
Fedora Guest run
# taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run
[ 1s ] thds: 4 tps: 41.84 qps: 912.52 (r/w/o: 641.55/160.39/110.58) lat (ms,95%): 235.74 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 55.04 qps: 1070.71 (r/w/o: 748.50/185.12/137.09) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 56.00 qps: 1149.94 (r/w/o: 805.95/202.99/140.99) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 66.00 qps: 1282.04 (r/w/o: 898.03/220.01/164.01) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 66.99 qps: 1377.90 (r/w/o: 963.93/243.98/169.99) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 49.00 qps: 979.94 (r/w/o: 685.96/171.99/121.99) lat (ms,95%): 161.51 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 59.00 qps: 1180.09 (r/w/o: 826.06/206.02/148.01) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 58.00 qps: 1143.09 (r/w/o: 807.06/193.01/143.01) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 55.99 qps: 1136.89 (r/w/o: 788.92/206.98/140.99) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 66.00 qps: 1320.01 (r/w/o: 924.01/230.00/166.00) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00
Fedora Host run
# taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run
[ 1s ] thds: 4 tps: 51.81 qps: 1112.01 (r/w/o: 781.19/195.30/135.51) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 25.01 qps: 500.22 (r/w/o: 350.15/88.04/62.03) lat (ms,95%): 383.33 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 78.00 qps: 1560.01 (r/w/o: 1092.00/272.00/196.00) lat (ms,95%): 66.84 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 68.00 qps: 1359.99 (r/w/o: 952.00/238.00/170.00) lat (ms,95%): 116.80 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 71.00 qps: 1420.00 (r/w/o: 994.00/249.00/177.00) lat (ms,95%): 66.84 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 59.00 qps: 1180.04 (r/w/o: 826.03/206.01/148.00) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 56.00 qps: 1119.97 (r/w/o: 783.98/195.99/140.00) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 52.00 qps: 1040.01 (r/w/o: 728.01/182.00/130.00) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 70.00 qps: 1399.98 (r/w/o: 979.99/245.00/175.00) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 80.00 qps: 1599.99 (r/w/o: 1119.99/280.00/200.00) lat (ms,95%): 51.02 err/s: 0.00 reconn/s: 0.00
Fedora Host vs Guest
SQL statistics:
queries performed:
read: 8610
write: 2152
other: 1538
total: 12300
transactions: 615 (61.23 per sec.)
queries: 12300 (1224.63 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 10.0411s
total number of events: 615
Host Guest
8092
2021
1447
11560
578 (57.29 per sec.)
11560 (1145.78 per sec.)
0 (0.00 per sec.)
0 (0.00 per sec.)
10.0863s
578
Fedora Host vs Guest
SQL statistics:
queries performed:
read: 8610
write: 2152
other: 1538
total: 12300
transactions: 615 (61.23 per sec.)
queries: 12300 (1224.63 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 10.0411s
total number of events: 615
Host Guest
8092
2021
1447
11560
578 (57.29 per sec.)
11560 (1145.78 per sec.)
0 (0.00 per sec.)
0 (0.00 per sec.)
10.0863s
578
-6.0%
-6.5%
-5.9%
-6.0%
-6.0%
-6.0%
+0.5%
-6.0%
Fedora Host vs Guest
Latency (ms):
min: 32.99
avg: 65.21
max: 380.68
95th percentile: 84.47
sum: 40104.21
Threads fairness:
events (avg/stddev): 153.7500/0.43
execution time (avg/stddev): 10.0261/0.01
Host Guest
33.33
69.64
283.24
82.96
40251.24
144.5000/2.06
10.0628/0.02
Fedora Host vs Guest
Latency (ms):
min: 32.99
avg: 65.21
max: 380.68
95th percentile: 84.47
sum: 40104.21
Threads fairness:
events (avg/stddev): 153.7500/0.43
execution time (avg/stddev): 10.0261/0.01
Host Guest
33.33
69.64
283.24
82.96
40251.24
144.5000/2.06
10.0628/0.02
+6.8%
-6.0%
Debian Guest run
# taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run
[ 1s ] thds: 4 tps: 523.07 qps: 10498.35 (r/w/o: 7351.94/2096.28/1050.13) lat (ms,95%): 8.58 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 725.42 qps: 14522.41 (r/w/o: 10173.89/2897.68/1450.84) lat (ms,95%): 6.91 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 567.00 qps: 11341.94 (r/w/o: 7939.96/2267.99/1133.99) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 732.95 qps: 14663.00 (r/w/o: 10257.30/2939.80/1465.90) lat (ms,95%): 6.79 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 702.00 qps: 14032.01 (r/w/o: 9828.00/2800.00/1404.00) lat (ms,95%): 7.17 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 725.04 qps: 14505.72 (r/w/o: 10155.51/2900.14/1450.07) lat (ms,95%): 6.91 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 739.97 qps: 14821.47 (r/w/o: 10365.63/2975.89/1479.95) lat (ms,95%): 6.55 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 587.04 qps: 11740.87 (r/w/o: 8218.61/2348.17/1174.09) lat (ms,95%): 7.17 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 726.96 qps: 14520.29 (r/w/o: 10174.50/2891.86/1453.93) lat (ms,95%): 6.79 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 711.03 qps: 14239.67 (r/w/o: 9957.47/2860.14/1422.07) lat (ms,95%): 7.04 err/s: 0.00 reconn/s: 0.00
Debian Host run
# taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run
[ 1s ] thds: 4 tps: 651.26 qps: 13100.71 (r/w/o: 9173.28/2620.94/1306.49) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00
[ 2s ] thds: 4 tps: 899.12 qps: 17974.32 (r/w/o: 12583.63/3592.45/1798.24) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00
[ 3s ] thds: 4 tps: 961.00 qps: 19211.04 (r/w/o: 13457.03/3832.01/1922.00) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00
[ 4s ] thds: 4 tps: 944.95 qps: 18897.06 (r/w/o: 13227.34/3779.81/1889.91) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00
[ 5s ] thds: 4 tps: 806.04 qps: 16126.79 (r/w/o: 11284.55/3230.16/1612.08) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00
[ 6s ] thds: 4 tps: 965.00 qps: 19288.99 (r/w/o: 13504.99/3854.00/1930.00) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00
[ 7s ] thds: 4 tps: 965.00 qps: 19277.07 (r/w/o: 13487.05/3860.01/1930.01) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00
[ 8s ] thds: 4 tps: 954.99 qps: 19122.84 (r/w/o: 13392.89/3819.97/1909.98) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00
[ 9s ] thds: 4 tps: 972.01 qps: 19436.10 (r/w/o: 13604.07/3888.02/1944.01) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00
[ 10s ] thds: 4 tps: 964.98 qps: 19320.60 (r/w/o: 13521.72/3868.92/1929.96) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00
Debian Host vs Guest
SQL statistics:
queries performed:
read: 127232
write: 36352
other: 18176
total: 181760
transactions: 9088 (908.37 per sec.)
queries: 181760 (18167.33 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 10.0037s
total number of events: 9088
Host Guest
94430
26980
13490
134900
6745 (673.91 per sec.)
134900 (13478.17 per sec.)
0 (0.00 per sec.)
0 (0.00 per sec.)
10.0075s
6745
Debian Host vs Guest
SQL statistics:
queries performed:
read: 127232
write: 36352
other: 18176
total: 181760
transactions: 9088 (908.37 per sec.)
queries: 181760 (18167.33 per sec.)
ignored errors: 0 (0.00 per sec.)
reconnects: 0 (0.00 per sec.)
General statistics:
total time: 10.0037s
total number of events: 9088
Host Guest
94430
26980
13490
134900
6745 (673.91 per sec.)
134900 (13478.17 per sec.)
0 (0.00 per sec.)
0 (0.00 per sec.)
10.0075s
6745
-25.8%!
-25.8%!
-25.8%!
-25.8%!
-25.8%!
-25.8%!
+0.03%
-25.8%!
Debian Host vs Guest
Latency (ms):
min: 1.44
avg: 4.40
max: 162.14
95th percentile: 5.00
sum: 39998.32
Threads fairness:
events (avg/stddev): 2272.0000/1.87
execution time (avg/stddev): 9.9996/0.00
Host Guest
3.31
5.93
167.81
7.04
40007.12
1686.2500/1.09
10.0018/0.00
+34.8%!
-25.8%
Using trace-cmd
■ There are over a 100 KVM events (Host events to handle guests)
Using trace-cmd
■ There are over a 100 KVM events (Host events to handle guests)
■ Traces when guests enter and leave the virtual environment
Using trace-cmd
■ There are over a 100 KVM events (Host events to handle guests)
■ Traces when guests enter and leave the virtual environment
■ Shows why guests exit and what the host is doing for the guest
On Fedora trace Guest from Host
# trace-cmd record -e sched -e kvm ssh root@Fedora33 ‘taskset -a -c 1 sysbench
mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16
--table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1
/usr/share/sysbench/oltp_read_write.lua run’
On Fedora show Guest report
# trace-cmd report
version = 7
cpus=8
ssh-3108 [001] 8832.004785: sched_process_exec: filename=/usr/bin/ssh pid=3108 old_pid=3108
ssh-3108 [001] 8832.004850: sched_stat_runtime: comm=ssh pid=3108 runtime=514120 [ns] vruntime=165181151 [ns]
ssh-3108 [001] 8832.006849: sched_stat_runtime: comm=ssh pid=3108 runtime=992297 [ns] vruntime=167169721 [ns]
ssh-3108 [001] 8832.007849: sched_stat_runtime: comm=ssh pid=3108 runtime=997254 [ns] vruntime=168166975 [ns]
ssh-3108 [001] 8832.007851: sched_wake_idle_without_ipi: cpu=0
ssh-3108 [001] 8832.008092: sched_waking: comm=vhost-2426 pid=2440 prio=120 target_cpu=001
ssh-3108 [001] 8832.008094: sched_migrate_task: comm=vhost-2426 pid=2440 prio=120 orig_cpu=1 dest_cpu=2
ssh-3108 [001] 8832.008096: sched_wake_idle_without_ipi: cpu=2
ssh-3108 [001] 8832.008096: sched_wakeup: vhost-2426:2440 [120] CPU:002
ssh-3108 [001] 8832.008098: sched_stat_runtime: comm=ssh pid=3108 runtime=243443 [ns] vruntime=168410418 [ns]
ssh-3108 [001] 8832.008101: sched_switch: ssh:3108 [120] S ==> swapper/1:0 [120]
<idle>-0 [002] 8832.008128: sched_switch: swapper/2:0 [120] R ==> vhost-2426:2440 [120]
vhost-2426-2440 [002] 8832.008142: kvm_msi_set_irq: dst 1 vec 44 (Fixed|physical|edge)
vhost-2426-2440 [002] 8832.008144: kvm_apic_accept_irq: apicid 1 vec 44 (Fixed|edge)
vhost-2426-2440 [002] 8832.008144: sched_waking: comm=CPU 1/KVM pid=2443 prio=120 target_cpu=007
vhost-2426-2440 [002] 8832.008151: sched_wake_idle_without_ipi: cpu=7
vhost-2426-2440 [002] 8832.008152: sched_wakeup: CPU 1/KVM:2443 [120] CPU:007
Using libtracecmd
■ Is a library that comes with trace-cmd
Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
■ But we will use a tool I created with this library
● https://rostedt.org/code/kvm-exit.c
Using libtracecmd
■ Is a library that comes with trace-cmd
■ Allows you to write tools that can analyze trace.dat files
● These are the files that trace-cmd creates.
■ This talk is not about how to use this library
● Only that it exists
■ But we will use a tool I created with this library
● https://rostedt.org/code/kvm-exit.c
■ This examines the kvm_exit and kvm_entry events
On Fedora Host run
# kvm-exit trace.dat
vCPU 0: host_pid: 2442
Number of exits: 32505
Total time (us): 8290386
Avg time (us): 255
Max time (us): 66519
Min time (nano): 272
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 1
Total time (us): 5
Avg time (us): 5
Max time (us): 5
Min time (nano): 5452
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 5840
Total time (us): 43813
Avg time (us): 7
Max time (us): 97
Min time (nano): 766
Migrated: 3
Preempted:
Number of exits: 12
On Fedora Host run
reason: INTERRUPT_WINDOW isa:1 exit:7
Number of exits: 1427
Total time (us): 2263
Avg time (us): 1
Max time (us): 16
Min time (nano): 382
reason: CPUID isa:1 exit:10
Number of exits: 65
Total time (us): 90
Avg time (us): 1
Max time (us): 7
Min time (nano): 596
reason: HLT isa:1 exit:12
Number of exits: 3132
Total time (us): 8149964
Avg time (us): 2602
Max time (us): 66519
Min time (nano): 468
Migrated: 1
Preempted:
Number of exits: 52
Total time (us): 544
Avg time (us): 10
Using trace-cmd agent
■ Run trace-cmd agent on the guests
● Listens on the vsocket for connections from the host
● Can start tracing for the guest
● Synchronizes timestamps with the host
Using trace-cmd agent
■ Run trace-cmd agent on the guests
● Listens on the vsocket for connections from the host
● Can start tracing for the guest
● Synchronizes timestamps with the host
■ Use the -A option on the host running trace-cmd record
● Will connect to the guest agent
● Can start tracing on both the host and the guest
● Negotiates timestamp synchronization to keep host and guest events in sync
On the Guest
# trace-cmd agent
listening on @4:823
On the Guest
# trace-cmd agent
listening on @4:823
vsocket CID
On the Guest
# trace-cmd agent
listening on @4:823
vsocket port
vsocket CID
On Host run
# trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched 
ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
On Host run
# trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched 
ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user
--mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4
--time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
vsocket port
vsocket CID
Analyze Fedora sysbench
# kvm-exit -c sysbench trace.dat trace-guest.dat
vCPU 1: host_pid: 16185
Number of exits: 2548
Total time (us): 16112
Avg time (us): 6
Max time (us): 71
Min time (nano): 655
task sysbench: Total run time: 575048(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 692
Total time (us): 8430
Avg time (us): 12
Max time (us): 71
Min time (nano): 2644
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 118
Total time (us): 280
Avg time (us): 2
Max time (us): 6
Min time (nano): 655
Analyze Fedora sysbench
reason: MSR_WRITE isa:1 exit:32
Number of exits: 656
Total time (us): 2111
Avg time (us): 3
Max time (us): 18
Min time (nano): 832
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 500
Total time (us): 2012
Avg time (us): 4
Max time (us): 31
Min time (nano): 948
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 582
Total time (us): 3277
Avg time (us): 5
Max time (us): 16
Min time (nano): 1693
Analyze Fedora mysqld
# kvm-exit -c mysqld trace.dat trace-guest.dat
vCPU 1: host_pid: 16185
Number of exits: 10479
Total time (us): 66016
Avg time (us): 6
Max time (us): 72
Min time (nano): 457
task mysqld: Total run time: 2363606(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 2583
Total time (us): 30788
Avg time (us): 11
Max time (us): 72
Min time (nano): 814
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 663
Total time (us): 1174
Avg time (us): 1
Max time (us): 9
Min time (nano): 457
Analyze Fedora mysqld
reason: MSR_WRITE isa:1 exit:32
Number of exits: 2956
Total time (us): 10012
Avg time (us): 3
Max time (us): 54
Min time (nano): 534
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 1571
Total time (us): 7942
Avg time (us): 5
Max time (us): 25
Min time (nano): 733
reason: EPT_MISCONFIG isa:1 exit:49
Number of exits: 291
Total time (us): 3448
Avg time (us): 11
Max time (us): 29
Min time (nano): 4022
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 2415
Total time (us): 12651
Avg time (us): 5
Max time (us): 16
Min time (nano): 731
Analyze Debian sysbench
# kvm-exit -c sysbench trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 5287
Total time (us): 26484
Avg time (us): 5
Max time (us): 978
Min time (nano): 283
task sysbench: Total run time: 1361148(us)
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 1236
Total time (us): 10163
Avg time (us): 8
Max time (us): 978
Min time (nano): 590
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 228
Total time (us): 195
Avg time (us): 0
Max time (us): 3
Min time (nano): 408
Analyze Debian sysbench
reason: MSR_WRITE isa:1 exit:32
Number of exits: 568
Total time (us): 1744
Avg time (us): 3
Max time (us): 34
Min time (nano): 354
reason: PAUSE_INSTRUCTION isa:1 exit:40
Number of exits: 1
Total time (us): 2
Avg time (us): 2
Max time (us): 2
Min time (nano): 2829
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 2923
Total time (us): 13477
Avg time (us): 4
Max time (us): 411
Min time (nano): 513
reason: PREEMPTION_TIMER isa:1 exit:52
Number of exits: 313
Total time (us): 886
Avg time (us): 2
Max time (us): 21
Min time (nano): 831
Analyze Debian mariadbd
# kvm-exit -c mariadbd trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 17846
Total time (us): 132184
Avg time (us): 7
Max time (us): 31001
Min time (nano): 309
task mariadbd: Total run time: 7045710(us)
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 2
Total time (us): 12
Avg time (us): 6
Max time (us): 6
Min time (nano): 6034
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 6320
Total time (us): 43236
Avg time (us): 6
Max time (us): 598
Min time (nano): 572
Analyze Debian mariadbd
reason: PENDING_INTERRUPT isa:1 exit:7
Number of exits: 535
Total time (us): 464
Avg time (us): 0
Max time (us): 6
Min time (nano): 417
reason: MSR_WRITE isa:1 exit:32
Number of exits: 3300
Total time (us): 16468
Avg time (us): 4
Max time (us): 6177
Min time (nano): 309
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 2667
Total time (us): 48480
Avg time (us): 18
Max time (us): 31001
Min time (nano): 594
reason: EPT_MISCONFIG isa:1 exit:49
Number of exits: 3237
Total time (us): 18510
Avg time (us): 5
Max time (us): 264
Min time (nano): 1308
Analyze Debian mariadbd Guest functions
# kvm-exit -c mariadbd -f trace.dat trace-guest.dat
vCPU 1: host_pid: 613977
Number of exits: 17846
Total time (us): 132184
Avg time (us): 7
Max time (us): 31001
Min time (nano): 309
task mariadbd: Total run time: 7045710(us)
reason: EXCEPTION_NMI isa:1 exit:0
Number of exits: 2
Total time (us): 12
Avg time (us): 6
Max time (us): 6
Min time (nano): 6034
reason: EXTERNAL_INTERRUPT isa:1 exit:1
Number of exits: 6320
Total time (us): 43236
Avg time (us): 6
Max time (us): 598
Min time (nano): 572
Analyze Debian mariadbd Guest functions
reason: MSR_WRITE native_write_msr+0x4 isa:1 exit:32
Number of exits: 3300
Total time (us): 16468
Avg time (us): 4
Max time (us): 6177
Min time (nano): 309
reason: EPT_VIOLATION isa:1 exit:48
Number of exits: 475
Total time (us): 1066
Avg time (us): 2
Max time (us): 26
Min time (nano): 594
reason: EPT_VIOLATION memcg_slab_post_alloc_hook+0x127 isa:1 exit:48
Number of exits: 2192
Total time (us): 47413
Avg time (us): 21
Max time (us): 31001
Min time (nano): 1191
reason: EPT_MISCONFIG iowrite16+0x9 isa:1 exit:49
Number of exits: 3237
Total time (us): 18510
Avg time (us): 5
Max time (us): 264
Min time (nano): 1308
Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
● Is blocked on I/O
Using libtracecmd for task analysis
■ Another tool I created for task analysis
● https://rostedt.org/code/task-time.c
■ This examines the times the task:
● Runs on each CPU
● Is preempted by other tasks (and shows when threads preempt each other)
● Is blocked on I/O
● Is sleeping (but could be blocked on a pthread_mutex)
Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
# task-time -c mariadbd trace-guest.dat
Task: mariadbd
Total run time (us): 7045710
Thread preempt time (us): 20535866
Total preempt time (us): 1327352 (-19208514)
Total blocked time (us): 3352464
Total sleep time (us): 37335578
thread id: 1206
Total run time (us): 233
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 4113
thread id: 1207
Total run time (us): 375
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 10001415
[..]
Analyze Debian mariadbd task
# task-time -c mariadbd trace.dat
Task: mariadbd
Total run time (us): 7374469
Thread preempt time (us): 23500051
Total preempt time (us): 1375971 (-22124080)
Total blocked time (us): 2432569
Total sleep time (us): 49682796
thread id: 493170
Total run time (us): 215
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9450
thread id: 493171
Total run time (us): 428
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 9605371
[..]
# task-time -c mariadbd trace-guest.dat
Task: mariadbd
Total run time (us): 7045710
Thread preempt time (us): 20535866
Total preempt time (us): 1327352 (-19208514)
Total blocked time (us): 3352464
Total sleep time (us): 37335578
thread id: 1206
Total run time (us): 233
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 4113
thread id: 1207
Total run time (us): 375
Total preempt time (us): 0
Total blocked time (us): 0
Total sleep time (us): 10001415
[..]
Blocked for 33.0% Blocked for 47.6%
■ KernelShark can show host guest interactions
● Follow this tutorial
■ https://rostedt.org/host-guest-tutorial/
● kernelshark trace.dat -a trace-guest.dat
Using KernelShark
Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing
Brought to you by
Steven Rostedt
rostedt@goodmis.org
@rostedt

More Related Content

PDF
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
PPTX
DPDK KNI interface
PDF
Intel dpdk Tutorial
PDF
Fun with Network Interfaces
PDF
Achieving the ultimate performance with KVM
PDF
MySQL 5.5 Guide to InnoDB Status
PDF
Blazing Performance with Flame Graphs
PPTX
Cinder
Optimizing Servers for High-Throughput and Low-Latency at Dropbox
DPDK KNI interface
Intel dpdk Tutorial
Fun with Network Interfaces
Achieving the ultimate performance with KVM
MySQL 5.5 Guide to InnoDB Status
Blazing Performance with Flame Graphs
Cinder

What's hot (20)

PDF
Kernel Recipes 2019 - Faster IO through io_uring
PDF
Linux Profiling at Netflix
PDF
How to Design Indexes, Really
PPTX
Tuning linux for mongo db
PDF
QEMU Disk IO Which performs Better: Native or threads?
PPTX
Tuning Apache Kafka Connectors for Flink.pptx
PPTX
Linux networking
PDF
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
PDF
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
PDF
MySQL Advanced Administrator 2021 - 네오클로바
PDF
The Linux Kernel Implementation of Pipes and FIFOs
PDF
Optimizing MariaDB for maximum performance
PDF
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
PPTX
InfluxDb
PDF
How Linux Processes Your Network Packet - Elazar Leibovich
PDF
Ceph issue 해결 사례
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PPTX
Migrating from InnoDB and HBase to MyRocks at Facebook
PPTX
Introduction to DPDK
ODP
Ceph Day Melbourne - Troubleshooting Ceph
Kernel Recipes 2019 - Faster IO through io_uring
Linux Profiling at Netflix
How to Design Indexes, Really
Tuning linux for mongo db
QEMU Disk IO Which performs Better: Native or threads?
Tuning Apache Kafka Connectors for Flink.pptx
Linux networking
Anatomy of a Container: Namespaces, cgroups & Some Filesystem Magic - LinuxCon
Optimizing Kubernetes Resource Requests/Limits for Cost-Efficiency and Latenc...
MySQL Advanced Administrator 2021 - 네오클로바
The Linux Kernel Implementation of Pipes and FIFOs
Optimizing MariaDB for maximum performance
Percona XtraDB Cluster vs Galera Cluster vs MySQL Group Replication
InfluxDb
How Linux Processes Your Network Packet - Elazar Leibovich
Ceph issue 해결 사례
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Migrating from InnoDB and HBase to MyRocks at Facebook
Introduction to DPDK
Ceph Day Melbourne - Troubleshooting Ceph
Ad

Similar to Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing (20)

PDF
Docker and friends at Linux Days 2014 in Prague
PDF
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
PDF
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
PDF
Reverse engineering Swisscom's Centro Grande Modem
PDF
PerfUG 3 - perfs système
PDF
Loadays managing my sql with percona toolkit
PDF
Practice and challenges from building IaaS
PDF
Haproxy - zastosowania
PDF
Linux Systems Performance 2016
PDF
Containers with systemd-nspawn
PPTX
Debugging linux issues with eBPF
PDF
Summit demystifying systemd1
PDF
Docker Introduction + what is new in 0.9
PDF
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
PDF
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
PDF
The New Systems Performance
PPT
PDF
The Art of Grey-Box Attack
PDF
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
PDF
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Docker and friends at Linux Days 2014 in Prague
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
Lightweight Virtualization with Linux Containers and Docker I YaC 2013
Reverse engineering Swisscom's Centro Grande Modem
PerfUG 3 - perfs système
Loadays managing my sql with percona toolkit
Practice and challenges from building IaaS
Haproxy - zastosowania
Linux Systems Performance 2016
Containers with systemd-nspawn
Debugging linux issues with eBPF
Summit demystifying systemd1
Docker Introduction + what is new in 0.9
Docker Introduction, and what's new in 0.9 — Docker Palo Alto at RelateIQ
"Lightweight Virtualization with Linux Containers and Docker". Jerome Petazzo...
The New Systems Performance
The Art of Grey-Box Attack
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
STKI Israel Market Study 2025 version august
PPTX
1. Introduction to Computer Programming.pptx
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Developing a website for English-speaking practice to English as a foreign la...
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
Modernising the Digital Integration Hub
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
project resource management chapter-09.pdf
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Enhancing emotion recognition model for a student engagement use case through...
PPTX
Tartificialntelligence_presentation.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Module 1.ppt Iot fundamentals and Architecture
Univ-Connecticut-ChatGPT-Presentaion.pdf
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
A comparative study of natural language inference in Swahili using monolingua...
STKI Israel Market Study 2025 version august
1. Introduction to Computer Programming.pptx
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Developing a website for English-speaking practice to English as a foreign la...
O2C Customer Invoices to Receipt V15A.pptx
Modernising the Digital Integration Hub
A contest of sentiment analysis: k-nearest neighbor versus neural network
project resource management chapter-09.pdf
Group 1 Presentation -Planning and Decision Making .pptx
Zenith AI: Advanced Artificial Intelligence
Enhancing emotion recognition model for a student engagement use case through...
Tartificialntelligence_presentation.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...

Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing

  • 1. Brought to you by Analyze Virtual Machine Overhead Compared to Bare Metal with Tracing Steven Rostedt Software Engineer at Google Inc
  • 2. Steven Rostedt Software Engineer at Google ■ One of the original authors of the PREEMPT_RT patch set ■ Creator and maintainer of ftrace (the official Linux tracer) ■ Creator of “make localmodconfig” ■ Creator and maintainer of “ktest.pl” Linux testing framework
  • 3. Using Virtual Machines Pros: ■ Gives more flexibility ● Can migrate from one machine to another ● Duplicate VMs ● Can save them (archives)
  • 4. Using Virtual Machines Pros: ■ Gives more flexibility ● Can migrate from one machine to another ● Duplicate VMs ● Can save them (archives) ■ Can easily shutdown and restart ● No lengthy BIOS
  • 5. Using Virtual Machines Pros: ■ Gives more flexibility ● Can migrate from one machine to another ● Duplicate VMs ● Can save them (archives) ■ Can easily shutdown and restart ● No lengthy BIOS ■ More “secure” ● Better isolation of tasks
  • 6. Using Virtual Machines Cons: ■ Takes up more memory ● Requires two operating systems on the machine
  • 7. Using Virtual Machines Cons: ■ Takes up more memory ● Requires two operating systems on the machine ■ Adds overhead ● There’s indirection between the VM and the devices
  • 8. Virtual Machines Overhead How bad is it really?
  • 9. Virtual Machines Overhead How bad is it really? ■ The pros usually outweigh the cons ● We are always trying to improve
  • 10. Virtual Machines Overhead How bad is it really? ■ The pros usually outweigh the cons ● We are always trying to improve ■ Where to look for that improvement ● Usually anytime the hypervisor needs to do work for the VM ● Utilize virtio more ■ Virtual devices that take advantage of the virtual environment ■ Easier to implement, and less overhead than simulating real devices
  • 11. Tools to determine Virtual Machine Overhead As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions.
  • 12. Tools to determine Virtual Machine Overhead As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions. ■ trace-cmd ● A front end CLI tool to interact with the Linux kernel tracing facility
  • 13. Tools to determine Virtual Machine Overhead As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions. ■ trace-cmd ● A front end CLI tool to interact with the Linux kernel tracing facility ● Can start tracing and examine the live events
  • 14. Tools to determine Virtual Machine Overhead As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions. ■ trace-cmd ● A front end CLI tool to interact with the Linux kernel tracing facility ● Can start tracing and examine the live events ● Can record to a file for post processing
  • 15. Tools to determine Virtual Machine Overhead As I’m the ftrace maintainer, I’ll focus on ftrace tracing solutions. ■ trace-cmd ● A front end CLI tool to interact with the Linux kernel tracing facility ● Can start tracing and examine the live events ● Can record to a file for post processing ● This talk will focus on the post processing
  • 16. What to look at for Virtual Machine Overhead Databases are a common critical path of virtual machine services ■ sysbench ● https://github.com/akopytov/sysbench ● Available on most distributions ● Has a mysql benchmark ● Good to compare different machines (in this case, VM vs Bare-metal)
  • 17. What to look at for Virtual Machine Overhead Databases are a common critical path of virtual machine services ■ sysbench ● https://github.com/akopytov/sysbench ● Available on most distributions ● Has a mysql benchmark ● Good to compare different machines (in this case, VM vs Bare-metal) ■ mysql / MariaDB ● In Debian the “mysql” commands are run by tasks called mariadbd ● In Fedora the “mysql” commands are run by tasks called “mysqld” ■ But it still is MariaDB and not mysql!
  • 18. The setup Using two bare-metal machines. One with Fedora 33 the other is Debian “testing” ■ Fedora 33 ● Kernel: 5.14.18-100.fc33.x86_64 ● File system: ext4 ● 8 CPUs (4 cores / 2 hyperthreaded) ● 32Gs RAM ● VM - same setup but only 4 CPUs / 2G RAM ■ Debian testing (from a month ago) ● Kernel: 5.18.0-3-amd64 ● File system: ext4 ● 6 CPUs (6 cores) ● 16Gs RAM ● VM - same setup but only 4 CPUs / 2G RAM
  • 20. The setup To make it even, limit everything to a single CPU ■ taskset ● Sets the affinity of the tasks ● Set everything to CPU 1 ■ CPU 0 usually has house keeping tasks ● Set the database server (mysqld / mariadbd) to CPU 1 ● Set the sysbench application to CPU 1
  • 21. Set the database server to CPU 1 # ps aux | grep mysqld mysql 20539 0.1 3.4 2289088 131892 ? Ssl 19:36 0:02 /usr/libexec/mysqld --basedir=/usr root 20925 0.0 0.0 221440 860 pts/3 S+ 20:18 0:00 grep --color=auto mysqld # taskset -a -pc 1 20539
  • 22. Fedora Guest run # taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run [ 1s ] thds: 4 tps: 41.84 qps: 912.52 (r/w/o: 641.55/160.39/110.58) lat (ms,95%): 235.74 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 4 tps: 55.04 qps: 1070.71 (r/w/o: 748.50/185.12/137.09) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 4 tps: 56.00 qps: 1149.94 (r/w/o: 805.95/202.99/140.99) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 4 tps: 66.00 qps: 1282.04 (r/w/o: 898.03/220.01/164.01) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00 [ 5s ] thds: 4 tps: 66.99 qps: 1377.90 (r/w/o: 963.93/243.98/169.99) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 4 tps: 49.00 qps: 979.94 (r/w/o: 685.96/171.99/121.99) lat (ms,95%): 161.51 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 4 tps: 59.00 qps: 1180.09 (r/w/o: 826.06/206.02/148.01) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 4 tps: 58.00 qps: 1143.09 (r/w/o: 807.06/193.01/143.01) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00 [ 9s ] thds: 4 tps: 55.99 qps: 1136.89 (r/w/o: 788.92/206.98/140.99) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 4 tps: 66.00 qps: 1320.01 (r/w/o: 924.01/230.00/166.00) lat (ms,95%): 74.46 err/s: 0.00 reconn/s: 0.00
  • 23. Fedora Host run # taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run [ 1s ] thds: 4 tps: 51.81 qps: 1112.01 (r/w/o: 781.19/195.30/135.51) lat (ms,95%): 84.47 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 4 tps: 25.01 qps: 500.22 (r/w/o: 350.15/88.04/62.03) lat (ms,95%): 383.33 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 4 tps: 78.00 qps: 1560.01 (r/w/o: 1092.00/272.00/196.00) lat (ms,95%): 66.84 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 4 tps: 68.00 qps: 1359.99 (r/w/o: 952.00/238.00/170.00) lat (ms,95%): 116.80 err/s: 0.00 reconn/s: 0.00 [ 5s ] thds: 4 tps: 71.00 qps: 1420.00 (r/w/o: 994.00/249.00/177.00) lat (ms,95%): 66.84 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 4 tps: 59.00 qps: 1180.04 (r/w/o: 826.03/206.01/148.00) lat (ms,95%): 75.82 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 4 tps: 56.00 qps: 1119.97 (r/w/o: 783.98/195.99/140.00) lat (ms,95%): 82.96 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 4 tps: 52.00 qps: 1040.01 (r/w/o: 728.01/182.00/130.00) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00 [ 9s ] thds: 4 tps: 70.00 qps: 1399.98 (r/w/o: 979.99/245.00/175.00) lat (ms,95%): 92.42 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 4 tps: 80.00 qps: 1599.99 (r/w/o: 1119.99/280.00/200.00) lat (ms,95%): 51.02 err/s: 0.00 reconn/s: 0.00
  • 24. Fedora Host vs Guest SQL statistics: queries performed: read: 8610 write: 2152 other: 1538 total: 12300 transactions: 615 (61.23 per sec.) queries: 12300 (1224.63 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 10.0411s total number of events: 615 Host Guest 8092 2021 1447 11560 578 (57.29 per sec.) 11560 (1145.78 per sec.) 0 (0.00 per sec.) 0 (0.00 per sec.) 10.0863s 578
  • 25. Fedora Host vs Guest SQL statistics: queries performed: read: 8610 write: 2152 other: 1538 total: 12300 transactions: 615 (61.23 per sec.) queries: 12300 (1224.63 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 10.0411s total number of events: 615 Host Guest 8092 2021 1447 11560 578 (57.29 per sec.) 11560 (1145.78 per sec.) 0 (0.00 per sec.) 0 (0.00 per sec.) 10.0863s 578 -6.0% -6.5% -5.9% -6.0% -6.0% -6.0% +0.5% -6.0%
  • 26. Fedora Host vs Guest Latency (ms): min: 32.99 avg: 65.21 max: 380.68 95th percentile: 84.47 sum: 40104.21 Threads fairness: events (avg/stddev): 153.7500/0.43 execution time (avg/stddev): 10.0261/0.01 Host Guest 33.33 69.64 283.24 82.96 40251.24 144.5000/2.06 10.0628/0.02
  • 27. Fedora Host vs Guest Latency (ms): min: 32.99 avg: 65.21 max: 380.68 95th percentile: 84.47 sum: 40104.21 Threads fairness: events (avg/stddev): 153.7500/0.43 execution time (avg/stddev): 10.0261/0.01 Host Guest 33.33 69.64 283.24 82.96 40251.24 144.5000/2.06 10.0628/0.02 +6.8% -6.0%
  • 28. Debian Guest run # taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run [ 1s ] thds: 4 tps: 523.07 qps: 10498.35 (r/w/o: 7351.94/2096.28/1050.13) lat (ms,95%): 8.58 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 4 tps: 725.42 qps: 14522.41 (r/w/o: 10173.89/2897.68/1450.84) lat (ms,95%): 6.91 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 4 tps: 567.00 qps: 11341.94 (r/w/o: 7939.96/2267.99/1133.99) lat (ms,95%): 9.73 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 4 tps: 732.95 qps: 14663.00 (r/w/o: 10257.30/2939.80/1465.90) lat (ms,95%): 6.79 err/s: 0.00 reconn/s: 0.00 [ 5s ] thds: 4 tps: 702.00 qps: 14032.01 (r/w/o: 9828.00/2800.00/1404.00) lat (ms,95%): 7.17 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 4 tps: 725.04 qps: 14505.72 (r/w/o: 10155.51/2900.14/1450.07) lat (ms,95%): 6.91 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 4 tps: 739.97 qps: 14821.47 (r/w/o: 10365.63/2975.89/1479.95) lat (ms,95%): 6.55 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 4 tps: 587.04 qps: 11740.87 (r/w/o: 8218.61/2348.17/1174.09) lat (ms,95%): 7.17 err/s: 0.00 reconn/s: 0.00 [ 9s ] thds: 4 tps: 726.96 qps: 14520.29 (r/w/o: 10174.50/2891.86/1453.93) lat (ms,95%): 6.79 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 4 tps: 711.03 qps: 14239.67 (r/w/o: 9957.47/2860.14/1422.07) lat (ms,95%): 7.04 err/s: 0.00 reconn/s: 0.00
  • 29. Debian Host run # taskset -a -c 1 sysbench --db-driver=mysql --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run [ 1s ] thds: 4 tps: 651.26 qps: 13100.71 (r/w/o: 9173.28/2620.94/1306.49) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00 [ 2s ] thds: 4 tps: 899.12 qps: 17974.32 (r/w/o: 12583.63/3592.45/1798.24) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00 [ 3s ] thds: 4 tps: 961.00 qps: 19211.04 (r/w/o: 13457.03/3832.01/1922.00) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00 [ 4s ] thds: 4 tps: 944.95 qps: 18897.06 (r/w/o: 13227.34/3779.81/1889.91) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00 [ 5s ] thds: 4 tps: 806.04 qps: 16126.79 (r/w/o: 11284.55/3230.16/1612.08) lat (ms,95%): 5.18 err/s: 0.00 reconn/s: 0.00 [ 6s ] thds: 4 tps: 965.00 qps: 19288.99 (r/w/o: 13504.99/3854.00/1930.00) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00 [ 7s ] thds: 4 tps: 965.00 qps: 19277.07 (r/w/o: 13487.05/3860.01/1930.01) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00 [ 8s ] thds: 4 tps: 954.99 qps: 19122.84 (r/w/o: 13392.89/3819.97/1909.98) lat (ms,95%): 5.00 err/s: 0.00 reconn/s: 0.00 [ 9s ] thds: 4 tps: 972.01 qps: 19436.10 (r/w/o: 13604.07/3888.02/1944.01) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00 [ 10s ] thds: 4 tps: 964.98 qps: 19320.60 (r/w/o: 13521.72/3868.92/1929.96) lat (ms,95%): 4.82 err/s: 0.00 reconn/s: 0.00
  • 30. Debian Host vs Guest SQL statistics: queries performed: read: 127232 write: 36352 other: 18176 total: 181760 transactions: 9088 (908.37 per sec.) queries: 181760 (18167.33 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 10.0037s total number of events: 9088 Host Guest 94430 26980 13490 134900 6745 (673.91 per sec.) 134900 (13478.17 per sec.) 0 (0.00 per sec.) 0 (0.00 per sec.) 10.0075s 6745
  • 31. Debian Host vs Guest SQL statistics: queries performed: read: 127232 write: 36352 other: 18176 total: 181760 transactions: 9088 (908.37 per sec.) queries: 181760 (18167.33 per sec.) ignored errors: 0 (0.00 per sec.) reconnects: 0 (0.00 per sec.) General statistics: total time: 10.0037s total number of events: 9088 Host Guest 94430 26980 13490 134900 6745 (673.91 per sec.) 134900 (13478.17 per sec.) 0 (0.00 per sec.) 0 (0.00 per sec.) 10.0075s 6745 -25.8%! -25.8%! -25.8%! -25.8%! -25.8%! -25.8%! +0.03% -25.8%!
  • 32. Debian Host vs Guest Latency (ms): min: 1.44 avg: 4.40 max: 162.14 95th percentile: 5.00 sum: 39998.32 Threads fairness: events (avg/stddev): 2272.0000/1.87 execution time (avg/stddev): 9.9996/0.00 Host Guest 3.31 5.93 167.81 7.04 40007.12 1686.2500/1.09 10.0018/0.00 +34.8%! -25.8%
  • 33. Using trace-cmd ■ There are over a 100 KVM events (Host events to handle guests)
  • 34. Using trace-cmd ■ There are over a 100 KVM events (Host events to handle guests) ■ Traces when guests enter and leave the virtual environment
  • 35. Using trace-cmd ■ There are over a 100 KVM events (Host events to handle guests) ■ Traces when guests enter and leave the virtual environment ■ Shows why guests exit and what the host is doing for the guest
  • 36. On Fedora trace Guest from Host # trace-cmd record -e sched -e kvm ssh root@Fedora33 ‘taskset -a -c 1 sysbench mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
  • 37. On Fedora show Guest report # trace-cmd report version = 7 cpus=8 ssh-3108 [001] 8832.004785: sched_process_exec: filename=/usr/bin/ssh pid=3108 old_pid=3108 ssh-3108 [001] 8832.004850: sched_stat_runtime: comm=ssh pid=3108 runtime=514120 [ns] vruntime=165181151 [ns] ssh-3108 [001] 8832.006849: sched_stat_runtime: comm=ssh pid=3108 runtime=992297 [ns] vruntime=167169721 [ns] ssh-3108 [001] 8832.007849: sched_stat_runtime: comm=ssh pid=3108 runtime=997254 [ns] vruntime=168166975 [ns] ssh-3108 [001] 8832.007851: sched_wake_idle_without_ipi: cpu=0 ssh-3108 [001] 8832.008092: sched_waking: comm=vhost-2426 pid=2440 prio=120 target_cpu=001 ssh-3108 [001] 8832.008094: sched_migrate_task: comm=vhost-2426 pid=2440 prio=120 orig_cpu=1 dest_cpu=2 ssh-3108 [001] 8832.008096: sched_wake_idle_without_ipi: cpu=2 ssh-3108 [001] 8832.008096: sched_wakeup: vhost-2426:2440 [120] CPU:002 ssh-3108 [001] 8832.008098: sched_stat_runtime: comm=ssh pid=3108 runtime=243443 [ns] vruntime=168410418 [ns] ssh-3108 [001] 8832.008101: sched_switch: ssh:3108 [120] S ==> swapper/1:0 [120] <idle>-0 [002] 8832.008128: sched_switch: swapper/2:0 [120] R ==> vhost-2426:2440 [120] vhost-2426-2440 [002] 8832.008142: kvm_msi_set_irq: dst 1 vec 44 (Fixed|physical|edge) vhost-2426-2440 [002] 8832.008144: kvm_apic_accept_irq: apicid 1 vec 44 (Fixed|edge) vhost-2426-2440 [002] 8832.008144: sched_waking: comm=CPU 1/KVM pid=2443 prio=120 target_cpu=007 vhost-2426-2440 [002] 8832.008151: sched_wake_idle_without_ipi: cpu=7 vhost-2426-2440 [002] 8832.008152: sched_wakeup: CPU 1/KVM:2443 [120] CPU:007
  • 38. Using libtracecmd ■ Is a library that comes with trace-cmd
  • 39. Using libtracecmd ■ Is a library that comes with trace-cmd ■ Allows you to write tools that can analyze trace.dat files ● These are the files that trace-cmd creates.
  • 40. Using libtracecmd ■ Is a library that comes with trace-cmd ■ Allows you to write tools that can analyze trace.dat files ● These are the files that trace-cmd creates. ■ This talk is not about how to use this library ● Only that it exists
  • 41. Using libtracecmd ■ Is a library that comes with trace-cmd ■ Allows you to write tools that can analyze trace.dat files ● These are the files that trace-cmd creates. ■ This talk is not about how to use this library ● Only that it exists ■ But we will use a tool I created with this library ● https://rostedt.org/code/kvm-exit.c
  • 42. Using libtracecmd ■ Is a library that comes with trace-cmd ■ Allows you to write tools that can analyze trace.dat files ● These are the files that trace-cmd creates. ■ This talk is not about how to use this library ● Only that it exists ■ But we will use a tool I created with this library ● https://rostedt.org/code/kvm-exit.c ■ This examines the kvm_exit and kvm_entry events
  • 43. On Fedora Host run # kvm-exit trace.dat vCPU 0: host_pid: 2442 Number of exits: 32505 Total time (us): 8290386 Avg time (us): 255 Max time (us): 66519 Min time (nano): 272 reason: EXCEPTION_NMI isa:1 exit:0 Number of exits: 1 Total time (us): 5 Avg time (us): 5 Max time (us): 5 Min time (nano): 5452 reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 5840 Total time (us): 43813 Avg time (us): 7 Max time (us): 97 Min time (nano): 766 Migrated: 3 Preempted: Number of exits: 12
  • 44. On Fedora Host run reason: INTERRUPT_WINDOW isa:1 exit:7 Number of exits: 1427 Total time (us): 2263 Avg time (us): 1 Max time (us): 16 Min time (nano): 382 reason: CPUID isa:1 exit:10 Number of exits: 65 Total time (us): 90 Avg time (us): 1 Max time (us): 7 Min time (nano): 596 reason: HLT isa:1 exit:12 Number of exits: 3132 Total time (us): 8149964 Avg time (us): 2602 Max time (us): 66519 Min time (nano): 468 Migrated: 1 Preempted: Number of exits: 52 Total time (us): 544 Avg time (us): 10
  • 45. Using trace-cmd agent ■ Run trace-cmd agent on the guests ● Listens on the vsocket for connections from the host ● Can start tracing for the guest ● Synchronizes timestamps with the host
  • 46. Using trace-cmd agent ■ Run trace-cmd agent on the guests ● Listens on the vsocket for connections from the host ● Can start tracing for the guest ● Synchronizes timestamps with the host ■ Use the -A option on the host running trace-cmd record ● Will connect to the guest agent ● Can start tracing on both the host and the guest ● Negotiates timestamp synchronization to keep host and guest events in sync
  • 47. On the Guest # trace-cmd agent listening on @4:823
  • 48. On the Guest # trace-cmd agent listening on @4:823 vsocket CID
  • 49. On the Guest # trace-cmd agent listening on @4:823 vsocket port vsocket CID
  • 50. On Host run # trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’
  • 51. On Host run # trace-cmd record -e sched -e kvm -A @4:823 --name guest -e sched ssh root@Fedora33 ‘taskset -a -c 1 sysbench --mysql-user=sbtest_user --mysql_password=password --mysql-db=sbtest --tables=16 --table-size=10000 --threads=4 --time=10 --events=0 --report-interval=1 /usr/share/sysbench/oltp_read_write.lua run’ vsocket port vsocket CID
  • 52. Analyze Fedora sysbench # kvm-exit -c sysbench trace.dat trace-guest.dat vCPU 1: host_pid: 16185 Number of exits: 2548 Total time (us): 16112 Avg time (us): 6 Max time (us): 71 Min time (nano): 655 task sysbench: Total run time: 575048(us) reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 692 Total time (us): 8430 Avg time (us): 12 Max time (us): 71 Min time (nano): 2644 reason: PENDING_INTERRUPT isa:1 exit:7 Number of exits: 118 Total time (us): 280 Avg time (us): 2 Max time (us): 6 Min time (nano): 655
  • 53. Analyze Fedora sysbench reason: MSR_WRITE isa:1 exit:32 Number of exits: 656 Total time (us): 2111 Avg time (us): 3 Max time (us): 18 Min time (nano): 832 reason: EPT_VIOLATION isa:1 exit:48 Number of exits: 500 Total time (us): 2012 Avg time (us): 4 Max time (us): 31 Min time (nano): 948 reason: PREEMPTION_TIMER isa:1 exit:52 Number of exits: 582 Total time (us): 3277 Avg time (us): 5 Max time (us): 16 Min time (nano): 1693
  • 54. Analyze Fedora mysqld # kvm-exit -c mysqld trace.dat trace-guest.dat vCPU 1: host_pid: 16185 Number of exits: 10479 Total time (us): 66016 Avg time (us): 6 Max time (us): 72 Min time (nano): 457 task mysqld: Total run time: 2363606(us) reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 2583 Total time (us): 30788 Avg time (us): 11 Max time (us): 72 Min time (nano): 814 reason: PENDING_INTERRUPT isa:1 exit:7 Number of exits: 663 Total time (us): 1174 Avg time (us): 1 Max time (us): 9 Min time (nano): 457
  • 55. Analyze Fedora mysqld reason: MSR_WRITE isa:1 exit:32 Number of exits: 2956 Total time (us): 10012 Avg time (us): 3 Max time (us): 54 Min time (nano): 534 reason: EPT_VIOLATION isa:1 exit:48 Number of exits: 1571 Total time (us): 7942 Avg time (us): 5 Max time (us): 25 Min time (nano): 733 reason: EPT_MISCONFIG isa:1 exit:49 Number of exits: 291 Total time (us): 3448 Avg time (us): 11 Max time (us): 29 Min time (nano): 4022 reason: PREEMPTION_TIMER isa:1 exit:52 Number of exits: 2415 Total time (us): 12651 Avg time (us): 5 Max time (us): 16 Min time (nano): 731
  • 56. Analyze Debian sysbench # kvm-exit -c sysbench trace.dat trace-guest.dat vCPU 1: host_pid: 613977 Number of exits: 5287 Total time (us): 26484 Avg time (us): 5 Max time (us): 978 Min time (nano): 283 task sysbench: Total run time: 1361148(us) reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 1236 Total time (us): 10163 Avg time (us): 8 Max time (us): 978 Min time (nano): 590 reason: PENDING_INTERRUPT isa:1 exit:7 Number of exits: 228 Total time (us): 195 Avg time (us): 0 Max time (us): 3 Min time (nano): 408
  • 57. Analyze Debian sysbench reason: MSR_WRITE isa:1 exit:32 Number of exits: 568 Total time (us): 1744 Avg time (us): 3 Max time (us): 34 Min time (nano): 354 reason: PAUSE_INSTRUCTION isa:1 exit:40 Number of exits: 1 Total time (us): 2 Avg time (us): 2 Max time (us): 2 Min time (nano): 2829 reason: EPT_VIOLATION isa:1 exit:48 Number of exits: 2923 Total time (us): 13477 Avg time (us): 4 Max time (us): 411 Min time (nano): 513 reason: PREEMPTION_TIMER isa:1 exit:52 Number of exits: 313 Total time (us): 886 Avg time (us): 2 Max time (us): 21 Min time (nano): 831
  • 58. Analyze Debian mariadbd # kvm-exit -c mariadbd trace.dat trace-guest.dat vCPU 1: host_pid: 613977 Number of exits: 17846 Total time (us): 132184 Avg time (us): 7 Max time (us): 31001 Min time (nano): 309 task mariadbd: Total run time: 7045710(us) reason: EXCEPTION_NMI isa:1 exit:0 Number of exits: 2 Total time (us): 12 Avg time (us): 6 Max time (us): 6 Min time (nano): 6034 reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 6320 Total time (us): 43236 Avg time (us): 6 Max time (us): 598 Min time (nano): 572
  • 59. Analyze Debian mariadbd reason: PENDING_INTERRUPT isa:1 exit:7 Number of exits: 535 Total time (us): 464 Avg time (us): 0 Max time (us): 6 Min time (nano): 417 reason: MSR_WRITE isa:1 exit:32 Number of exits: 3300 Total time (us): 16468 Avg time (us): 4 Max time (us): 6177 Min time (nano): 309 reason: EPT_VIOLATION isa:1 exit:48 Number of exits: 2667 Total time (us): 48480 Avg time (us): 18 Max time (us): 31001 Min time (nano): 594 reason: EPT_MISCONFIG isa:1 exit:49 Number of exits: 3237 Total time (us): 18510 Avg time (us): 5 Max time (us): 264 Min time (nano): 1308
  • 60. Analyze Debian mariadbd Guest functions # kvm-exit -c mariadbd -f trace.dat trace-guest.dat vCPU 1: host_pid: 613977 Number of exits: 17846 Total time (us): 132184 Avg time (us): 7 Max time (us): 31001 Min time (nano): 309 task mariadbd: Total run time: 7045710(us) reason: EXCEPTION_NMI isa:1 exit:0 Number of exits: 2 Total time (us): 12 Avg time (us): 6 Max time (us): 6 Min time (nano): 6034 reason: EXTERNAL_INTERRUPT isa:1 exit:1 Number of exits: 6320 Total time (us): 43236 Avg time (us): 6 Max time (us): 598 Min time (nano): 572
  • 61. Analyze Debian mariadbd Guest functions reason: MSR_WRITE native_write_msr+0x4 isa:1 exit:32 Number of exits: 3300 Total time (us): 16468 Avg time (us): 4 Max time (us): 6177 Min time (nano): 309 reason: EPT_VIOLATION isa:1 exit:48 Number of exits: 475 Total time (us): 1066 Avg time (us): 2 Max time (us): 26 Min time (nano): 594 reason: EPT_VIOLATION memcg_slab_post_alloc_hook+0x127 isa:1 exit:48 Number of exits: 2192 Total time (us): 47413 Avg time (us): 21 Max time (us): 31001 Min time (nano): 1191 reason: EPT_MISCONFIG iowrite16+0x9 isa:1 exit:49 Number of exits: 3237 Total time (us): 18510 Avg time (us): 5 Max time (us): 264 Min time (nano): 1308
  • 62. Using libtracecmd for task analysis ■ Another tool I created for task analysis ● https://rostedt.org/code/task-time.c
  • 63. Using libtracecmd for task analysis ■ Another tool I created for task analysis ● https://rostedt.org/code/task-time.c ■ This examines the times the task: ● Runs on each CPU
  • 64. Using libtracecmd for task analysis ■ Another tool I created for task analysis ● https://rostedt.org/code/task-time.c ■ This examines the times the task: ● Runs on each CPU ● Is preempted by other tasks (and shows when threads preempt each other)
  • 65. Using libtracecmd for task analysis ■ Another tool I created for task analysis ● https://rostedt.org/code/task-time.c ■ This examines the times the task: ● Runs on each CPU ● Is preempted by other tasks (and shows when threads preempt each other) ● Is blocked on I/O
  • 66. Using libtracecmd for task analysis ■ Another tool I created for task analysis ● https://rostedt.org/code/task-time.c ■ This examines the times the task: ● Runs on each CPU ● Is preempted by other tasks (and shows when threads preempt each other) ● Is blocked on I/O ● Is sleeping (but could be blocked on a pthread_mutex)
  • 67. Analyze Debian mariadbd task # task-time -c mariadbd trace.dat Task: mariadbd Total run time (us): 7374469 Thread preempt time (us): 23500051 Total preempt time (us): 1375971 (-22124080) Total blocked time (us): 2432569 Total sleep time (us): 49682796 thread id: 493170 Total run time (us): 215 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9450 thread id: 493171 Total run time (us): 428 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9605371 [..]
  • 68. Analyze Debian mariadbd task # task-time -c mariadbd trace.dat Task: mariadbd Total run time (us): 7374469 Thread preempt time (us): 23500051 Total preempt time (us): 1375971 (-22124080) Total blocked time (us): 2432569 Total sleep time (us): 49682796 thread id: 493170 Total run time (us): 215 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9450 thread id: 493171 Total run time (us): 428 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9605371 [..] # task-time -c mariadbd trace-guest.dat Task: mariadbd Total run time (us): 7045710 Thread preempt time (us): 20535866 Total preempt time (us): 1327352 (-19208514) Total blocked time (us): 3352464 Total sleep time (us): 37335578 thread id: 1206 Total run time (us): 233 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 4113 thread id: 1207 Total run time (us): 375 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 10001415 [..]
  • 69. Analyze Debian mariadbd task # task-time -c mariadbd trace.dat Task: mariadbd Total run time (us): 7374469 Thread preempt time (us): 23500051 Total preempt time (us): 1375971 (-22124080) Total blocked time (us): 2432569 Total sleep time (us): 49682796 thread id: 493170 Total run time (us): 215 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9450 thread id: 493171 Total run time (us): 428 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 9605371 [..] # task-time -c mariadbd trace-guest.dat Task: mariadbd Total run time (us): 7045710 Thread preempt time (us): 20535866 Total preempt time (us): 1327352 (-19208514) Total blocked time (us): 3352464 Total sleep time (us): 37335578 thread id: 1206 Total run time (us): 233 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 4113 thread id: 1207 Total run time (us): 375 Total preempt time (us): 0 Total blocked time (us): 0 Total sleep time (us): 10001415 [..] Blocked for 33.0% Blocked for 47.6%
  • 70. ■ KernelShark can show host guest interactions ● Follow this tutorial ■ https://rostedt.org/host-guest-tutorial/ ● kernelshark trace.dat -a trace-guest.dat Using KernelShark
  • 72. Brought to you by Steven Rostedt rostedt@goodmis.org @rostedt