SlideShare a Scribd company logo
6
Most read
9
Most read
13
Most read
A ScyllaDB
Community
Using eBPF Off-CPU Sampling to
See What Your DBs are Really
Waiting For
Tanel Põder
Computer Performance Nerd
Tanel Põder
A long-time computer performance nerd
■ I've been promoting DB session- or OS thread-level
performance diagnosis approach for decades now
■ P99CONF is the best performance conference these days! :-)
■ I'm originally from (a small country) Estonia - and my
company logo uses our flag's colors!
■ I still end up researching & testing out modern tech even in
my free time (CXL.mem is my latest interest)
Method > Data Sources > Tools
Every system is
a bunch of
threads.
Measure where
they spend most
of their time and
do it less!
/proc/PID/task/TID
perf, ftrace, ...
"top" for
wallclock time
... and much
more!
eBPF
0x.tools
● /proc sampling
● works without eBPF
● even very old linuxes
● eBPF!
● see anything you want!
● PoC prototype with bcc
● work-in-progress
Extended Linux Thread State Sampling method
/proc sampling example (psn)
the fact of
sampling: a
thread seen in
"active state"
sample attributes:
(many) dimensions
in a "fact table"
For systematic performance & troubleshooting work, I want to:
● See the full system activity (“active threads”)
● Not only system-wide utilization averages
● Not only on-CPU thread stacks, but all thread states (and offcpu stacks)
● With ability to drill down into each thread’s activity
● See what each thread of interest is doing, for whom and why (context)
● I/O & function call latencies tied to each thread & its context at the time
● All this without tracing & postprocessing every event for every thread!
Detailed full system activity without tracing every event?
eBPF example (xtop with bcc)
Each dimension attribute is linked to the
same point in time! (*except oncpu)
"stacktiles" show the
value of a stack_id
Extended Task State Array (very basic) example
How does it work?!
Two decoupled layers
● eBPF populating & maintaining the array
● Keep only the latest state change for each thread
● “Tracking, not tracing!”
● Sampling program independent from population
● Python/BCC, C, Rust/libbpf, eBPF iterators, etc...
● Multiple concurrent samplers allowed
● Different sampling frequencies allowed
Time
tid 10
tid 11
tid 42
10 11 42 N
...
10
10
10
TRACEPOINT_PROBE(
raw_syscalls, sys_enter)
{
...
t->syscall_id = args->id;
tsa.update(&tid, t);
...
}
BPF_HASH(tsa, ...);
TRACEPOINT_PROBE(
raw_syscalls, sys_exit)
{
...
t->syscall_id = -1;
tsa.update(&tid, t);
...
}
Populating the extended task state array
Time
tid 10
tid 11
tid 42
10 11 42 N
...
10 11
11
11
11
11
BPF_HASH(tsa, ...);
TRACEPOINT_PROBE(
raw_syscalls, sys_enter)
{
...
t->syscall_id = args->id;
tsa.update(&tid, t);
...
}
TRACEPOINT_PROBE(
raw_syscalls, sys_exit)
{
...
t->syscall_id = -1;
tsa.update(&tid, t);
...
}
Populating the extended task state array
Time
tid 10
tid 11
tid 42
10 11 42 N
...
10 42
42
42
42
42
42
42
42
42
42
42
We are not
tracing: no
logging or
appending all
events ...
We track:
overwrite the
task's current
action in the
extended task
state array
...
BPF_HASH(tsa, ...);
TRACEPOINT_PROBE(
raw_syscalls, sys_enter)
{
...
t->syscall_id = args->id;
tsa.update(&tid, t);
...
}
TRACEPOINT_PROBE(
raw_syscalls, sys_exit)
{
...
t->syscall_id = -1;
tsa.update(&tid, t);
...
}
Populating the extended task state array
Time
tid 10
tid 11
tid 42
10 11 42 N
...
A separate,
independent
program samples
the state arrays
using its desired
frequency and filter
rules to userspace
tsa = BPF.get_table(“tsa”)
for x in tsa.items():
...
10
11
42
N
10
11
42
N
10
11
42
N
10
11
42
N
BPF_HASH(tsa, ...);
TRACEPOINT_PROBE(
raw_syscalls, sys_enter)
{
...
t->syscall_id = args->id;
tsa.update(&tid, t);
...
}
TRACEPOINT_PROBE(
raw_syscalls, sys_exit)
{
...
t->syscall_id = -1;
tsa.update(&tid, t);
...
}
Sampling the extended task state array
Time
tid 10
tid 11
tid 42
10 11 42 N
... 10
11
42
N
10
11
42
N
10
11
42
N
10
11
42
N
The sampler(s) can
be eBPF client
programs (bcc,
libbpf) using bpf()
syscall or a bpf
task iterator with
perf_event queue
BPF_HASH(tsa, ...);
TRACEPOINT_PROBE(
raw_syscalls, sys_enter)
{
...
t->syscall_id = args->id;
tsa.update(&tid, t);
...
}
TRACEPOINT_PROBE(
raw_syscalls, sys_exit)
{
...
t->syscall_id = -1;
tsa.update(&tid, t);
...
}
tsa = BPF.get_table(“tsa”)
for x in tsa.items():
...
Sampling the extended task state array
Always-on output logging (for time travel and advanced analytics)
$ ./xcapture-bpf -h
usage: xcapture-bpf [-h] [-x] [-d report_seconds] [-f SAMPLE_HZ] [-g csv-columns]
[-G append-csv-columns] [-n] [-N] [-c] [-V] [-o OUTPUT_DIR] [-l]
Always-on profiling of Linux thread activity using eBPF.
options:
-h, --help show this help message and exit
-x, --xtop Run in aggregated top-thread-activity (xtop) mode
-d report_seconds xtop report printing interval (default: 5s)
-f SAMPLE_HZ, --sample-hz SAMPLE_HZ
xtop sampling frequency in Hz (default: 20)
-g csv-columns, --group-by csv-columns
Full column list what to group by
-G append-csv-columns, --append-group-by append-csv-columns
List of additional columns to default cols what to group by
-n, --nerd-mode Print out relevant stack traces as wide output lines
-N, --giant-nerd-mode
Print out relevant stack traces as stacktiles
-c, --clear-screen Clear screen before printing next output
-V, --version Show the program version and exit
-o OUTPUT_DIR, --output-dir OUTPUT_DIR
Directory path where to write the output CSV files
-l, --list list all available columns for display and grouping
Always-on output logging (for time travel and advanced analytics)
$ ls -l
total 236
-rw-r--r-- 1 root root 19080 Jul 12 17:30 stacks_2024-07-12.16.csv
-rw-r--r-- 1 root root 41061 Jul 12 17:00 threads_2024-07-12.16.csv
-rw-r--r-- 1 root root 162132 Jul 12 17:33 threads_2024-07-12.17.csv
$ grep -E "TIMESTAMP|mysql" threads_2024-07-12.17.csv | head
TIMESTAMP,ST,TID,PID,USERNAME,COMM,SYSCALL,CMDLINE,OFFCPU_U,OFFCPU_K,ONCPU_U,ONCPU_K,WAKER_TID,SCH
2024-07-12 17:14:16.798,R,1894,1836,mysql,ib_log_fl_notif,-,,-,-,14409,12280,0,___-
2024-07-12 17:22:44.575,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____
2024-07-12 17:22:45.619,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,30,____
2024-07-12 17:22:46.694,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____
2024-07-12 17:22:47.734,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____
2024-07-12 17:22:48.778,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,_-__
2024-07-12 17:22:49.821,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,____
2024-07-12 17:22:50.864,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,____
2024-07-12 17:22:51.913,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,57771,____
$ grep 9692 stacks_2024-07-12.16.csv
ustack 9692 ->71051cceabb4->std::thread::_State_impl->log_flusher->log_flush_low->Log_file_handle::fsync->
os_file_flush_func->os_file_fsync_posix
Path to "IPC wait chains"?
$ sudo ./xcapture-bpf
Client – Server
interaction
RDBMS commit
"log file sync"
Things not yet implemented, but possible (it's eBPF, after all!)
Many components are already successfully implemented in other (eBPF) tools
● IPC wait chains (more research needed)
● RPC / trace_id / distributed tracing context propagation
● Sample & estimate I/O latencies for each captured thread that's off CPU
● Use these samples for analyzing various latencies across any "dimension"
● Read common SQL DB context (SQL text/hash, exec phase DB wait events)
● Read interpreted language/VM state (via perf.map or direct)
● Still just a method, datasource and a couple of tools, not a product or platform
● Production-grade, always on, focus on compiled binaries & perf.map capable runtimes
● Use BTF, CO-RE and libbpf instead of bcc
● Use BPF task iterators for sampling kernel-maintained task fields (no field duplication)
● Use BPF_MAP_TASK_STORAGE for all the additional (extended context) structures
● Use get_stack (not get_stackid) – flexible, no need for large stack-maps in kernel mem
● Use BlazeSym as the build-id aware symbolizer (OSS by Meta, written in Rust)
● Feed output to common metrics/monitoring/visualization tools (which metric type?!)
● Contribute/integrate with OpenTelemetry agent (if/when the time is right)?
0x.tools future plans and hopes: xcapture-bpf v3.0
Modern libbpf
dev help is
appreciated!
● 0x.tools
● tanelpoder.com
● tanel@tanelpoder.com
● @tanelpoder
Thank You!

More Related Content

PPTX
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
PDF
Linux BPF Superpowers
PDF
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
PDF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
PDF
eBPF Perf Tools 2019
PDF
Performance Wins with eBPF: Getting Started (2021)
PDF
Velocity 2017 Performance analysis superpowers with Linux eBPF
PDF
Systems@Scale 2021 BPF Performance Getting Started
Always-on Profiling of All Linux Threads, On-CPU and Off-CPU, with eBPF & Con...
Linux BPF Superpowers
USENIX ATC 2017 Performance Superpowers with Enhanced BPF
OSSNA 2017 Performance Analysis Superpowers with Linux BPF
eBPF Perf Tools 2019
Performance Wins with eBPF: Getting Started (2021)
Velocity 2017 Performance analysis superpowers with Linux eBPF
Systems@Scale 2021 BPF Performance Getting Started

Similar to Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For by Tanel Poder (20)

PDF
Efficient System Monitoring in Cloud Native Environments
PDF
Andrea Righi - Spying on the Linux kernel for fun and profit
PDF
Spying on the Linux kernel for fun and profit
PDF
LSFMM 2019 BPF Observability
PDF
UM2019 Extended BPF: A New Type of Software
PDF
Security Monitoring with eBPF
PDF
Linux 4.x Tracing Tools: Using BPF Superpowers
PDF
Kernel Recipes 2017: Performance Analysis with BPF
PDF
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
PDF
bcc/BPF tools - Strategy, current tools, future challenges
PDF
BPF Tools 2017
PDF
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
PDF
NetConf 2018 BPF Observability
PPTX
Understanding eBPF in a Hurry!
PDF
Performance Wins with BPF: Getting Started
PDF
BPF: Tracing and more
PDF
Introduction of eBPF - 時下最夯的Linux Technology
PDF
ATO Linux Performance 2018
PDF
re:Invent 2019 BPF Performance Analysis at Netflix
PDF
Kernel bug hunting
Efficient System Monitoring in Cloud Native Environments
Andrea Righi - Spying on the Linux kernel for fun and profit
Spying on the Linux kernel for fun and profit
LSFMM 2019 BPF Observability
UM2019 Extended BPF: A New Type of Software
Security Monitoring with eBPF
Linux 4.x Tracing Tools: Using BPF Superpowers
Kernel Recipes 2017: Performance Analysis with BPF
Kernel Recipes 2017 - Performance analysis Superpowers with Linux BPF - Brend...
bcc/BPF tools - Strategy, current tools, future challenges
BPF Tools 2017
Tracing MariaDB server with bpftrace - MariaDB Server Fest 2021
NetConf 2018 BPF Observability
Understanding eBPF in a Hurry!
Performance Wins with BPF: Getting Started
BPF: Tracing and more
Introduction of eBPF - 時下最夯的Linux Technology
ATO Linux Performance 2018
re:Invent 2019 BPF Performance Analysis at Netflix
Kernel bug hunting
Ad

More from ScyllaDB (20)

PDF
Understanding The True Cost of DynamoDB Webinar
PDF
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
PDF
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
PDF
New Ways to Reduce Database Costs with ScyllaDB
PDF
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
PDF
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
PDF
Leading a High-Stakes Database Migration
PDF
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
PDF
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
PDF
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
PDF
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
PDF
ScyllaDB: 10 Years and Beyond by Dor Laor
PDF
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
PDF
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
PDF
Vector Search with ScyllaDB by Szymon Wasik
PDF
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
PDF
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
PDF
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
PDF
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
PDF
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Understanding The True Cost of DynamoDB Webinar
Database Benchmarking for Performance Masterclass: Session 2 - Data Modeling ...
Database Benchmarking for Performance Masterclass: Session 1 - Benchmarking F...
New Ways to Reduce Database Costs with ScyllaDB
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Powering a Billion Dreams: Scaling Meesho’s E-commerce Revolution with Scylla...
Leading a High-Stakes Database Migration
Achieving Extreme Scale with ScyllaDB: Tips & Tradeoffs
Securely Serving Millions of Boot Artifacts a Day by João Pedro Lima & Matt ...
How Agoda Scaled 50x Throughput with ScyllaDB by Worakarn Isaratham
How Yieldmo Cut Database Costs and Cloud Dependencies Fast by Todd Coleman
ScyllaDB: 10 Years and Beyond by Dor Laor
Reduce Your Cloud Spend with ScyllaDB by Tzach Livyatan
Migrating 50TB Data From a Home-Grown Database to ScyllaDB, Fast by Terence Liu
Vector Search with ScyllaDB by Szymon Wasik
Workload Prioritization: How to Balance Multiple Workloads in a Cluster by Fe...
Two Leading Approaches to Data Virtualization, and Which Scales Better? by Da...
Scaling a Beast: Lessons from 400x Growth in a High-Stakes Financial System b...
Object Storage in ScyllaDB by Ran Regev, ScyllaDB
Lessons Learned from Building a Serverless Notifications System by Srushith R...
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPT
Teaching material agriculture food technology
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Electronic commerce courselecture one. Pdf
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Spectroscopy.pptx food analysis technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Mobile App Security Testing_ A Comprehensive Guide.pdf
Teaching material agriculture food technology
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation
Electronic commerce courselecture one. Pdf
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Spectroscopy.pptx food analysis technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Advanced methodologies resolving dimensionality complications for autism neur...
The Rise and Fall of 3GPP – Time for a Sabbatical?
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars

Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For by Tanel Poder

  • 1. A ScyllaDB Community Using eBPF Off-CPU Sampling to See What Your DBs are Really Waiting For Tanel Põder Computer Performance Nerd
  • 2. Tanel Põder A long-time computer performance nerd ■ I've been promoting DB session- or OS thread-level performance diagnosis approach for decades now ■ P99CONF is the best performance conference these days! :-) ■ I'm originally from (a small country) Estonia - and my company logo uses our flag's colors! ■ I still end up researching & testing out modern tech even in my free time (CXL.mem is my latest interest)
  • 3. Method > Data Sources > Tools Every system is a bunch of threads. Measure where they spend most of their time and do it less! /proc/PID/task/TID perf, ftrace, ... "top" for wallclock time ... and much more! eBPF
  • 4. 0x.tools ● /proc sampling ● works without eBPF ● even very old linuxes ● eBPF! ● see anything you want! ● PoC prototype with bcc ● work-in-progress Extended Linux Thread State Sampling method
  • 5. /proc sampling example (psn) the fact of sampling: a thread seen in "active state" sample attributes: (many) dimensions in a "fact table"
  • 6. For systematic performance & troubleshooting work, I want to: ● See the full system activity (“active threads”) ● Not only system-wide utilization averages ● Not only on-CPU thread stacks, but all thread states (and offcpu stacks) ● With ability to drill down into each thread’s activity ● See what each thread of interest is doing, for whom and why (context) ● I/O & function call latencies tied to each thread & its context at the time ● All this without tracing & postprocessing every event for every thread! Detailed full system activity without tracing every event?
  • 7. eBPF example (xtop with bcc) Each dimension attribute is linked to the same point in time! (*except oncpu)
  • 9. Extended Task State Array (very basic) example
  • 10. How does it work?! Two decoupled layers ● eBPF populating & maintaining the array ● Keep only the latest state change for each thread ● “Tracking, not tracing!” ● Sampling program independent from population ● Python/BCC, C, Rust/libbpf, eBPF iterators, etc... ● Multiple concurrent samplers allowed ● Different sampling frequencies allowed
  • 11. Time tid 10 tid 11 tid 42 10 11 42 N ... 10 10 10 TRACEPOINT_PROBE( raw_syscalls, sys_enter) { ... t->syscall_id = args->id; tsa.update(&tid, t); ... } BPF_HASH(tsa, ...); TRACEPOINT_PROBE( raw_syscalls, sys_exit) { ... t->syscall_id = -1; tsa.update(&tid, t); ... } Populating the extended task state array
  • 12. Time tid 10 tid 11 tid 42 10 11 42 N ... 10 11 11 11 11 11 BPF_HASH(tsa, ...); TRACEPOINT_PROBE( raw_syscalls, sys_enter) { ... t->syscall_id = args->id; tsa.update(&tid, t); ... } TRACEPOINT_PROBE( raw_syscalls, sys_exit) { ... t->syscall_id = -1; tsa.update(&tid, t); ... } Populating the extended task state array
  • 13. Time tid 10 tid 11 tid 42 10 11 42 N ... 10 42 42 42 42 42 42 42 42 42 42 42 We are not tracing: no logging or appending all events ... We track: overwrite the task's current action in the extended task state array ... BPF_HASH(tsa, ...); TRACEPOINT_PROBE( raw_syscalls, sys_enter) { ... t->syscall_id = args->id; tsa.update(&tid, t); ... } TRACEPOINT_PROBE( raw_syscalls, sys_exit) { ... t->syscall_id = -1; tsa.update(&tid, t); ... } Populating the extended task state array
  • 14. Time tid 10 tid 11 tid 42 10 11 42 N ... A separate, independent program samples the state arrays using its desired frequency and filter rules to userspace tsa = BPF.get_table(“tsa”) for x in tsa.items(): ... 10 11 42 N 10 11 42 N 10 11 42 N 10 11 42 N BPF_HASH(tsa, ...); TRACEPOINT_PROBE( raw_syscalls, sys_enter) { ... t->syscall_id = args->id; tsa.update(&tid, t); ... } TRACEPOINT_PROBE( raw_syscalls, sys_exit) { ... t->syscall_id = -1; tsa.update(&tid, t); ... } Sampling the extended task state array
  • 15. Time tid 10 tid 11 tid 42 10 11 42 N ... 10 11 42 N 10 11 42 N 10 11 42 N 10 11 42 N The sampler(s) can be eBPF client programs (bcc, libbpf) using bpf() syscall or a bpf task iterator with perf_event queue BPF_HASH(tsa, ...); TRACEPOINT_PROBE( raw_syscalls, sys_enter) { ... t->syscall_id = args->id; tsa.update(&tid, t); ... } TRACEPOINT_PROBE( raw_syscalls, sys_exit) { ... t->syscall_id = -1; tsa.update(&tid, t); ... } tsa = BPF.get_table(“tsa”) for x in tsa.items(): ... Sampling the extended task state array
  • 16. Always-on output logging (for time travel and advanced analytics) $ ./xcapture-bpf -h usage: xcapture-bpf [-h] [-x] [-d report_seconds] [-f SAMPLE_HZ] [-g csv-columns] [-G append-csv-columns] [-n] [-N] [-c] [-V] [-o OUTPUT_DIR] [-l] Always-on profiling of Linux thread activity using eBPF. options: -h, --help show this help message and exit -x, --xtop Run in aggregated top-thread-activity (xtop) mode -d report_seconds xtop report printing interval (default: 5s) -f SAMPLE_HZ, --sample-hz SAMPLE_HZ xtop sampling frequency in Hz (default: 20) -g csv-columns, --group-by csv-columns Full column list what to group by -G append-csv-columns, --append-group-by append-csv-columns List of additional columns to default cols what to group by -n, --nerd-mode Print out relevant stack traces as wide output lines -N, --giant-nerd-mode Print out relevant stack traces as stacktiles -c, --clear-screen Clear screen before printing next output -V, --version Show the program version and exit -o OUTPUT_DIR, --output-dir OUTPUT_DIR Directory path where to write the output CSV files -l, --list list all available columns for display and grouping
  • 17. Always-on output logging (for time travel and advanced analytics) $ ls -l total 236 -rw-r--r-- 1 root root 19080 Jul 12 17:30 stacks_2024-07-12.16.csv -rw-r--r-- 1 root root 41061 Jul 12 17:00 threads_2024-07-12.16.csv -rw-r--r-- 1 root root 162132 Jul 12 17:33 threads_2024-07-12.17.csv $ grep -E "TIMESTAMP|mysql" threads_2024-07-12.17.csv | head TIMESTAMP,ST,TID,PID,USERNAME,COMM,SYSCALL,CMDLINE,OFFCPU_U,OFFCPU_K,ONCPU_U,ONCPU_K,WAKER_TID,SCH 2024-07-12 17:14:16.798,R,1894,1836,mysql,ib_log_fl_notif,-,,-,-,14409,12280,0,___- 2024-07-12 17:22:44.575,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____ 2024-07-12 17:22:45.619,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,30,____ 2024-07-12 17:22:46.694,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____ 2024-07-12 17:22:47.734,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,0,____ 2024-07-12 17:22:48.778,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,_-__ 2024-07-12 17:22:49.821,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,____ 2024-07-12 17:22:50.864,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,353,____ 2024-07-12 17:22:51.913,D,1895,1836,mysql,ib_log_flush,fsync,/usr/sbin/mysqld,9692,24360,-,-,57771,____ $ grep 9692 stacks_2024-07-12.16.csv ustack 9692 ->71051cceabb4->std::thread::_State_impl->log_flusher->log_flush_low->Log_file_handle::fsync-> os_file_flush_func->os_file_fsync_posix
  • 18. Path to "IPC wait chains"? $ sudo ./xcapture-bpf Client – Server interaction RDBMS commit "log file sync"
  • 19. Things not yet implemented, but possible (it's eBPF, after all!) Many components are already successfully implemented in other (eBPF) tools ● IPC wait chains (more research needed) ● RPC / trace_id / distributed tracing context propagation ● Sample & estimate I/O latencies for each captured thread that's off CPU ● Use these samples for analyzing various latencies across any "dimension" ● Read common SQL DB context (SQL text/hash, exec phase DB wait events) ● Read interpreted language/VM state (via perf.map or direct)
  • 20. ● Still just a method, datasource and a couple of tools, not a product or platform ● Production-grade, always on, focus on compiled binaries & perf.map capable runtimes ● Use BTF, CO-RE and libbpf instead of bcc ● Use BPF task iterators for sampling kernel-maintained task fields (no field duplication) ● Use BPF_MAP_TASK_STORAGE for all the additional (extended context) structures ● Use get_stack (not get_stackid) – flexible, no need for large stack-maps in kernel mem ● Use BlazeSym as the build-id aware symbolizer (OSS by Meta, written in Rust) ● Feed output to common metrics/monitoring/visualization tools (which metric type?!) ● Contribute/integrate with OpenTelemetry agent (if/when the time is right)? 0x.tools future plans and hopes: xcapture-bpf v3.0 Modern libbpf dev help is appreciated!
  • 21. ● 0x.tools ● tanelpoder.com ● tanel@tanelpoder.com ● @tanelpoder Thank You!