SlideShare a Scribd company logo
Contextual Continuous Profiling
Getting deeper insights into your application
Jaroslav Bachorik
Staff Software Engineer
Datadog
1. Introduction
2. Continuous profiling
3. Contextual continuous profiling
4. Takeaways
5. Q&A
Agenda
Introduction
- Active in JVM performance area since 2006
- NetBeans Profiler
- VisualVM [1]
- BTrace [2]
- JMX, Serviceability
- OpenJDK member and Reviewer
- Currently at Datadog in charge of JVM profiling
- In-house profiling agent
- Heavily based on async-profiler [3]
- Participating in OpenJDK
- JFR fixes, backports
- Proposing/implementing new features
Disclaimer
- Examples will be shown in Datadog UI
- Yet, not a Datadog pitch!
[1] https://visualvm.github.io
[2] https://github.com/btraceio/btrace
[3] https://github.com/async-profiler/async-profiler
Continuous Profiling
- Single execution profiling
- Traditional profilers - JProfiler, VisualVM, YourKit etc.
- Used in development phase
- Overhead not a big concern
- Results restricted by the development environment
- Continuous profiling
- Cloud deployments, Continuous delivery etc.
- Profiling in development environment not sufficient
- Profiler is ‘always on’
- Past performance can be inspected and analyzed
- Very overhead sensitive
JDK Flight Recorder
- Available in JDK 9+ and JDK 8 after update 272
- Capture profiling data on demand
- Jcmd
- JMX
- Streaming available since JDK 14 [1]
[1] https://openjdk.org/jeps/349
[2] https://docs.oracle.com/en/java/javase/11/tools/java.html
[3]
https://docs.oracle.com/javacomponents/jmc-5-5/jfr-command-referenc
e/diagnostic-command-reference.htm
> jcmd myapp.jar JFR.dump name=rec1 filename=dump1.jfr
…
> jcmd myapp.jar JFR.dump name=rec1 filename=dump2.jfr
Start JFR at JVM startup
> java -XX:StartFlightRecording=name=rec1,filename=my_recording.jfr -jar myapp.jar
More JFR options are described in Java Tools Reference [2]
Capture profiling data via JCMD
More JCMD related options are described in JCMD Tool Reference [3]
JVMTI Agent
- AsyncGetCallTrace
- ‘Unofficial’ API to get non-biased stacktraces
- Not really maintained
- Lurking bugs can crash your JVM
- async-profiler [1]
- Widely used fully functional profiler
- Exports to multiple formats
- CSV
- Flamegraph [2]
- JFR binary format
[1] https://github.com/async-profiler/async-profiler
[2] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Datadog Continuous Profiler
[1] https://openjdk.org/jeps/349
[2] https://github.com/async-profiler/async-profiler
[3] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
- Agent, Backend, UI
- Backend and UI are proprietary, closed source
- Agent is open source
- https://github.com/dataDog/dd-trace-java
- Opportunistic Agent
- Use any available datasource
- Combines JFR and in-house profiling agent
- Subject to availability
- AsyncGetCallTrace can be crashy on older JVMs
- JFR is not available from all Java vendors
- J9, Zing
- Integrated Agent
- Distributed as a part of the tracer agent
- Integrates with the tracer
- ! Important for context !
Continuous Profiler - Demo
Introducing Context
- ‘Context’ is:
- A set of simple values describing current workload
- Can be thought of as tags
- User specific meaning
- ‘Context’ allows:
- Mapping performance data back to
- HTTP requests
- REST API calls
- GRPC calls
- etc.
- Slice’n’Dice analysis of the performance data
- ‘Context’ is difficult because it must be:
- Of acceptable cardinality
- Fully propagated between threads
- Executors
- Fork-Join
- Reactive frameworks
- Loom!
Implementing Context
- Labels in PPROF
- Ready to use
- Profile size implications
- Go runtime has native support
- Nothing in JVM
- ‘Thread-coloring’ approach was considered in JRockit
- Never implemented
- JFR is still not aware of context
- Custom implementation is needed
Datadog Profiling Context
- Context propagation
- Implemented in Java tracer
- Context associated with a unit of work
- Independent of executing thread
- Context persistence
- Implemented in the profiler agent
- Store context in JFR events
- Easy and fast Java<->Native interop is mandatory
- No JNI calls, please!
- Shared memory buffer
- Relying on Java and native side being tightly coupled
- Semi-custom context
- Capped at ten custom tags
- Custom tag types/names
- Must be defined before profiler is started
- Stored in the JFR recording
Shared Memory Context
- One context per thread
- Sparse thread-page map
- Static size
- Efficient memory layout
- 64 bytes to match the common x64 cache line size
- Checksum
- Used to detect tearing, partial writer
- 64 bit/8 bytes
- Context Content
- Provides 10 slots (currently)
- Each slot is 4 bytes
- Possibly up to 14 slots (56 bytes)
Shared Memory Context
Thread 1
Thread 2
…
Thread N
1 2 3 4 5 6 7 8 9 10
chksum
64b
Context data (10 slots, 40 bytes
64 bytes (eg. cache line)
1 2 3 4 5 6 7 8 9 10
chksum
64b
Context data (10 slots, 40 bytes)
64 bytes (eg. cache line)
Thread
page
map
JFR Event Context
- Contextual JFR events
- Used context slots as event attributes
- Event scheme generated at startup
- Store context slot names/ids
- Use Settings event
- Dictionarized context contents
- Strings mapped to unique IDs
- Context content is the ID
- Strings stored as JFR constant pool
- Standard JFR binary format feature
- Custom context is fully restorable
- Context slot name
- Context slot value
Java API
- ContextSetter
- Register context before profiling is started
- Count
- Names
- Set context values
- Register dictionarized strings
Context Propagation
- Context is bound to a work item
- Work item can be processed by multiple threads
- Manual threads
- Thread pools
- Reactive frameworks
- Context must be carried from thread to thread
- Concept of activate/deactivate
- Piggy-back on distributed tracing
- Datadog Tracer already does propagation
- Context propagation ‘for free’
- Profiling context needs more detailed propagation
- Tracer needs to be aware of profiler needs
Custom Context - Demo
Wrap-up
- Benefits of using profiling context
- Captures additional information about the environment during
profiling
- Enables slicing and dicing of profiling data to identify
performance issues
- Associates profiling data with specific users or inputs for
informed decisions
- Provides connection between parallel code execution to identify
interactions
- Overall, helps developers optimize code more effectively
- Next steps
- Bring the benefits to JDK/JFR
- Many built-in events would benefit from this
- Locks, I/O, etc.
- Standardized implementation
ContextualContinuous Profilng
Thank you!

More Related Content

PDF
Java Profiling Future
PDF
Java in flames
PDF
Java Flight Recorder Behind the Scenes
PDF
Jvm profiling under the hood
PDF
Iurii Antykhovych "Java and performance tools and toys"
PDF
Java Performance & Profiling
PDF
Understanding Request Latency with Wallclock Profiling by Richard Startin
PDF
Diagnose Your Microservices
Java Profiling Future
Java in flames
Java Flight Recorder Behind the Scenes
Jvm profiling under the hood
Iurii Antykhovych "Java and performance tools and toys"
Java Performance & Profiling
Understanding Request Latency with Wallclock Profiling by Richard Startin
Diagnose Your Microservices

Similar to ContextualContinuous Profilng (20)

PDF
JDK Tools For Performance Diagnostics
PDF
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
PDF
Chronon - A Back-In-Time-Debugger for Java
PPTX
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
PDF
Java Performance and Using Java Flight Recorder
PDF
Java 25 and Beyond - A Roadmap of Innovations
PPT
Jdk Tools For Performance Diagnostics
PDF
Using Flame Graphs
PPTX
HPC Application Profiling & Analysis
PDF
A new execution model for Nashorn in Java 9
PDF
Threads Needles Stacks Heaps - Java edition
PDF
HPC Application Profiling and Analysis
PDF
Web Sphere Problem Determination Ext
PPTX
Beirut Java User Group JVM presentation
PPTX
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
PPTX
The Art of JVM Profiling
PPTX
Application Profiling for Memory and Performance
PDF
Java Performance and Profiling
PPT
Jpf model checking
PDF
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
JDK Tools For Performance Diagnostics
How To Get The Most Out Of Your Hibernate, JBoss EAP 7 Application (Ståle Ped...
Chronon - A Back-In-Time-Debugger for Java
Diagnosing HotSpot JVM Memory Leaks with JFR and JMC
Java Performance and Using Java Flight Recorder
Java 25 and Beyond - A Roadmap of Innovations
Jdk Tools For Performance Diagnostics
Using Flame Graphs
HPC Application Profiling & Analysis
A new execution model for Nashorn in Java 9
Threads Needles Stacks Heaps - Java edition
HPC Application Profiling and Analysis
Web Sphere Problem Determination Ext
Beirut Java User Group JVM presentation
OracleCode 2017: Performance Diagnostic Techniques for Big Data Solutions Usi...
The Art of JVM Profiling
Application Profiling for Memory and Performance
Java Performance and Profiling
Jpf model checking
TIP1 - Overview of C/C++ Debugging/Tracing/Profiling Tools
Ad

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Cloud computing and distributed systems.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
cuic standard and advanced reporting.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Approach and Philosophy of On baking technology
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Electronic commerce courselecture one. Pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Advanced methodologies resolving dimensionality complications for autism neur...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Cloud computing and distributed systems.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The Rise and Fall of 3GPP – Time for a Sabbatical?
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
cuic standard and advanced reporting.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Electronic commerce courselecture one. Pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Ad

ContextualContinuous Profilng

  • 1. Contextual Continuous Profiling Getting deeper insights into your application Jaroslav Bachorik Staff Software Engineer Datadog
  • 2. 1. Introduction 2. Continuous profiling 3. Contextual continuous profiling 4. Takeaways 5. Q&A Agenda
  • 3. Introduction - Active in JVM performance area since 2006 - NetBeans Profiler - VisualVM [1] - BTrace [2] - JMX, Serviceability - OpenJDK member and Reviewer - Currently at Datadog in charge of JVM profiling - In-house profiling agent - Heavily based on async-profiler [3] - Participating in OpenJDK - JFR fixes, backports - Proposing/implementing new features Disclaimer - Examples will be shown in Datadog UI - Yet, not a Datadog pitch! [1] https://visualvm.github.io [2] https://github.com/btraceio/btrace [3] https://github.com/async-profiler/async-profiler
  • 4. Continuous Profiling - Single execution profiling - Traditional profilers - JProfiler, VisualVM, YourKit etc. - Used in development phase - Overhead not a big concern - Results restricted by the development environment - Continuous profiling - Cloud deployments, Continuous delivery etc. - Profiling in development environment not sufficient - Profiler is ‘always on’ - Past performance can be inspected and analyzed - Very overhead sensitive
  • 5. JDK Flight Recorder - Available in JDK 9+ and JDK 8 after update 272 - Capture profiling data on demand - Jcmd - JMX - Streaming available since JDK 14 [1] [1] https://openjdk.org/jeps/349 [2] https://docs.oracle.com/en/java/javase/11/tools/java.html [3] https://docs.oracle.com/javacomponents/jmc-5-5/jfr-command-referenc e/diagnostic-command-reference.htm > jcmd myapp.jar JFR.dump name=rec1 filename=dump1.jfr … > jcmd myapp.jar JFR.dump name=rec1 filename=dump2.jfr Start JFR at JVM startup > java -XX:StartFlightRecording=name=rec1,filename=my_recording.jfr -jar myapp.jar More JFR options are described in Java Tools Reference [2] Capture profiling data via JCMD More JCMD related options are described in JCMD Tool Reference [3]
  • 6. JVMTI Agent - AsyncGetCallTrace - ‘Unofficial’ API to get non-biased stacktraces - Not really maintained - Lurking bugs can crash your JVM - async-profiler [1] - Widely used fully functional profiler - Exports to multiple formats - CSV - Flamegraph [2] - JFR binary format [1] https://github.com/async-profiler/async-profiler [2] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
  • 7. Datadog Continuous Profiler [1] https://openjdk.org/jeps/349 [2] https://github.com/async-profiler/async-profiler [3] https://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html - Agent, Backend, UI - Backend and UI are proprietary, closed source - Agent is open source - https://github.com/dataDog/dd-trace-java - Opportunistic Agent - Use any available datasource - Combines JFR and in-house profiling agent - Subject to availability - AsyncGetCallTrace can be crashy on older JVMs - JFR is not available from all Java vendors - J9, Zing - Integrated Agent - Distributed as a part of the tracer agent - Integrates with the tracer - ! Important for context !
  • 9. Introducing Context - ‘Context’ is: - A set of simple values describing current workload - Can be thought of as tags - User specific meaning - ‘Context’ allows: - Mapping performance data back to - HTTP requests - REST API calls - GRPC calls - etc. - Slice’n’Dice analysis of the performance data - ‘Context’ is difficult because it must be: - Of acceptable cardinality - Fully propagated between threads - Executors - Fork-Join - Reactive frameworks - Loom!
  • 10. Implementing Context - Labels in PPROF - Ready to use - Profile size implications - Go runtime has native support - Nothing in JVM - ‘Thread-coloring’ approach was considered in JRockit - Never implemented - JFR is still not aware of context - Custom implementation is needed
  • 11. Datadog Profiling Context - Context propagation - Implemented in Java tracer - Context associated with a unit of work - Independent of executing thread - Context persistence - Implemented in the profiler agent - Store context in JFR events - Easy and fast Java<->Native interop is mandatory - No JNI calls, please! - Shared memory buffer - Relying on Java and native side being tightly coupled - Semi-custom context - Capped at ten custom tags - Custom tag types/names - Must be defined before profiler is started - Stored in the JFR recording
  • 12. Shared Memory Context - One context per thread - Sparse thread-page map - Static size - Efficient memory layout - 64 bytes to match the common x64 cache line size - Checksum - Used to detect tearing, partial writer - 64 bit/8 bytes - Context Content - Provides 10 slots (currently) - Each slot is 4 bytes - Possibly up to 14 slots (56 bytes)
  • 13. Shared Memory Context Thread 1 Thread 2 … Thread N 1 2 3 4 5 6 7 8 9 10 chksum 64b Context data (10 slots, 40 bytes 64 bytes (eg. cache line) 1 2 3 4 5 6 7 8 9 10 chksum 64b Context data (10 slots, 40 bytes) 64 bytes (eg. cache line) Thread page map
  • 14. JFR Event Context - Contextual JFR events - Used context slots as event attributes - Event scheme generated at startup - Store context slot names/ids - Use Settings event - Dictionarized context contents - Strings mapped to unique IDs - Context content is the ID - Strings stored as JFR constant pool - Standard JFR binary format feature - Custom context is fully restorable - Context slot name - Context slot value
  • 15. Java API - ContextSetter - Register context before profiling is started - Count - Names - Set context values - Register dictionarized strings
  • 16. Context Propagation - Context is bound to a work item - Work item can be processed by multiple threads - Manual threads - Thread pools - Reactive frameworks - Context must be carried from thread to thread - Concept of activate/deactivate - Piggy-back on distributed tracing - Datadog Tracer already does propagation - Context propagation ‘for free’ - Profiling context needs more detailed propagation - Tracer needs to be aware of profiler needs
  • 18. Wrap-up - Benefits of using profiling context - Captures additional information about the environment during profiling - Enables slicing and dicing of profiling data to identify performance issues - Associates profiling data with specific users or inputs for informed decisions - Provides connection between parallel code execution to identify interactions - Overall, helps developers optimize code more effectively - Next steps - Bring the benefits to JDK/JFR - Many built-in events would benefit from this - Locks, I/O, etc. - Standardized implementation