SlideShare a Scribd company logo
A Practical Guide To End-to-End Tracing
In Event Driven Architectures
Roman - UK developer at PIE Labs
PIE Labs, Confluent
• What is Distributed Tracing ?
• What is OpenTelemetry?
• Kafka Client instrumentation
• Kafka Streams instrumentation
• Kafka Connect instrumentation
• Summary
Who we are, what we’ll talk about…
What is Distributed Tracing (DT)?
https://www.altexsoft.com/blog/shipment-tracking-integration-apis-edis-carriers-aggregators/
Systems get
complex…
Components of a DT system
• Instrumentation
• Collection
• Visualisation
https://blog.gurock.com/distributed-tracing
/
What makes up a trace?
https://docs.logz.io/user-guide/distributed-tracing/what-is-tracing
Context and Context?
• Local Trace Context
Context and Context?
• Remote Trace Context
Trace Context and it’s Propagation?
Why Distributed Tracing?
Adding context to the message and process flow.
• Dependency graph
• Record of Event flow
• Log correlation
• Contextual metrics
• Answer questions like:
“This result looks weird. Show me all the intermediate
states, so I can debug where the weirdness started…”
Section: What is OpenTelemetry?
• What is Distributed Tracing (DT)?
• What is OpenTelemetry?
• Kafka Client instrumentation
• Kafka Streams instrumentation
• Kafka Connect instrumentation
Overview of OpenTelemetry
1. Standardised, vendor-agnostic
2. High-quality, ubiquitous, and portable
3. Collection of tools, APIs, and SDKs
4. Instrument, generate
5. Collect, and export
6. To an observability back-end (not OT - e.g. Jaeger)
7. To help you analyze your software’s performance and behavior
Support for Kafka in OpenTelemetry
Kafka Clients:
• Javaagent
• Tracing Wrappers
• Tracing Interceptors
Kafka Streams:
• Javaagent
• Supply Kafka Clients with Tracing
OpenTelmetry instrumentation
Javaagent - Auto Instrumentation Agent
• Aspect Oriented approach
• Byte Buddy
• Muzzle
• Extension support
• Service Provider Interface (SPI) for tracer customization
• Installed at runtime through `-javaagent` Java option
OpenTelmetry instrumentation agent - how does it work?
Tracing Interceptors
• Standard Consumer / Producer Interceptor
implementations
• Installed through Interceptor configuration
Tracing Interceptors - how do they work?
Tracing Wrappers
• Standard Consumer / Producer Interface
implementations
• Installed through code
• Uses java.reflect.Proxy to intercept relevant method
calls and add tracing behaviour
Tracing Wrappers - how do they work?
• What is Distributed Tracing (DT)?
• What is OpenTelemetry?
• Kafka Client instrumentation
• Kafka Streams instrumentation
• Kafka Connect instrumentation
Section: Kafka Client instrumentation
Tracing Consumer Producer Applications
Application Flow
Trace
Tracing Consumer Producer Applications
Kafka Receive Telemetry Enabled
Tracing Consumer Producer Applications
Kafka Receive Telemetry Enabled
Tracing Consumer Producer Applications
Kafka Receive Telemetry Enabled
Tracing Consumer Producer Applications
Kafka Receive Telemetry Disabled
Tracing Consumer Producer Applications
Kafka Receive Telemetry Disabled
Tracing Consumer Producer Applications
Kafka Receive Telemetry Disabled
Comparison of instrumentation methods
Interceptor / Wrapper
Javaagent
Comparison of instrumentation methods - Producer
Interceptor Javaagent
Wrapper
Comparison of instrumentation methods - Consumer
Interceptor Javaagent
Wrapper
Comparison of instrumentation methods
Interceptor Wrapper Javaagent
Installation Config or Code Code Runtime
Producer Metadata Limited Full Full
Consumer Metadata Full Full Full
Consumer Local Context Not propagated Not propagated Propagated
Library Kafka-clients-2.6 Kafka-clients-2.6 Kafka-clients-0.11
• What is Distributed Tracing (DT)?
• What is OpenTelemetry?
• Kafka Client instrumentation
• Kafka Streams instrumentation
• Kafka Connect instrumentation
Section: Kafka Client instrumentation
Kafka Streams support
• Kafka Streams specific Process span implementation in Javaagent
• Stateless Kafka Streams processing pipelines
• Limitations:
• Non-javaagent implementation limitations
• Stateful operation support
• Caching in stateful operations
Kafka Streams support - Stateless
Javaagent
Interceptor / Wrapper
Kafka Streams support - Stateless
Javaagent
Interceptor / Wrapper
Kafka Streams support - Stateless
Interceptor / Wrapper
Kafka Streams support - Stateless
Interceptor / Wrapper
Javaagent
Kafka Streams - Stateful - No Caching
Kafka Streams - Stateful - No Caching - Aggregate Span
Kafka Streams - Stateful - No Caching - Aggregate Span
Kafka Streams - Stateful - No Caching - Aggregate Span
Kafka Streams - Stateful - No Caching - Aggregate Span
Kafka Streams - Stateful - No Caching - Wrapped Store
Kafka Streams - Stateful - No Caching - Wrapped Store
Kafka Streams - Stateful - No Caching - Wrapped Store
Kafka Streams - Stateful - Caching
Kafka Streams - Stateful - Caching - Aggregate
Kafka Streams - Stateful - Caching - Aggregate
Kafka Streams - Stateful - Caching - Aggregate
Kafka Streams support - summary
• Supported as is with Javaagent:
• Stateless
• Stateful - with limitations - single thread context, - no caching
• Wrapping State Store approach
• Inlining Span creation into Stateful operations
• Transformer hack - possible repartitioning
• What is Distributed Tracing (DT)?
• What is OpenTelemetry?
• Kafka Client instrumentation
• Kafka Streams instrumentation
• Kafka Connect instrumentation
Section: Kafka Client instrumentation
Kafka Connect support - Source Task
Kafka Connect support - Source Task
Kafka Connect support - Source Task
Kafka Connect support - Source Task
Kafka Connect support - Sink Task
Kafka Connect support - Sink Task
Kafka Connect support - Sink Task
Future plans & More Information
confluent.io/community/ask-the-community
rkolesnev@confluent.io
https://github.com/rkolesnev/kafka-opentelemetry

More Related Content

PDF
Implementing End-To-End Tracing With Roman Kolesnev and Antony Stubbs | Curre...
PPTX
GraphQL Introduction
PDF
Apache Calcite: One planner fits all
PDF
Better APIs with GraphQL
PDF
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
PPSX
Service Mesh - Observability
PPTX
Introduction to Kafka Cruise Control
PDF
Native Support of Prometheus Monitoring in Apache Spark 3.0
Implementing End-To-End Tracing With Roman Kolesnev and Antony Stubbs | Curre...
GraphQL Introduction
Apache Calcite: One planner fits all
Better APIs with GraphQL
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
Service Mesh - Observability
Introduction to Kafka Cruise Control
Native Support of Prometheus Monitoring in Apache Spark 3.0

What's hot (20)

PDF
ksqlDB - Stream Processing simplified!
PDF
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
PDF
Introduction to Apache Calcite
PDF
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
PDF
Introduction to Kafka Streams
PDF
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
PDF
Elasticsearch in Netflix
PDF
Monitoring with prometheus
PDF
Kafka Streams: What it is, and how to use it?
PDF
Alfresco Content Modelling and Policy Behaviours
 
PDF
GraphQL
PDF
OpenAPI and gRPC Side by-Side
PDF
Systems Monitoring with Prometheus (Devops Ireland April 2015)
PDF
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
PDF
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
PDF
Fast federated SQL with Apache Calcite
PPTX
Alfresco tuning part1
PDF
Introducing Kafka's Streams API
PDF
Producer Performance Tuning for Apache Kafka
PPTX
Prometheus and Grafana
ksqlDB - Stream Processing simplified!
Log aggregation: using Elasticsearch, Fluentd/Fluentbit and Kibana (EFK)
Introduction to Apache Calcite
Better Monitoring for Python: Inclusive Monitoring with Prometheus (Pycon Ire...
Introduction to Kafka Streams
Improving fault tolerance and scaling out in Kafka Streams with Bill Bejeck |...
Elasticsearch in Netflix
Monitoring with prometheus
Kafka Streams: What it is, and how to use it?
Alfresco Content Modelling and Policy Behaviours
 
GraphQL
OpenAPI and gRPC Side by-Side
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Performance Tuning RocksDB for Kafka Streams' State Stores (Dhruba Borthakur,...
What to do if Your Kafka Streams App Gets OOMKilled? with Andrey Serebryanskiy
Fast federated SQL with Apache Calcite
Alfresco tuning part1
Introducing Kafka's Streams API
Producer Performance Tuning for Apache Kafka
Prometheus and Grafana
Ad

Similar to A Practical Guide To End-to-End Tracing In Event Driven Architectures with Roman Kolesnev (20)

PDF
A Practical Guide To End-to-End Tracing In Event Driven Architectures
PDF
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
PDF
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
PPTX
OpenTelemetry For Architects
PDF
Microservices observability
PPTX
OpenTelemetry For Developers
PPTX
OpenTelemetry For Operators
PDF
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
PDF
KCD-OpenTelemetry.pdf
PPTX
OpenTelemetry 101 FTW
PDF
Observability, Distributed Tracing, and Open Source: The Missing Primer
PDF
Java il spanning services 2019
PPTX
Tracing-for-fun-and-profit.pptx
PPTX
Keep Calm and Distributed Tracing
PDF
Introduction to Open Telemetry as Observability Library
PDF
Intro to Instrumentation
PDF
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...
PPTX
DockerCon SF 2019 - Observability Workshop
PDF
Manage Microservices Chaos and Complexity with Observability
PDF
Observability in Java: Getting Started with OpenTelemetry
A Practical Guide To End-to-End Tracing In Event Driven Architectures
A Practical Deep Dive into Observability of Streaming Applications with Kosta...
Distributed Tracing for Kafka with OpenTelemetry with Daniel Kim | Kafka Summ...
OpenTelemetry For Architects
Microservices observability
OpenTelemetry For Developers
OpenTelemetry For Operators
Uncover the Root Cause of Kafka Performance Anomalies, Daniel Kim & Antón Rod...
KCD-OpenTelemetry.pdf
OpenTelemetry 101 FTW
Observability, Distributed Tracing, and Open Source: The Missing Primer
Java il spanning services 2019
Tracing-for-fun-and-profit.pptx
Keep Calm and Distributed Tracing
Introduction to Open Telemetry as Observability Library
Intro to Instrumentation
"Introducing Distributed Tracing in a Large Software System", Kostiantyn Sha...
DockerCon SF 2019 - Observability Workshop
Manage Microservices Chaos and Complexity with Observability
Observability in Java: Getting Started with OpenTelemetry
Ad

More from HostedbyConfluent (20)

PDF
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
PDF
Renaming a Kafka Topic | Kafka Summit London
PDF
Evolution of NRT Data Ingestion Pipeline at Trendyol
PDF
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
PDF
Exactly-once Stream Processing with Arroyo and Kafka
PDF
Fish Plays Pokemon | Kafka Summit London
PDF
Tiered Storage 101 | Kafla Summit London
PDF
Building a Self-Service Stream Processing Portal: How And Why
PDF
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
PDF
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
PDF
Navigating Private Network Connectivity Options for Kafka Clusters
PDF
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
PDF
Explaining How Real-Time GenAI Works in a Noisy Pub
PDF
TL;DR Kafka Metrics | Kafka Summit London
PDF
A Window Into Your Kafka Streams Tasks | KSL
PDF
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
PDF
Data Contracts Management: Schema Registry and Beyond
PDF
Code-First Approach: Crafting Efficient Flink Apps
PDF
Debezium vs. the World: An Overview of the CDC Ecosystem
PDF
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Renaming a Kafka Topic | Kafka Summit London
Evolution of NRT Data Ingestion Pipeline at Trendyol
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Exactly-once Stream Processing with Arroyo and Kafka
Fish Plays Pokemon | Kafka Summit London
Tiered Storage 101 | Kafla Summit London
Building a Self-Service Stream Processing Portal: How And Why
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Navigating Private Network Connectivity Options for Kafka Clusters
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Explaining How Real-Time GenAI Works in a Noisy Pub
TL;DR Kafka Metrics | Kafka Summit London
A Window Into Your Kafka Streams Tasks | KSL
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Data Contracts Management: Schema Registry and Beyond
Code-First Approach: Crafting Efficient Flink Apps
Debezium vs. the World: An Overview of the CDC Ecosystem
Beyond Tiered Storage: Serverless Kafka with No Local Disks

Recently uploaded (20)

PPTX
observCloud-Native Containerability and monitoring.pptx
PPT
What is a Computer? Input Devices /output devices
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Enhancing emotion recognition model for a student engagement use case through...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
PDF
Architecture types and enterprise applications.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Getting Started with Data Integration: FME Form 101
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
STKI Israel Market Study 2025 version august
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
The various Industrial Revolutions .pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
observCloud-Native Containerability and monitoring.pptx
What is a Computer? Input Devices /output devices
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
1 - Historical Antecedents, Social Consideration.pdf
Chapter 5: Probability Theory and Statistics
Hindi spoken digit analysis for native and non-native speakers
Enhancing emotion recognition model for a student engagement use case through...
Assigned Numbers - 2025 - Bluetooth® Document
Developing a website for English-speaking practice to English as a foreign la...
2021 HotChips TSMC Packaging Technologies for Chiplets and 3D_0819 publish_pu...
Architecture types and enterprise applications.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
A comparative study of natural language inference in Swahili using monolingua...
Getting Started with Data Integration: FME Form 101
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
STKI Israel Market Study 2025 version august
O2C Customer Invoices to Receipt V15A.pptx
Programs and apps: productivity, graphics, security and other tools
The various Industrial Revolutions .pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

A Practical Guide To End-to-End Tracing In Event Driven Architectures with Roman Kolesnev

  • 1. A Practical Guide To End-to-End Tracing In Event Driven Architectures
  • 2. Roman - UK developer at PIE Labs PIE Labs, Confluent • What is Distributed Tracing ? • What is OpenTelemetry? • Kafka Client instrumentation • Kafka Streams instrumentation • Kafka Connect instrumentation • Summary Who we are, what we’ll talk about…
  • 3. What is Distributed Tracing (DT)? https://www.altexsoft.com/blog/shipment-tracking-integration-apis-edis-carriers-aggregators/ Systems get complex…
  • 4. Components of a DT system • Instrumentation • Collection • Visualisation https://blog.gurock.com/distributed-tracing /
  • 5. What makes up a trace? https://docs.logz.io/user-guide/distributed-tracing/what-is-tracing
  • 6. Context and Context? • Local Trace Context
  • 7. Context and Context? • Remote Trace Context
  • 8. Trace Context and it’s Propagation?
  • 9. Why Distributed Tracing? Adding context to the message and process flow. • Dependency graph • Record of Event flow • Log correlation • Contextual metrics • Answer questions like: “This result looks weird. Show me all the intermediate states, so I can debug where the weirdness started…”
  • 10. Section: What is OpenTelemetry? • What is Distributed Tracing (DT)? • What is OpenTelemetry? • Kafka Client instrumentation • Kafka Streams instrumentation • Kafka Connect instrumentation
  • 11. Overview of OpenTelemetry 1. Standardised, vendor-agnostic 2. High-quality, ubiquitous, and portable 3. Collection of tools, APIs, and SDKs 4. Instrument, generate 5. Collect, and export 6. To an observability back-end (not OT - e.g. Jaeger) 7. To help you analyze your software’s performance and behavior
  • 12. Support for Kafka in OpenTelemetry Kafka Clients: • Javaagent • Tracing Wrappers • Tracing Interceptors Kafka Streams: • Javaagent • Supply Kafka Clients with Tracing OpenTelmetry instrumentation
  • 13. Javaagent - Auto Instrumentation Agent • Aspect Oriented approach • Byte Buddy • Muzzle • Extension support • Service Provider Interface (SPI) for tracer customization • Installed at runtime through `-javaagent` Java option OpenTelmetry instrumentation agent - how does it work?
  • 14. Tracing Interceptors • Standard Consumer / Producer Interceptor implementations • Installed through Interceptor configuration Tracing Interceptors - how do they work?
  • 15. Tracing Wrappers • Standard Consumer / Producer Interface implementations • Installed through code • Uses java.reflect.Proxy to intercept relevant method calls and add tracing behaviour Tracing Wrappers - how do they work?
  • 16. • What is Distributed Tracing (DT)? • What is OpenTelemetry? • Kafka Client instrumentation • Kafka Streams instrumentation • Kafka Connect instrumentation Section: Kafka Client instrumentation
  • 17. Tracing Consumer Producer Applications Application Flow Trace
  • 18. Tracing Consumer Producer Applications Kafka Receive Telemetry Enabled
  • 19. Tracing Consumer Producer Applications Kafka Receive Telemetry Enabled
  • 20. Tracing Consumer Producer Applications Kafka Receive Telemetry Enabled
  • 21. Tracing Consumer Producer Applications Kafka Receive Telemetry Disabled
  • 22. Tracing Consumer Producer Applications Kafka Receive Telemetry Disabled
  • 23. Tracing Consumer Producer Applications Kafka Receive Telemetry Disabled
  • 24. Comparison of instrumentation methods Interceptor / Wrapper Javaagent
  • 25. Comparison of instrumentation methods - Producer Interceptor Javaagent Wrapper
  • 26. Comparison of instrumentation methods - Consumer Interceptor Javaagent Wrapper
  • 27. Comparison of instrumentation methods Interceptor Wrapper Javaagent Installation Config or Code Code Runtime Producer Metadata Limited Full Full Consumer Metadata Full Full Full Consumer Local Context Not propagated Not propagated Propagated Library Kafka-clients-2.6 Kafka-clients-2.6 Kafka-clients-0.11
  • 28. • What is Distributed Tracing (DT)? • What is OpenTelemetry? • Kafka Client instrumentation • Kafka Streams instrumentation • Kafka Connect instrumentation Section: Kafka Client instrumentation
  • 29. Kafka Streams support • Kafka Streams specific Process span implementation in Javaagent • Stateless Kafka Streams processing pipelines • Limitations: • Non-javaagent implementation limitations • Stateful operation support • Caching in stateful operations
  • 30. Kafka Streams support - Stateless Javaagent Interceptor / Wrapper
  • 31. Kafka Streams support - Stateless Javaagent Interceptor / Wrapper
  • 32. Kafka Streams support - Stateless Interceptor / Wrapper
  • 33. Kafka Streams support - Stateless Interceptor / Wrapper Javaagent
  • 34. Kafka Streams - Stateful - No Caching
  • 35. Kafka Streams - Stateful - No Caching - Aggregate Span
  • 36. Kafka Streams - Stateful - No Caching - Aggregate Span
  • 37. Kafka Streams - Stateful - No Caching - Aggregate Span
  • 38. Kafka Streams - Stateful - No Caching - Aggregate Span
  • 39. Kafka Streams - Stateful - No Caching - Wrapped Store
  • 40. Kafka Streams - Stateful - No Caching - Wrapped Store
  • 41. Kafka Streams - Stateful - No Caching - Wrapped Store
  • 42. Kafka Streams - Stateful - Caching
  • 43. Kafka Streams - Stateful - Caching - Aggregate
  • 44. Kafka Streams - Stateful - Caching - Aggregate
  • 45. Kafka Streams - Stateful - Caching - Aggregate
  • 46. Kafka Streams support - summary • Supported as is with Javaagent: • Stateless • Stateful - with limitations - single thread context, - no caching • Wrapping State Store approach • Inlining Span creation into Stateful operations • Transformer hack - possible repartitioning
  • 47. • What is Distributed Tracing (DT)? • What is OpenTelemetry? • Kafka Client instrumentation • Kafka Streams instrumentation • Kafka Connect instrumentation Section: Kafka Client instrumentation
  • 48. Kafka Connect support - Source Task
  • 49. Kafka Connect support - Source Task
  • 50. Kafka Connect support - Source Task
  • 51. Kafka Connect support - Source Task
  • 52. Kafka Connect support - Sink Task
  • 53. Kafka Connect support - Sink Task
  • 54. Kafka Connect support - Sink Task
  • 55. Future plans & More Information confluent.io/community/ask-the-community rkolesnev@confluent.io https://github.com/rkolesnev/kafka-opentelemetry