SlideShare a Scribd company logo
Distributed systems
Observability
Elastic Stack
Jaeger Tracing
Distributed system
>
Monolithic systems
Distributed tracing
Netflix – microservices system
Distributed tracing
Nowadays all system are distributed
Distributed tracing
Lorem ipsum dolor sit
6
Distributed system – logical view
Distributed tracing
Observability
>
Microservices observability
Observability
Distributed tracing
Monitoring
Dashboards
Thresholds
Interactive
Alerting
Event
based
Trigger
actions
Logging
Centralize
logs
Aggregate
Interactive
Tracing
Request
based
Debugging
Cross-
Platform
Monitoring
Interactive Tools:
• Graphite & Grafana
• Elastic stack with Kibana UI
• Icinga Dashboards
• Oracle Enterprise Manager
• Kafka Manager
• …
Distributed tracing
Distributed tracing
Microservices observability
Alerting with icinga
Distributed tracing
Alerting
Main tool for alerting is Icinga
Distributed tracing
Log aggregation/analytics
>
Log aggregation
Elastic stack
Distributed tracing
Source: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
Several applications logs into one big index
Classical simple view
Distributed tracing
Amount of specific payload types increases
Mapping explosion can cause out of memory errors and difficult situations to recover from
index.mapping.total_fields.limit
The maximum number of fields in an index. Field and object mappings, as well as
field aliases count towards this limit. The default value is 1000.
Many applications
Distributed tracing
• granular configuration for disk space and history per component
• dashboards are faster
• no problem with mapping explosion
• no problem with same name but different type fields
Separated logstash index per component
Distributed tracing
One curator action per component
• delete using indices size for a specific component
• delete using amount of indices for some
• delete using date
Example:
Housekeeping with Curator
Distributed tracing
ILM replaces most of the basic Curator functionality.
But! ILM does not support deletion of oldest index of a group of indices sorted by a pattern and
based on overall size.
See: https://github.com/elastic/elasticsearch/issues/44001
Index Lifecycle management (ILM)
Distributed tracing
Demo
Microservices observability
Microservices observability
Microservices observability
Distributed tracing
>
Distributed tracing
Distributed tracing takes a request-centric view.
"What happened to my request?"
It captures the detailed execution of important
activities performed by the components of a
distributed system as it processes a given
request.
Tracing infrastructure attaches contextual
metadata to each request and ensures that
metadata is passed around during the request
execution.
Distributed tracing
Vendor-neutral APIs and instrumentation for distributed tracing
opentracing.io
Distributed tracing
Source:https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736
jaegertracing.io
OpenTracing compatible data model and instrumentation libraries in
• Go, Java, Python, …
Multiple storage backends: Cassandra, Elasticsearch, memory.
Modern Web UI
Cloud Native Deployments
Not a full replacement for automatic profiler
Not a dynamic instrumentation
Distributed tracing
jaegertracing.io
Span
A span represents a logical unit of work in Jaeger that has an operation name, the start time of
the operation, and the duration.
Trace
A trace is a data/execution path
through the system.
Terminology
Distributed tracing
Source: https://www.jaegertracing.io/docs/1.13/architecture/
Trace Timeline
Distributed tracing
Trace Graph
Distributed tracing
jaegertracing architecture
Distributed tracing
Distributed tracing
Distributed tracing
Implementation details
3rd party libraries
• OpenTracing Cassandra Driver Instrumentation (https://github.com/opentracing-contrib/java-
cassandra-driver)
• OpenTracing Spring Web Instrumentation (https://github.com/opentracing-contrib/java-spring-web)
• OpenTracing Feign Instrumentation (https://github.com/OpenFeign/feign-opentracing)
• OpenTracing JAX-RS Instrumentation (https://github.com/opentracing-contrib/java-jaxrs)
Custom libraries
• Integration library for applications in tomcat (extint in DE and INT) and in weblogic (DE and INT)
https://pb-git.intra.loyaltypartner.com/projects/LIBRARIES/repos/opentracing-jee/browse
• Custom Spring Boot integration library with support for Kafka producers and consumers
(based on https://github.com/opentracing-contrib/java-kafka-client)
Distributed tracing
Code snippets
Tracer initialization
Distributed tracing
public static JaegerConfig fromConfiguration(final String service, final Configuration configuration) {
final boolean enabled = configuration.getBoolean("jaeger.enabled", false);
if (enabled) {
return JaegerConfig.enabled( //
service, configuration.getString("jaeger.endpoint"),
configuration.getInteger("jaeger.maxPacketSize", null), //
configuration.getInteger("jaeger.flushInterval", null), //
configuration.getInteger("jaeger.maxQueueSize", null), //
configuration.getInteger("jaeger.probabilityPercent", null) //
);
} else {
return JaegerConfig.disabled();
}
}
final Tracer tracer = JaegerBootstrapUtil.createTracer(jaegerConfig);
GlobalTracer.register(tracer);
public static Tracer createTracer(final JaegerConfig cfg) {
if (cfg.isEnabled()) {
final Sender sender = createSender(cfg);
final RemoteReporter reporter = createRemoteReporter(cfg, sender);
final Sampler sampler = createSampler(cfg);
return new JaegerTracer.Builder(cfg.getService()) //
.withReporter(reporter) //
.withSampler(sampler) //
.build();
} else {
LOGGER.info("Jaeger is disabled");
return NoopTracerFactory.create();
}
}
The sampling decision will be propagated with the requests.
Sampling
Distributed tracing
public static Sampler createSampler(final JaegerConfig cfg) {
if (cfg.getProbabilityPercent() == 100) {
LOGGER.info("Sending all spans to jaeger");
return new ConstSampler(true);
} else if (cfg.getProbabilityPercent() == 0) {
LOGGER.info("Sending no spans to jaeger");
return new ConstSampler(false);
} else {
LOGGER.info("Sending {}% of spans to jaeger", cfg.getProbabilityPercent());
return new ProbabilisticSampler(((double) cfg.getProbabilityPercent()) / 100);
}
}
Code examples
Distributed tracing
…
@Interceptors({CompositeOpenTracingInterceptor.class, MethodValidationInterceptor.class, LoggingInterceptor.class})
public class IcmLoyaltyOrderServiceBean {
…
/**
* Open-Tracing for EJBs that belong to the composite layer.
*/
public class CompositeOpenTracingInterceptor extends OpenTracingInterceptor {
@Override
protected void addSpanTags(final InvocationContext ctx, final RequestContext requestContext,
final SpanContext parent, final Span span) {
super.addSpanTags(ctx, requestContext, parent, span);
COMPONENT.set(span, "composite");
}
}
Interceptor example
Distributed tracing
public class OpenTracingInterceptor {
@AroundInvoke
public Object trace(final InvocationContext ctx) throws Exception {
final Tracer tracer = GlobalTracer.get();
final RequestContext requestContext = getRequestContext(ctx);
final SpanContext parent = null;
if (tracer.activeSpan()!= null) {
parent = activeSpan.context();
} else {
if (requestContext instanceof SpanContextTransporter) {
parent = ((SpanContextTransporter) requestContext).getSpanContext();
}
}
final Tracer.SpanBuilder spanBuilder = tracer.buildSpan(ctx.getMethod().getName());
if (parent != null) {
spanBuilder.asChildOf(parent);
}
try (final Scope scope = spanBuilder.startActive(true)) {
CLASS_NAME.set(span, determineClassName(ctx.getTarget()));
METHOD_NAME.set(span, ctx.getMethod().getName());
if (parent == null && requestContext != null) {
REQUEST_CONTEXT_ID.set(span, requestContext.getId());
}
try {
return dispatchTracedCall(ctx);
} catch (final Exception e) {
final Span span = scope.span();
Tags.ERROR.set(span, true);
span.log(e.getMessage());
throw e;
}
}
}
.....
Demo
Error information stored in Jeager
Analyzing errors with Jeager
Analyzing errors with Jeager
Distributed tracing
Jaeger UI view of two traces A and B being compared structurally in the graph form
Compare traces
Distributed tracing
Compare traces
Distributed tracing
OpenTracing APM java agent exists:
https://github.com/elastic/apm-agent-java
But!
Documentation for Elastic APM OpenTracing bridge:
Elastic APM
Distributed tracing
Stay curious
Keep exploring
Distributed tracing

More Related Content

PDF
Observability
PDF
Observability & Datadog
PDF
Getting Started Monitoring with Prometheus and Grafana
PDF
Observability, Distributed Tracing, and Open Source: The Missing Primer
PPTX
Microservices Part 3 Service Mesh and Kafka
PDF
Application Monitoring using Datadog
PPTX
Observability, what, why and how
PPTX
OpenTelemetry For Architects
Observability
Observability & Datadog
Getting Started Monitoring with Prometheus and Grafana
Observability, Distributed Tracing, and Open Source: The Missing Primer
Microservices Part 3 Service Mesh and Kafka
Application Monitoring using Datadog
Observability, what, why and how
OpenTelemetry For Architects

What's hot (20)

PPTX
Splunk Architecture
PPTX
Splunk for IT Operations
PDF
Api observability
PDF
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
PPTX
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
PPTX
Beginner's Guide to SIEM
PPTX
OpenTelemetry For Developers
PDF
Observability
PPTX
OpenTelemetry For Operators
PPTX
Prometheus design and philosophy
PPTX
Observability
PDF
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
PPTX
Observability in the world of microservices
PDF
Cloud-Native Observability
PDF
Elastic SIEM (Endpoint Security)
PDF
Kubernetes Networking - Sreenivas Makam - Google - CC18
PDF
Elastic Security: Unified protection for everyone
PPTX
Application Performance Monitoring (APM)
PDF
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
PPTX
Prometheus (Prometheus London, 2016)
Splunk Architecture
Splunk for IT Operations
Api observability
OSMC 2022 | OpenTelemetry 101 by Dotan Horovit s.pdf
How to Move from Monitoring to Observability, On-Premises and in a Multi-Clou...
Beginner's Guide to SIEM
OpenTelemetry For Developers
Observability
OpenTelemetry For Operators
Prometheus design and philosophy
Observability
promgen - prometheus managemnet tool / simpleclient_java hacks @ Prometheus c...
Observability in the world of microservices
Cloud-Native Observability
Elastic SIEM (Endpoint Security)
Kubernetes Networking - Sreenivas Makam - Google - CC18
Elastic Security: Unified protection for everyone
Application Performance Monitoring (APM)
More Than Monitoring: How Observability Takes You From Firefighting to Fire P...
Prometheus (Prometheus London, 2016)
Ad

Similar to Microservices observability (20)

PPTX
Logging, tracing and metrics: Instrumentation in .NET 5 and Azure
PDF
Serverless London 2019 FaaS composition using Kafka and CloudEvents
PDF
Opencensus with prometheus and kubernetes
PPT
App Grid Dev With Coherence
PPT
Application Grid Dev with Coherence
PPT
App Grid Dev With Coherence
PDF
Opentracing jaeger
PDF
Distributed Tracing with Jaeger
PPTX
Distributed Applications with Apache Zookeeper
PDF
Observability: Beyond the Three Pillars with Spring
PPTX
Apache Eagle in Action
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
PDF
Struts2 - 101
PPTX
GemFire In Memory Data Grid
PPTX
Virtual Science in the Cloud
PPTX
GemFire In-Memory Data Grid
PDF
OpenCensus with Prometheus and Kubernetes
PPT
An Engineer's Intro to Oracle Coherence
PDF
Prezo tooracleteam (2)
Logging, tracing and metrics: Instrumentation in .NET 5 and Azure
Serverless London 2019 FaaS composition using Kafka and CloudEvents
Opencensus with prometheus and kubernetes
App Grid Dev With Coherence
Application Grid Dev with Coherence
App Grid Dev With Coherence
Opentracing jaeger
Distributed Tracing with Jaeger
Distributed Applications with Apache Zookeeper
Observability: Beyond the Three Pillars with Spring
Apache Eagle in Action
YOW2018 Cloud Performance Root Cause Analysis at Netflix
Struts2 - 101
GemFire In Memory Data Grid
Virtual Science in the Cloud
GemFire In-Memory Data Grid
OpenCensus with Prometheus and Kubernetes
An Engineer's Intro to Oracle Coherence
Prezo tooracleteam (2)
Ad

Recently uploaded (20)

PDF
Upgrade and Innovation Strategies for SAP ERP Customers
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
medical staffing services at VALiNTRY
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PDF
Understanding Forklifts - TECH EHS Solution
PDF
top salesforce developer skills in 2025.pdf
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
L1 - Introduction to python Backend.pptx
Upgrade and Innovation Strategies for SAP ERP Customers
Softaken Excel to vCard Converter Software.pdf
How to Choose the Right IT Partner for Your Business in Malaysia
2025 Textile ERP Trends: SAP, Odoo & Oracle
medical staffing services at VALiNTRY
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Digital Systems & Binary Numbers (comprehensive )
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Computer Software and OS of computer science of grade 11.pptx
VVF-Customer-Presentation2025-Ver1.9.pptx
Understanding Forklifts - TECH EHS Solution
top salesforce developer skills in 2025.pdf
PTS Company Brochure 2025 (1).pdf.......
Odoo Companies in India – Driving Business Transformation.pdf
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
How to Migrate SBCGlobal Email to Yahoo Easily
Which alternative to Crystal Reports is best for small or large businesses.pdf
L1 - Introduction to python Backend.pptx

Microservices observability

  • 4. Netflix – microservices system Distributed tracing
  • 5. Nowadays all system are distributed Distributed tracing
  • 7. Distributed system – logical view Distributed tracing
  • 11. Monitoring Interactive Tools: • Graphite & Grafana • Elastic stack with Kibana UI • Icinga Dashboards • Oracle Enterprise Manager • Kafka Manager • … Distributed tracing
  • 15. Alerting Main tool for alerting is Icinga Distributed tracing
  • 17. Log aggregation Elastic stack Distributed tracing Source: https://www.elastic.co/guide/en/logstash/current/deploying-and-scaling.html
  • 18. Several applications logs into one big index Classical simple view Distributed tracing
  • 19. Amount of specific payload types increases Mapping explosion can cause out of memory errors and difficult situations to recover from index.mapping.total_fields.limit The maximum number of fields in an index. Field and object mappings, as well as field aliases count towards this limit. The default value is 1000. Many applications Distributed tracing
  • 20. • granular configuration for disk space and history per component • dashboards are faster • no problem with mapping explosion • no problem with same name but different type fields Separated logstash index per component Distributed tracing
  • 21. One curator action per component • delete using indices size for a specific component • delete using amount of indices for some • delete using date Example: Housekeeping with Curator Distributed tracing
  • 22. ILM replaces most of the basic Curator functionality. But! ILM does not support deletion of oldest index of a group of indices sorted by a pattern and based on overall size. See: https://github.com/elastic/elasticsearch/issues/44001 Index Lifecycle management (ILM) Distributed tracing
  • 23. Demo
  • 28. Distributed tracing Distributed tracing takes a request-centric view. "What happened to my request?" It captures the detailed execution of important activities performed by the components of a distributed system as it processes a given request. Tracing infrastructure attaches contextual metadata to each request and ensures that metadata is passed around during the request execution. Distributed tracing
  • 29. Vendor-neutral APIs and instrumentation for distributed tracing opentracing.io Distributed tracing Source:https://medium.com/opentracing/towards-turnkey-distributed-tracing-5f4297d1736
  • 30. jaegertracing.io OpenTracing compatible data model and instrumentation libraries in • Go, Java, Python, … Multiple storage backends: Cassandra, Elasticsearch, memory. Modern Web UI Cloud Native Deployments Not a full replacement for automatic profiler Not a dynamic instrumentation Distributed tracing
  • 31. jaegertracing.io Span A span represents a logical unit of work in Jaeger that has an operation name, the start time of the operation, and the duration. Trace A trace is a data/execution path through the system. Terminology Distributed tracing Source: https://www.jaegertracing.io/docs/1.13/architecture/
  • 36. Implementation details 3rd party libraries • OpenTracing Cassandra Driver Instrumentation (https://github.com/opentracing-contrib/java- cassandra-driver) • OpenTracing Spring Web Instrumentation (https://github.com/opentracing-contrib/java-spring-web) • OpenTracing Feign Instrumentation (https://github.com/OpenFeign/feign-opentracing) • OpenTracing JAX-RS Instrumentation (https://github.com/opentracing-contrib/java-jaxrs) Custom libraries • Integration library for applications in tomcat (extint in DE and INT) and in weblogic (DE and INT) https://pb-git.intra.loyaltypartner.com/projects/LIBRARIES/repos/opentracing-jee/browse • Custom Spring Boot integration library with support for Kafka producers and consumers (based on https://github.com/opentracing-contrib/java-kafka-client) Distributed tracing
  • 38. Tracer initialization Distributed tracing public static JaegerConfig fromConfiguration(final String service, final Configuration configuration) { final boolean enabled = configuration.getBoolean("jaeger.enabled", false); if (enabled) { return JaegerConfig.enabled( // service, configuration.getString("jaeger.endpoint"), configuration.getInteger("jaeger.maxPacketSize", null), // configuration.getInteger("jaeger.flushInterval", null), // configuration.getInteger("jaeger.maxQueueSize", null), // configuration.getInteger("jaeger.probabilityPercent", null) // ); } else { return JaegerConfig.disabled(); } } final Tracer tracer = JaegerBootstrapUtil.createTracer(jaegerConfig); GlobalTracer.register(tracer); public static Tracer createTracer(final JaegerConfig cfg) { if (cfg.isEnabled()) { final Sender sender = createSender(cfg); final RemoteReporter reporter = createRemoteReporter(cfg, sender); final Sampler sampler = createSampler(cfg); return new JaegerTracer.Builder(cfg.getService()) // .withReporter(reporter) // .withSampler(sampler) // .build(); } else { LOGGER.info("Jaeger is disabled"); return NoopTracerFactory.create(); } }
  • 39. The sampling decision will be propagated with the requests. Sampling Distributed tracing public static Sampler createSampler(final JaegerConfig cfg) { if (cfg.getProbabilityPercent() == 100) { LOGGER.info("Sending all spans to jaeger"); return new ConstSampler(true); } else if (cfg.getProbabilityPercent() == 0) { LOGGER.info("Sending no spans to jaeger"); return new ConstSampler(false); } else { LOGGER.info("Sending {}% of spans to jaeger", cfg.getProbabilityPercent()); return new ProbabilisticSampler(((double) cfg.getProbabilityPercent()) / 100); } }
  • 40. Code examples Distributed tracing … @Interceptors({CompositeOpenTracingInterceptor.class, MethodValidationInterceptor.class, LoggingInterceptor.class}) public class IcmLoyaltyOrderServiceBean { … /** * Open-Tracing for EJBs that belong to the composite layer. */ public class CompositeOpenTracingInterceptor extends OpenTracingInterceptor { @Override protected void addSpanTags(final InvocationContext ctx, final RequestContext requestContext, final SpanContext parent, final Span span) { super.addSpanTags(ctx, requestContext, parent, span); COMPONENT.set(span, "composite"); } }
  • 41. Interceptor example Distributed tracing public class OpenTracingInterceptor { @AroundInvoke public Object trace(final InvocationContext ctx) throws Exception { final Tracer tracer = GlobalTracer.get(); final RequestContext requestContext = getRequestContext(ctx); final SpanContext parent = null; if (tracer.activeSpan()!= null) { parent = activeSpan.context(); } else { if (requestContext instanceof SpanContextTransporter) { parent = ((SpanContextTransporter) requestContext).getSpanContext(); } } final Tracer.SpanBuilder spanBuilder = tracer.buildSpan(ctx.getMethod().getName()); if (parent != null) { spanBuilder.asChildOf(parent); } try (final Scope scope = spanBuilder.startActive(true)) { CLASS_NAME.set(span, determineClassName(ctx.getTarget())); METHOD_NAME.set(span, ctx.getMethod().getName()); if (parent == null && requestContext != null) { REQUEST_CONTEXT_ID.set(span, requestContext.getId()); } try { return dispatchTracedCall(ctx); } catch (final Exception e) { final Span span = scope.span(); Tags.ERROR.set(span, true); span.log(e.getMessage()); throw e; } } } .....
  • 42. Demo
  • 43. Error information stored in Jeager Analyzing errors with Jeager
  • 46. Jaeger UI view of two traces A and B being compared structurally in the graph form Compare traces Distributed tracing
  • 48. OpenTracing APM java agent exists: https://github.com/elastic/apm-agent-java But! Documentation for Elastic APM OpenTracing bridge: Elastic APM Distributed tracing