SlideShare a Scribd company logo
Distributed Tracing
Latency analysis for microservices
Reshmi Krishna
@reshmi9k
About Me
 Software Engineer
 Senior Platform Architect, Pivotal
 Conference Speaker
MeetUp : Cloud-Native-New-York
@reshmi9k
Distributed tracing
Agenda
 Distributed Tracing
 Tracers and Tracing Systems
 Zipkin
 Demo – Spring Cloud Sleuth, Zipkin, PCF Metrics
Distributed tracing
Everything is going to be okay!
Until
Distributed tracing
Let’s Debug
It doesn’t look like this
QueryHandlerService
IndexerService
BackendService
PageRankingService
Web Frontend
More like this
Distributed tracing
Troubleshooting Latency issues
 When was the event? How long did it take?
 How do I know it was slow?
 Why did it take so long?
 Which microservice was responsible?
Distributed Tracing
 Distributed Tracing is a process of collecting end-to-end transaction graphs in near real
time
 A trace represents the entire journey of a request
 A span represents single operation call
 Distributed Tracing Systems are often used for this purpose. Zipkin is an example
 As a request is flowing from one microservice to another, tracers add logic to create
unique trace Id, span Id
Tracers
 Tracers add logic to create unique trace ID
 Trace ID is generated when the first request is made
 Span ID is generated as the request arrives at each microservice
 Example tracer is Spring Cloud Sleuth
 Tracers execute in your production apps! They are written to not log too much
 Tracers have instrumentation or sampling policy
Visualization - Traces & Spans
service1
Trace Id : 1, Span Id : 1
service4
Trace Id : 1, Parent Id : 2, Span Id : 4
service2
Trace Id : 1, Parent Id : 1, Span Id : 2
service3
Trace Id : 1, Parent Id : 2, Span Id : 3
Dapper Paper By Google
@reshmi9k
@reshmi9k
This paper described Dapper, which is Google’s production distributed systems
tracing infrastructure
Design Goals :
Low overhead
Application-level transparency
Scalability
Zipkin
Zipkin is a distributed tracing system
Implementation based on Dapper paper, Google
Aggregate spans into trace trees
Manages both collection and lookup of the data
In 2015, OpenZipkin became the primary fork
Initial Zipkin Architecture
Demo : Architecture Diagram
Spring Cloud
Sleuth
Collector
Span
Store
Transport
Mq/Http/Log
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Query
ServerZipkin UI
ZIPKIN
APP
APP
APP
APP
Let’s look at some code & Demo
Links
 Dapper, Google : http://research.google.com/pubs/pub36356.html
 Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git
 Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html
 Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java
 Zipkin deployed as an PCF https://github.com/spring-cloud-samples/sleuth-documentation-
apps/tree/master/zipkin-server
 Pivotal Web Services trial : https://run.pivotal.io/
 PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/
@reshmi9k

More Related Content

PPTX
Distributed Tracing in Practice
PPT
Distributed Tracing Velocity2016
PDF
Why Distributed Tracing is Essential for Performance and Reliability
PDF
Spring Cloud Data Flow Overview
PDF
TDD for APIs in a Microservice World (Short Version) by Michael Kuehne-Schlin...
PPTX
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
PPTX
Velocity 2019 making s3 more resilient using lambda@edge- velocity v1 (1)
PDF
Azure Academyadi: Introduction to GitHub and AzureDevOps
Distributed Tracing in Practice
Distributed Tracing Velocity2016
Why Distributed Tracing is Essential for Performance and Reliability
Spring Cloud Data Flow Overview
TDD for APIs in a Microservice World (Short Version) by Michael Kuehne-Schlin...
Applying AI to Performance Engineering: Shift-Left, Shift-Right, Self-Healing
Velocity 2019 making s3 more resilient using lambda@edge- velocity v1 (1)
Azure Academyadi: Introduction to GitHub and AzureDevOps

What's hot (20)

PDF
Connect Ops and Security with Flexible Web App and API Protection
PDF
What makes me to migrate entire VPC JAWS PANKRATION 2021
PPTX
Keptn - Automated Operations & Continuous Delivery for k8s
PDF
Kubernetes vs App Service
PDF
Observability, Distributed Tracing, and Open Source: The Missing Primer
PDF
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
PDF
Building Cloud-agnostic Serverless APIs
PDF
10 Steps to Cloud Happiness
PPTX
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
PPTX
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
PPTX
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
PPTX
Release Readiness Validation with Keptn for Austrian Online Banking Software
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PDF
Enforce compliance policy with model-driven automation
PDF
A DevOps State of Mind with Microservices, Containers and Kubernetes
PDF
Speed-Up Kafka Delivery with AsyncAPI & Microcks | Hugo Guerrero, Red Hat
PDF
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
PDF
Lessons learned making Confluent Cloud | Addison Huddy and Dan Rosanova, Conf...
PPTX
Netflix - 40 msec
PDF
GitHub for partners
Connect Ops and Security with Flexible Web App and API Protection
What makes me to migrate entire VPC JAWS PANKRATION 2021
Keptn - Automated Operations & Continuous Delivery for k8s
Kubernetes vs App Service
Observability, Distributed Tracing, and Open Source: The Missing Primer
IoT in the Cloud: Build and Unleash the Value in your Renewable Energy System
Building Cloud-agnostic Serverless APIs
10 Steps to Cloud Happiness
Metrics Driven DevOps - Automate Scalability and Performance Into your Pipeline
Real-time Analysis of Data Processing Pipelines with Spring Cloud Data Flow a...
Boston DevOps Days 2016: Implementing Metrics Driven DevOps - Why and How
Release Readiness Validation with Keptn for Austrian Online Banking Software
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
Enforce compliance policy with model-driven automation
A DevOps State of Mind with Microservices, Containers and Kubernetes
Speed-Up Kafka Delivery with AsyncAPI & Microcks | Hugo Guerrero, Red Hat
Building serverless applications with Apache OpenWhisk and IBM Cloud Functions
Lessons learned making Confluent Cloud | Addison Huddy and Dan Rosanova, Conf...
Netflix - 40 msec
GitHub for partners
Ad

Similar to Distributed tracing (20)

PPTX
Latency analysis for your microservices using Spring Cloud & Zipkin
PDF
Distributed tracing - get a grasp on your production
PDF
Implementing microservices tracing with spring cloud and zipkin (spring one)
PDF
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV
PDF
Microservices Tracing with Spring Cloud and Zipkin
PDF
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
PDF
Microservices Tracing with Spring Cloud and Zipkin (devoxx)
PPTX
DISTRIBUTED LOGGING.pptx
PDF
Distributed Tracing
PDF
Rafaëla Breed - Tracing performance of your service calls - Codemotion Amster...
PDF
Distributed Tracing
PDF
The Last Pickle: Distributed Tracing from Application to Database
PDF
Tracing Micro Services with OpenTracing
PPTX
Distributed tracing
PDF
Tracing Microservices with Zipkin
PDF
PinTrace Advanced AWS meetup
PDF
Melbourne meetup march 2018
PDF
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
PDF
Pintrace: Distributed tracing@Pinterest
Latency analysis for your microservices using Spring Cloud & Zipkin
Distributed tracing - get a grasp on your production
Implementing microservices tracing with spring cloud and zipkin (spring one)
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV
Microservices Tracing with Spring Cloud and Zipkin
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Microservices Tracing with Spring Cloud and Zipkin (devoxx)
DISTRIBUTED LOGGING.pptx
Distributed Tracing
Rafaëla Breed - Tracing performance of your service calls - Codemotion Amster...
Distributed Tracing
The Last Pickle: Distributed Tracing from Application to Database
Tracing Micro Services with OpenTracing
Distributed tracing
Tracing Microservices with Zipkin
PinTrace Advanced AWS meetup
Melbourne meetup march 2018
Distributed Tracing with OpenTracing, ZipKin and Kubernetes
Pintrace: Distributed tracing@Pinterest
Ad

Recently uploaded (20)

PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Spectroscopy.pptx food analysis technology
PPT
Teaching material agriculture food technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Approach and Philosophy of On baking technology
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Digital-Transformation-Roadmap-for-Companies.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
The AUB Centre for AI in Media Proposal.docx
Machine Learning_overview_presentation.pptx
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...
gpt5_lecture_notes_comprehensive_20250812015547.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Spectroscopy.pptx food analysis technology
Teaching material agriculture food technology
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)
Approach and Philosophy of On baking technology
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Distributed tracing

  • 1. Distributed Tracing Latency analysis for microservices Reshmi Krishna @reshmi9k
  • 2. About Me  Software Engineer  Senior Platform Architect, Pivotal  Conference Speaker MeetUp : Cloud-Native-New-York @reshmi9k
  • 4. Agenda  Distributed Tracing  Tracers and Tracing Systems  Zipkin  Demo – Spring Cloud Sleuth, Zipkin, PCF Metrics
  • 6. Everything is going to be okay!
  • 10. It doesn’t look like this QueryHandlerService IndexerService BackendService PageRankingService Web Frontend
  • 13. Troubleshooting Latency issues  When was the event? How long did it take?  How do I know it was slow?  Why did it take so long?  Which microservice was responsible?
  • 14. Distributed Tracing  Distributed Tracing is a process of collecting end-to-end transaction graphs in near real time  A trace represents the entire journey of a request  A span represents single operation call  Distributed Tracing Systems are often used for this purpose. Zipkin is an example  As a request is flowing from one microservice to another, tracers add logic to create unique trace Id, span Id
  • 15. Tracers  Tracers add logic to create unique trace ID  Trace ID is generated when the first request is made  Span ID is generated as the request arrives at each microservice  Example tracer is Spring Cloud Sleuth  Tracers execute in your production apps! They are written to not log too much  Tracers have instrumentation or sampling policy
  • 16. Visualization - Traces & Spans service1 Trace Id : 1, Span Id : 1 service4 Trace Id : 1, Parent Id : 2, Span Id : 4 service2 Trace Id : 1, Parent Id : 1, Span Id : 2 service3 Trace Id : 1, Parent Id : 2, Span Id : 3
  • 17. Dapper Paper By Google @reshmi9k @reshmi9k This paper described Dapper, which is Google’s production distributed systems tracing infrastructure Design Goals : Low overhead Application-level transparency Scalability
  • 18. Zipkin Zipkin is a distributed tracing system Implementation based on Dapper paper, Google Aggregate spans into trace trees Manages both collection and lookup of the data In 2015, OpenZipkin became the primary fork
  • 20. Demo : Architecture Diagram Spring Cloud Sleuth Collector Span Store Transport Mq/Http/Log Spring Cloud Sleuth Spring Cloud Sleuth Spring Cloud Sleuth Query ServerZipkin UI ZIPKIN APP APP APP APP
  • 21. Let’s look at some code & Demo
  • 22. Links  Dapper, Google : http://research.google.com/pubs/pub36356.html  Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git  Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html  Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-java  Zipkin deployed as an PCF https://github.com/spring-cloud-samples/sleuth-documentation- apps/tree/master/zipkin-server  Pivotal Web Services trial : https://run.pivotal.io/  PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/ @reshmi9k

Editor's Notes

  • #11: A monolith usually looks like a big ball of mud with entangled dependencies, lack of cohesion, direct DB queries instead of using interfaces and APIs. It does NOT do one thing very well. It usually does a lot of things, which become brittle and difficult to reason on. All functionality must be deployed together No Language and framework heterogeneity More likely a failure will cascade resulting in a reliance reduction - brittle - high risk deployment Scale vertically or limited horizontal scaling of everything at once Large team - anti agile Harder to reuse Harder to modify - thousands of lines of hard to understand code Harder to replace - meantime to recovery is limited Getting up to speed Wikipedia: A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
  • #12: Death Star architecture by Adrian Cockcroft As visualized by App Dynamics, Boundary.com and Twitter internal tools
  • #15: A trace represents the entire journey of a request A span is a basic unit of work Span id is identified by an unique 64-bit id Trace id is identified by a 64-bit id, which the span is part of A span contains timestamped records, any RPC timing data, and zero or more application-specific annotations The trace give u the structure through which you can identify your calls. You can you can think about trace as a tree and the tree nodes as spans. The edges indicate a casual relationship between a span and its parent span. Independent of its place in a larger trace tree, though, a span is also a simple log of timestamped records which encode the span’s start and end time, any RPC timing data, and zero or more application-specific annotations
  • #19: Dapper was published in 2010 http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf
  • #20: Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. Zipkin’s design is based on the Google Dapper paper. Started as a project in first hack week. Initial version of Dapper paper was implemented for Thrift Today it has grown to include support for tracing Http, Thrift, Memcache, SQL and Redis requests. The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
  • #21: Tracers collect timing data and transport it over HTTP or Kafka. We use Scribe to transport all the traces from the different services to Zipkin and Hadoop. Scribe was developed by Facebook and it’s made up of a daemon that can run on each server in your system. It listens for log messages and routes them to the correct receiver depending on the category. Once the trace data arrives at the Zipkin collector daemon we check that it’s valid, store it and the index it for lookups. Zipkin was originally built with Cassandra for storage. It was scalable, had a flexible schema, and is heavily used within Twitter. However, this component is now pluggable, and now we have support for Redis, HBase, MySQL, PostgreSQL, SQLite, and H2. Users query for traces via Zipkin’s Web UI or Api.
  • #22: Tracers add logic to create unique trace ID Trace ID is generated when the first request is made Span Id is generated as the request arrives at each microservice Example tracer is Spring Cloud Sleuth Tracers execute in your production apps! They are written to not log too much Tracers have instrumentation or sampling policy to manage volumes of traces and spans