SlideShare a Scribd company logo
Distributed Tracing
How to do latency analysis for
microservice-based applications
Reshmi Krishna
@reshmi9k
About Me
 Software Engineer
 Platform Architect, Pivotal
 Women In Tech Community Members
Twitter : @reshmi9k
MeetUp : Cloud-Native-New-York
Agenda
 Distributed Tracing
 Tracers and Tracing Systems
 Zipkin
 Incorporating distributed tracing into an existing micro service
 Demo
From Monolith ….
Customer
Loyalty
Notifications
Payment
Web Frontend
To Microservices .
Troubleshooting Latency issues
 When was the event? How long did it take?
 How do I know it was slow?
 Why did it take so long?
 Which microservice was responsible?
Distributed Tracing
 Distributed Tracing is a process of collecting end-to-end transaction graphs in near real
time
 A trace represents the entire journey of a request
 A span represents single operation call
 Distributed Tracing Systems are often used for this purpose. Zipkin is an example
 As a request is flowing from one microservice to another, tracers add logic to create
unique trace Id, span Id
Visualization - Traces & Spans
UI
Trace Id : 1, Span Id : 1
Account-Microservice
Trace Id : 1, Parent Id : 2, Span Id : 5
Back-Office-Microservice
Trace Id : 1, Parent Id : 1, Span Id : 2
Customer-Microservice
Trace Id : 1, Parent Id : 2, Span Id : 4
Dapper Paper By Google
@reshmi9k
@reshmi9k
This paper described Dapper, which is Google’s production distributed systems
tracing infrastructure
Design Goals :
Low overhead
Application-level transparency
Scalability
Zipkin
Zipkin is a distributed tracing system
Implementation based on Dapper paper, Google
Aggregate spans into trace trees
Manages both collection and lookup of the data
In 2015, OpenZipkin became the primary fork
Initial Zipkin Architecture
Tracers
 Tracers add logic to create unique trace ID
 Trace ID is generated when the first request is made
 Span ID is generated as the request arrives at each microservice
 Example tracer is Spring Cloud Sleuth
 Tracers execute in your production apps! They are written to not log too much
 Tracers have instrumentation or sampling policy
Demo : Architecture Diagram
Spring Cloud
Sleuth
Collector
Span
Store
Transport
Mq/Http/Log
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Spring Cloud
Sleuth
Query
ServerZipkin UI
ZIPKIN
APP
APP
APP
APP
Let’s look at some code & Demo
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Distributed Tracing Velocity2016
Summary
 Distributed tracing allows you to quickly see latency issues in your system
 Zipkin is a great tool to visualize the latency graph and system dependencies
 Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
 Log correlation allows you to match logs for a given trace
 Pivotal Cloud Foundry makes integration of your apps and Spring Cloud Sleuth and Zipkin
easier
Links
 Dapper, Google : http://research.google.com/pubs/pub36356.html
 Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git
 Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html
 Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-reporter-java.git
 Zipkin deployed as an PCF :https://github.com/reshmik/Zipkin/tree/master/spring-cloud-sleuth-
samples/spring-cloud-sleuth-sample-zipkin-stream
 Pivotal Web Services trial : https://run.pivotal.io/
 PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/
@reshmi9k

More Related Content

PPT
Distributed tracing
PPTX
Distributed Tracing in Practice
PDF
Why Distributed Tracing is Essential for Performance and Reliability
PDF
Why Distributed Tracing is Essential for Performance and Reliability
PPTX
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
PDF
TDD for APIs in a Microservice World (Short Version) by Michael Kuehne-Schlin...
PDF
E bpf and profilers
PDF
Open source cloud native security with threat mapper
Distributed tracing
Distributed Tracing in Practice
Why Distributed Tracing is Essential for Performance and Reliability
Why Distributed Tracing is Essential for Performance and Reliability
Sviluppare applicazioni nell'era dei "Big Data" con Scala e Spark - Mario Car...
TDD for APIs in a Microservice World (Short Version) by Michael Kuehne-Schlin...
E bpf and profilers
Open source cloud native security with threat mapper

What's hot (20)

PDF
Devoxxma-API centric microservices Architecture
PDF
Designing and Debugging Mobile Apps with an Embedded, Scriptable Web Server
PDF
Distributed tracing with OpenTracing and Jaeger @ getstream.io
PDF
Tracing Micro Services with OpenTracing
PPTX
Velocity 2019 making s3 more resilient using lambda@edge- velocity v1 (1)
PDF
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
PDF
DevSecOps - Security in DevOps
PPTX
DevOps Underground - Microservices Monitoring
PDF
Architectures That Scale Deep - Regaining Control in Deep Systems
PPTX
ATAGTR2017 HikeRunner: Load Test Framework
PDF
Enterprise DevOps Series: Using VS Code & Zowe
PDF
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
PDF
[WSO2Con EU 2018] Tooling for Observability
PDF
Containers and Kubernetes without limits
PDF
Angular 2 kickstart
PDF
Building Cloud-agnostic Serverless APIs
PDF
Opentracing jaeger
PDF
Why You Should Be Doing Contract-First API Development
PPTX
Distributed tracing 101
PDF
Serverless security - how to protect what you don't see?
Devoxxma-API centric microservices Architecture
Designing and Debugging Mobile Apps with an Embedded, Scriptable Web Server
Distributed tracing with OpenTracing and Jaeger @ getstream.io
Tracing Micro Services with OpenTracing
Velocity 2019 making s3 more resilient using lambda@edge- velocity v1 (1)
Hardening Your CI/CD Pipelines with GitOps and Continuous Security
DevSecOps - Security in DevOps
DevOps Underground - Microservices Monitoring
Architectures That Scale Deep - Regaining Control in Deep Systems
ATAGTR2017 HikeRunner: Load Test Framework
Enterprise DevOps Series: Using VS Code & Zowe
Deploying Anything as a Service (XaaS) Using Operators on Kubernetes
[WSO2Con EU 2018] Tooling for Observability
Containers and Kubernetes without limits
Angular 2 kickstart
Building Cloud-agnostic Serverless APIs
Opentracing jaeger
Why You Should Be Doing Contract-First API Development
Distributed tracing 101
Serverless security - how to protect what you don't see?
Ad

Viewers also liked (20)

PDF
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
PDF
Lineにおけるspring frameworkの活用
PDF
Spring Day 2016 springの現在過去未来
PDF
Grailsでドメイン駆動設計を実践する時の勘所
PPTX
Cloud Foundry x Wagby
PDF
楽天トラベルとSpring(Spring Day 2016)
PPT
Springを使ったwebアプリにリファクタリングしよう
PPTX
Spring bootで学ぶ初めてのwebアプリ開発
PDF
Spring Day 2016 - Web API アクセス制御の最適解
PDF
Distributed tracing - get a grasp on your production
PPTX
Spring CloudとZipkinを利用した分散トレーシング
PDF
Data Microservices with Spring Cloud Stream, Task, and Data Flow #jsug #spri...
PPTX
Spring 5に備えるリアクティブプログラミング入門
PDF
Internetトラフィックエンジニアリングの現実
PDF
Business Process Modeling in Goldman Sachs @ JJUG CCC Fall 2017
PPTX
Javaアプリケーションの モダナイゼーションアプローチ
PPTX
高速なソートアルゴリズムを書こう!!
PDF
Another compilation method in java - AOT (Ahead of Time) compilation
PPTX
データ履歴管理のためのテンポラルデータモデルとReladomoの紹介 #jjug_ccc #ccc_g3
PDF
Java SE 9の紹介: モジュール・システムを中心に
Application Re-Architecture Technology ~ StrutsからSpring MVCへ ~
Lineにおけるspring frameworkの活用
Spring Day 2016 springの現在過去未来
Grailsでドメイン駆動設計を実践する時の勘所
Cloud Foundry x Wagby
楽天トラベルとSpring(Spring Day 2016)
Springを使ったwebアプリにリファクタリングしよう
Spring bootで学ぶ初めてのwebアプリ開発
Spring Day 2016 - Web API アクセス制御の最適解
Distributed tracing - get a grasp on your production
Spring CloudとZipkinを利用した分散トレーシング
Data Microservices with Spring Cloud Stream, Task, and Data Flow #jsug #spri...
Spring 5に備えるリアクティブプログラミング入門
Internetトラフィックエンジニアリングの現実
Business Process Modeling in Goldman Sachs @ JJUG CCC Fall 2017
Javaアプリケーションの モダナイゼーションアプローチ
高速なソートアルゴリズムを書こう!!
Another compilation method in java - AOT (Ahead of Time) compilation
データ履歴管理のためのテンポラルデータモデルとReladomoの紹介 #jjug_ccc #ccc_g3
Java SE 9の紹介: モジュール・システムを中心に
Ad

Similar to Distributed Tracing Velocity2016 (20)

PPTX
ThroughTheLookingGlass_EffectiveObservability.pptx
PPTX
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
PPTX
Wso2 con 2014 event driven architecture Publish/Subscribe Pubsub
PPTX
Microservices Corporate Style
PDF
Devoxx university - Kafka de haut en bas
PPTX
RTP Bluemix Meetup April 20th 2016
PDF
PinTrace Advanced AWS meetup
PDF
Demystifying Distributed Tracing for Microservices in .NET.pdf
PDF
Monitoring in 2017 - TIAD Camp Docker
PDF
PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience
PDF
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
PDF
Project Flogo: Serverless Integration, Powered by Flogo and Lambda
PDF
Getting Started with Splunk Enterprise
PPTX
Cytoscape CI Chapter 2
PPTX
Splunk for ITOA Breakout Session
PPTX
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
PDF
Spring and Pivotal Application Service - SpringOne Tour - Boston
PPTX
Evolution of Monitoring and Prometheus (Dublin 2018)
PPTX
Spring on PAS - Fabio Marinelli
PDF
batbern43 Events - Lessons learnt building an Enterprise Data Bus
ThroughTheLookingGlass_EffectiveObservability.pptx
New Splunk Management Solutions Update: Splunk MINT and Splunk App for Stream
Wso2 con 2014 event driven architecture Publish/Subscribe Pubsub
Microservices Corporate Style
Devoxx university - Kafka de haut en bas
RTP Bluemix Meetup April 20th 2016
PinTrace Advanced AWS meetup
Demystifying Distributed Tracing for Microservices in .NET.pdf
Monitoring in 2017 - TIAD Camp Docker
PSUG 5 - 2025-01-20 - Splunk Observability And Digital Resilience
Open Blueprint for Real-Time Analytics in Retail: Strata Hadoop World 2017 S...
Project Flogo: Serverless Integration, Powered by Flogo and Lambda
Getting Started with Splunk Enterprise
Cytoscape CI Chapter 2
Splunk for ITOA Breakout Session
SpringOne Tour Denver - Spring Boot & Spring Cloud on Pivotal Application Ser...
Spring and Pivotal Application Service - SpringOne Tour - Boston
Evolution of Monitoring and Prometheus (Dublin 2018)
Spring on PAS - Fabio Marinelli
batbern43 Events - Lessons learnt building an Enterprise Data Bus

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Big Data Technologies - Introduction.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
Programs and apps: productivity, graphics, security and other tools
Network Security Unit 5.pdf for BCA BBA.
MIND Revenue Release Quarter 2 2025 Press Release
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Machine learning based COVID-19 study performance prediction
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Big Data Technologies - Introduction.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation

Distributed Tracing Velocity2016

  • 1. Distributed Tracing How to do latency analysis for microservice-based applications Reshmi Krishna @reshmi9k
  • 2. About Me  Software Engineer  Platform Architect, Pivotal  Women In Tech Community Members Twitter : @reshmi9k MeetUp : Cloud-Native-New-York
  • 3. Agenda  Distributed Tracing  Tracers and Tracing Systems  Zipkin  Incorporating distributed tracing into an existing micro service  Demo
  • 6. Troubleshooting Latency issues  When was the event? How long did it take?  How do I know it was slow?  Why did it take so long?  Which microservice was responsible?
  • 7. Distributed Tracing  Distributed Tracing is a process of collecting end-to-end transaction graphs in near real time  A trace represents the entire journey of a request  A span represents single operation call  Distributed Tracing Systems are often used for this purpose. Zipkin is an example  As a request is flowing from one microservice to another, tracers add logic to create unique trace Id, span Id
  • 8. Visualization - Traces & Spans UI Trace Id : 1, Span Id : 1 Account-Microservice Trace Id : 1, Parent Id : 2, Span Id : 5 Back-Office-Microservice Trace Id : 1, Parent Id : 1, Span Id : 2 Customer-Microservice Trace Id : 1, Parent Id : 2, Span Id : 4
  • 9. Dapper Paper By Google @reshmi9k @reshmi9k This paper described Dapper, which is Google’s production distributed systems tracing infrastructure Design Goals : Low overhead Application-level transparency Scalability
  • 10. Zipkin Zipkin is a distributed tracing system Implementation based on Dapper paper, Google Aggregate spans into trace trees Manages both collection and lookup of the data In 2015, OpenZipkin became the primary fork
  • 12. Tracers  Tracers add logic to create unique trace ID  Trace ID is generated when the first request is made  Span ID is generated as the request arrives at each microservice  Example tracer is Spring Cloud Sleuth  Tracers execute in your production apps! They are written to not log too much  Tracers have instrumentation or sampling policy
  • 13. Demo : Architecture Diagram Spring Cloud Sleuth Collector Span Store Transport Mq/Http/Log Spring Cloud Sleuth Spring Cloud Sleuth Spring Cloud Sleuth Query ServerZipkin UI ZIPKIN APP APP APP APP
  • 14. Let’s look at some code & Demo
  • 25. Summary  Distributed tracing allows you to quickly see latency issues in your system  Zipkin is a great tool to visualize the latency graph and system dependencies  Spring Cloud Sleuth integrates with Zipkin and grants you log correlation  Log correlation allows you to match logs for a given trace  Pivotal Cloud Foundry makes integration of your apps and Spring Cloud Sleuth and Zipkin easier
  • 26. Links  Dapper, Google : http://research.google.com/pubs/pub36356.html  Code for this presentation : https://github.com/reshmik/DistributedTracingDemo_Velocity2016.git  Sleuth’s documentation: http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html  Repo with Spring Boot Zipkin server: https://github.com/openzipkin/zipkin-reporter-java.git  Zipkin deployed as an PCF :https://github.com/reshmik/Zipkin/tree/master/spring-cloud-sleuth- samples/spring-cloud-sleuth-sample-zipkin-stream  Pivotal Web Services trial : https://run.pivotal.io/  PivotalCloudFoundry on your laptop : https://docs.pivotal.io/pcf-dev/ @reshmi9k

Editor's Notes

  • #5: A monolith usually looks like a big ball of mud with entangled dependencies, lack of cohesion, direct DB queries instead of using interfaces and APIs. It does NOT do one thing very well. It usually does a lot of things, which become brittle and difficult to reason on. All functionality must be deployed together No Language and framework heterogeneity More likely a failure will cascade resulting in a reliance reduction - brittle - high risk deployment Scale vertically or limited horizontal scaling of everything at once Large team - anti agile Harder to reuse Harder to modify - thousands of lines of hard to understand code Harder to replace - meantime to recovery is limited Getting up to speed Wikipedia: A big ball of mud is a software system that lacks a perceivable architecture. Although undesirable from a software engineering point of view, such systems are common in practice due to business pressures, developer turnover and code entropy. They are a type of design anti-pattern.
  • #6: Death Star architecture by Adrian Cockcroft As visualized by App Dynamics, Boundary.com and Twitter internal tools
  • #8: A trace represents the entire journey of a request A span is a basic unit of work Span id is identified by an unique 64-bit id Trace id is identified by a 64-bit id, which the span is part of A span contains timestamped records, any RPC timing data, and zero or more application-specific annotations The trace give u the structure through which you can identify your calls. You can you can think about trace as a tree and the tree nodes as spans. The edges indicate a casual relationship between a span and its parent span. Independent of its place in a larger trace tree, though, a span is also a simple log of timestamped records which encode the span’s start and end time, any RPC timing data, and zero or more application-specific annotations
  • #10: Dapper was published in 2010 http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36356.pdf
  • #11: Zipkin is a distributed tracing system. It helps gather timing data needed to troubleshoot latency problems in microservice architectures. It manages both the collection and lookup of this data. Zipkin’s design is based on the Google Dapper paper. Started as a project in first hack week. Initial version of Dapper paper was implemented for Thrift Today it has grown to include support for tracing Http, Thrift, Memcache, SQL and Redis requests. The Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages.
  • #12: Tracers collect timing data and transport it over HTTP or Kafka. We use Scribe to transport all the traces from the different services to Zipkin and Hadoop. Scribe was developed by Facebook and it’s made up of a daemon that can run on each server in your system. It listens for log messages and routes them to the correct receiver depending on the category. Once the trace data arrives at the Zipkin collector daemon we check that it’s valid, store it and the index it for lookups. Zipkin was originally built with Cassandra for storage. It was scalable, had a flexible schema, and is heavily used within Twitter. However, this component is now pluggable, and now we have support for Redis, HBase, MySQL, PostgreSQL, SQLite, and H2. Users query for traces via Zipkin’s Web UI or Api.
  • #14: Tracers add logic to create unique trace ID Trace ID is generated when the first request is made Span Id is generated as the request arrives at each microservice Example tracer is Spring Cloud Sleuth Tracers execute in your production apps! They are written to not log too much Tracers have instrumentation or sampling policy to manage volumes of traces and spans