SlideShare a Scribd company logo
Microservices tracing with
Spring Cloud and Zipkin
Marcin Grzejszczak
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
About me
Developer at Pivotal
Part of Spring Cloud Team
Working with OSS:
● Accurest - Consumer Driven Contracts verifier for Java
● JSON Assert - fluent JSON assertions
● Spock Subjects Collaborators Extension
● Gradle Test Profiler
● Up To Date Gradle Plugin
TWITTER: @MGrzejszczak
BLOG: http://TOOMUCHCODING.COM
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Agenda
What is distributed tracing?
How to correlate logs with Spring Cloud Sleuth?
How to visualize latency with Spring Cloud Sleuth and Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
An ordinary system...
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
UI calls backend
UI -> BACKEND
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Everything is awesome
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Until it’s not
CLICK 500
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Time to debug
https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
It doesn’t look like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
More like this
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
On which server / instance
was the exception thrown?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
SSH and grep for ERROR to find it?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span
The basic unit of work (e.g. sending RPC)
● Spans are started and stopped
● They keep track of their timing information
● Once you create a span, you must stop it at some point in the future
● Has a parent and can have multiple children
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Trace
A set of spans forming a tree-like structure.
● For example, if you are running a book store then
○ Trace could be retriving a list of available books
○ Assuming that to retrive the books you have to send 3 requests to 3 services
then you could have at least 3 spans (1 for each hop) forming 1 trace
SERVICE 1
REQUEST
No Trace Id
No Span Id
RESPONSE
SERVICE 2
SERVICE 3
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
Trace Id = X
Span Id = A
REQUEST
RESPONSE
Trace Id = X
Span Id = B
Client Send
Trace Id = X
Span Id = B
Client Received
Trace Id = X
Span Id = B
Server Received
Trace Id = X
Span Id = C
Trace Id = X
Span Id = B
Server Sent
REQUEST
RESPONSE
Trace Id = X
Span Id = D
Client Send
Trace Id = X
Span Id = D
Client Received
Trace Id = X
Span Id = D
Server Received
Trace Id = X
Span Id = E
Trace Id = X
Span Id = D
Server Sent
Trace Id = X
Span Id = E
SERVICE 4
REQUEST
RESPONSE
Trace Id = X
Span Id = F
Client Send
Trace Id = X
Span Id = F
Client Received
Trace Id = X
Span Id = F
Server Received
Trace Id = X
Span Id = G
Trace Id = X
Span Id = F
Server Sent
Trace Id = X
Span Id = G
Trace Id = X
Span Id = C
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Span Id = A
Parent Id = null
Span Id = B
Parent Id = A
Span Id = C
Parent Id = B
Span Id = D
Parent Id = C
Span Id = E
Parent Id = D
Span Id = F
Parent Id = C
Span Id = G
Parent Id = F
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Is it that simple?
How do you pass tracing information (incl. Trace ID)
between:
● different libraries?
● thread pools?
● asynchronous communication?
● …?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
What if you forget about a thread pool?
SERVICE 1
REQUEST
NO TRACE
RESPONSE
SERVICE 2
SERVICE 3
A
A
A
REQUEST
RESPONSE
A
A
A B
A
REQUEST
RESPONSE
B
B
C C
C C
SERVICE 4
REQUEST
RESPONSE
B
B
D D
D D
B
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
We take care of passing tracing information between threads / libraries / contexts for
● Hystrix
● RxJava
● Rest Template
● Feign
● Messaging with Spring Integration
● Zuul
● ...
If you don’t do anything unexpected there’s nothing you need to do to make
Sleuth work. Check the docs for more info.
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Now let’s aggregate the logs!
Instead of SSHing to the machines aggregate the logs!
● With Cloud Foundry’s (CF) Loggergator the logs from different instances are
streamed into a single place
● You can harvest your logs with Logstash Forwarder / FileBeat
● You can use ELK stack to stream and visualize the logs
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-sleuth</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-sleuth"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
SERVICE 3
REQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
REQUEST
RESPONSE
“Hello from service3”
“Hello from service4”
“Hello from service2, response from
service3 [Hello from service3] and from
service4 [Hello from service4]”
SERVICE 1
/readtimeout
REQUEST
BOOM!
SERVICE 2
REQUEST
BOOM!
REQUEST
BOOM!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Log correlation with Spring Cloud Sleuth
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Great! We’ve found the exception!
But meanwhile....
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
The system is slow...
CLICK 200
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
One of the services is slow?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Which one?
How to measure that?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Client Send (CS) - The client has made a request
● Server Received (SR) - The server side got the request and will start processing
● Server Send (SS) - Annotated upon completion of request processing
● Client Received (CR) - The client has successfully received the response from
the server side
Let’s log events!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● The request started at T=0ms
● It took 450 ms for the client to receive a response
● Server side received the request at T=100 ms
● The request got processed on the server side in 200 ms
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Why is there a delay between sending and receiving messages?!!11!one!?!1!
Conclusions
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
https://blogs.oracle.com/jag/resource/Fallacies.html
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Distributed tracing - terminology
Span
Trace
Logs (annotations)
Tags (binary annotations)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Logs
Represents an event in time associated with a span
● Every span has zero or more logs
● Each log is a timestamped event name
● Event should be the stable name of some notable moment in the lifetime of a
span
● For instance, a span representing a browser page load might add an event for
each of the Performance.timing moments (check https://developer.mozilla.
org/en-US/docs/Web/API/PerformanceTiming)
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Client Send (CS)
○ The client has made a request - the span was started
● Server Received (SR)
○ The server side got the request and will start processing it
○ SR timestamp - CS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Main logs
● Server Send (SS)
○ Annotated upon completion of request processing
○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME
● Client Received (CR)
○ The client has successfully received the response from the server side
○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE
○ CR timestamp - SS timestamp = NETWORK LATENCY
CS 0 ms SR 100 ms
SS 300 msCR 450 ms
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Key-value pair
● Every span may have zero or more key/value Tags
● They do not have timestamps and simply annotate the spans.
● Example of default tags in Sleuth
○ message/payload-size
○ http.method
○ commandKey for Hystrix
Tag
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How to visualise latency in
a distributed system?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
● Zipkin is a distributed tracing system
● It runs as a separate process (you can run it as a Spring Boot
application)
● It helps gather timing data needed to troubleshoot latency problems in
microservice architectures
● The front end is a "waterfall" style graph of service calls showing call durations
as horizontal bars
The answer is: Zipkin
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
How does Zipkin work?
SPANS SENT TO
COLLECTORS
SPANS SENT TO
COLLECTORS
STORE
IN DB
APP
APP
UI QUERIES
FOR TRACE
INFO VIA API
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth and Zipkin integration
● We take care of passing tracing information between threads / libraries /
contexts
● Upon closing of a Span we will send it to Zipkin
○ either via HTTP (spring-cloud-sleuth-zipkin)
○ or via Spring Cloud Stream (spring-cloud-sleuth-stream)
● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring-
cloud-sleuth-zipkin-stream)
○ you can add the dependency to Zipkin UI!
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Maven
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-dependencies</artifactId>
<version>Brixton.SR1</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-zipkin</artifactId>
</dependency>
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Spring Cloud Sleuth Zipkin with Gradle
dependencies {
compile "org.springframework.cloud:spring-cloud-starter-zipkin"
}
dependencyManagement {
imports {
mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1"
}
}
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
HOLD IT!
● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Sampling to the rescue!
● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin
● You can change that by changing the property
spring.sleuth.sampler.percentage (for 100% pass 1.0)
● Or register a custom org.springframework.cloud.sleuth.Sampler
implementation
SERVICE 1
/start
REQUEST
RESPONSE
SERVICE 2
/foo
SERVICE 3
/barREQUEST
RESPONSE
REQUEST
RESPONSE
SERVICE 4
/baz
REQUEST
RESPONSE
DEVOXX
SERVICE
/devoxx
REQUEST
RESPONSE
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
DEMO
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
TOTAL DURATION
END
START
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
CLIENT
SENT
CLIENT
RECEIVED
SERVICE 2CLIENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
SERVER
SENT
SERVICE 4SERVER
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
LATENCY
SERVER
RECEIVED
CLIENT
SENT
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Traced call
SERVER
RECEIVED
CLIENT
SENT
DIFF IS
LATENCY
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Zipkin for Brewery
● A test app for Spring Cloud end to end tests
● Source code:
https://github.com/spring-cloud-samples/brewery
● Around 10 applications involved
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Summary
● Log correlation allows you to match logs for a given trace
● Distributed tracing allows you to quickly see latency issues in your system
● Zipkin is a great tool to visualize the latency graph and system dependencies
● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
Marcin Grzejszczak @mgrzejszczak, 24 June 2016
THANK YOU
● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone
and run getReadyForConference.sh - NOTE: you need Vagrant!)
● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository
● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation
● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release
● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server
● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud
Foundry - point of entry to the app
● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry
● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app

More Related Content

PDF
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV
PDF
Microservices Tracing with Spring Cloud and Zipkin
PDF
Spring Cloud’s Groovy
PDF
Moving from HTTP to HTTPS
PPTX
Have your cake and eat it too, further dispelling the myths of the lambda arc...
PDF
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
PPTX
minor final
PDF
Logonomics
Microservices Tracing With Spring Cloud and Zipkin @CybercomDEV
Microservices Tracing with Spring Cloud and Zipkin
Spring Cloud’s Groovy
Moving from HTTP to HTTPS
Have your cake and eat it too, further dispelling the myths of the lambda arc...
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
minor final
Logonomics

Viewers also liked (20)

PDF
Spring Cloud Contract And Your Microservice Architecture
PDF
Consumer Driven Contracts and Your Microservice Architecture @ Warsaw JUG
PDF
Consumer Driven Contracts and Your Microservice Architecture
PDF
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
PDF
Consumer Driven Contracts and Your Microservice Architecture
PPTX
Emerging themes from Cannes 2013
PPT
Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...
PPTX
Menú de ajustes en word press
PPTX
Jessie j analysis
PPT
День закрытых дверей
PPT
外匯交易簡介
DOCX
Ept1 unidad 2
PDF
LJ52 42
DOC
Europees Hof zet eerste stap in dispuut onverdoofd slachten
PDF
SEGUIMOS CELEBRANDO!
PPTX
Surf & Sun Surf Shop South Australia
PDF
Independent Chairman - Research Spotlight
PPTX
Idol Master Platinum Stars アイマス プラチナスターズ Game Review
PDF
Новый взгляд на визуализацию информации
PPT
Entrepreneurial Lessons 2012
Spring Cloud Contract And Your Microservice Architecture
Consumer Driven Contracts and Your Microservice Architecture @ Warsaw JUG
Consumer Driven Contracts and Your Microservice Architecture
Microservices Tracing With Spring Cloud and Zipkin @Szczecin JUG
Consumer Driven Contracts and Your Microservice Architecture
Emerging themes from Cannes 2013
Eight Signs Your Marketing Content Is Being Wasted, Ignored, Exploited...Or W...
Menú de ajustes en word press
Jessie j analysis
День закрытых дверей
外匯交易簡介
Ept1 unidad 2
LJ52 42
Europees Hof zet eerste stap in dispuut onverdoofd slachten
SEGUIMOS CELEBRANDO!
Surf & Sun Surf Shop South Australia
Independent Chairman - Research Spotlight
Idol Master Platinum Stars アイマス プラチナスターズ Game Review
Новый взгляд на визуализацию информации
Entrepreneurial Lessons 2012
Ad

Similar to Microservices Tracing with Spring Cloud and Zipkin (devoxx) (9)

PDF
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
PDF
Debugging data pipelines @OLA by Karan Kumar
PDF
richard-rodger-awssofia-microservices-2019.pdf
PDF
Data Day Texas 2017: Scaling Data Science at Stitch Fix
PDF
Store stream data on Data Lake
PDF
Testing and Developing gRPC APIs
PDF
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
PDF
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
PPTX
Auditing data and answering the life long question, is it the end of the day ...
Data Day Seattle 2017: Scaling Data Science at Stitch Fix
Debugging data pipelines @OLA by Karan Kumar
richard-rodger-awssofia-microservices-2019.pdf
Data Day Texas 2017: Scaling Data Science at Stitch Fix
Store stream data on Data Lake
Testing and Developing gRPC APIs
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Kafka Connect: Operational Lessons Learned from the Trenches (Elizabeth Benne...
Auditing data and answering the life long question, is it the end of the day ...
Ad

More from Marcin Grzejszczak (10)

PDF
Consumer Driven Contracts and Your Microservice Architecture
PDF
Continuous Deployment of your Application @jSession#5
PDF
Continuous Deployment of your Application @JUGtoberfest
PDF
Continuous Deployment To The Cloud @DevoxxPL 2017
PDF
Continuous Deployment To The Cloud
PDF
Consumer Driven Contracts To Enable API Evolution @Geecon
PDF
Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...
PDF
Microservices - enough with theory, let's do some code @Geecon Prague 2015
PDF
Do you think you're doing microservice architecture? What about infrastructur...
PDF
Introduction to Groovy runtime metaprogramming and AST transforms
Consumer Driven Contracts and Your Microservice Architecture
Continuous Deployment of your Application @jSession#5
Continuous Deployment of your Application @JUGtoberfest
Continuous Deployment To The Cloud @DevoxxPL 2017
Continuous Deployment To The Cloud
Consumer Driven Contracts To Enable API Evolution @Geecon
Continuous Deployment To The Cloud With Spring Cloud Pipelines @WarsawCloudNa...
Microservices - enough with theory, let's do some code @Geecon Prague 2015
Do you think you're doing microservice architecture? What about infrastructur...
Introduction to Groovy runtime metaprogramming and AST transforms

Recently uploaded (20)

PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Cloud computing and distributed systems.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
Teaching material agriculture food technology
PPTX
sap open course for s4hana steps from ECC to s4
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Cloud computing and distributed systems.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Network Security Unit 5.pdf for BCA BBA.
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Teaching material agriculture food technology
sap open course for s4hana steps from ECC to s4
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
MIND Revenue Release Quarter 2 2025 Press Release
Understanding_Digital_Forensics_Presentation.pptx
Machine learning based COVID-19 study performance prediction
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
NewMind AI Weekly Chronicles - August'25 Week I

Microservices Tracing with Spring Cloud and Zipkin (devoxx)

  • 1. Microservices tracing with Spring Cloud and Zipkin Marcin Grzejszczak Marcin Grzejszczak @mgrzejszczak, 24 June 2016
  • 2. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 About me Developer at Pivotal Part of Spring Cloud Team Working with OSS: ● Accurest - Consumer Driven Contracts verifier for Java ● JSON Assert - fluent JSON assertions ● Spock Subjects Collaborators Extension ● Gradle Test Profiler ● Up To Date Gradle Plugin TWITTER: @MGrzejszczak BLOG: http://TOOMUCHCODING.COM
  • 4. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Agenda What is distributed tracing? How to correlate logs with Spring Cloud Sleuth? How to visualize latency with Spring Cloud Sleuth and Zipkin?
  • 5. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 An ordinary system...
  • 6. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 UI calls backend UI -> BACKEND
  • 7. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Everything is awesome CLICK 200
  • 8. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Until it’s not CLICK 500
  • 10. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Time to debug https://tonysbologna.files.wordpress.com/2015/09/mario-and-luigi.jpg?w=468&h=578&crop=1
  • 11. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 It doesn’t look like this
  • 12. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 More like this
  • 13. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 On which server / instance was the exception thrown?
  • 14. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 SSH and grep for ERROR to find it?
  • 15. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 16. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 17. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span The basic unit of work (e.g. sending RPC) ● Spans are started and stopped ● They keep track of their timing information ● Once you create a span, you must stop it at some point in the future ● Has a parent and can have multiple children
  • 18. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Trace A set of spans forming a tree-like structure. ● For example, if you are running a book store then ○ Trace could be retriving a list of available books ○ Assuming that to retrive the books you have to send 3 requests to 3 services then you could have at least 3 spans (1 for each hop) forming 1 trace
  • 19. SERVICE 1 REQUEST No Trace Id No Span Id RESPONSE SERVICE 2 SERVICE 3 Trace Id = X Span Id = A Trace Id = X Span Id = A Trace Id = X Span Id = A REQUEST RESPONSE Trace Id = X Span Id = B Client Send Trace Id = X Span Id = B Client Received Trace Id = X Span Id = B Server Received Trace Id = X Span Id = C Trace Id = X Span Id = B Server Sent REQUEST RESPONSE Trace Id = X Span Id = D Client Send Trace Id = X Span Id = D Client Received Trace Id = X Span Id = D Server Received Trace Id = X Span Id = E Trace Id = X Span Id = D Server Sent Trace Id = X Span Id = E SERVICE 4 REQUEST RESPONSE Trace Id = X Span Id = F Client Send Trace Id = X Span Id = F Client Received Trace Id = X Span Id = F Server Received Trace Id = X Span Id = G Trace Id = X Span Id = F Server Sent Trace Id = X Span Id = G Trace Id = X Span Id = C
  • 20. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Span Id = A Parent Id = null Span Id = B Parent Id = A Span Id = C Parent Id = B Span Id = D Parent Id = C Span Id = E Parent Id = D Span Id = F Parent Id = C Span Id = G Parent Id = F
  • 21. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple?
  • 22. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Is it that simple? How do you pass tracing information (incl. Trace ID) between: ● different libraries? ● thread pools? ● asynchronous communication? ● …?
  • 23. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 What if you forget about a thread pool? SERVICE 1 REQUEST NO TRACE RESPONSE SERVICE 2 SERVICE 3 A A A REQUEST RESPONSE A A A B A REQUEST RESPONSE B B C C C C SERVICE 4 REQUEST RESPONSE B B D D D D B
  • 24. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth We take care of passing tracing information between threads / libraries / contexts for ● Hystrix ● RxJava ● Rest Template ● Feign ● Messaging with Spring Integration ● Zuul ● ... If you don’t do anything unexpected there’s nothing you need to do to make Sleuth work. Check the docs for more info.
  • 25. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Now let’s aggregate the logs! Instead of SSHing to the machines aggregate the logs! ● With Cloud Foundry’s (CF) Loggergator the logs from different instances are streamed into a single place ● You can harvest your logs with Logstash Forwarder / FileBeat ● You can use ELK stack to stream and visualize the logs
  • 26. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-sleuth</artifactId> </dependency>
  • 27. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-sleuth" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 28. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 SERVICE 3 REQUEST RESPONSE REQUEST RESPONSE SERVICE 4 REQUEST RESPONSE “Hello from service3” “Hello from service4” “Hello from service2, response from service3 [Hello from service3] and from service4 [Hello from service4]”
  • 30. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Log correlation with Spring Cloud Sleuth DEMO
  • 34. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Great! We’ve found the exception! But meanwhile....
  • 35. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 The system is slow... CLICK 200
  • 36. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 One of the services is slow?
  • 37. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Which one? How to measure that?
  • 38. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Client Send (CS) - The client has made a request ● Server Received (SR) - The server side got the request and will start processing ● Server Send (SS) - Annotated upon completion of request processing ● Client Received (CR) - The client has successfully received the response from the server side Let’s log events!
  • 39. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 40. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● The request started at T=0ms ● It took 450 ms for the client to receive a response ● Server side received the request at T=100 ms ● The request got processed on the server side in 200 ms Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 41. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Why is there a delay between sending and receiving messages?!!11!one!?!1! Conclusions CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 42. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 https://blogs.oracle.com/jag/resource/Fallacies.html
  • 43. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Distributed tracing - terminology Span Trace Logs (annotations) Tags (binary annotations)
  • 44. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Logs Represents an event in time associated with a span ● Every span has zero or more logs ● Each log is a timestamped event name ● Event should be the stable name of some notable moment in the lifetime of a span ● For instance, a span representing a browser page load might add an event for each of the Performance.timing moments (check https://developer.mozilla. org/en-US/docs/Web/API/PerformanceTiming)
  • 46. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Client Send (CS) ○ The client has made a request - the span was started ● Server Received (SR) ○ The server side got the request and will start processing it ○ SR timestamp - CS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms
  • 47. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Main logs ● Server Send (SS) ○ Annotated upon completion of request processing ○ SS timestamp - SR timestamp = SERVER SIDE PROCESSING TIME ● Client Received (CR) ○ The client has successfully received the response from the server side ○ CR timestamp - CS timestamp = TIME NEEDED TO RECEIVE RESPONSE ○ CR timestamp - SS timestamp = NETWORK LATENCY CS 0 ms SR 100 ms SS 300 msCR 450 ms
  • 48. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Key-value pair ● Every span may have zero or more key/value Tags ● They do not have timestamps and simply annotate the spans. ● Example of default tags in Sleuth ○ message/payload-size ○ http.method ○ commandKey for Hystrix Tag
  • 49. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How to visualise latency in a distributed system?
  • 50. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 ● Zipkin is a distributed tracing system ● It runs as a separate process (you can run it as a Spring Boot application) ● It helps gather timing data needed to troubleshoot latency problems in microservice architectures ● The front end is a "waterfall" style graph of service calls showing call durations as horizontal bars The answer is: Zipkin
  • 51. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 How does Zipkin work? SPANS SENT TO COLLECTORS SPANS SENT TO COLLECTORS STORE IN DB APP APP UI QUERIES FOR TRACE INFO VIA API
  • 52. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth and Zipkin integration ● We take care of passing tracing information between threads / libraries / contexts ● Upon closing of a Span we will send it to Zipkin ○ either via HTTP (spring-cloud-sleuth-zipkin) ○ or via Spring Cloud Stream (spring-cloud-sleuth-stream) ● You can run Zipkin Spring Cloud Stream Collector as a Spring Boot app (spring- cloud-sleuth-zipkin-stream) ○ you can add the dependency to Zipkin UI!
  • 53. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Maven <dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-dependencies</artifactId> <version>Brixton.SR1</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement> <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency>
  • 54. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Spring Cloud Sleuth Zipkin with Gradle dependencies { compile "org.springframework.cloud:spring-cloud-starter-zipkin" } dependencyManagement { imports { mavenBom "org.springframework.cloud:spring-cloud-dependencies:Brixton.SR1" } }
  • 55. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 HOLD IT! ● If I have billion services that emit gazillion spans - won’t I kill Zipkin?
  • 56. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Sampling to the rescue! ● By default Spring Cloud Sleuth sends only 10% of requests to Zipkin ● You can change that by changing the property spring.sleuth.sampler.percentage (for 100% pass 1.0) ● Or register a custom org.springframework.cloud.sleuth.Sampler implementation
  • 57. SERVICE 1 /start REQUEST RESPONSE SERVICE 2 /foo SERVICE 3 /barREQUEST RESPONSE REQUEST RESPONSE SERVICE 4 /baz REQUEST RESPONSE DEVOXX SERVICE /devoxx REQUEST RESPONSE
  • 59. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call
  • 60. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call TOTAL DURATION END START
  • 61. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call CLIENT SENT CLIENT RECEIVED SERVICE 2CLIENT
  • 62. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED SERVER SENT SERVICE 4SERVER
  • 63. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call LATENCY SERVER RECEIVED CLIENT SENT
  • 64. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Traced call SERVER RECEIVED CLIENT SENT DIFF IS LATENCY
  • 65. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Zipkin for Brewery ● A test app for Spring Cloud end to end tests ● Source code: https://github.com/spring-cloud-samples/brewery ● Around 10 applications involved
  • 68. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 Summary ● Log correlation allows you to match logs for a given trace ● Distributed tracing allows you to quickly see latency issues in your system ● Zipkin is a great tool to visualize the latency graph and system dependencies ● Spring Cloud Sleuth integrates with Zipkin and grants you log correlation
  • 70. Marcin Grzejszczak @mgrzejszczak, 24 June 2016 THANK YOU ● https://github.com/marcingrzejszczak/vagrant-elk-box/tree/presentation - code for this presentation (clone and run getReadyForConference.sh - NOTE: you need Vagrant!) ● https://github.com/spring-cloud/spring-cloud-sleuth - Spring Cloud Sleuth repository ● http://cloud.spring.io/spring-cloud-sleuth/spring-cloud-sleuth.html - Sleuth’s documentation ● http://toomuchcoding.com/blog/2016/03/25/spring-cloud-sleuth-rc1-deployed/ - article about RC1 release ● https://github.com/openzipkin/zipkin-java - Repo with Spring Boot Zipkin server ● http://docssleuth-service1.cfapps.io/start - The service1 app from this presentation deployed to Pivotal Cloud Foundry - point of entry to the app ● http://docssleuth-zipkin-server.cfapps.io/ - Zipkin deployed to Pivotal Cloud Foundry ● http://docsbrewing-zipkin-server.cfapps.io - Zipkin deployed to PCF for Brewery Sample app