SlideShare a Scribd company logo
Fault tolerant 
microservices 
BSkyB 
@chbatey
@chbatey 
Who is this guy? 
● Enthusiastic nerd 
● Senior software engineer at BSkyB 
● Builds a lot of distributed applications 
● Apache Cassandra MVP
@chbatey 
Agenda 
1. Setting the scene 
○ What do we mean by a fault? 
○ What is a microservice? 
○ Monolith application vs the micro(ish) service 
2. A worked example 
○ Identify an issue 
○ Reproduce/test it 
○ Show how to deal with the issue
So… what do applications look like? 
@chbatey
So... what do systems look like now? 
@chbatey
But different things go wrong... 
@chbatey 
down 
slow network 
slow app 
2 second max 
GC :( 
missing packets
Fault tolerance 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
Time for an example... 
● All examples are on github 
● Technologies used: 
@chbatey 
○ Dropwizard 
○ Spring Boot 
○ Wiremock 
○ Hystrix 
○ Graphite 
○ Saboteur
Example: Movie player service 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
Testing microservices 
You don’t know a service is 
fault tolerant if you don’t 
test faults 
@chbatey
Isolated service tests 
Shiny App 
@chbatey 
Mocks 
User 
Device 
Pin 
service 
Acceptance Play Movie 
Test 
Prime
1 - Don’t take forever 
@chbatey 
● If at first you don’t 
succeed, don’t take 
forever to tell someone 
● Timeout and fail fast
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
@chbatey
Your service hung for 30 seconds :( 
@chbatey 
Customer 
You :(
Which timeouts? 
● Socket connection timeout 
● Socket read timeout 
● Resource acquisition 
@chbatey
Your service hung for 10 minutes :( 
@chbatey
Let’s think about this 
@chbatey
A little more detail 
@chbatey
Wiremock + Saboteur + Vagrant 
● Vagrant - launches + provisions local VMs 
● Saboteur - uses tc, iptables to simulate 
@chbatey 
network issues 
● Wiremock - used to mock HTTP 
dependencies 
● Cucumber - acceptance tests
I can write an automated test for that? 
@chbatey 
Vagrant + Virtual box VM 
Wiremock 
User Service 
Device Service 
Pin Service 
Sabot 
eur 
Play 
Movie 
Service 
Acceptance 
Test 
prime to drop traffic 
reset
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor)
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix
Implementing reliable timeouts 
● Homemade: Worker Queue + Thread pool 
@chbatey 
(executor) 
● Hystrix 
● Spring Cloud Netflix
A simple Spring RestController 
@chbatey 
@RestController 
public class Resource { 
private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); 
@Autowired 
private ScaryDependency scaryDependency; 
@RequestMapping("/scary") 
public String callTheScaryDependency() { 
LOGGER.info("RestContoller: I wonder which thread I am on!"); 
return scaryDependency.getScaryString(); 
} 
}
Scary dependency 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; } 
} 
}
All on the tomcat thread 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestContoller: I wonder which thread 
I am on! 
13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. 
examples.ScaryDependency - Scary dependency: I wonder 
which thread I am on! 
@chbatey
Seriously this simple now? 
@chbatey 
@Component 
public class ScaryDependency { 
private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); 
@HystrixCommand 
public String getScaryString() { 
LOGGER.info("Scary dependency: I wonder which thread I am on!"); 
if (System.currentTimeMillis() % 2 == 0) { 
return "Scary String"; 
} else { 
Thread.sleep(10000); 
return "Really slow scary string"; 
} 
} 
}
What an annotation can do... 
13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. 
examples.Resource - RestController: I wonder which 
thread I am on! 
13:07:32.896 [hystrix-ScaryDependency-1] INFO info. 
batey.examples.ScaryDependency - Scary Dependency: I 
wonder which thread I am on! 
@chbatey
Timeouts take home 
● You can’t use network level timeouts for 
@chbatey 
SLAs 
● Test your SLAs - if someone says you can’t, 
hit them with a stick 
● Scary things happen without network issues
2 - Don’t try if you can’t succeed 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails 
@chbatey
Complexity 
● When an application grows in complexity it 
will eventually start sending emails contain 
queues and thread pools 
@chbatey
Don’t try if you can’t succeed 
● Executor Unbounded queues :( 
○ newFixedThreadPool 
○ newSingleThreadExecutor 
○ newThreadCachedThreadPool 
● Bound your queues and threads 
● Fail quickly when the queue / 
@chbatey 
maxPoolSize is met 
● Know your drivers
This is a functional requirement 
● Set the timeout very high 
● Use wiremock to add a large delay to the 
@chbatey 
requests 
● Set queue size and thread pool size to 1 
● Send in 2 requests to use the thread and fill 
the queue 
● What happens on the 3rd request?
3 - Fail gracefully 
@chbatey
Expect rubbish 
● Expect invalid HTTP 
● Expect malformed response bodies 
● Expect connection failures 
● Expect huge / tiny responses 
@chbatey
Testing with Wiremock 
@chbatey 
stubFor(get(urlEqualTo("/dependencyPath")) 
.willReturn(aResponse() 
.withFault(Fault.MALFORMED_RESPONSE_CHUNK))); 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "RANDOM_DATA_THEN_CLOSE" 
} 
} 
{ 
"request": { 
"method": "GET", 
"url": "/fault" 
}, 
"response": { 
"fault": "EMPTY_RESPONSE" 
} 
}
4 - Know if it’s your fault 
@chbatey
What to record 
● Metrics: Timings, errors, concurrent 
incoming requests, thread pool statistics, 
connection pool statistics 
● Logging: Boundary logging, elasticsearch / 
@chbatey 
logstash 
● Request identifiers
Graphite + Codahale 
@chbatey
@chbatey 
Response times
Separate resource pools 
● Don’t flood your dependencies 
● Be able to answer the questions: 
○ How many connections will 
you make to dependency X? 
○ Are you getting close to your 
@chbatey 
max connections?
So easy with Dropwizard + Hystrix 
@Override 
public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { 
HystrixCodaHaleMetricsPublisher metricsPublisher 
= new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) 
HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); 
@chbatey 
} 
metrics: 
reporters: 
- type: graphite 
host: 192.168.10.120 
port: 2003 
prefix: shiny_app
5 - Don’t whack a dead horse 
@chbatey 
Shiny App 
User 
Service 
Device 
Service 
Pin 
Service 
Shiny App 
Shiny App 
Shiny App 
User 
Se rUvisceer 
Service 
Device 
Service 
Play Movie
What to do.. 
● Yes this will happen.. 
● Mandatory dependency - fail *really* fast 
● Throttling 
● Fallbacks 
@chbatey
Circuit breaker pattern 
@chbatey
Implementation with Hystrix 
@chbatey 
@GET 
@Timed 
public String integrate() { 
LOGGER.info("I best do some integration!"); 
String user = new UserServiceDependency(userService).execute(); 
String device = new DeviceServiceDependency(deviceService).execute(); 
Boolean pinCheck = new PinCheckDependency(pinService).execute(); 
return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, 
pinCheck); 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
}
Implementation with Hystrix 
public class PinCheckDependency extends HystrixCommand<Boolean> { 
@chbatey 
@Override 
protected Boolean run() throws Exception { 
HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); 
HttpResponse pinCheckResponse = httpClient.execute(pinCheck); 
String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); 
return Boolean.valueOf(pinCheckInfo); 
} 
@Override 
public Boolean getFallback() { 
return true; 
} 
}
Triggering the fallback 
● Error threshold percentage 
● Bucket of time for the percentage 
● Minimum number of requests to trigger 
● Time before trying a request again 
● Disable 
● Per instance statistics 
@chbatey
6 - Turn off broken stuff 
● The kill switch 
@chbatey
To recap 
1. Don’t take forever - Timeouts 
2. Don’t try if you can’t succeed 
3. Fail gracefully 
4. Know if it’s your fault 
5. Don’t whack a dead horse 
6. Turn broken stuff off 
@chbatey
@chbatey 
Links 
● Examples: 
○ https://github.com/chbatey/spring-cloud-example 
○ https://github.com/chbatey/dropwizard-hystrix 
○ https://github.com/chbatey/vagrant-wiremock-saboteur 
● Tech: 
○ https://github.com/Netflix/Hystrix 
○ https://www.vagrantup.com/ 
○ http://wiremock.org/ 
○ https://github.com/tomakehurst/saboteur
Questions? 
● Thanks for listening! 
● http://christopher-batey.blogspot.co.uk/ 
@chbatey
Developer takeaways 
● Learn about TCP 
● Love vagrant, docker etc to enable testing 
● Don’t trust libraries 
@chbatey
Hystrix cost - do this yourself 
@chbatey
Hystrix metrics 
● Failure count 
● Percentiles from Hystrix 
@chbatey 
point of view 
● Error percentages
How to test metric publishing? 
● Stub out graphite and verify calls? 
● Programmatically call graphite and verify 
@chbatey 
numbers? 
● Make metrics + logs part of the story demo

More Related Content

PPTX
IPv4 Addressing Architecture
PDF
Free CCNA workbook by networkers home pdf
PDF
IPsec vpn topology over GRE tunnels
PPT
CCNA Exploration 2 - Chapter 5
PPS
Teste de Software
PPTX
ISO 9001:2015 Revision Overview: part 2
PPTX
Quality Management Systems in different industries - from ISO 9001 to cGxP
PDF
Tp rsa1
IPv4 Addressing Architecture
Free CCNA workbook by networkers home pdf
IPsec vpn topology over GRE tunnels
CCNA Exploration 2 - Chapter 5
Teste de Software
ISO 9001:2015 Revision Overview: part 2
Quality Management Systems in different industries - from ISO 9001 to cGxP
Tp rsa1

What's hot (16)

PPTX
CCNA v6.0 ITN - Chapter 06
PDF
Ten Best Practices for Successful Global Teams
PPTX
ISO 9001: 2015
PDF
ISO 9001:2015 : Welche Auswirkungen hat die neue ISO-Norm 9001:2015 auf die z...
PPTX
MikroTik MTCNA
PDF
POWER BI - Ribbon Chart, Waterfall, Scatter Chart, Bubble Chart, Dot Plot Chart
PDF
DUNS Number Infographic
PPTX
Configuring RIPv2
PDF
CCNAv5 - S4: Chapter 1 Hierarchical Network Design
PPTX
cisco ppt.pptx
PDF
CCPM using ms project 2010 and prochain implementing ppm
PDF
Iso 9001 2015 Understanding
PPTX
VLSM & SUPERNETTING
PPTX
Projeto de infraestrutura da empresa RME
PPTX
ENSA_Module_2 Packet Tracer - Single-Area OSPFv2 Configuration
PPTX
Quality objectives
CCNA v6.0 ITN - Chapter 06
Ten Best Practices for Successful Global Teams
ISO 9001: 2015
ISO 9001:2015 : Welche Auswirkungen hat die neue ISO-Norm 9001:2015 auf die z...
MikroTik MTCNA
POWER BI - Ribbon Chart, Waterfall, Scatter Chart, Bubble Chart, Dot Plot Chart
DUNS Number Infographic
Configuring RIPv2
CCNAv5 - S4: Chapter 1 Hierarchical Network Design
cisco ppt.pptx
CCPM using ms project 2010 and prochain implementing ppm
Iso 9001 2015 Understanding
VLSM & SUPERNETTING
Projeto de infraestrutura da empresa RME
ENSA_Module_2 Packet Tracer - Single-Area OSPFv2 Configuration
Quality objectives
Ad

Viewers also liked (7)

PPTX
Dropwizard Internals
PDF
Production Ready Web Services with Dropwizard
PDF
Simple REST-APIs with Dropwizard and Swagger
PDF
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
PDF
Dropwizard
PDF
Reactive Design Patterns
PPTX
Patterns for building resilient and scalable microservices platform on AWS
Dropwizard Internals
Production Ready Web Services with Dropwizard
Simple REST-APIs with Dropwizard and Swagger
Stick to the rules - Consumer Driven Contracts. 2015.07 Confitura
Dropwizard
Reactive Design Patterns
Patterns for building resilient and scalable microservices platform on AWS
Ad

Similar to Fault tolerant microservices - LJC Skills Matter 4thNov2014 (20)

PDF
Voxxed Vienna 2015 Fault tolerant microservices
PDF
LJC: Microservices in the real world
PDF
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
PDF
2012 07 making disqus realtime@euro python
PDF
13multithreaded Programming
PPTX
VISUG - Approaches for application request throttling
PPTX
Integrate Solr with real-time stream processing applications
PDF
Monitoring your Python with Prometheus (Python Ireland April 2015)
ODP
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
PDF
CDI: How do I ?
PDF
Tornado Web Server Internals
PPTX
Approaches to application request throttling
PPTX
Thread syncronization
ODP
Java Concurrency, Memory Model, and Trends
PPTX
Introduction to Ethereum
PPTX
Ad Server Optimization
PDF
Campus HTC at #TechEX15
ODP
Java Concurrency
PDF
Post quantum cryptography in vault (hashi talks 2020)
PDF
SwampDragon presentation: The Copenhagen Django Meetup Group
Voxxed Vienna 2015 Fault tolerant microservices
LJC: Microservices in the real world
Devoxx France: Fault tolerant microservices on the JVM with Cassandra
2012 07 making disqus realtime@euro python
13multithreaded Programming
VISUG - Approaches for application request throttling
Integrate Solr with real-time stream processing applications
Monitoring your Python with Prometheus (Python Ireland April 2015)
Proof-of-Stake & Its Improvements (San Francisco Bitcoin Devs Hackathon)
CDI: How do I ?
Tornado Web Server Internals
Approaches to application request throttling
Thread syncronization
Java Concurrency, Memory Model, and Trends
Introduction to Ethereum
Ad Server Optimization
Campus HTC at #TechEX15
Java Concurrency
Post quantum cryptography in vault (hashi talks 2020)
SwampDragon presentation: The Copenhagen Django Meetup Group

More from Christopher Batey (20)

PDF
Cassandra summit LWTs
PDF
Docker and jvm. A good idea?
PDF
NYC Cassandra Day - Java Intro
PDF
Cassandra Day NYC - Cassandra anti patterns
PDF
Think your software is fault-tolerant? Prove it!
PDF
Manchester Hadoop Meetup: Cassandra Spark internals
PDF
Cassandra London - 2.2 and 3.0
PDF
Cassandra London - C* Spark Connector
PDF
IoT London July 2015
PDF
1 Dundee - Cassandra 101
PDF
2 Dundee - Cassandra-3
PDF
3 Dundee-Spark Overview for C* developers
PDF
Paris Day Cassandra: Use case
PDF
Dublin Meetup: Cassandra anti patterns
PDF
Cassandra Day London: Building Java Applications
PDF
Data Science Lab Meetup: Cassandra and Spark
PDF
Manchester Hadoop Meetup: Spark Cassandra Integration
PDF
Manchester Hadoop User Group: Cassandra Intro
PDF
Webinar Cassandra Anti-Patterns
PDF
Munich March 2015 - Cassandra + Spark Overview
Cassandra summit LWTs
Docker and jvm. A good idea?
NYC Cassandra Day - Java Intro
Cassandra Day NYC - Cassandra anti patterns
Think your software is fault-tolerant? Prove it!
Manchester Hadoop Meetup: Cassandra Spark internals
Cassandra London - 2.2 and 3.0
Cassandra London - C* Spark Connector
IoT London July 2015
1 Dundee - Cassandra 101
2 Dundee - Cassandra-3
3 Dundee-Spark Overview for C* developers
Paris Day Cassandra: Use case
Dublin Meetup: Cassandra anti patterns
Cassandra Day London: Building Java Applications
Data Science Lab Meetup: Cassandra and Spark
Manchester Hadoop Meetup: Spark Cassandra Integration
Manchester Hadoop User Group: Cassandra Intro
Webinar Cassandra Anti-Patterns
Munich March 2015 - Cassandra + Spark Overview

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
KodekX | Application Modernization Development
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPT
Teaching material agriculture food technology
PPTX
Cloud computing and distributed systems.
PDF
Approach and Philosophy of On baking technology
PPTX
Spectroscopy.pptx food analysis technology
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Programs and apps: productivity, graphics, security and other tools
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Network Security Unit 5.pdf for BCA BBA.
Reach Out and Touch Someone: Haptics and Empathic Computing
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
KodekX | Application Modernization Development
sap open course for s4hana steps from ECC to s4
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Review of recent advances in non-invasive hemoglobin estimation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Teaching material agriculture food technology
Cloud computing and distributed systems.
Approach and Philosophy of On baking technology
Spectroscopy.pptx food analysis technology
MYSQL Presentation for SQL database connectivity
Programs and apps: productivity, graphics, security and other tools

Fault tolerant microservices - LJC Skills Matter 4thNov2014

  • 2. @chbatey Who is this guy? ● Enthusiastic nerd ● Senior software engineer at BSkyB ● Builds a lot of distributed applications ● Apache Cassandra MVP
  • 3. @chbatey Agenda 1. Setting the scene ○ What do we mean by a fault? ○ What is a microservice? ○ Monolith application vs the micro(ish) service 2. A worked example ○ Identify an issue ○ Reproduce/test it ○ Show how to deal with the issue
  • 4. So… what do applications look like? @chbatey
  • 5. So... what do systems look like now? @chbatey
  • 6. But different things go wrong... @chbatey down slow network slow app 2 second max GC :( missing packets
  • 7. Fault tolerance 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 8. Time for an example... ● All examples are on github ● Technologies used: @chbatey ○ Dropwizard ○ Spring Boot ○ Wiremock ○ Hystrix ○ Graphite ○ Saboteur
  • 9. Example: Movie player service @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 10. Testing microservices You don’t know a service is fault tolerant if you don’t test faults @chbatey
  • 11. Isolated service tests Shiny App @chbatey Mocks User Device Pin service Acceptance Play Movie Test Prime
  • 12. 1 - Don’t take forever @chbatey ● If at first you don’t succeed, don’t take forever to tell someone ● Timeout and fail fast
  • 13. Which timeouts? ● Socket connection timeout ● Socket read timeout @chbatey
  • 14. Your service hung for 30 seconds :( @chbatey Customer You :(
  • 15. Which timeouts? ● Socket connection timeout ● Socket read timeout ● Resource acquisition @chbatey
  • 16. Your service hung for 10 minutes :( @chbatey
  • 17. Let’s think about this @chbatey
  • 18. A little more detail @chbatey
  • 19. Wiremock + Saboteur + Vagrant ● Vagrant - launches + provisions local VMs ● Saboteur - uses tc, iptables to simulate @chbatey network issues ● Wiremock - used to mock HTTP dependencies ● Cucumber - acceptance tests
  • 20. I can write an automated test for that? @chbatey Vagrant + Virtual box VM Wiremock User Service Device Service Pin Service Sabot eur Play Movie Service Acceptance Test prime to drop traffic reset
  • 21. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor)
  • 22. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix
  • 23. Implementing reliable timeouts ● Homemade: Worker Queue + Thread pool @chbatey (executor) ● Hystrix ● Spring Cloud Netflix
  • 24. A simple Spring RestController @chbatey @RestController public class Resource { private static final Logger LOGGER = LoggerFactory.getLogger(Resource.class); @Autowired private ScaryDependency scaryDependency; @RequestMapping("/scary") public String callTheScaryDependency() { LOGGER.info("RestContoller: I wonder which thread I am on!"); return scaryDependency.getScaryString(); } }
  • 25. Scary dependency @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 26. All on the tomcat thread 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestContoller: I wonder which thread I am on! 13:07:32.896 [http-nio-8080-exec-1] INFO info.batey. examples.ScaryDependency - Scary dependency: I wonder which thread I am on! @chbatey
  • 27. Seriously this simple now? @chbatey @Component public class ScaryDependency { private static final Logger LOGGER = LoggerFactory.getLogger(ScaryDependency.class); @HystrixCommand public String getScaryString() { LOGGER.info("Scary dependency: I wonder which thread I am on!"); if (System.currentTimeMillis() % 2 == 0) { return "Scary String"; } else { Thread.sleep(10000); return "Really slow scary string"; } } }
  • 28. What an annotation can do... 13:07:32.814 [http-nio-8080-exec-1] INFO info.batey. examples.Resource - RestController: I wonder which thread I am on! 13:07:32.896 [hystrix-ScaryDependency-1] INFO info. batey.examples.ScaryDependency - Scary Dependency: I wonder which thread I am on! @chbatey
  • 29. Timeouts take home ● You can’t use network level timeouts for @chbatey SLAs ● Test your SLAs - if someone says you can’t, hit them with a stick ● Scary things happen without network issues
  • 30. 2 - Don’t try if you can’t succeed @chbatey
  • 31. Complexity ● When an application grows in complexity it will eventually start sending emails @chbatey
  • 32. Complexity ● When an application grows in complexity it will eventually start sending emails contain queues and thread pools @chbatey
  • 33. Don’t try if you can’t succeed ● Executor Unbounded queues :( ○ newFixedThreadPool ○ newSingleThreadExecutor ○ newThreadCachedThreadPool ● Bound your queues and threads ● Fail quickly when the queue / @chbatey maxPoolSize is met ● Know your drivers
  • 34. This is a functional requirement ● Set the timeout very high ● Use wiremock to add a large delay to the @chbatey requests ● Set queue size and thread pool size to 1 ● Send in 2 requests to use the thread and fill the queue ● What happens on the 3rd request?
  • 35. 3 - Fail gracefully @chbatey
  • 36. Expect rubbish ● Expect invalid HTTP ● Expect malformed response bodies ● Expect connection failures ● Expect huge / tiny responses @chbatey
  • 37. Testing with Wiremock @chbatey stubFor(get(urlEqualTo("/dependencyPath")) .willReturn(aResponse() .withFault(Fault.MALFORMED_RESPONSE_CHUNK))); { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "RANDOM_DATA_THEN_CLOSE" } } { "request": { "method": "GET", "url": "/fault" }, "response": { "fault": "EMPTY_RESPONSE" } }
  • 38. 4 - Know if it’s your fault @chbatey
  • 39. What to record ● Metrics: Timings, errors, concurrent incoming requests, thread pool statistics, connection pool statistics ● Logging: Boundary logging, elasticsearch / @chbatey logstash ● Request identifiers
  • 42. Separate resource pools ● Don’t flood your dependencies ● Be able to answer the questions: ○ How many connections will you make to dependency X? ○ Are you getting close to your @chbatey max connections?
  • 43. So easy with Dropwizard + Hystrix @Override public void initialize(Bootstrap<AppConfig> appConfigBootstrap) { HystrixCodaHaleMetricsPublisher metricsPublisher = new HystrixCodaHaleMetricsPublisher(appConfigBootstrap.getMetricRegistry()) HystrixPlugins.getInstance().registerMetricsPublisher(metricsPublisher); @chbatey } metrics: reporters: - type: graphite host: 192.168.10.120 port: 2003 prefix: shiny_app
  • 44. 5 - Don’t whack a dead horse @chbatey Shiny App User Service Device Service Pin Service Shiny App Shiny App Shiny App User Se rUvisceer Service Device Service Play Movie
  • 45. What to do.. ● Yes this will happen.. ● Mandatory dependency - fail *really* fast ● Throttling ● Fallbacks @chbatey
  • 47. Implementation with Hystrix @chbatey @GET @Timed public String integrate() { LOGGER.info("I best do some integration!"); String user = new UserServiceDependency(userService).execute(); String device = new DeviceServiceDependency(deviceService).execute(); Boolean pinCheck = new PinCheckDependency(pinService).execute(); return String.format("[User info: %s] n[Device info: %s] n[Pin check: %s] n", user, device, pinCheck); }
  • 48. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } }
  • 49. Implementation with Hystrix public class PinCheckDependency extends HystrixCommand<Boolean> { @chbatey @Override protected Boolean run() throws Exception { HttpGet pinCheck = new HttpGet("http://localhost:9090/pincheck"); HttpResponse pinCheckResponse = httpClient.execute(pinCheck); String pinCheckInfo = EntityUtils.toString(pinCheckResponse.getEntity()); return Boolean.valueOf(pinCheckInfo); } @Override public Boolean getFallback() { return true; } }
  • 50. Triggering the fallback ● Error threshold percentage ● Bucket of time for the percentage ● Minimum number of requests to trigger ● Time before trying a request again ● Disable ● Per instance statistics @chbatey
  • 51. 6 - Turn off broken stuff ● The kill switch @chbatey
  • 52. To recap 1. Don’t take forever - Timeouts 2. Don’t try if you can’t succeed 3. Fail gracefully 4. Know if it’s your fault 5. Don’t whack a dead horse 6. Turn broken stuff off @chbatey
  • 53. @chbatey Links ● Examples: ○ https://github.com/chbatey/spring-cloud-example ○ https://github.com/chbatey/dropwizard-hystrix ○ https://github.com/chbatey/vagrant-wiremock-saboteur ● Tech: ○ https://github.com/Netflix/Hystrix ○ https://www.vagrantup.com/ ○ http://wiremock.org/ ○ https://github.com/tomakehurst/saboteur
  • 54. Questions? ● Thanks for listening! ● http://christopher-batey.blogspot.co.uk/ @chbatey
  • 55. Developer takeaways ● Learn about TCP ● Love vagrant, docker etc to enable testing ● Don’t trust libraries @chbatey
  • 56. Hystrix cost - do this yourself @chbatey
  • 57. Hystrix metrics ● Failure count ● Percentiles from Hystrix @chbatey point of view ● Error percentages
  • 58. How to test metric publishing? ● Stub out graphite and verify calls? ● Programmatically call graphite and verify @chbatey numbers? ● Make metrics + logs part of the story demo