SlideShare a Scribd company logo
Stress Test &
Chaos Engineering
Diego Pacheco
Stress Test & Chaos Engineering
Every engineer + manager think about
❏ FRPs(Functional Requirements) ~ Features
❏ Time / Productivity
❏ Business Logic that works
❏ But is it all...?
Scalability + Availability + Reliability
❏ As the business grows would the code continue working?
❏ Would the user experience be the same(getting slow)?
❏ Would be good for some users(p50) but few users really might
have a bad experience (p99.9 & p99.99).
❏ Does the user trust the system? Lack of think
About this 3 disciplines could really destroy
Your brand really fast.
Stress Test & Chaos Engineering
8 Fallacies and Actions...
Stress Test & Chaos Engineering
Stress Test & Chaos Engineering
The Rise and Fall of fallbacks
❏ Hystrix
❏ Spring Cloud -> Resilience4J
❏ Fallback Issues:
❏ Hard to Tests
❏ Fallbacks fail
❏ Lack of continuous testing
❏ Fallbacks can make outage even worst
❏ Amazon Philosophy -> focus in code more resilient.
Erlang | Akka | Amazon Philosophy
Stress Test & Chaos Engineering
How to do Proper Stress / Load Testing?
❏ Have Plan
❏ What Service to Test? Why?
❏ Select Endpoints to test (don't test them all)
❏ Have Expectations in sense of Latency | Requests to Handle
❏ Know where your service break. Figure it out why.
❏ Test using batteries: 1,5,10,50,100,1k,2k,5k,10k,50k,100k,1M,100M...
❏ You must have observability. Dedicated Env is a must as well.
❏ Understand your metrics(which ones per service)
❏ Automate Stress Tests in your build pineline
❏ Have platform: It could be a jenkins job + scripts.
Stress / Load Testing with Gatling
https://gist.github.com/diegopacheco/faf7ceb2496e4ebdaded
Stress / Load Testing with Gatling
Stress / Load Testing with Gatling
Stress / Load Testing with Gatling
docker run diegopacheco/time-microservice
Stress / Load Testing with Gatling
./gradlew gatlingRun-com.github.diegopacheco.gatling.microservices.st.StressTest
-DGATLING_URL="http://172.17.0.2:8080"
Stress / Load Testing with Gatling
Stress / Load Testing with Gatling
https://gatling.io/docs/current/cheat-sheet/
Chaos
Chaos
Chaos
❏ Test your Infrastructure
❏ All ASG in place?
❏ Does the failover to other: Instance, AZ, Region works?
❏ Test your clusters:
❏ SQL | NoSQL | NewSQL
❏ Test your microservices downstream dependencies
❏ Timeouts
❏ Retries | Exponential backoff + Jitter
❏ Chaos Inside a Box
❏ DISK, CPU, Memory, Metadata...
Chaos
Chaos
Chaos
Chaos
Chaos
Chaos
Chaos
https://github.com/asobti/kube-monkey
Chaos
https://docs.chaostoolkit.org/
Chaos
https://github.com/pingcap/chaos-mesh
Chaos
Exercises
Constraints
1. Stress Tests need to be written in Scala
2. You need to use Gatling
1. Using previous exercises or time-timecroservice image make the
application run in kubernetes.
2. Create a stress test with Gatling
3. Create chaos with kubernetes killing PODS and make sure app still
works and gatling tests don't fail.
Stress Test &
Chaos Engineering
Diego Pacheco

More Related Content

PPTX
Introduction to Storm
PPTX
Integrating microservices with apache camel on kubernetes
PDF
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
PDF
Docker vs VM | | Containerization or Virtualization - The Differences | DevOp...
PDF
Oracle Extended Clusters for Oracle RAC
PDF
Kafka 101 and Developer Best Practices
PDF
Best Practices for Middleware and Integration Architecture Modernization with...
PDF
The Art and Science of DDS Data Modelling
Introduction to Storm
Integrating microservices with apache camel on kubernetes
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Docker vs VM | | Containerization or Virtualization - The Differences | DevOp...
Oracle Extended Clusters for Oracle RAC
Kafka 101 and Developer Best Practices
Best Practices for Middleware and Integration Architecture Modernization with...
The Art and Science of DDS Data Modelling

What's hot (20)

PDF
Oracle RAC on Extended Distance Clusters - Customer Examples
PDF
Dive into PySpark
PDF
Redis cluster
PPTX
Introduction to Apache Camel
PDF
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
PDF
Web Assembly (on the server)
PDF
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
PDF
Delivering Docker & K3s worloads to IoT Edge devices
PDF
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
PPTX
Apache Spark Core
PDF
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
PDF
Dockerfile Tutorial with Example | Creating your First Dockerfile | Docker Tr...
PPTX
Spy hard, challenges of 100G deep packet inspection on x86 platform
PPTX
Kafka replication apachecon_2013
PDF
Cassandra serving netflix @ scale
PPTX
Network Function Virtualization : Infrastructure Overview
PDF
Introduction to Apache Spark
PDF
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
PPTX
VXLAN Practice Guide
PDF
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Oracle RAC on Extended Distance Clusters - Customer Examples
Dive into PySpark
Redis cluster
Introduction to Apache Camel
MSA 전략 2: 마이크로서비스, 어떻게 구현할 것인가?
Web Assembly (on the server)
Vectorized UDF: Scalable Analysis with Python and PySpark with Li Jin
Delivering Docker & K3s worloads to IoT Edge devices
Espresso: LinkedIn's Distributed Data Serving Platform (Paper)
Apache Spark Core
[오픈소스컨설팅] 쿠버네티스와 쿠버네티스 on 오픈스택 비교 및 구축 방법
Dockerfile Tutorial with Example | Creating your First Dockerfile | Docker Tr...
Spy hard, challenges of 100G deep packet inspection on x86 platform
Kafka replication apachecon_2013
Cassandra serving netflix @ scale
Network Function Virtualization : Infrastructure Overview
Introduction to Apache Spark
A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets with Jules ...
VXLAN Practice Guide
Top 5 Mistakes to Avoid When Writing Apache Spark Applications
Ad

Similar to Stress Test & Chaos Engineering (20)

PPTX
Antifragility and testing for distributed systems failure
PDF
Resilience testing! Why should you
PDF
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
PDF
Chaos Engineering Kubernetes
PPTX
Resilience and chaos engineering
PDF
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
PDF
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
PDF
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
PPTX
Resilience Testing
PPTX
Chaos engineering - The art of breaking stuff in production on purpose
PDF
Applying principles of chaos engineering to serverless (O'Reilly Software Arc...
PDF
From the Drawing Board to the Trenches: Building a Production-ready Application
PDF
Applying principles of chaos engineering to serverless (ServerlessCPH)
PPTX
Accelerating Innovation and Time-to-Market @ Camp Devops Houston 2015
PPTX
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
PPTX
Introducing Bangalore Meetup for Kubernetes Chaos Engineering
PPTX
CNCF App-Delivery SIG Presentation - Litmus Chaos Engineering
PPTX
Expect the unexpected: Anticipate and prepare for failures in microservices b...
PDF
DevOps - Chaos Engineering on Kubernetes
PDF
Embracing Disruption: Adding a Bit of Chaos to Help You Grow
Antifragility and testing for distributed systems failure
Resilience testing! Why should you
JDD 2016 - Jedrzej Dabrowa - Distributed System Fault Injection Testing With ...
Chaos Engineering Kubernetes
Resilience and chaos engineering
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Applying principles of chaos engineering to Serverless (CodeMotion Berlin)
Yan Cui - Applying principles of chaos engineering to Serverless - Codemotion...
Resilience Testing
Chaos engineering - The art of breaking stuff in production on purpose
Applying principles of chaos engineering to serverless (O'Reilly Software Arc...
From the Drawing Board to the Trenches: Building a Production-ready Application
Applying principles of chaos engineering to serverless (ServerlessCPH)
Accelerating Innovation and Time-to-Market @ Camp Devops Houston 2015
Net flix embracingfailure re-invent2014-141113085858-conversion-gate02
Introducing Bangalore Meetup for Kubernetes Chaos Engineering
CNCF App-Delivery SIG Presentation - Litmus Chaos Engineering
Expect the unexpected: Anticipate and prepare for failures in microservices b...
DevOps - Chaos Engineering on Kubernetes
Embracing Disruption: Adding a Bit of Chaos to Help You Grow
Ad

More from Diego Pacheco (20)

PDF
Naming Things Book : Simple Book Review!
PDF
Continuous Discovery Habits Book Review.pdf
PDF
Thoughts about Shape Up
PDF
Holacracy
PDF
AWS IAM
PDF
PDF
Encryption Deep Dive
PDF
Sec 101
PDF
Reflections on SCM
PDF
Management: Doing the non-obvious! III
PDF
Design is not Subjective
PDF
Architecture & Engineering : Doing the non-obvious!
PDF
Management doing the non-obvious II
PDF
Testing in production
PDF
Nine lies about work
PDF
Management: doing the nonobvious!
PDF
AI and the Future
PDF
Dealing with dependencies
PDF
Dealing with dependencies in tests
PDF
Kanban 2020
Naming Things Book : Simple Book Review!
Continuous Discovery Habits Book Review.pdf
Thoughts about Shape Up
Holacracy
AWS IAM
Encryption Deep Dive
Sec 101
Reflections on SCM
Management: Doing the non-obvious! III
Design is not Subjective
Architecture & Engineering : Doing the non-obvious!
Management doing the non-obvious II
Testing in production
Nine lies about work
Management: doing the nonobvious!
AI and the Future
Dealing with dependencies
Dealing with dependencies in tests
Kanban 2020

Recently uploaded (20)

PDF
Hindi spoken digit analysis for native and non-native speakers
PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
A Presentation on Touch Screen Technology
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PPTX
Chapter 5: Probability Theory and Statistics
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PPTX
1. Introduction to Computer Programming.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
Hindi spoken digit analysis for native and non-native speakers
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
Mushroom cultivation and it's methods.pdf
MIND Revenue Release Quarter 2 2025 Press Release
A Presentation on Touch Screen Technology
OMC Textile Division Presentation 2021.pptx
Accuracy of neural networks in brain wave diagnosis of schizophrenia
Programs and apps: productivity, graphics, security and other tools
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
Chapter 5: Probability Theory and Statistics
gpt5_lecture_notes_comprehensive_20250812015547.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
cloud_computing_Infrastucture_as_cloud_p
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
Zenith AI: Advanced Artificial Intelligence
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative study of natural language inference in Swahili using monolingua...
Heart disease approach using modified random forest and particle swarm optimi...
1. Introduction to Computer Programming.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf

Stress Test & Chaos Engineering

  • 1. Stress Test & Chaos Engineering Diego Pacheco
  • 3. Every engineer + manager think about ❏ FRPs(Functional Requirements) ~ Features ❏ Time / Productivity ❏ Business Logic that works ❏ But is it all...?
  • 4. Scalability + Availability + Reliability ❏ As the business grows would the code continue working? ❏ Would the user experience be the same(getting slow)? ❏ Would be good for some users(p50) but few users really might have a bad experience (p99.9 & p99.99). ❏ Does the user trust the system? Lack of think About this 3 disciplines could really destroy Your brand really fast.
  • 6. 8 Fallacies and Actions...
  • 9. The Rise and Fall of fallbacks ❏ Hystrix ❏ Spring Cloud -> Resilience4J ❏ Fallback Issues: ❏ Hard to Tests ❏ Fallbacks fail ❏ Lack of continuous testing ❏ Fallbacks can make outage even worst ❏ Amazon Philosophy -> focus in code more resilient.
  • 10. Erlang | Akka | Amazon Philosophy
  • 12. How to do Proper Stress / Load Testing? ❏ Have Plan ❏ What Service to Test? Why? ❏ Select Endpoints to test (don't test them all) ❏ Have Expectations in sense of Latency | Requests to Handle ❏ Know where your service break. Figure it out why. ❏ Test using batteries: 1,5,10,50,100,1k,2k,5k,10k,50k,100k,1M,100M... ❏ You must have observability. Dedicated Env is a must as well. ❏ Understand your metrics(which ones per service) ❏ Automate Stress Tests in your build pineline ❏ Have platform: It could be a jenkins job + scripts.
  • 13. Stress / Load Testing with Gatling https://gist.github.com/diegopacheco/faf7ceb2496e4ebdaded
  • 14. Stress / Load Testing with Gatling
  • 15. Stress / Load Testing with Gatling
  • 16. Stress / Load Testing with Gatling docker run diegopacheco/time-microservice
  • 17. Stress / Load Testing with Gatling ./gradlew gatlingRun-com.github.diegopacheco.gatling.microservices.st.StressTest -DGATLING_URL="http://172.17.0.2:8080"
  • 18. Stress / Load Testing with Gatling
  • 19. Stress / Load Testing with Gatling https://gatling.io/docs/current/cheat-sheet/
  • 20. Chaos
  • 21. Chaos
  • 22. Chaos ❏ Test your Infrastructure ❏ All ASG in place? ❏ Does the failover to other: Instance, AZ, Region works? ❏ Test your clusters: ❏ SQL | NoSQL | NewSQL ❏ Test your microservices downstream dependencies ❏ Timeouts ❏ Retries | Exponential backoff + Jitter ❏ Chaos Inside a Box ❏ DISK, CPU, Memory, Metadata...
  • 23. Chaos
  • 24. Chaos
  • 25. Chaos
  • 26. Chaos
  • 27. Chaos
  • 28. Chaos
  • 32. Chaos
  • 33. Exercises Constraints 1. Stress Tests need to be written in Scala 2. You need to use Gatling 1. Using previous exercises or time-timecroservice image make the application run in kubernetes. 2. Create a stress test with Gatling 3. Create chaos with kubernetes killing PODS and make sure app still works and gatling tests don't fail.
  • 34. Stress Test & Chaos Engineering Diego Pacheco