SlideShare a Scribd company logo
CHAOS ENGINEERING
– OR LET'S SHAKE THE
TREE
J I M M Y D A H L Q V I S T , K N O W I T
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
" Failures are given and
everything will eventually
fail over time "
Werner Vogels
CTO – Amazon.com
TRIBUTE
• Nora Jones
• Adrian Cockroft
• Adrian Hornsby
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
" By running experiments on a
regular basis that simulate a
Regional outage, we were able to
identify any systemic weaknesses
early and fix them. "
Netflix Blog
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix Chaos engineering book. Chaos toolset.
2018 Concept is spread, ChaosConf started.
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix – Chaos engineering book.
2018 Concept is spread, ChaosConf started.
History and background
2004 Amazon – Jesse Robbins. Master of disaster
2010 Netflix – Greg Orzell. Chaos Monkey
2012 Netflix – Open sources simian army
2016 Gremlin Inc is founded
2017 Netflix – Chaos engineering book.
2018 Concept is spread, ChaosConf started.
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent conditions
in production.
" Failures are given and
everything will eventually
fail over time "
Werner Vogels
CTO – Amazon.com
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent conditions
in production.
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
Unit testing
Component X
Input Output
Integration testing
Component A
Input OutputOutput / Input
Component B
Distributed System
Input
Output
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
Distributed System
Input
Output Corrupt?
Chaos Engineering is the discipline of
experimenting on a system
in order to build confidence in the system’s
capability to withstand turbulent
conditions in production.
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
" Chaos doesn't cause problems.
It reveals them. "
Nora Jones
Slack - Head of chaos engineering
and human factors
Before practicing chaos
Socialize
Start small
Use an opt in model, not an opt out.
Only include services that like to be chaosed.
Start with a success!
Don't start in production.
Steady state
Define the steady state. Build a hypothesis about the
steady state. What does our system look like when it's
behaving normally.
Monitoring
Understand your key business metrics and KPIs.
Netflix key business metric is SPS.
First experiment
Graceful restarts and degradations
Design your next experiments
" You have to know the past to
understand the present. "
Carl Sagan
Move to production
Don't forget about your customers!
Don't destroy the customer experience!
Make sure you can abort!
Only run during business hour.
Automate everything
Run often
Automatic safeguards
Percentage of traffic
Netflix Chaos Automation Platform (ChAP)
Change of mindset
What happens IF this fails to,
what happens WHEN this fails.
Lesson learnt
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
CHAOS ENGINEERING – OR LET'S SHAKE THE TREE
Takeaways
Everyone can be doing Chaos Engineering
Chaos Engineering is a learning opportunity
Be conscious about customers, involve business
" Chaos doesn't cause problems.
It reveals them. "
Nora Jones
Slack - Head of chaos engineering
and human factors
Tack!

More Related Content

PPTX
Chaos Engineering when you're not Netflix
PPTX
Breaking things on purpose (with Gremlin)
PDF
Chaos Engineering: Why the World Needs More Resilient Systems
PDF
Chaos Engineering - Limiting Damage During Chaos Experiments
PDF
A strong belief loosely held
PDF
Ops Happen: Improve Security Without Getting in the Way
PDF
Chaos Engineering - The Art of Breaking Things in Production
PDF
Chaos Engineering, When should you release the monkeys?
Chaos Engineering when you're not Netflix
Breaking things on purpose (with Gremlin)
Chaos Engineering: Why the World Needs More Resilient Systems
Chaos Engineering - Limiting Damage During Chaos Experiments
A strong belief loosely held
Ops Happen: Improve Security Without Getting in the Way
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering, When should you release the monkeys?

Similar to CHAOS ENGINEERING – OR LET'S SHAKE THE TREE (20)

PPTX
Chaos engineering
PPTX
Green Custard Friday Talk 19: Chaos Engineering
PDF
Chaos Engineering Talk at DevOps Days Austin
PPTX
ChaosEngineeringITEA.pptx
PDF
Chaos Engineering – why we should all practice breaking things on purpose by ...
PPTX
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
PDF
Crash wars - The handling awakens v3.0
PDF
Chaos is a ladder !
PDF
Continuous Automated Testing - Cast conference workshop august 2014
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PDF
Using security to drive chaos engineering - April 2018
PDF
Exactpro FinTech Webinar - Global Exchanges Test Oracles
PDF
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
PDF
Principles Of Chaos Engineering - Chaos Engineering Hamburg
PPTX
Resilience and chaos engineering
PDF
Choose your own adventure Chaos Engineering - QCon NYC 2017
PDF
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
PPTX
DevOps - Boldly Go for Distro
PDF
Practical Chaos Engineering
PDF
DevOps in the Real World
Chaos engineering
Green Custard Friday Talk 19: Chaos Engineering
Chaos Engineering Talk at DevOps Days Austin
ChaosEngineeringITEA.pptx
Chaos Engineering – why we should all practice breaking things on purpose by ...
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
Crash wars - The handling awakens v3.0
Chaos is a ladder !
Continuous Automated Testing - Cast conference workshop august 2014
Chaos Mesh Introducing Chaos in Kubernetes
Using security to drive chaos engineering - April 2018
Exactpro FinTech Webinar - Global Exchanges Test Oracles
What is ATT&CK coverage, anyway? Breadth and depth analysis with Atomic Red Team
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Resilience and chaos engineering
Choose your own adventure Chaos Engineering - QCon NYC 2017
Trust and Confidence through Chaos Keynote for W-JAX Munich 2018
DevOps - Boldly Go for Distro
Practical Chaos Engineering
DevOps in the Real World
Ad

More from Jimmy Dahlqvist (20)

PPTX
Event-driven and serverless in the world of IoT
PPTX
Building resilient serverless workloads: Navigating through failures
PPTX
Serverless website analytics with Lambda@Edge
PPTX
AWS ECS and AWS Fargate demystified: run serverless containers
PPTX
Cloud-grilled delights a high-tech approach to perfect BBQ
PPTX
Building-resilient-serverless-workloads-Navigating-through-failure
PPTX
Serverless website analytics with Lambda@Edge
PPTX
Encrypting data in S3 with Stepfunctions
PPTX
Building a serverless AI powered translation service
PPTX
Serverless cloud architecture patterns
PPTX
AI Powered event-driven translation bot
PPTX
Serverless and event-driven in a world of IoT
PPTX
Event-driven and serverless in the world of IoT
PPTX
IoT Enabled Smoker for Great BBQ
PPTX
Building a serverless event driven Slack Bot
PPTX
IoT Enabled smoker for Great BBQ
PPTX
IoT enable smoker for great BBQ
PPTX
Autoscaled Github Runners using StepFunctions
PPTX
EventBridge Patterns and real world use case
PPTX
re:Invent Recap Breakfast
Event-driven and serverless in the world of IoT
Building resilient serverless workloads: Navigating through failures
Serverless website analytics with Lambda@Edge
AWS ECS and AWS Fargate demystified: run serverless containers
Cloud-grilled delights a high-tech approach to perfect BBQ
Building-resilient-serverless-workloads-Navigating-through-failure
Serverless website analytics with Lambda@Edge
Encrypting data in S3 with Stepfunctions
Building a serverless AI powered translation service
Serverless cloud architecture patterns
AI Powered event-driven translation bot
Serverless and event-driven in a world of IoT
Event-driven and serverless in the world of IoT
IoT Enabled Smoker for Great BBQ
Building a serverless event driven Slack Bot
IoT Enabled smoker for Great BBQ
IoT enable smoker for great BBQ
Autoscaled Github Runners using StepFunctions
EventBridge Patterns and real world use case
re:Invent Recap Breakfast
Ad

Recently uploaded (20)

PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Cloud computing and distributed systems.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Chapter 3 Spatial Domain Image Processing.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Empathic Computing: Creating Shared Understanding
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Dropbox Q2 2025 Financial Results & Investor Presentation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Programs and apps: productivity, graphics, security and other tools
Cloud computing and distributed systems.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Understanding_Digital_Forensics_Presentation.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

CHAOS ENGINEERING – OR LET'S SHAKE THE TREE

  • 1. CHAOS ENGINEERING – OR LET'S SHAKE THE TREE J I M M Y D A H L Q V I S T , K N O W I T
  • 3. " Failures are given and everything will eventually fail over time " Werner Vogels CTO – Amazon.com
  • 4. TRIBUTE • Nora Jones • Adrian Cockroft • Adrian Hornsby
  • 5. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 7. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 9. " By running experiments on a regular basis that simulate a Regional outage, we were able to identify any systemic weaknesses early and fix them. " Netflix Blog
  • 10. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix Chaos engineering book. Chaos toolset. 2018 Concept is spread, ChaosConf started.
  • 12. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix – Chaos engineering book. 2018 Concept is spread, ChaosConf started.
  • 13. History and background 2004 Amazon – Jesse Robbins. Master of disaster 2010 Netflix – Greg Orzell. Chaos Monkey 2012 Netflix – Open sources simian army 2016 Gremlin Inc is founded 2017 Netflix – Chaos engineering book. 2018 Concept is spread, ChaosConf started.
  • 14. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 15. " Failures are given and everything will eventually fail over time " Werner Vogels CTO – Amazon.com
  • 16. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 19. Integration testing Component A Input OutputOutput / Input Component B
  • 23. Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.
  • 25. " Chaos doesn't cause problems. It reveals them. " Nora Jones Slack - Head of chaos engineering and human factors
  • 28. Start small Use an opt in model, not an opt out. Only include services that like to be chaosed. Start with a success! Don't start in production.
  • 29. Steady state Define the steady state. Build a hypothesis about the steady state. What does our system look like when it's behaving normally.
  • 30. Monitoring Understand your key business metrics and KPIs. Netflix key business metric is SPS.
  • 32. Design your next experiments " You have to know the past to understand the present. " Carl Sagan
  • 33. Move to production Don't forget about your customers! Don't destroy the customer experience! Make sure you can abort! Only run during business hour.
  • 34. Automate everything Run often Automatic safeguards Percentage of traffic Netflix Chaos Automation Platform (ChAP)
  • 35. Change of mindset What happens IF this fails to, what happens WHEN this fails.
  • 40. Takeaways Everyone can be doing Chaos Engineering Chaos Engineering is a learning opportunity Be conscious about customers, involve business
  • 41. " Chaos doesn't cause problems. It reveals them. " Nora Jones Slack - Head of chaos engineering and human factors
  • 42. Tack!

Editor's Notes

  • #34: And do run it during business hours, when everyone is at the office..... Monitor! Monitor! Monitor! And at first sight of problem. Abort the experiment. So make sure you can abort the experiment, make sure you have validated that you can abort. So if you hit the abort button, it does abort and not just keep running!