SlideShare a Scribd company logo
Chaos Engineering when
you’re not Netflix
Martez Reed
@greenreedtech
Indy DevOps Meetup
Introduction
Principal Training Solutions Engineer @ Puppet
Martez Reed
Chaos engineering when you’re not Netflix | Martez Reed
Linkedin: martezreed
Twitter: @greenreedtech
Github: martezr
What is chaos engineering?
Overview
“Chaos Engineering is the discipline of
experimenting on a system in order to build
confidence in the system’s capability to
withstand turbulent conditions in production.”
Chaos Engineering
https://principlesofchaos.org/
https://youtu.be/CZ3wIuvmHeM
Mastering Chaos - A Netflix guide to
microservices by Josh Evans
Chaos engineering when you’re not Netflix | Martez Reed
Another type of test
for validating assumptions
Why Netflix uses chaos engineering?
Details
• Online streaming video service
• 8,600+ employees
• 20 billion in revenue (2019)
Netflix
Chaos engineering when you’re not Netflix | Martez Reed
Netflix Logo
Why Netflix uses chaos engineering?
Details
• Invented Chaos Monkey in 2011
• Simian Army
• Latency Monkey
• Conformity Monkey
• Doctor Monkey
• Janitor Monkey
• Security Monkey
• Chaos Gorilla
Netflix Chaos Engineering
Chaos engineering when you’re not Netflix | Martez Reed
Netflix Chaos Monkey
Why Netflix uses chaos engineering?
Details
• 1000+ microservices
• 20+ open source projects
• Spinnaker
• Zuul
• Simian Army
Netflix Technology
Chaos engineering when you’re not Netflix | Martez Reed
Netflix Microservices Graph
Not Netflix?
Things we assume about our architecture
Common Assumptions
• If an ESXi host fails, the
workloads will migrate to
another host
• If the primary firewall fails, traffic
will cutover to the secondary
• If the instance’s CPU is at 95%,
another one will be added to the
pool
• If a service stops, we’ll receive a
notification
Architectural Assumptions
Chaos engineering when you’re not Netflix | Martez Reed
What does chaos engineering look like?
Experiments
• Randomly restart an ESXi host
• Randomly restart the primary firewall
• Increase CPU usage on an instance
• Randomly restart a service
Chaos Engineering
Chaos engineering when you’re not Netflix | Martez Reed
Common Assumptions
• If an ESXi host fails, the
workloads will migrate to
another host
• If the primary firewall fails, traffic
will cutover to the secondary
• If the instance’s CPU is at 95%,
another one will be added to the
pool
• If a service stops, we’ll receive a
notification
What is the purpose of chaos engineering
Challenge our assumptions of
what will happen
Chaos Engineering
Chaos engineering when you’re not Netflix | Martez Reed
Why would I want to use chaos engineering
Benefits
• Validate resilient configuration
• Validate system monitoring
• Understand how systems behave during
a failure.
• Application response to database
outage
• Application response to active
directory outage
• Log information for outages
• Refine your incident management
process
Chaos Engineering Benefits
Chaos engineering when you’re not Netflix | Martez Reed
The Science of Chaos Engineering
What does healthy look like
• The WordPress website is
accessible
Steady State
Chaos engineering when you’re not Netflix | Martez Reed
What we assume about the architecture
• If an instance in the auto scaling
group is unhealthy the
application will continue to
respond
• If there is an availability zone
outage the application will
continue to respond
Hypothesis
Chaos engineering when you’re not Netflix | Martez Reed
Validating our assumption
• Kill a random instance in the
WordPress instance autoscaling
group
• Evaluate if the WordPress site is
still accessible
Experiment
Chaos engineering when you’re not Netflix | Martez Reed
Undo what we did
• No defined rollback as the
autoscaling group should
provision a new instance
• If the experiment fails, then
manual intervention is required
Rollback
Chaos engineering when you’re not Netflix | Martez Reed
Experiment #1
How the application is designed
• Web server
• AWS autoscaling group
• max size: 2
• desired size: 1
• min size: 1
Architecture
Chaos engineering when you’re not Netflix | Martez Reed
Auto scaling group
Application Load Balancer
Instance Instance
What does healthy look like
• Validate that the website returns
a 200 HTTP status
Steady State
Chaos engineering when you’re not Netflix | Martez Reed
Auto scaling group
Application Load Balancer
Instance Instance
What we assume about the architecture
• If the CPU usage gets too high,
scale out the auto scaling group
Hypothesis
Chaos engineering when you’re not Netflix | Martez Reed
Auto scaling group
Application Load Balancer
Instance Instance
Validating our assumption
• Increase CPU usage on the
instance in the autoscaling group
to 95%
• Evaluate if the website is still
accessible
Experiment
Chaos engineering when you’re not Netflix | Martez Reed
Auto scaling group
Application Load Balancer
Instance Instance
95%
CPU Usage
Undo what we did
• Stop the experiment
action/attack to allow the CPU
usage to normalize
Rollback
Chaos engineering when you’re not Netflix | Martez Reed
Auto scaling group
Application Load Balancer
Instance Instance
30%
CPU Usage
Experiment #2
How the application is designed
• Service B reads and writes
information to an S3 bucket
• Service A requests information
from Service B
Architecture
Chaos engineering when you’re not Netflix | Martez Reed
Instance
Instance
S3 Bucket
Service A
Service B
What does healthy look like
• Service B is accessible using the
same request that Service A uses
(valid info or known error
message)
• Service A is accessible (HTTP 200,
500, etc.)
Steady State
Chaos engineering when you’re not Netflix | Martez Reed
Instance
Instance
S3 Bucket
Service A
Service B
What we assume about the architecture
• Service B locally queues any new
information, returns queued
information upon request and
returns an error message when
information is not available.
• Service A returns an error page
Hypothesis
Chaos engineering when you’re not Netflix | Martez Reed
Instance
Instance
S3 Bucket
Service A
Service B
Validating our assumption
• Prevent the instance in service B
from accessing the S3 bucket
• Evaluate if Service A returns an
error page and if Service B
returns an error message
Experiment
Chaos engineering when you’re not Netflix | Martez Reed
Instance
Instance
S3 Bucket
Service A
Service B
Undo what we did
• Restore access to the S3 bucket
to service B’s instance.
Rollback
Chaos engineering when you’re not Netflix | Martez Reed
Instance
Instance
S3 Bucket
Service A
Service B
Adopting Chaos Engineering
Think about what to challenge
Assumptions
• The application supports the
failure of a single component
• The application supports the
failure of a cloud region
• The application gracefully
handles an active directory
outage
• The application handles latency
to the backend service
Identify An Assumption
Chaos engineering when you’re not Netflix | Martez Reed
Break some things
• Find a tool and start
experimenting
• Start small
• Develop nice reporting output for
consumption by others
Create an Experiment
Chaos engineering when you’re not Netflix | Martez Reed
Integrating chaos into deployments
• Incorporate chaos into lower level
environment testing.
CI/CD Pipeline Integration
Chaos engineering when you’re not Netflix | Martez Reed
Provision Infrastructure Validate Infrastructure Test Infrastructure
Creating Chaos on a Schedule
Scheduling Chaos
• Focus on dev or test
environments to avoid breaking
production
• Ensure the steady state
evaluation is accurate to
accommodate for existing
outages or maintenance windows
• Agree upon a window in which
chaos can be performed
Scheduled Chaos
Chaos engineering when you’re not Netflix | Martez Reed
Chaos Engineering Tools
Configuration Management for
Hybrid IT environments
CTO Advisor Virtual Conference
CNCF Chaos Engineering Projects
Chaos Engineering CNCF
https://landscape.cncf.io/category=chaos-engineering&format=card-mode&grouping=category
Chaos engineering when you’re not Netflix | Martez Reed
CTO Advisor Virtual Conference
Overview
• Open source
• Randomly terminates AWS EC2
instances
Chaos Monkey
https://netflix.github.io/chaosmonkey/
Netflix Chaos Monkey
Chaos engineering when you’re not Netflix | Martez Reed
Configuration Management for
Hybrid IT environments
CTO Advisor Virtual Conference
Overview
• SaaS
• AWS, Azure, GCP, Kubernetes,
Remote Machine, etc.
• Online dashboard
Gremlin
Gremlin Chaos Engineering Platform
https://www.gremlin.com/
Chaos engineering when you’re not Netflix | Martez Reed
CTO Advisor Virtual Conference
Overview
• Open source
• AWS, Azure, GCP, Kubernetes,
Istio, Cloud Foundry, etc
• Python3 application
Chaos Engineering Framework
ChaosToolkit
https://chaostoolkit.org/
Chaos engineering when you’re not Netflix | Martez Reed
CTO Advisor Virtual Conference
Overview
• Open source
• Kubernetes
• Helm Chart deployment
Kubernetes focused chaos engineering
Litmus Chaos
https://litmuschaos.io/
Chaos engineering when you’re not Netflix | Martez Reed
CTO Advisor Virtual Conference
Overview
• Open source
• Kubernetes, Docker, VMware vSphere,
Remote machines, AWS
• OVA download (PoC) or Kubernetes
VMware chaos engineering
VMware Mangle
https://vmware.github.io/mangle/
Chaos engineering when you’re not Netflix | Martez Reed
Questions?
Principal Training Solutions Engineer @ Puppet
Martez Reed
Chaos engineering when you’re not Netflix | Martez Reed
Linkedin: martezreed
Twitter: @greenreedtech
Github: martezr
https://www.slideshare.net/MartezReed/
not-netflix-chaos-engineering
Slide Deck

More Related Content

PDF
Chaos Engineering – why we should all practice breaking things on purpose by ...
PPTX
Introduction to Chaos Engineering
PDF
Chaos Engineering: Why the World Needs More Resilient Systems
PDF
Principles of Chaos Engineering
PDF
Principles Of Chaos Engineering - Chaos Engineering Hamburg
PDF
Chaos Engineering, When should you release the monkeys?
PDF
Chaos engineering intro
PDF
Chaos Engineering
Chaos Engineering – why we should all practice breaking things on purpose by ...
Introduction to Chaos Engineering
Chaos Engineering: Why the World Needs More Resilient Systems
Principles of Chaos Engineering
Principles Of Chaos Engineering - Chaos Engineering Hamburg
Chaos Engineering, When should you release the monkeys?
Chaos engineering intro
Chaos Engineering

What's hot (20)

PDF
Chaos Engineering - Limiting Damage During Chaos Experiments
PDF
Chaos Engineering 101 by Russ Miles
PDF
Incident Management in the Age of DevOps and SRE
PDF
Chaos Engineering: Injecting Failure for Building Resilience in Systems
PPTX
Accelerating Innovation and Time-to-Market @ Camp Devops Houston 2015
PDF
An Introduction to Chaos Engineering
PDF
Reactive programming and Hystrix fault tolerance by Max Myslyvtsev
PPTX
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
PPT
Devops at Netflix (re:Invent)
PDF
Choose your own adventure Chaos Engineering - QCon NYC 2017
PPTX
Chaos engineering - The art of breaking stuff in production on purpose
PPTX
Resiliency through Failure @ OSCON 2013
PDF
The Last Mile Continued: Incident Management
PDF
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
PDF
Incident Management in the Age of DevOps and SRE
PDF
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
PDF
Attacking Pipelines--Security meets Continuous Delivery
PDF
Top 7 Mistakes in Performance Testing
PDF
Operations: The Last Mile
PPTX
Security as Code
Chaos Engineering - Limiting Damage During Chaos Experiments
Chaos Engineering 101 by Russ Miles
Incident Management in the Age of DevOps and SRE
Chaos Engineering: Injecting Failure for Building Resilience in Systems
Accelerating Innovation and Time-to-Market @ Camp Devops Houston 2015
An Introduction to Chaos Engineering
Reactive programming and Hystrix fault tolerance by Max Myslyvtsev
#ATAGTR2021 Presentation : "Chaos engineering: Break it to make it" by Anupa...
Devops at Netflix (re:Invent)
Choose your own adventure Chaos Engineering - QCon NYC 2017
Chaos engineering - The art of breaking stuff in production on purpose
Resiliency through Failure @ OSCON 2013
The Last Mile Continued: Incident Management
PagerDuty + Rundeck = Shorter Incidents, Fewer Escalations
Incident Management in the Age of DevOps and SRE
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Attacking Pipelines--Security meets Continuous Delivery
Top 7 Mistakes in Performance Testing
Operations: The Last Mile
Security as Code
Ad

Similar to Chaos Engineering when you're not Netflix (20)

PDF
Applying principles of chaos engineering to serverless (reinvent DVC305)
PDF
TechEvent 2019: Chaos Engineering - here we go; Lothar Wieske - Trivadis
PDF
Chaos Engineering Here We_Go
PDF
Chaos Engineering Talk at DevOps Days Austin
PDF
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
PDF
Chaos Engineering 시작하기 - 윤석찬 (AWS 테크에반젤리스트) :: 한국 카오스엔지니어링 밋업
PDF
Chaos Engineering - The Art of Breaking Things in Production
PDF
Chaos Engineering with Containers
PPTX
Chaos Engineering with Containers - QCon SF 2018
PDF
Production Microservices @ Jazoon
PPTX
ChaosEngineeringITEA.pptx
PDF
Becoming a catalyst for chaos
PDF
Introduction to Chaos Engineering | SRECon Asia - Ana Medina
PDF
Using chaos to bring resiliency to your applications
PDF
Chaos Engineering
PDF
chaos-engineering-Knolx
PPTX
Green Custard Friday Talk 19: Chaos Engineering
PDF
Chaos Engineering Site Reliability Through Controlled Disruption 1st Edition ...
PDF
Chaos Engineering 101: A Field Guide
PPTX
Chaos engineering
Applying principles of chaos engineering to serverless (reinvent DVC305)
TechEvent 2019: Chaos Engineering - here we go; Lothar Wieske - Trivadis
Chaos Engineering Here We_Go
Chaos Engineering Talk at DevOps Days Austin
Chaos Engineering with Kubernetes - Berlin / Hamburg Chaos Engineering Meetup...
Chaos Engineering 시작하기 - 윤석찬 (AWS 테크에반젤리스트) :: 한국 카오스엔지니어링 밋업
Chaos Engineering - The Art of Breaking Things in Production
Chaos Engineering with Containers
Chaos Engineering with Containers - QCon SF 2018
Production Microservices @ Jazoon
ChaosEngineeringITEA.pptx
Becoming a catalyst for chaos
Introduction to Chaos Engineering | SRECon Asia - Ana Medina
Using chaos to bring resiliency to your applications
Chaos Engineering
chaos-engineering-Knolx
Green Custard Friday Talk 19: Chaos Engineering
Chaos Engineering Site Reliability Through Controlled Disruption 1st Edition ...
Chaos Engineering 101: A Field Guide
Chaos engineering
Ad

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Cloud computing and distributed systems.
PDF
Machine learning based COVID-19 study performance prediction
PDF
Modernizing your data center with Dell and AMD
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
KodekX | Application Modernization Development
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced methodologies resolving dimensionality complications for autism neur...
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Cloud computing and distributed systems.
Machine learning based COVID-19 study performance prediction
Modernizing your data center with Dell and AMD
NewMind AI Monthly Chronicles - July 2025
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
The AUB Centre for AI in Media Proposal.docx
Review of recent advances in non-invasive hemoglobin estimation
KodekX | Application Modernization Development
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Chaos Engineering when you're not Netflix

  • 1. Chaos Engineering when you’re not Netflix Martez Reed @greenreedtech Indy DevOps Meetup
  • 2. Introduction Principal Training Solutions Engineer @ Puppet Martez Reed Chaos engineering when you’re not Netflix | Martez Reed Linkedin: martezreed Twitter: @greenreedtech Github: martezr
  • 3. What is chaos engineering? Overview “Chaos Engineering is the discipline of experimenting on a system in order to build confidence in the system’s capability to withstand turbulent conditions in production.” Chaos Engineering https://principlesofchaos.org/ https://youtu.be/CZ3wIuvmHeM Mastering Chaos - A Netflix guide to microservices by Josh Evans Chaos engineering when you’re not Netflix | Martez Reed Another type of test for validating assumptions
  • 4. Why Netflix uses chaos engineering? Details • Online streaming video service • 8,600+ employees • 20 billion in revenue (2019) Netflix Chaos engineering when you’re not Netflix | Martez Reed Netflix Logo
  • 5. Why Netflix uses chaos engineering? Details • Invented Chaos Monkey in 2011 • Simian Army • Latency Monkey • Conformity Monkey • Doctor Monkey • Janitor Monkey • Security Monkey • Chaos Gorilla Netflix Chaos Engineering Chaos engineering when you’re not Netflix | Martez Reed Netflix Chaos Monkey
  • 6. Why Netflix uses chaos engineering? Details • 1000+ microservices • 20+ open source projects • Spinnaker • Zuul • Simian Army Netflix Technology Chaos engineering when you’re not Netflix | Martez Reed Netflix Microservices Graph
  • 8. Things we assume about our architecture Common Assumptions • If an ESXi host fails, the workloads will migrate to another host • If the primary firewall fails, traffic will cutover to the secondary • If the instance’s CPU is at 95%, another one will be added to the pool • If a service stops, we’ll receive a notification Architectural Assumptions Chaos engineering when you’re not Netflix | Martez Reed
  • 9. What does chaos engineering look like? Experiments • Randomly restart an ESXi host • Randomly restart the primary firewall • Increase CPU usage on an instance • Randomly restart a service Chaos Engineering Chaos engineering when you’re not Netflix | Martez Reed Common Assumptions • If an ESXi host fails, the workloads will migrate to another host • If the primary firewall fails, traffic will cutover to the secondary • If the instance’s CPU is at 95%, another one will be added to the pool • If a service stops, we’ll receive a notification
  • 10. What is the purpose of chaos engineering Challenge our assumptions of what will happen Chaos Engineering Chaos engineering when you’re not Netflix | Martez Reed
  • 11. Why would I want to use chaos engineering Benefits • Validate resilient configuration • Validate system monitoring • Understand how systems behave during a failure. • Application response to database outage • Application response to active directory outage • Log information for outages • Refine your incident management process Chaos Engineering Benefits Chaos engineering when you’re not Netflix | Martez Reed
  • 12. The Science of Chaos Engineering
  • 13. What does healthy look like • The WordPress website is accessible Steady State Chaos engineering when you’re not Netflix | Martez Reed
  • 14. What we assume about the architecture • If an instance in the auto scaling group is unhealthy the application will continue to respond • If there is an availability zone outage the application will continue to respond Hypothesis Chaos engineering when you’re not Netflix | Martez Reed
  • 15. Validating our assumption • Kill a random instance in the WordPress instance autoscaling group • Evaluate if the WordPress site is still accessible Experiment Chaos engineering when you’re not Netflix | Martez Reed
  • 16. Undo what we did • No defined rollback as the autoscaling group should provision a new instance • If the experiment fails, then manual intervention is required Rollback Chaos engineering when you’re not Netflix | Martez Reed
  • 18. How the application is designed • Web server • AWS autoscaling group • max size: 2 • desired size: 1 • min size: 1 Architecture Chaos engineering when you’re not Netflix | Martez Reed Auto scaling group Application Load Balancer Instance Instance
  • 19. What does healthy look like • Validate that the website returns a 200 HTTP status Steady State Chaos engineering when you’re not Netflix | Martez Reed Auto scaling group Application Load Balancer Instance Instance
  • 20. What we assume about the architecture • If the CPU usage gets too high, scale out the auto scaling group Hypothesis Chaos engineering when you’re not Netflix | Martez Reed Auto scaling group Application Load Balancer Instance Instance
  • 21. Validating our assumption • Increase CPU usage on the instance in the autoscaling group to 95% • Evaluate if the website is still accessible Experiment Chaos engineering when you’re not Netflix | Martez Reed Auto scaling group Application Load Balancer Instance Instance 95% CPU Usage
  • 22. Undo what we did • Stop the experiment action/attack to allow the CPU usage to normalize Rollback Chaos engineering when you’re not Netflix | Martez Reed Auto scaling group Application Load Balancer Instance Instance 30% CPU Usage
  • 24. How the application is designed • Service B reads and writes information to an S3 bucket • Service A requests information from Service B Architecture Chaos engineering when you’re not Netflix | Martez Reed Instance Instance S3 Bucket Service A Service B
  • 25. What does healthy look like • Service B is accessible using the same request that Service A uses (valid info or known error message) • Service A is accessible (HTTP 200, 500, etc.) Steady State Chaos engineering when you’re not Netflix | Martez Reed Instance Instance S3 Bucket Service A Service B
  • 26. What we assume about the architecture • Service B locally queues any new information, returns queued information upon request and returns an error message when information is not available. • Service A returns an error page Hypothesis Chaos engineering when you’re not Netflix | Martez Reed Instance Instance S3 Bucket Service A Service B
  • 27. Validating our assumption • Prevent the instance in service B from accessing the S3 bucket • Evaluate if Service A returns an error page and if Service B returns an error message Experiment Chaos engineering when you’re not Netflix | Martez Reed Instance Instance S3 Bucket Service A Service B
  • 28. Undo what we did • Restore access to the S3 bucket to service B’s instance. Rollback Chaos engineering when you’re not Netflix | Martez Reed Instance Instance S3 Bucket Service A Service B
  • 30. Think about what to challenge Assumptions • The application supports the failure of a single component • The application supports the failure of a cloud region • The application gracefully handles an active directory outage • The application handles latency to the backend service Identify An Assumption Chaos engineering when you’re not Netflix | Martez Reed
  • 31. Break some things • Find a tool and start experimenting • Start small • Develop nice reporting output for consumption by others Create an Experiment Chaos engineering when you’re not Netflix | Martez Reed
  • 32. Integrating chaos into deployments • Incorporate chaos into lower level environment testing. CI/CD Pipeline Integration Chaos engineering when you’re not Netflix | Martez Reed Provision Infrastructure Validate Infrastructure Test Infrastructure
  • 33. Creating Chaos on a Schedule Scheduling Chaos • Focus on dev or test environments to avoid breaking production • Ensure the steady state evaluation is accurate to accommodate for existing outages or maintenance windows • Agree upon a window in which chaos can be performed Scheduled Chaos Chaos engineering when you’re not Netflix | Martez Reed
  • 35. Configuration Management for Hybrid IT environments CTO Advisor Virtual Conference CNCF Chaos Engineering Projects Chaos Engineering CNCF https://landscape.cncf.io/category=chaos-engineering&format=card-mode&grouping=category Chaos engineering when you’re not Netflix | Martez Reed
  • 36. CTO Advisor Virtual Conference Overview • Open source • Randomly terminates AWS EC2 instances Chaos Monkey https://netflix.github.io/chaosmonkey/ Netflix Chaos Monkey Chaos engineering when you’re not Netflix | Martez Reed
  • 37. Configuration Management for Hybrid IT environments CTO Advisor Virtual Conference Overview • SaaS • AWS, Azure, GCP, Kubernetes, Remote Machine, etc. • Online dashboard Gremlin Gremlin Chaos Engineering Platform https://www.gremlin.com/ Chaos engineering when you’re not Netflix | Martez Reed
  • 38. CTO Advisor Virtual Conference Overview • Open source • AWS, Azure, GCP, Kubernetes, Istio, Cloud Foundry, etc • Python3 application Chaos Engineering Framework ChaosToolkit https://chaostoolkit.org/ Chaos engineering when you’re not Netflix | Martez Reed
  • 39. CTO Advisor Virtual Conference Overview • Open source • Kubernetes • Helm Chart deployment Kubernetes focused chaos engineering Litmus Chaos https://litmuschaos.io/ Chaos engineering when you’re not Netflix | Martez Reed
  • 40. CTO Advisor Virtual Conference Overview • Open source • Kubernetes, Docker, VMware vSphere, Remote machines, AWS • OVA download (PoC) or Kubernetes VMware chaos engineering VMware Mangle https://vmware.github.io/mangle/ Chaos engineering when you’re not Netflix | Martez Reed
  • 41. Questions? Principal Training Solutions Engineer @ Puppet Martez Reed Chaos engineering when you’re not Netflix | Martez Reed Linkedin: martezreed Twitter: @greenreedtech Github: martezr https://www.slideshare.net/MartezReed/ not-netflix-chaos-engineering Slide Deck