SlideShare a Scribd company logo
@mayralois
Mushroom Cloud Effect
Alois Mayr
Technology Lead – Dynatrace
@mayralois
Monitoring for Highly Dynamic
Container Environments
@mayralois
Source: http://www.schoonoart.de/
…there’s been the
mushroom cloud effect
oh yeah, everything
screwed up
@mayralois
The Mushroom Cloud Effect
or
What Happens When Containers Fail?
@mayralois
Biggest LatAm e-commerce Company
• ~ U$ 2.5 billion revenue
• 4 sites: Americanas, Shoptime,
Submarino, Soubarato
• ~ 150 hosts across 4 regions
• 5k-15k containers
• 1k-3k services
@mayralois
TL;DR
@mayralois
About Cloud-Scale Systems
@mayralois
Important Aspects…
• Lots of (micro-)services
• Lots of communication between services
• Service dependencies
• Versioning and API compatibilities
• Zero downtime
@mayralois
Platform-related Aspects
• Most often container-based
• Clustered for scalability
• Ephemeral containers
• Resilient architecture
• Cross AZ fail-overs
• SDN for communication
@mayralois
Deployments are no Longer Static
7:00 a.m.
Low load, service running
with minimum redundancy
12:00 p.m.
Scaled up service during peak load
with failover of problematic node
7:00 p.m.
Scaled back down to lower load,
move to different geolocation
@mayralois
Anatomy of dynamic environments
https://www.dynatrace.com/en/ruxit/
@mayralois
All About (Service) Dependencies
@mayralois
Failing containers…
…may or may not have an (immediate)
impact on service performance
@mayralois
Cascading Failures Lead to a
Mushroom Cloud Effect
@mayralois
@mayralois
The Hungry Container Breakdown
• Shared /logs partition on host
• No log rotation, no archiving for app logs
• No proper log management used for Docker environment
• Shared /logs partition ran out of space
What was the problem?
@mayralois
The Hungry Container Breakdown
• Container health checks failed
• Orchestration killed container and rescheduled new one
• Still no free space on /logs
• Termination and rescheduling
• /var/lib/docker ran out of space
• Cluster nodes were no longer able to run any containers
How the problem has evolved over time?
@mayralois
The Hungry Container Breakdown
• Services at the top of the graph
• Increased failure rates
• Lots of depending Tomcat and DB services affected
How the problem affected services?
@mayralois
@mayralois
The Hungry Container Breakdown
Log management tools for app logs
--log-driver=none|syslog
Remove container
--rm=true
/var/lib/docker deserves its own partition
How the problem could have been avoided?
@mayralois
The Hungry Container Breakdown
Buggy Containers May Kill Your Nodes
@mayralois
Try to Break Your Clusters Early
(And be Prepared for Black Friday)
@mayralois
Break Your Clusters Early
Massive load testing!
Survive three days of pain
Include everything
Services, Containers,
Orchestration, EC2 instances
@mayralois
Testing everything
13.3k containers (+nodes)
3,451 services
@mayralois
@mayralois
Automation Needed to Pinpoint the
Root Cause of Cascading Failures!
@mayralois
Want to learn more?
Stop by our booth!
G15
@mayralois
Thank you!

More Related Content

PPTX
Running microservice environments is no free lunch
PDF
Stored Procedure as a Service
PPTX
Dynatrace
PDF
Microservices deployment patterns
PDF
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
PDF
Akka and AngularJS – Reactive Applications in Practice
PPTX
Azure Microservices in Practice - Radu Vunvulea ITCamp Community Timisoara 07...
PPTX
Dead-Simple Deployment: Headache-Free Java Web Applications in the Cloud
Running microservice environments is no free lunch
Stored Procedure as a Service
Dynatrace
Microservices deployment patterns
GDG Taipei 2020 - Cloud and On-premises Applications Integration Using Event-...
Akka and AngularJS – Reactive Applications in Practice
Azure Microservices in Practice - Radu Vunvulea ITCamp Community Timisoara 07...
Dead-Simple Deployment: Headache-Free Java Web Applications in the Cloud

What's hot (15)

PPTX
PDF
An Introduction to Microservices
PDF
Building a reliable, scalable service with Clojure and Core.async
PPTX
.NET Security (Radu Vunvulea)
PPTX
Artificial Intelligence & Machine learning foundation topic in AWS
PDF
AWS Lambda
KEY
Keeping Rails on the Tracks
PDF
Kapil Thangavelu - Cloud Custodian
PDF
How LEGO.com Accelerates With Serverless
PDF
Ben Kehoe - Serverless Architecture for the Internet of Things
PDF
Serverless Architecture on AWS
PDF
Introduction to RightScale
PDF
Serverless computing
PPTX
Cloud Services Powered by IBM SoftLayer and NetflixOSS
PDF
Using Machine Learning on K8s Logs to Find Root Cause Faster
An Introduction to Microservices
Building a reliable, scalable service with Clojure and Core.async
.NET Security (Radu Vunvulea)
Artificial Intelligence & Machine learning foundation topic in AWS
AWS Lambda
Keeping Rails on the Tracks
Kapil Thangavelu - Cloud Custodian
How LEGO.com Accelerates With Serverless
Ben Kehoe - Serverless Architecture for the Internet of Things
Serverless Architecture on AWS
Introduction to RightScale
Serverless computing
Cloud Services Powered by IBM SoftLayer and NetflixOSS
Using Machine Learning on K8s Logs to Find Root Cause Faster
Ad

Viewers also liked (17)

PPTX
Cost of a Speeding Ticket in Wisconsin
DOC
Ej03 añadir, insertar y borrar texto
PDF
Tatjana Konakov Radic-uverenje ISO 14001.PDF
PPSX
Yoga Sutras Vibhuti
PPT
Open Innovation
PPTX
DOCX
Actividad tics (1)
PPS
Potential Book Covers
DOC
Propgaganda Dr. Shriniwas Kashalikar
PPTX
Shanti Mantra
PDF
a825902cdfef20898579d87c868479c4
PPTX
Sheep Happens
PDF
Performance monitoring and call tracing in microservice environments
PDF
Introduction to RADAR by NI
PPTX
Geomorphology fieldbook
PDF
Data Integrity webinar - Essentials & Solutions
 
Cost of a Speeding Ticket in Wisconsin
Ej03 añadir, insertar y borrar texto
Tatjana Konakov Radic-uverenje ISO 14001.PDF
Yoga Sutras Vibhuti
Open Innovation
Actividad tics (1)
Potential Book Covers
Propgaganda Dr. Shriniwas Kashalikar
Shanti Mantra
a825902cdfef20898579d87c868479c4
Sheep Happens
Performance monitoring and call tracing in microservice environments
Introduction to RADAR by NI
Geomorphology fieldbook
Data Integrity webinar - Essentials & Solutions
 
Ad

Similar to The Mushroom Cloud Effect - What happens when containers fail? (20)

PDF
When containers fail
PPTX
Lessons learned running large real-world Docker environments
PDF
Are Your Containers as Secure as You Think?
PDF
DCSF19 Containers for Beginners
PDF
Docker introduction
PDF
Containers and Nutanix - Acropolis Container Services
PDF
Digital Transformation with Kubernetes, Containers, and Microservices
PDF
Docker in Production at the Aurora Team
PDF
56k.cloud training
ODP
The journey to container adoption in enterprise
PDF
Microservices, Kubernetes, and Application Modernization Done Right
PDF
Best Practices for Developing & Deploying Java Applications with Docker
PDF
Docker in Production: How RightScale Delivers Cloud Applications
PPTX
Docker for Multi-Cloud Apps
PPTX
What’s the Deal with Containers, Anyway?
PPTX
Cloud Foundry Vancouver Meetup July 2016
PDF
Docker for developers
PPTX
The Enterprise IT Checklist for Docker Operations
PPTX
DockerCon 2016 - Structured Container Delivery
PPTX
Structured Container Delivery by Oscar Renalias, Accenture
When containers fail
Lessons learned running large real-world Docker environments
Are Your Containers as Secure as You Think?
DCSF19 Containers for Beginners
Docker introduction
Containers and Nutanix - Acropolis Container Services
Digital Transformation with Kubernetes, Containers, and Microservices
Docker in Production at the Aurora Team
56k.cloud training
The journey to container adoption in enterprise
Microservices, Kubernetes, and Application Modernization Done Right
Best Practices for Developing & Deploying Java Applications with Docker
Docker in Production: How RightScale Delivers Cloud Applications
Docker for Multi-Cloud Apps
What’s the Deal with Containers, Anyway?
Cloud Foundry Vancouver Meetup July 2016
Docker for developers
The Enterprise IT Checklist for Docker Operations
DockerCon 2016 - Structured Container Delivery
Structured Container Delivery by Oscar Renalias, Accenture

Recently uploaded (20)

PDF
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Salesforce Agentforce AI Implementation.pdf
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Why Generative AI is the Future of Content, Code & Creativity?
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PPTX
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
PDF
Complete Guide to Website Development in Malaysia for SMEs
PDF
iTop VPN Free 5.6.0.5262 Crack latest version 2025
PDF
Cost to Outsource Software Development in 2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
AutoCAD Professional Crack 2025 With License Key
PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Digital Systems & Binary Numbers (comprehensive )
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
CapCut Video Editor 6.8.1 Crack for PC Latest Download (Fully Activated) 2025
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Salesforce Agentforce AI Implementation.pdf
wealthsignaloriginal-com-DS-text-... (1).pdf
Why Generative AI is the Future of Content, Code & Creativity?
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Embracing Complexity in Serverless! GOTO Serverless Bengaluru
Complete Guide to Website Development in Malaysia for SMEs
iTop VPN Free 5.6.0.5262 Crack latest version 2025
Cost to Outsource Software Development in 2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Navsoft: AI-Powered Business Solutions & Custom Software Development
AutoCAD Professional Crack 2025 With License Key
Oracle Fusion HCM Cloud Demo for Beginners
Adobe Premiere Pro 2025 (v24.5.0.057) Crack free
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Digital Systems & Binary Numbers (comprehensive )
Design an Analysis of Algorithms I-SECS-1021-03
Wondershare Filmora 15 Crack With Activation Key [2025

The Mushroom Cloud Effect - What happens when containers fail?

Editor's Notes

  • #3: Who am I and what we do Dynatrace the monitoring company with a solution for highly dynamic cloud and container environments Many customers, lots of experience, production monitoring,
  • #4: There are many campfire stories we could go into … but today I’m gonna talk about the Mushroom Cloud Effect Go into context here. Mushroom cloud effect we see very often in dynamic container environements with lots of microservices etc.
  • #9: All the services should work as designed You need to be careful of api and service compatibilities Additional complexity through feature sets cross geo-location availability Usually designed for zero downtime Most of them apply to netflix website services
  • #10: However, running such services often done with paas or paas like systems Container based One container, one service One service, multiple containers – somewhere across different hosts Containers are ephemeral, regularly killed and started Orchestration not only for scheduling but also for resilience at container level
  • #12: This is an example of one of our customers – B2W the largest e-commerce provider in LatAm Left hand side: a single service in v5 provided by 5 containers – one is offline. The conainers run on separate hosts in 2 datacenters. This is the technical part Right hand side is a mess – all services communicate somehow to each other. Service dependencies and communication needs to be real-time
  • #13: This is a way simpler environment that shows the flow of transactional apps.
  • #16: Example of a small mushroom cloud effect Bottom: Hosts Middle part: processes / containers Top: services and applications