ZERO-DOWNTIME DEPLOYMENT
on K8S
the missing part
Bảo Huỳnh
Site Reliability Engineering
12-Jun-2020
AGENDA
1. Deployment & Replicas: are we really safe ?
2. Understand Pod Eviction Lifecycle
3. Avoid Outages
4. Beyond the Outages
1. Deployment & Replicas: really safe ?
We have:
● Replicas : 2
● RollingUpdate Strategy
● maxUnavailable: 1
* Everything seems quite strong to avoid downtime
* What happens if one pod disappear ?
* How about existing & upcoming traffic ?
1. Deployment & Replicas: really safe ?
Downtime will occur IF:
- Existing traffic does not being handled properly
- Application does not handle graceful shutdown
- ……….
2. Understand Pod Eviction Lifecycle
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time stopping app)
Add preStop hook to graceful
shutdown nginx
🡪 Make sure app finish handling
existing connections before quit
2. Understand Pod Eviction Lifecycle
2. Understand Pod Eviction Lifecycle
- Drain “node 1”
- Sent SIGTERM to nginx pod
- preStop hook is executed
(nginx quit)
2. Understand Pod Eviction Lifecycle
+ New request is coming
+ Being routed to stopping Nginx
+ Error….
2. Understand Pod Eviction Lifecycle
2. Understand Pod Eviction Lifecycle
- Why does this sh*t happens ?
- Why does stupid K8S still routing traffic to a “terminating” pod ?
- said CT Engineer -
3. Avoid the Outages
Recall pod shutdown sequence
● kubectl delete / drain / upgrade
● A request 🡪 nodes where pod is located
● kubelet sends SIGTERM to pods
● kubelet sends SIGKILL after graceful period (preStop + time to stop app)
……………………….
RIGHT, but NOT ENOUGH
3. Avoid the Outages
Figure 1: Sequences occur when pod is deleted
3. Avoid the Outages
Figure 2: Timeline “version” for pod deletion’s events
- Two flows run in parellel
- No guarantee [A] finish after [B]
3. Avoid the Outages
3. Avoid the Outages
BUT HOW ???
3. Avoid the Outages
● don’t work, just SLEEP
● … & wait for deregister flow (B) to complete
before graceful shutdown
3. Avoid the Outages
3. Avoid the Outages
3. Avoid the Outages
4. Beyond the Outages
- Introducing: PodDisruptionBudgets
- An indicator of the number of disruptions that
can be tolerated at a given time for a class of
pods (a budget of faults).
- If number of pod < PodDisruptionBudgets, the
drain operation is halted
(wait for new pod come up & increase above the
threshold)
4. Beyond the Outages
Summary
Application:
- Handed SIGTERM for graceful shutdown
System:
- Apply preStop lifecycle
- Apply Sleep to make sure pod Endpoint is deregistered from Service to
avoid new coming traffic
- Using PodDisruptionBudgets to avoid all pods down at the same time
Questions & Answers
Appendix: Service Disruption
Involuntary disruptions Voluntary disruptions
HW failure,
node disappear from cluster
deployment upgrade,
delete pod
node upgrade
node drain

More Related Content

PPTX
ELK - Optimizations & Updates
PPTX
CICD pipelines with GitOps
PDF
Enabling GitOps - Architecture for Implementation
PDF
Shaker
PPTX
Neutron upgrades strategy
PDF
Git workflows
PDF
Git Series. Episode 3. Git Flow and Github-Flow
PDF
Git flow for daily use
ELK - Optimizations & Updates
CICD pipelines with GitOps
Enabling GitOps - Architecture for Implementation
Shaker
Neutron upgrades strategy
Git workflows
Git Series. Episode 3. Git Flow and Github-Flow
Git flow for daily use

What's hot (20)

PDF
Git flow Introduction
PDF
Git Series. Episode 2. Merge, Upstream Commands and Tags
ODP
Git Flow - An Introduction
PPTX
Continious integration pipeline
PPTX
Git presentation
PDF
Git and git flow
PPTX
Git flow
PDF
Kubernetes-native or not? When should you ditch your traditional CI/CD server...
PPTX
Git tutorial git branches 20131206-Bryan
PDF
Kubernetes configuration and security policies with KubeLinter | DevNation Te...
PDF
Git Tricks
PDF
Top 10 Kubernetes Native Java Quarkus Features
PDF
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
PPTX
Neutron upgrades
PPTX
Why Aren't You Using Git Flow?
PPT
Git workflows presentation
PDF
Git tutorial
PPTX
git-flow R3Labs
PDF
GitLab 라이선스별 특징 요약 - 인포그랩
PDF
Crossing the Streams! Rollout Strategies to Keep Your Users Happy!
Git flow Introduction
Git Series. Episode 2. Merge, Upstream Commands and Tags
Git Flow - An Introduction
Continious integration pipeline
Git presentation
Git and git flow
Git flow
Kubernetes-native or not? When should you ditch your traditional CI/CD server...
Git tutorial git branches 20131206-Bryan
Kubernetes configuration and security policies with KubeLinter | DevNation Te...
Git Tricks
Top 10 Kubernetes Native Java Quarkus Features
Quarkus: From developer joy to Kubernetes nirvana! | DevNation Tech Talk
Neutron upgrades
Why Aren't You Using Git Flow?
Git workflows presentation
Git tutorial
git-flow R3Labs
GitLab 라이선스별 특징 요약 - 인포그랩
Crossing the Streams! Rollout Strategies to Keep Your Users Happy!
Ad

Similar to K8s-zero-downtime-the-missing-part (20)

PPTX
Production Grade Kubernetes Applications
PPTX
Upgrade Kubernetes the boring way
PPT
F33 book-depend-pres-pt6
PDF
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
PDF
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
PDF
Rac 12c optimization
PPTX
Monitoring klassisch oder Cloud
PDF
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
PDF
"Scaling in space and time with Temporal", Andriy Lupa .pdf
PDF
Scheduling in Android
PDF
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
PDF
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
PDF
Velocity 2012 - Learning WebOps the Hard Way
PDF
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
PPTX
Introduce to Git and Jenkins
PDF
TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf
PDF
GitOps and security by Reto Bollinger, CSIO nine.ch
PDF
Awe k2 midterms finals
PDF
Cloud orchestration risks
PDF
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Production Grade Kubernetes Applications
Upgrade Kubernetes the boring way
F33 book-depend-pres-pt6
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
Kubernetes Failure Stories, or: How to Crash Your Cluster - ContainerDays EU ...
Rac 12c optimization
Monitoring klassisch oder Cloud
Lessons Learned from Migrating Legacy Enterprise Applications to Microservices
"Scaling in space and time with Temporal", Andriy Lupa .pdf
Scheduling in Android
Running Kubernetes in Production: A Million Ways to Crash Your Cluster - Cont...
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Velocity 2012 - Learning WebOps the Hard Way
2017 Microservices Practitioner Virtual Summit: Microservices at Squarespace ...
Introduce to Git and Jenkins
TechTalk5-WhatDoesItTakeToRunLLVMBuildbots.pdf
GitOps and security by Reto Bollinger, CSIO nine.ch
Awe k2 midterms finals
Cloud orchestration risks
"Scaling in space and time with Temporal", Andriy Lupa.pdf
Ad

More from Huynh Thai Bao (7)

PPTX
Service Mesh 101 - Digging into your service
PPTX
K8s Webhook Admission
PPTX
Cassandra - decentralized structured database
PPTX
Skaffold - faster development on K8S
PDF
Kubernetes - A Rising Hero
PDF
Vault - Enhancement for K8S secret security
PDF
GCP Best Practices for SRE Team
Service Mesh 101 - Digging into your service
K8s Webhook Admission
Cassandra - decentralized structured database
Skaffold - faster development on K8S
Kubernetes - A Rising Hero
Vault - Enhancement for K8S secret security
GCP Best Practices for SRE Team

Recently uploaded (20)

PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
Current and future trends in Computer Vision.pptx
PPTX
Amdahl’s law is explained in the above power point presentations
PPT
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PDF
distributed database system" (DDBS) is often used to refer to both the distri...
PDF
Soil Improvement Techniques Note - Rabbi
PPTX
Feature types and data preprocessing steps
PPT
Total quality management ppt for engineering students
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PPTX
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
PPTX
Software Engineering and software moduleing
PDF
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf
Abrasive, erosive and cavitation wear.pdf
Current and future trends in Computer Vision.pptx
Amdahl’s law is explained in the above power point presentations
INTRODUCTION -Data Warehousing and Mining-M.Tech- VTU.ppt
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
distributed database system" (DDBS) is often used to refer to both the distri...
Soil Improvement Techniques Note - Rabbi
Feature types and data preprocessing steps
Total quality management ppt for engineering students
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Fundamentals of Mechanical Engineering.pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
CURRICULAM DESIGN engineering FOR CSE 2025.pptx
Software Engineering and software moduleing
Accra-Kumasi Expressway - Prefeasibility Report Volume 1 of 7.11.2018.pdf

K8s-zero-downtime-the-missing-part

  • 1. ZERO-DOWNTIME DEPLOYMENT on K8S the missing part Bảo Huỳnh Site Reliability Engineering 12-Jun-2020
  • 2. AGENDA 1. Deployment & Replicas: are we really safe ? 2. Understand Pod Eviction Lifecycle 3. Avoid Outages 4. Beyond the Outages
  • 3. 1. Deployment & Replicas: really safe ? We have: ● Replicas : 2 ● RollingUpdate Strategy ● maxUnavailable: 1 * Everything seems quite strong to avoid downtime * What happens if one pod disappear ? * How about existing & upcoming traffic ?
  • 4. 1. Deployment & Replicas: really safe ? Downtime will occur IF: - Existing traffic does not being handled properly - Application does not handle graceful shutdown - ……….
  • 5. 2. Understand Pod Eviction Lifecycle ● kubectl delete / drain / upgrade ● A request 🡪 nodes where pod is located ● kubelet sends SIGTERM to pods ● kubelet sends SIGKILL after graceful period (preStop + time stopping app)
  • 6. Add preStop hook to graceful shutdown nginx 🡪 Make sure app finish handling existing connections before quit 2. Understand Pod Eviction Lifecycle
  • 7. 2. Understand Pod Eviction Lifecycle - Drain “node 1” - Sent SIGTERM to nginx pod - preStop hook is executed (nginx quit)
  • 8. 2. Understand Pod Eviction Lifecycle + New request is coming + Being routed to stopping Nginx + Error….
  • 9. 2. Understand Pod Eviction Lifecycle
  • 10. 2. Understand Pod Eviction Lifecycle - Why does this sh*t happens ? - Why does stupid K8S still routing traffic to a “terminating” pod ? - said CT Engineer -
  • 11. 3. Avoid the Outages Recall pod shutdown sequence ● kubectl delete / drain / upgrade ● A request 🡪 nodes where pod is located ● kubelet sends SIGTERM to pods ● kubelet sends SIGKILL after graceful period (preStop + time to stop app) ………………………. RIGHT, but NOT ENOUGH
  • 12. 3. Avoid the Outages Figure 1: Sequences occur when pod is deleted
  • 13. 3. Avoid the Outages Figure 2: Timeline “version” for pod deletion’s events - Two flows run in parellel - No guarantee [A] finish after [B]
  • 14. 3. Avoid the Outages
  • 15. 3. Avoid the Outages BUT HOW ???
  • 16. 3. Avoid the Outages ● don’t work, just SLEEP ● … & wait for deregister flow (B) to complete before graceful shutdown
  • 17. 3. Avoid the Outages
  • 18. 3. Avoid the Outages
  • 19. 3. Avoid the Outages
  • 20. 4. Beyond the Outages - Introducing: PodDisruptionBudgets - An indicator of the number of disruptions that can be tolerated at a given time for a class of pods (a budget of faults). - If number of pod < PodDisruptionBudgets, the drain operation is halted (wait for new pod come up & increase above the threshold)
  • 21. 4. Beyond the Outages
  • 22. Summary Application: - Handed SIGTERM for graceful shutdown System: - Apply preStop lifecycle - Apply Sleep to make sure pod Endpoint is deregistered from Service to avoid new coming traffic - Using PodDisruptionBudgets to avoid all pods down at the same time
  • 24. Appendix: Service Disruption Involuntary disruptions Voluntary disruptions HW failure, node disappear from cluster deployment upgrade, delete pod node upgrade node drain