SlideShare a Scribd company logo
Highly available apps on
k8s
- Prashant Kalkar
Agenda
● Get your container process right.
● Starting up healthy - Liveness Readiness Probes.
● Pod graceful shutdowns.
● Working with Cloud Load balancer.
● Scheduling pods
● Deployment strategies
● Disturbances
○ Voluntary
○ involuntary
Container Processes
Unix Processes
● Manage child processes’
lifecycle.
● Forwarding OS signals
(SIGTERM, SIGKILL)
Container PID = 1
Command: bash -c "java
$JAVA_OPTS -jar /opt/app/app.jar"
Container PID = 1. Graceful shutdown
● PID = 1 should forward OS signals to container process.
● Wait for child termination
● Exit with 0.
Bash does not forward any OS signals to child process.
So no SIGTERM send to application. The container is simply killed. No Graceful
shutdown.
https://about.gitlab.com/blog/2022/05/17/how-we-removed-all-502-errors-by-caring
-about-pid-1-in-kubernetes/
Fixing the PID 1 issue
● Avoid shell format :
● Use exec format:
● Use exec:
● Replace bash with better init process (See tini)
Ensure your app does handle the SIGTERM signal (all modern frameworks do).
Starting Up Healthy
Liveness and Readiness Probe
Readiness probe
● Ensures application running before traffic is send to it.
● Reduces possibilities of errors when application is not ready.
● Can mark pod as not ready to avoid new traffic under high load.
● Readiness plays role in rolling deployments
Liveness probe
● Tries to restore failed container by killing it.
● Ensure Liveness probe is not aggressive to prevent excessive restarts.
● Not recommended for clustered applications.
Pod graceful shutdown
Pod termination Process
Best practices for Highly available apps on k8s.pptx
Kubelet pod termination in more details
Best practices for Highly available apps on k8s.pptx
preStop hook to delay Pod termination
Ensure High availability
● preStop hook and terminationGracePeriodSeconds controls the pod
termination process (kubelet).
● Ensure preStop hook >= Time taken to stop traffic + Time required to
complete ongoing request.
● Can slow down rolling deployments
What about Cloud Load Balancer
Consider scenario
● Start a rolling update of the deployment
● Rollout of new pods takes less time than it takes the Load Balancer controller to register the new
pods and for their health state turn »Healthy« in the target group
● So even when pod is ready and deployment mark the pod as available. LB is not sending traffic.
● Since deployment available replica count is satisfied one of the healthy older pod start
terminating.
● Thus making less pods available for the service from the LB point of view.
preStop hook more than LB Health internal
lifecycle:
preStop:
exec:
# Sleeping before shutdown allows app to process requests that
were
# in-flight before the node is removed from load-balancing.
# If using an external load balancer, you may need to increase
this
# duration to be greater than the LB's health check interval.
command: ["sleep", "${lb_healthcheck_internal + 5}"]
Pod readinessGates
Scheduling pods
Inter-pod Anti-affinity for high availability
Topology Spread Across Node and Zone
Deployment Strategies
Blue Green deployment
Blue Green deployment
Canary Deployment
Canary Deployment
Disturbances
Voluntary disturbance
● Draining a node
● Rolling deployments
● Deleting a pod
Pod Disruption Budgets (PDB)
Involuntary disturbance
Evictions
● Node out of resources
● Hardware failure
Preemption
● Pod replaced due to high priority pod.
How to reduce Impact for Involuntary disturbance
● QoS - Guaranteed, Bustable, BestEffort
● Cluster autoscaling (and low priority empty pods)
● PDB can not prevent but count Involuntary disturbances.
Quality of Service
Guaranteed
Bustable
BestEffort
Fast scaling with Overprovisioning and Preemption
Fast scaling with Overprovisioning and Preemption
How to reduce Impact for Involuntary disturbance
● QoS - Guaranteed, Bustable, BestEffort
● Cluster autoscaling (and low priority empty pods)
● PDB can not prevent but count Involuntary disturbances.
Thank you!

More Related Content

PDF
Troubleshooting containerized applications
PPTX
Production Grade Kubernetes Applications
PDF
The Highs and Lows of Stateful Containers
PPTX
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
PDF
Zero downtime deployment of micro-services with Kubernetes
PDF
15 kubernetes failure points you should watch
PDF
Lessons learned from operating small scale clusters.pdf
PDF
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019
Troubleshooting containerized applications
Production Grade Kubernetes Applications
The Highs and Lows of Stateful Containers
Kube con china_2019_7 missing factors for your production-quality 12-factor apps
Zero downtime deployment of micro-services with Kubernetes
15 kubernetes failure points you should watch
Lessons learned from operating small scale clusters.pdf
Kafka on Kubernetes: Keeping It Simple (Nikki Thean, Etsy) Kafka Summit SF 2019

Similar to Best practices for Highly available apps on k8s.pptx (20)

PDF
Kubernetes in Production: Lessons Learnt
PDF
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
PPTX
K8s-zero-downtime-the-missing-part
PDF
Staying out of_trouble_with_k8s_on_aws
PDF
Spring Into Kubernetes DFW
PPTX
Container orchestration and microservices world
PDF
Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes w...
PDF
Kubernetes at Datadog the very hard way
PDF
Afterlife tales -> troubleshooting containerized applications
PDF
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
PPTX
Lifecycle of a pod
PPTX
Observability and Orchestration of your GitOps Deployments with Keptn
PDF
4Developers 2018: Zero-Downtime deployments with Kubernetes (Mateusz Dymiński)
PDF
Kubernetes - introduction
PPTX
Kubernetes Probes (Liveness, Readyness, Startup) Introduction
PDF
Kubernetes and lastminute.com: our course towards better scalability and proc...
PDF
Kubernetes and lastminute.com: our course towards better scalability and proc...
PDF
Kubernetes day 2 @ zse energia
PDF
Scaling Microservices with Kubernetes
PDF
Kubernetes and lastminute.com: our course towards better scalability and proc...
Kubernetes in Production: Lessons Learnt
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
K8s-zero-downtime-the-missing-part
Staying out of_trouble_with_k8s_on_aws
Spring Into Kubernetes DFW
Container orchestration and microservices world
Intelligent, Automatic Restarts for Unhealthy Kafka Consumers on Kubernetes w...
Kubernetes at Datadog the very hard way
Afterlife tales -> troubleshooting containerized applications
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Lifecycle of a pod
Observability and Orchestration of your GitOps Deployments with Keptn
4Developers 2018: Zero-Downtime deployments with Kubernetes (Mateusz Dymiński)
Kubernetes - introduction
Kubernetes Probes (Liveness, Readyness, Startup) Introduction
Kubernetes and lastminute.com: our course towards better scalability and proc...
Kubernetes and lastminute.com: our course towards better scalability and proc...
Kubernetes day 2 @ zse energia
Scaling Microservices with Kubernetes
Kubernetes and lastminute.com: our course towards better scalability and proc...
Ad

More from Prashant Kalkar (10)

PPTX
Design principles to modularise a monolith codebase.pptx
PDF
GDCR 2022.pptx.pdf
PPTX
Exploring the flow of network traffic through kubernetes cluster.pptx
PPTX
Uncover the mysteries of infrastructure as code (iac)!
PPTX
AWS ECS workshop
PPTX
Microservices testing consumer driven contracts using pact
PPTX
Immutable infrastructure with Terraform
PPTX
Hibernate
PPTX
Functional programming
PDF
Functional programming ii
Design principles to modularise a monolith codebase.pptx
GDCR 2022.pptx.pdf
Exploring the flow of network traffic through kubernetes cluster.pptx
Uncover the mysteries of infrastructure as code (iac)!
AWS ECS workshop
Microservices testing consumer driven contracts using pact
Immutable infrastructure with Terraform
Hibernate
Functional programming
Functional programming ii
Ad

Recently uploaded (20)

PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
KodekX | Application Modernization Development
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Spectroscopy.pptx food analysis technology
PDF
Empathic Computing: Creating Shared Understanding
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Cloud computing and distributed systems.
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPT
Teaching material agriculture food technology
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
KodekX | Application Modernization Development
Understanding_Digital_Forensics_Presentation.pptx
MYSQL Presentation for SQL database connectivity
Machine learning based COVID-19 study performance prediction
Spectroscopy.pptx food analysis technology
Empathic Computing: Creating Shared Understanding
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Cloud computing and distributed systems.
sap open course for s4hana steps from ECC to s4
Advanced methodologies resolving dimensionality complications for autism neur...
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Teaching material agriculture food technology
20250228 LYD VKU AI Blended-Learning.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Unlocking AI with Model Context Protocol (MCP)
Review of recent advances in non-invasive hemoglobin estimation

Best practices for Highly available apps on k8s.pptx