SlideShare a Scribd company logo
Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
1
Production Grade Kubernetes Applications
Have some
Bad News &
Good News!
Just because you containerized, Kubernetized and Cloudified your
application, doesn’t mean its Reliable, Scalable and Secured automatically
Bad News First!
• Your Hardware will fail
• Your Enterprise grade application will fail
• Your Cloud will fail
• Your Kubernetes cluster will fail
Embrace it!
QoSLayer Application Layer Cluster Layer Infrastructure Layer
Reliable
Scalable
Available
Secured
Performance
INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS
INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A POD BY MISTAKE ☹
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
A HARDWARE FAILURE OF THE PHYSICAL MACHINE or VM
INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A NODE BY MISTAKE ☹
INVOLUNTARY DISRUPTIONS
Master
Worker 1
New Pod
Worker 2
login Emp Emp
POD GETS EVICTED FROM NODE DUE TO RESOURCE CONSTRAINTS
New Pod New Pod
New Pod
New Pod New Pod New Pod
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
Master
Worker 1 Worker 2
login Emp Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp
Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp
Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN
VOLUNTARY DISRUPTIONS
PENDING QUEUE!
Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade
Good news -
Solution?
https://www.plectica.com/maps/I7WZTGITU/edit/RAKHSLAXT
Choose Right Controller/Storage Req
Pod Replicas
Application Upgrade Strategy
https://www.youtube.com/watch?v=c7ytxiddImw
spec:
replicas: 1
deployment.spec.replicas
deployment.spec.strategy statefulset.spec.updateStrategy
Recreate – deletes all
RollingUpdate – one pod upgrade at a time
OnDelete – only on Delete | Partition(canary)
RollingUpdate
daemonset.spec. updateStrategy
onDelete
RollingUpdate
https://www.youtube.com/watch?v=GQJP9QdHHs8
deployment daemonset statefulset job ephemeral persistent
https://www.youtube.com/watch?v=c7ytxiddImw
Pod eviction during
resource constraints
Node disk or mem
pressures
liveness
Liveness and Readiness Probes
readiness
pods/probe/exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
pods/probe/tcp-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
https://www.youtube.com/watch?v=u7sbDPmezAo
node-selectors
Affinity and Anti Affinity
nodeAffinity podAffinity and podAntiAffinity
pods/pod-nginx.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
pods/pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
pods/pod-with-pod-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
Pod eviction during
resource constraints
Node disk or mem
pressures
Not all similar Pods
folks together
naive-dep-login
Affinity and Anti Affinity
self-relialized-dep-login
01-affinity-antiaffinity1-naive-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"
01-affinity-antiaffinity2-self-relialized-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
affinity:
nodeAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- preference:
matchExpressions:
- key: color
operator: In
values:
- blue
weight: 1
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"
taint
Taints and Tolerations
Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a
hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.
toleration
kubectl taint nodes node1 key=value:NoSchedule pods/pod-with-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "example-key"
operator: "Exists"
effect: "NoSchedule"
One or more taints are applied to
a node; this marks that the node
should not accept any pods that
do not tolerate the taints.
Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
kubectl delete pod login-abcdf-
123adfc
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
Deleting a Pod! - #ClaGIFied
Controllers will
start panicking
Kubectl Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
Kubectl drain
node1
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
For Every Node –
Cordon it
For Every
POD
Cordon it –
Mark
Unschedulable
Is
PDB
met?
Retry
Draining
a Node!
-
#ClaGIFied
POD
DISRUPTION
BUDGET
POD DISRUPTION BUDGET
A PDB limits the number of pods of a replicated
application that are down simultaneously from
voluntary disruptions.
How
Does that
Work?
Your Deployment
Pod Disruption Budget
PDB
e001/pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: loginBudget
spec:
minAvailable: 1
selector:
matchLabels:
app: login
e001/dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
.
.
.
.
.
.
Admin calls kubectl drain
Programmatically using
Eviction API
https://www.youtube.com/watch?v=pNbkZMEDevs
VOLUNTARY DISRUPTIONS
Disclaimer: Not all disruptions will be protected by
PDB
Some examples includes:
1.Deleting a deployment directly
2.Deleting a pod directly
How to determine
right value for my
PDB?
How to determine right value for my PDB?
• There is no single rule for this
Few Examples will be:
1. You are running a Consul cluster with K8S and you want to maintain a quorum of minimum 3 server components for fault
tolerance. In this case we can specify PDB’s minAvailable as 3.
2. You are running a statefulset for your database with K8S. And here you can specify PDB to avoid disruption in that DB, may
be you need respective team to take DB backups and then confirm that you can perform the disruption.
3. For stateless microservice, you might say I need minimum 1 replica running all the time and set PDB accordingly. Like we
saw in our demo sometime back.
4. And the list goes on.
So it means for every workload you are running in your cluster the setup of PDB can differ.
https://www.youtube.com/watch?v=pNbkZMEDevs
https://github.com/mikkeloscar/pdb-controller/
The controller simply gets all Pod Disruption Budgets for each namespace and
compares them to Deployments and StatefulSets. For any resource with
more than 1 replica and no matching Pod Disruption Budget, a default PDB
will be created
Cool tip on PDB controller
resources.requests(limits).cpu
Resource Constraints and PriorityClass
PriorityClass – Non-Namespaced object
containers:
- name: login
image: "busybox:1"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
command:
- sleep
- "7200"
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for
High Priority service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 5000
globalDefault: false
description: "This priority class should be used for
Low Priority service pods only."
resources.requests(limits).memory
spec:
priorityClassName: high-priority
Pod Spec with reference to PriorityClassName
Pod eviction during
resource constraints
failure-domain.beta.kubernetes.io/zone(< 1.17)
topology.kubernetes.io/zone (>= 1.17)
Topology Spread – Hosts/Zones/Regions
pods/pod-with-pod-affinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: topo-emp-deployment
labels:
app: emp
spec:
replicas: 2
selector:
matchLabels:
app: emp
template:
metadata:
labels:
app: emp
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: failure-
domain.beta.kubernetes.io/zone
containers:
- name: with-pod-affinity
image: k8s.gcr.io/pause:2.0
topologyKey: kubernetes.io/hostname
kubernetes.io/hostname
topologyKey: failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/region(< 1.17)
topology.kubernetes.io/region (>= 1.17)
Not all similar Pods
folks together
Pods are HA during zonal
or Region Failure
https://cloud.google.com/compute/docs/regions-zones
https://docs.microsoft.com/en-us/azure/availability-zones/az-overview
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-
regions-availability-zones.html#concepts-availability-zones
AZ’s are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.
Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade
Choose Right
Controller
Pod Replicas
Application
Upgrade Strategy
PDB
Affinity and Anti
Affinity/Taints and
Tolerations
Taints and
Tolerations
Topology Spread –
Hosts/Zones/
Regions
Resource
Constraints and
PriorityClass
QoSLayer Application Layer Cluster Layer Infrastructure Layer
Reliable
Scalable
Available
Secured
Performance
Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
39

More Related Content

PDF
Kubernetes Failure Stories - KubeCon Europe Barcelona
PDF
Kubernetes + Python = ❤ - Cloud Native Prague
PDF
Developer Experience at Zalando - CNCF End User SIG-DX
PDF
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
PDF
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
PDF
Paris Container Day 2016 : Etcd - overview and future (CoreOS)
PDF
Preventing Kubernetes Misconfigurations
PDF
jbang: Unleash the power of Java for shell scripting
Kubernetes Failure Stories - KubeCon Europe Barcelona
Kubernetes + Python = ❤ - Cloud Native Prague
Developer Experience at Zalando - CNCF End User SIG-DX
Why I love Kubernetes Failure Stories and you should too - GOTO Berlin
Building a Production Grade PostgreSQL Cloud Foundry Service | anynines
Paris Container Day 2016 : Etcd - overview and future (CoreOS)
Preventing Kubernetes Misconfigurations
jbang: Unleash the power of Java for shell scripting

What's hot (20)

PPTX
Kubernetes your tests! automation with docker on google cloud platform
PDF
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
PDF
Automated acceptance test
PDF
Adopting Java for the Serverless world at JUG Hamburg
PDF
What's new with Apache Camel 3? | DevNation Tech Talk
PDF
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
PDF
Performance Testing using Real Browsers with JMeter & Webdriver
PDF
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
PDF
Testing the Enterprise layers, with Arquillian
PDF
Perfug 20-11-2019 - Kafka Performances
PDF
Everything You Thought You Already Knew About Orchestration
PDF
New AWS Services
PPTX
Go語言開發APM微服務在Kubernetes之經驗分享
KEY
Perlbrew
PPTX
Cloud Foundry | How it works
PDF
Staying out of_trouble_with_k8s_on_aws
PDF
DCEU 18: How To Build Your Containerization Strategy
PDF
Unleashing Docker with Pipelines in Bitbucket Cloud
PPTX
Adding Security to your SLO-based Release Validation with Keptn
PPTX
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Kubernetes your tests! automation with docker on google cloud platform
Spring Boot to Quarkus: A real app migration experience | DevNation Tech Talk
Automated acceptance test
Adopting Java for the Serverless world at JUG Hamburg
What's new with Apache Camel 3? | DevNation Tech Talk
제4회 한국IBM과 함께하는 난공불락 오픈소스 인프라 세미나-Ranchers
Performance Testing using Real Browsers with JMeter & Webdriver
Webinar slides: 9 DevOps Tips for Going in Production with Galera Cluster for...
Testing the Enterprise layers, with Arquillian
Perfug 20-11-2019 - Kafka Performances
Everything You Thought You Already Knew About Orchestration
New AWS Services
Go語言開發APM微服務在Kubernetes之經驗分享
Perlbrew
Cloud Foundry | How it works
Staying out of_trouble_with_k8s_on_aws
DCEU 18: How To Build Your Containerization Strategy
Unleashing Docker with Pipelines in Bitbucket Cloud
Adding Security to your SLO-based Release Validation with Keptn
Automated Deployment Pipeline using Jenkins, Puppet, Mcollective and AWS
Ad

Similar to Production Grade Kubernetes Applications (20)

PPTX
KCD PPT -2025 - Kasun Rathnayaka (2).pptx
PPTX
K8s best practices from the field!
PDF
Kubernetes - introduction
PDF
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
PDF
Seamless scaling of Kubernetes nodes
PDF
kubernetes.pdf
PDF
Kubernetes at Datadog the very hard way
PPTX
Kubernetes Internals
PDF
Zero downtime deployment of micro-services with Kubernetes
PDF
Kubernetes Basics - ICP Workshop Batch II
PPTX
Dealing with kubesprawl tetris style !
PDF
Hands-On Introduction to Kubernetes at LISA17
PDF
Kubernetes 1001
PDF
Using kubernetes to lose your fear of using containers
PDF
Kubernetes Walk Through from Technical View
PDF
Kubernetes111111111111111111122233334334
PDF
Scaling Microservices with Kubernetes
PDF
Kube Your Enthusiasm - Paul Czarkowski
PDF
Kube Your Enthusiasm
PDF
Spring Into Kubernetes DFW
KCD PPT -2025 - Kasun Rathnayaka (2).pptx
K8s best practices from the field!
Kubernetes - introduction
Why Kubernetes Freedom Requires Chaos Engineering to Shine in Production
Seamless scaling of Kubernetes nodes
kubernetes.pdf
Kubernetes at Datadog the very hard way
Kubernetes Internals
Zero downtime deployment of micro-services with Kubernetes
Kubernetes Basics - ICP Workshop Batch II
Dealing with kubesprawl tetris style !
Hands-On Introduction to Kubernetes at LISA17
Kubernetes 1001
Using kubernetes to lose your fear of using containers
Kubernetes Walk Through from Technical View
Kubernetes111111111111111111122233334334
Scaling Microservices with Kubernetes
Kube Your Enthusiasm - Paul Czarkowski
Kube Your Enthusiasm
Spring Into Kubernetes DFW
Ad

Recently uploaded (20)

PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Advanced IT Governance
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Empathic Computing: Creating Shared Understanding
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPT
Teaching material agriculture food technology
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Understanding_Digital_Forensics_Presentation.pptx
GamePlan Trading System Review: Professional Trader's Honest Take
MYSQL Presentation for SQL database connectivity
Advanced IT Governance
Per capita expenditure prediction using model stacking based on satellite ima...
Empathic Computing: Creating Shared Understanding
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Teaching material agriculture food technology
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
NewMind AI Weekly Chronicles - August'25 Week I
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
NewMind AI Monthly Chronicles - July 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Reach Out and Touch Someone: Haptics and Empathic Computing
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Dropbox Q2 2025 Financial Results & Investor Presentation
Understanding_Digital_Forensics_Presentation.pptx

Production Grade Kubernetes Applications

  • 1. Chennai, India 755 02 01 268 Narayanan.kmu@gmail.com bribebybytes.github.io/landing-page NARAYANAN KRISHNAMURTHY Technical Architect, ADP India Cloud Architect with 15 Years in IT CLOUD DEVOPS TELECOM Skills Languages English Tamil Hindi Being Social Bribe By Bytes Hobby 1
  • 3. Have some Bad News & Good News!
  • 4. Just because you containerized, Kubernetized and Cloudified your application, doesn’t mean its Reliable, Scalable and Secured automatically
  • 5. Bad News First! • Your Hardware will fail • Your Enterprise grade application will fail • Your Cloud will fail • Your Kubernetes cluster will fail Embrace it!
  • 6. QoSLayer Application Layer Cluster Layer Infrastructure Layer Reliable Scalable Available Secured Performance
  • 9. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp CLUSTER ADMIN DELETES A POD BY MISTAKE ☹
  • 10. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp A HARDWARE FAILURE OF THE PHYSICAL MACHINE or VM
  • 11. INVOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp CLUSTER ADMIN DELETES A NODE BY MISTAKE ☹
  • 12. INVOLUNTARY DISRUPTIONS Master Worker 1 New Pod Worker 2 login Emp Emp POD GETS EVICTED FROM NODE DUE TO RESOURCE CONSTRAINTS New Pod New Pod New Pod New Pod New Pod New Pod
  • 13. VOLUNTARY DISRUPTIONS Master Worker 1 Worker 2 login Emp Emp
  • 14. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS
  • 15. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS
  • 16. Master Worker 1 Worker 2 login Emp Emp DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN VOLUNTARY DISRUPTIONS PENDING QUEUE!
  • 17. Cluster admin deletes a pod by mistake A hardware failure of the physical machine or Virtual Machine Cluster admin deletes a node by mistake Pod gets evicted from node due to resource constraints Draining a node for repair or upgrade or to scale down Application Upgrade
  • 20. Choose Right Controller/Storage Req Pod Replicas Application Upgrade Strategy https://www.youtube.com/watch?v=c7ytxiddImw spec: replicas: 1 deployment.spec.replicas deployment.spec.strategy statefulset.spec.updateStrategy Recreate – deletes all RollingUpdate – one pod upgrade at a time OnDelete – only on Delete | Partition(canary) RollingUpdate daemonset.spec. updateStrategy onDelete RollingUpdate https://www.youtube.com/watch?v=GQJP9QdHHs8 deployment daemonset statefulset job ephemeral persistent https://www.youtube.com/watch?v=c7ytxiddImw Pod eviction during resource constraints Node disk or mem pressures
  • 21. liveness Liveness and Readiness Probes readiness pods/probe/exec-liveness.yaml apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: k8s.gcr.io/busybox args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: exec: command: - cat - /tmp/healthy initialDelaySeconds: 5 periodSeconds: 5 pods/probe/tcp-liveness-readiness.yaml apiVersion: v1 kind: Pod metadata: name: goproxy labels: app: goproxy spec: containers: - name: goproxy image: k8s.gcr.io/goproxy:0.1 ports: - containerPort: 8080 readinessProbe: tcpSocket: port: 8080 initialDelaySeconds: 5 periodSeconds: 10 livenessProbe: tcpSocket: port: 8080 initialDelaySeconds: 15 periodSeconds: 20 https://www.youtube.com/watch?v=u7sbDPmezAo
  • 22. node-selectors Affinity and Anti Affinity nodeAffinity podAffinity and podAntiAffinity pods/pod-nginx.yaml apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent nodeSelector: disktype: ssd pods/pod-with-node-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-node-affinity spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/e2e-az-name operator: In values: - e2e-az1 - e2e-az2 preferredDuringSchedulingIgnoredDuringExecution: - weight: 1 preference: matchExpressions: - key: another-node-label-key operator: In values: - another-node-label-value containers: - name: with-node-affinity image: k8s.gcr.io/pause:2.0 pods/pod-with-pod-affinity.yaml apiVersion: v1 kind: Pod metadata: name: with-pod-affinity spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: security operator: In values: - S1 podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: labelSelector: matchExpressions: - key: security operator: In values: - S2 containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0 Pod eviction during resource constraints Node disk or mem pressures Not all similar Pods folks together
  • 23. naive-dep-login Affinity and Anti Affinity self-relialized-dep-login 01-affinity-antiaffinity1-naive-dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login template: metadata: labels: app: login spec: containers: - name: login image: "busybox:1" command: - sleep - "7200" 01-affinity-antiaffinity2-self-relialized-dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login template: metadata: labels: app: login spec: affinity: nodeAffinity: preferredDuringSchedulingIgnoredDuringExecution: - preference: matchExpressions: - key: color operator: In values: - blue weight: 1 containers: - name: login image: "busybox:1" command: - sleep - "7200"
  • 24. taint Taints and Tolerations Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a hard requirement). Taints are the opposite -- they allow a node to repel a set of pods. toleration kubectl taint nodes node1 key=value:NoSchedule pods/pod-with-toleration.yaml apiVersion: v1 kind: Pod metadata: name: nginx labels: env: test spec: containers: - name: nginx image: nginx imagePullPolicy: IfNotPresent tolerations: - key: "example-key" operator: "Exists" effect: "NoSchedule" One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
  • 25. Kube-api-server Service controllers or kube-proxy kubelet in Node Container Runtime (e.g., Docker) Containers kubectl delete pod login-abcdf- 123adfc Pod Set to ‘Terminating’ state Pod removed from Endpoints pre-stop hook trigged SIGTERM signal is sent to each container kill <process> pre-stop hook executed Pod no more considered as valid replica SIGKILL signal is sent to each container kill -9 <process> Remove Pod from API Server Initiates SIGTERM Initiate SIGKILL Pods garbage collected 30 secs Grace Period Remove Pods and Cleans-up Deleting a Pod! - #ClaGIFied Controllers will start panicking
  • 26. Kubectl Kube-api-server Service controllers or kube-proxy kubelet in Node Container Runtime (e.g., Docker) Containers Kubectl drain node1 Pod Set to ‘Terminating’ state Pod removed from Endpoints pre-stop hook trigged SIGTERM signal is sent to each container kill <process> pre-stop hook executed Pod no more considered as valid replica SIGKILL signal is sent to each container kill -9 <process> Remove Pod from API Server Initiates SIGTERM Initiate SIGKILL Pods garbage collected 30 secs Grace Period Remove Pods and Cleans-up For Every Node – Cordon it For Every POD Cordon it – Mark Unschedulable Is PDB met? Retry Draining a Node! - #ClaGIFied
  • 28. POD DISRUPTION BUDGET A PDB limits the number of pods of a replicated application that are down simultaneously from voluntary disruptions.
  • 30. Your Deployment Pod Disruption Budget PDB e001/pdb.yaml apiVersion: policy/v1beta1 kind: PodDisruptionBudget metadata: name: loginBudget spec: minAvailable: 1 selector: matchLabels: app: login e001/dep-login.yaml apiVersion: apps/v1 kind: Deployment metadata: name: login-deployment labels: app: login spec: replicas: 1 selector: matchLabels: app: login . . . . . . Admin calls kubectl drain Programmatically using Eviction API https://www.youtube.com/watch?v=pNbkZMEDevs
  • 31. VOLUNTARY DISRUPTIONS Disclaimer: Not all disruptions will be protected by PDB Some examples includes: 1.Deleting a deployment directly 2.Deleting a pod directly
  • 32. How to determine right value for my PDB?
  • 33. How to determine right value for my PDB? • There is no single rule for this Few Examples will be: 1. You are running a Consul cluster with K8S and you want to maintain a quorum of minimum 3 server components for fault tolerance. In this case we can specify PDB’s minAvailable as 3. 2. You are running a statefulset for your database with K8S. And here you can specify PDB to avoid disruption in that DB, may be you need respective team to take DB backups and then confirm that you can perform the disruption. 3. For stateless microservice, you might say I need minimum 1 replica running all the time and set PDB accordingly. Like we saw in our demo sometime back. 4. And the list goes on. So it means for every workload you are running in your cluster the setup of PDB can differ. https://www.youtube.com/watch?v=pNbkZMEDevs
  • 34. https://github.com/mikkeloscar/pdb-controller/ The controller simply gets all Pod Disruption Budgets for each namespace and compares them to Deployments and StatefulSets. For any resource with more than 1 replica and no matching Pod Disruption Budget, a default PDB will be created Cool tip on PDB controller
  • 35. resources.requests(limits).cpu Resource Constraints and PriorityClass PriorityClass – Non-Namespaced object containers: - name: login image: "busybox:1" resources: requests: memory: "64Mi" cpu: "250m" limits: memory: "128Mi" cpu: "500m" command: - sleep - "7200" apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 1000000 globalDefault: false description: "This priority class should be used for High Priority service pods only." --- apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: low-priority value: 5000 globalDefault: false description: "This priority class should be used for Low Priority service pods only." resources.requests(limits).memory spec: priorityClassName: high-priority Pod Spec with reference to PriorityClassName Pod eviction during resource constraints
  • 36. failure-domain.beta.kubernetes.io/zone(< 1.17) topology.kubernetes.io/zone (>= 1.17) Topology Spread – Hosts/Zones/Regions pods/pod-with-pod-affinity.yaml apiVersion: apps/v1 kind: Deployment metadata: name: topo-emp-deployment labels: app: emp spec: replicas: 2 selector: matchLabels: app: emp template: metadata: labels: app: emp spec: affinity: podAntiAffinity: preferredDuringSchedulingIgnoredDuringExecution: - weight: 100 podAffinityTerm: topologyKey: failure- domain.beta.kubernetes.io/zone containers: - name: with-pod-affinity image: k8s.gcr.io/pause:2.0 topologyKey: kubernetes.io/hostname kubernetes.io/hostname topologyKey: failure-domain.beta.kubernetes.io/region failure-domain.beta.kubernetes.io/region(< 1.17) topology.kubernetes.io/region (>= 1.17) Not all similar Pods folks together Pods are HA during zonal or Region Failure https://cloud.google.com/compute/docs/regions-zones https://docs.microsoft.com/en-us/azure/availability-zones/az-overview https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using- regions-availability-zones.html#concepts-availability-zones AZ’s are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.
  • 37. Cluster admin deletes a pod by mistake A hardware failure of the physical machine or Virtual Machine Cluster admin deletes a node by mistake Pod gets evicted from node due to resource constraints Draining a node for repair or upgrade or to scale down Application Upgrade Choose Right Controller Pod Replicas Application Upgrade Strategy PDB Affinity and Anti Affinity/Taints and Tolerations Taints and Tolerations Topology Spread – Hosts/Zones/ Regions Resource Constraints and PriorityClass
  • 38. QoSLayer Application Layer Cluster Layer Infrastructure Layer Reliable Scalable Available Secured Performance
  • 39. Chennai, India 755 02 01 268 Narayanan.kmu@gmail.com bribebybytes.github.io/landing-page NARAYANAN KRISHNAMURTHY Technical Architect, ADP India Cloud Architect with 15 Years in IT CLOUD DEVOPS TELECOM Skills Languages English Tamil Hindi Being Social Bribe By Bytes Hobby 39