Production Grade Kubernetes Applications

Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
1

Have some
Bad News &
Good News!

Just because you containerized, Kubernetized and Cloudified your
application, doesn’t mean its Reliable, Scalable and Secured automatically

Bad News First!
• Your Hardware will fail
• Your Enterprise grade application will fail
• Your Cloud will fail
• Your Kubernetes cluster will fail
Embrace it!

QoSLayer Application Layer Cluster Layer Infrastructure Layer
Reliable
Scalable
Available
Secured
Performance

INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS

INVOLUNTARY
DISRUPTIONS
VOLUNTARY
DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp

INVOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A POD BY MISTAKE ☹

Master
Worker 1 Worker 2
login Emp Emp
A HARDWARE FAILURE OF THE PHYSICAL MACHINE or VM

Master
Worker 1 Worker 2
login Emp Emp
CLUSTER ADMIN DELETES A NODE BY MISTAKE ☹

Master
Worker 1
New Pod
Worker 2
login Emp Emp
POD GETS EVICTED FROM NODE DUE TO RESOURCE CONSTRAINTS
New Pod New Pod
New Pod
New Pod New Pod New Pod

VOLUNTARY DISRUPTIONS
Master
Worker 1 Worker 2
login Emp Emp

Master
Worker 1 Worker 2
login Emp Emp
DRAINING A NODE FOR REPAIR OR UPGRADE OR TO SCALE DOWN

Master
Worker 1 Worker 2
login Emp
Emp

Master
Worker 1 Worker 2
login Emp
Emp
PENDING QUEUE!

Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade

https://www.plectica.com/maps/I7WZTGITU/edit/RAKHSLAXT

Choose Right Controller/Storage Req
Pod Replicas
Application Upgrade Strategy
https://www.youtube.com/watch?v=c7ytxiddImw
spec:
replicas: 1
deployment.spec.replicas
deployment.spec.strategy statefulset.spec.updateStrategy
Recreate – deletes all
RollingUpdate – one pod upgrade at a time
OnDelete – only on Delete | Partition(canary)
RollingUpdate
daemonset.spec. updateStrategy
onDelete
RollingUpdate
https://www.youtube.com/watch?v=GQJP9QdHHs8
deployment daemonset statefulset job ephemeral persistent
https://www.youtube.com/watch?v=c7ytxiddImw
Pod eviction during
resource constraints
Node disk or mem
pressures

liveness
Liveness and Readiness Probes
readiness
pods/probe/exec-liveness.yaml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5
pods/probe/tcp-liveness-readiness.yaml
apiVersion: v1
kind: Pod
metadata:
name: goproxy
labels:
app: goproxy
spec:
containers:
- name: goproxy
image: k8s.gcr.io/goproxy:0.1
ports:
- containerPort: 8080
readinessProbe:
tcpSocket:
port: 8080
periodSeconds: 10
livenessProbe:
tcpSocket:
port: 8080
periodSeconds: 20
https://www.youtube.com/watch?v=u7sbDPmezAo

node-selectors
Affinity and Anti Affinity
nodeAffinity podAffinity and podAntiAffinity
pods/pod-nginx.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
pods/pod-with-node-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-node-affinity
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/e2e-az-name
operator: In
values:
- e2e-az1
- e2e-az2
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
preference:
matchExpressions:
- key: another-node-label-key
operator: In
values:
- another-node-label-value
containers:
- name: with-node-affinity
image: k8s.gcr.io/pause:2.0
pods/pod-with-pod-affinity.yaml
apiVersion: v1
kind: Pod
metadata:
name: with-pod-affinity
spec:
affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S1
podAntiAffinity:
- weight: 100
podAffinityTerm:
labelSelector:
matchExpressions:
- key: security
operator: In
values:
- S2
containers:
- name: with-pod-affinity
Pod eviction during
Node disk or mem
pressures
Not all similar Pods
folks together

naive-dep-login
Affinity and Anti Affinity
self-relialized-dep-login
01-affinity-antiaffinity1-naive-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: login-deployment
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"
01-affinity-antiaffinity2-self-relialized-dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
template:
metadata:
labels:
app: login
spec:
affinity:
nodeAffinity:
- preference:
matchExpressions:
- key: color
operator: In
values:
- blue
weight: 1
containers:
- name: login
image: "busybox:1"
command:
- sleep
- "7200"

taint
Taints and Tolerations
Node affinity, is a property of Pods that attracts them to a set of nodes (either as a preference or a
hard requirement). Taints are the opposite -- they allow a node to repel a set of pods.
toleration
kubectl taint nodes node1 key=value:NoSchedule pods/pod-with-toleration.yaml
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
tolerations:
- key: "example-key"
operator: "Exists"
effect: "NoSchedule"
One or more taints are applied to
a node; this marks that the node
should not accept any pods that
do not tolerate the taints.

Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
kubectl delete pod login-abcdf-
123adfc
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
Deleting a Pod! - #ClaGIFied
Controllers will
start panicking

Kubectl Kube-api-server Service controllers
or kube-proxy
kubelet in Node Container Runtime
(e.g., Docker)
Containers
Kubectl drain
node1
Pod Set to
‘Terminating’
state
Pod removed
from Endpoints
pre-stop
hook trigged
SIGTERM
signal is sent
to each
container
kill <process>
pre-stop hook
executed
Pod no more
considered as
valid replica
SIGKILL signal
is sent to each
container
kill -9
<process>
Remove
Pod from
API
Server
Initiates
SIGTERM
Initiate
SIGKILL
Pods garbage
collected
30
secs
Grace
Period
Remove Pods
and Cleans-up
For Every Node –
Cordon it
For Every
POD
Cordon it –
Mark
Unschedulable
Is
PDB
met?
Retry
Draining
a Node!
-
#ClaGIFied

POD DISRUPTION BUDGET
A PDB limits the number of pods of a replicated
application that are down simultaneously from
voluntary disruptions.

Your Deployment
Pod Disruption Budget
PDB
e001/pdb.yaml
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: loginBudget
spec:
minAvailable: 1
selector:
matchLabels:
app: login
e001/dep-login.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: login
spec:
replicas: 1
selector:
matchLabels:
app: login
.
.
.
.
.
.
Admin calls kubectl drain
Programmatically using
Eviction API
https://www.youtube.com/watch?v=pNbkZMEDevs

Disclaimer: Not all disruptions will be protected by
PDB
Some examples includes:
1.Deleting a deployment directly
2.Deleting a pod directly

How to determine
right value for my
PDB?

How to determine right value for my PDB?
• There is no single rule for this
Few Examples will be:
1. You are running a Consul cluster with K8S and you want to maintain a quorum of minimum 3 server components for fault
tolerance. In this case we can specify PDB’s minAvailable as 3.
2. You are running a statefulset for your database with K8S. And here you can specify PDB to avoid disruption in that DB, may
be you need respective team to take DB backups and then confirm that you can perform the disruption.
3. For stateless microservice, you might say I need minimum 1 replica running all the time and set PDB accordingly. Like we
saw in our demo sometime back.
4. And the list goes on.
So it means for every workload you are running in your cluster the setup of PDB can differ.
https://www.youtube.com/watch?v=pNbkZMEDevs

https://github.com/mikkeloscar/pdb-controller/
The controller simply gets all Pod Disruption Budgets for each namespace and
compares them to Deployments and StatefulSets. For any resource with
more than 1 replica and no matching Pod Disruption Budget, a default PDB
will be created
Cool tip on PDB controller

resources.requests(limits).cpu
Resource Constraints and PriorityClass
PriorityClass – Non-Namespaced object
containers:
- name: login
image: "busybox:1"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
command:
- sleep
- "7200"
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class should be used for
High Priority service pods only."
---
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: low-priority
value: 5000
globalDefault: false
description: "This priority class should be used for
Low Priority service pods only."
resources.requests(limits).memory
spec:
priorityClassName: high-priority
Pod Spec with reference to PriorityClassName
Pod eviction during

failure-domain.beta.kubernetes.io/zone(< 1.17)
topology.kubernetes.io/zone (>= 1.17)
Topology Spread – Hosts/Zones/Regions
pods/pod-with-pod-affinity.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: topo-emp-deployment
labels:
app: emp
spec:
replicas: 2
selector:
matchLabels:
app: emp
template:
metadata:
labels:
app: emp
spec:
affinity:
podAntiAffinity:
- weight: 100
podAffinityTerm:
topologyKey: failure-
domain.beta.kubernetes.io/zone
containers:
- name: with-pod-affinity
topologyKey: kubernetes.io/hostname
kubernetes.io/hostname
topologyKey: failure-domain.beta.kubernetes.io/region
failure-domain.beta.kubernetes.io/region(< 1.17)
topology.kubernetes.io/region (>= 1.17)
Not all similar Pods
folks together
Pods are HA during zonal
or Region Failure
https://cloud.google.com/compute/docs/regions-zones
https://docs.microsoft.com/en-us/azure/availability-zones/az-overview
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-
regions-availability-zones.html#concepts-availability-zones
AZ’s are physically separated by a meaningful distance, many kilometers, from any other AZ, although all are within 100 km (60 miles) of each other.

Cluster admin deletes a pod
by mistake
A hardware failure of the
physical machine or Virtual
Machine
Cluster admin deletes a node
by mistake
Pod gets evicted from node
due to resource constraints
Draining a node for repair or
upgrade or to scale down
Application Upgrade
Choose Right
Controller
Pod Replicas
Application
Upgrade Strategy
PDB
Affinity and Anti
Affinity/Taints and
Tolerations
Taints and
Tolerations
Topology Spread –
Hosts/Zones/
Regions
Resource
Constraints and
PriorityClass

Chennai, India
755 02 01 268
Narayanan.kmu@gmail.com
bribebybytes.github.io/landing-page
NARAYANAN
KRISHNAMURTHY
Technical Architect, ADP India
Cloud Architect with 15 Years in IT
CLOUD DEVOPS TELECOM
Skills
Languages
English
Tamil
Hindi
Being Social
Bribe By Bytes
Hobby
39

Production Grade Kubernetes Applications

More Related Content

What's hot (20)

Similar to Production Grade Kubernetes Applications (20)

Recently uploaded (20)

Production Grade Kubernetes Applications