SlideShare a Scribd company logo
Stability anti-patterns in
cloud-native applications
#devfestRO
@ammbra1508
I am Ana
Solutions Architect @ IBM
Co-founder of Bucharest Software Craftsmanship
Community
HELLO!
#devfestRO @ammbra1508
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
What are anti-patterns?
An anti-pattern is a common response to a
recurring problem that is usually ineffective
and risks being highly counterproductive.
Wikipedia
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
STABILITY Continue to work with a
SYSTEM
DISRUPTIONS OCCUR
TEMPORARY SHOCKS
CONTINUOUS LOAD STRESS
COMPONENT FAILURES
Users can Even when
System Stability
Application deployments in a specific
order
#devfestRO @ammbra1508
#devfestRO
@ammbra1508
HELLO!
#devfestRO @ammbra1508
Examples and symptoms
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
A specific order to the start and stop processes when bringing up applications.
A deployment depends on previous successful deployment of another application.
A service waits for another component to be available.
kubectl wait --for=condition=ready pod -l app=backend
Examples and symptoms
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Why is it bad?
Wait time between deployments
equals application not fully functional
When the condition is never met, the next deployment
cannot proceed and the process breaks.
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Solutions at design level
Concurrently deploy and start all
parts of an application.
Use retry patterns. Use circuit-breaker patterns.
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Choose a deployment strategy
Blue/green deployments for instant rollout/rollback
LB
Pod
v2
Pod
v2
1.
Pod
v3
Pod
v3
LB
Pod
v2
Pod
v2
2.
Pod
v3
Pod
v3
LB
Pod
v3
Pod
v3
3.
Pod
v2
Pod
v2
LB
Pod
v3
Pod
v3
4.
Pod
v2
Pod
v2
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Choose a deployment strategy
Canary deployments when the user does the testing
LB
Pod
v3
Pod
v3
1.
Pod
v4
LB
Pod
v3
Pod
v3
2.
Pod
v4
LB
Pod
v3
Pod
v3
Pod
v4
3.
LB
Pod
v4
Pod
v4
Pod
v4
4.
Integration communication and
composition
#devfestRO @ammbra1508
#devfestRO
@ammbra1508
HELLO!
#devfestRO @ammbra1508
Synchronous
call-and-response
based system
Queue-based
messaging
systems
System-to-System
messaging via
SMTP or SMS
Integration via synchronous communication with
a software that forces the calling system to
wait/stop from what is doing.
Containerizing the middleware as is.
Distorted usage of declarative deployment pattern.
Examples and symptoms
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Requests may be sent but
not receive a reply.
The provider claims to
send a different response
format.
Synchronous calls are
vicious amplifiers that
facilitate blockages.
Tightly coupled
middleware amplifies
shocks to the system.
Why is it bad?
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Application Level Solutions (1)
Circuit Breaker pattern
✚ Consider delayed retries.
✚ Report, record and correlate state changes.
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Application Level Solutions (2)
Failure rate
threshold
Configuration example for SpringBoot with Resillience4j
Allowed number
of calls in half-
open state
Sliding window
Wait duration
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Is this enough?
Yes, if your end-user is happy that
<<real>> data is not present in the response.
Yes, if your fallback method retrieves a
unified satisfying response.
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Transition Solutions (1)
DETERMINE THE
OUTAGE PATTERN
BASED ON MAINTENANCE
WINDOW
CACHE SOLUTION
Stress
testing
+
Load
Testing
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Transition Solutions (2)
Cache Hit:
Check if TTL of first layer cache expired
NO: Return the response from Redis.
YES: Go at 2nd layer cache and find entry:
Try to call the real method
On Success: Store the new result with a proper TTL
On Failure: Extend the existing TTL to put it back into the
first layer and return the result.
Cache miss:
Try to call the real method
On Success: Store the new result with a proper TTL
On Failure: Return a result covering the miss. Record the failure.
helm3 install my-redis stable/redis
Source: “Inside out” animation by Pixar
https://www.psychologies.co.uk/inside-out-interview-creators
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Circuit
breaker
Traefik
Solutions at Kubernetes level(1)
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Circuit
breaker
Istio Traefik
Solutions at Kubernetes level(2)
Unlimited resources mirage
#devfestRO @ammbra1508
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Examples and symptoms
Not setting memory or CPU can result into scheduling an unlimited number of pods
on any node.
The container could use all of the available memory on its node, possibly invoking
the OOM (out of memory) Killer.
The default memory limit of the namespace (in which the container is running) is
assigned to the container.
#devfestRO#devfestRO @ammbra1508
Why is it bad?
#devfestRO
@ammbra1508
#devfestRO @ammbra1508
Solutions at Kubernetes level
Set memory and CPU requests
below their limits
Control resource limits via
ResourceQuotas and LimitRange in
the namespace settings.
Keep the CPU request at 1 core
or below and
use ReplicaSets to scale it out.
Scaling results
#devfestRO @ammbra1508
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
2 hosts, 2 instances, 300
threads
3000
threads
70
instances
20 hosts
Anytime you have a “many-to-one” or “many-to-few”
relationship.
Amplify the scaling effects through “shared resource” or
”commons project”.
Dangerously combining horizontal and vertical
autoscaling.
Counterproductive usage of predictable demands and
elastic scale patterns.
Examples and symptoms
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
One service can flood another with
requests beyond its capacity.
Shared resources are a
capacity constraint.
Why is it bad?
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Solutions at design level
Approximate a shared-nothing
architecture through reducing the number
of callers of the shared resource.
Design for pairs of applications that
each act as a failover for the other.
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Provision cluster nodes to have the same resource
footprint.
Ensure that the cluster autoscaler pod has enough
resources.
To avoid delays in provisioning, over-provision your
cluster.
https://github.com/kubernetes/autoscaler/blob/master/cluster-
autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with-
cluster-autoscaler
Solutions at Kubernetes level(1)
#devfestRO
@ammbra1508
• I am Ana
• Solutions Architect @ IBM
• Co-founder of Bucharest Software
Craftsmanship Community
HELLO!
#devfestRO @ammbra1508
Ensure that every pod has resource requests defined.
Validate that resource requests are close to actual usage.
Install metrics-server and configure custom/external metrics.
Specify PodDisruptionBudget for application pods.
Solutions at Kubernetes level(2)
#devfestRO @ammbra1508
Takeaways
Avoid deploying things in a specific order: applications should not wait
because a dependency is not ready.
Consider setting memory and CPU limits to reduce the risk of resource
contention and that resource requests are close to actual usage.
Utilize Kubernetes’s self-healing mechanism, implement retries and
circuit breakers both at application and Kubernetes level.
Avoid using both HPA and VPA; consider installing metrics-server and
adding custom metrics for horizontal scaling.
Thank YOU!
#devfestRO @ammbra1508

More Related Content

PDF
Spring Cloud Kubernetes: An Easier Path from Idea to Production
PDF
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
PDF
Fully Orchestrating Applications, Microservices and Enterprise Services with ...
PPTX
Continuous Everything in a Multi-cloud and Multi-platform Environment
PPTX
StripeCon 2021: A Cloud-Native approach to running Silverstripe on Google Clo...
PPTX
Achieving DevSecOps Outcomes with Tanzu Advanced- March 22, 2021
PDF
Service Mesh: Two Big Words But Do You Need It?
PDF
Enterprise Application Migration
Spring Cloud Kubernetes: An Easier Path from Idea to Production
Cloud-Native Patterns and the Benefits of MySQL as a Platform Managed Service
Fully Orchestrating Applications, Microservices and Enterprise Services with ...
Continuous Everything in a Multi-cloud and Multi-platform Environment
StripeCon 2021: A Cloud-Native approach to running Silverstripe on Google Clo...
Achieving DevSecOps Outcomes with Tanzu Advanced- March 22, 2021
Service Mesh: Two Big Words But Do You Need It?
Enterprise Application Migration

What's hot (20)

PPTX
From Pivotal to VMware Tanzu: What you need to know
PPTX
A Leader’s Guide to DevOps Practices and Culture
PDF
Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...
PDF
Welcome to the Metrics
PDF
Accelerate Application Migration - August 5, 2020
PDF
“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events
PDF
From Monolith to K8s - Spring One 2020
PDF
Delivering-Off-The-Shelf Software with Kubernetes- November 12, 2020
PDF
Unlock Sustainable Kubernetes Services for TAS
PPTX
Achieving DevSecOps Outcomes with Tanzu Advanced- May 25, 2021
PPTX
State of Steeltoe 2020
PDF
IoT Scale Event-Stream Processing for Connected Fleet at Penske
PDF
The Path Towards Spring Boot Native Applications
PPTX
Introduction to KubeSphere and its open source ecosystem
PDF
vSphere with Kubernetes Virtual Event- June 16, 2020
PDF
Cloud Trends Nov2015 Structure
PDF
Next Generation Vulnerability Assessment Using Datadog and Snyk
PDF
Cloud Native DevOps
PDF
Tanzu Standard
PPTX
Containers: Give Me The Facts, Not The Hype - AppD Summit Europe
From Pivotal to VMware Tanzu: What you need to know
A Leader’s Guide to DevOps Practices and Culture
Concourse, Spinnaker, Cloud Foundry, Oh My! Creating Sophisticated Deployment...
Welcome to the Metrics
Accelerate Application Migration - August 5, 2020
“Sh*^%# on Fire, Yo!”: A True Story Inspired by Real Events
From Monolith to K8s - Spring One 2020
Delivering-Off-The-Shelf Software with Kubernetes- November 12, 2020
Unlock Sustainable Kubernetes Services for TAS
Achieving DevSecOps Outcomes with Tanzu Advanced- May 25, 2021
State of Steeltoe 2020
IoT Scale Event-Stream Processing for Connected Fleet at Penske
The Path Towards Spring Boot Native Applications
Introduction to KubeSphere and its open source ecosystem
vSphere with Kubernetes Virtual Event- June 16, 2020
Cloud Trends Nov2015 Structure
Next Generation Vulnerability Assessment Using Datadog and Snyk
Cloud Native DevOps
Tanzu Standard
Containers: Give Me The Facts, Not The Hype - AppD Summit Europe
Ad

Similar to Stability anti patterns in cloud-native applications (20)

PPTX
Microservices Resilient Engineering - Java meetup.pptx
PDF
Architecting for failure - Why are distributed systems hard?
PDF
Resisting to The Shocks
PDF
The anatomy of a cascading failure
PDF
Resilient service to-service calls in a post-Hystrix world
PDF
The 7 quests of resilient software design
PDF
Reliability and Resilience Patterns
PPTX
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
PDF
Cloud native resiliency patterns from the ground up
PDF
Architecting for Failures in micro services: patterns and lessons learned
PDF
Distributed Systems at Scale: Reducing the Fail
PDF
Software Architecture Anti-Patterns
PPTX
Expect the unexpected: Anticipate and prepare for failures in microservices b...
PDF
"Resilient by Design: Strategies for Building Robust Architecture at Uklon", ...
PDF
Cloud native resiliency patterns from the ground up
PDF
Resilient Functional Service Design
PDF
Microservice Resilience Patterns @VoxxedCern'24
PDF
Resilience Planning & How the Empire Strikes Back
PPTX
Resilience planning and how the empire strikes back
PPTX
Surviving Black Friday - CodeMotion
Microservices Resilient Engineering - Java meetup.pptx
Architecting for failure - Why are distributed systems hard?
Resisting to The Shocks
The anatomy of a cascading failure
Resilient service to-service calls in a post-Hystrix world
The 7 quests of resilient software design
Reliability and Resilience Patterns
Serhiy Kalinets "Embracing architectural challenges in the modern .NET world"
Cloud native resiliency patterns from the ground up
Architecting for Failures in micro services: patterns and lessons learned
Distributed Systems at Scale: Reducing the Fail
Software Architecture Anti-Patterns
Expect the unexpected: Anticipate and prepare for failures in microservices b...
"Resilient by Design: Strategies for Building Robust Architecture at Uklon", ...
Cloud native resiliency patterns from the ground up
Resilient Functional Service Design
Microservice Resilience Patterns @VoxxedCern'24
Resilience Planning & How the Empire Strikes Back
Resilience planning and how the empire strikes back
Surviving Black Friday - CodeMotion
Ad

More from Ana-Maria Mihalceanu (20)

PDF
Empower Inclusion Through Accessible Java Applications
PDF
Java 25 and Beyond - A Roadmap of Innovations
PDF
Sécuriser les Applications Java Contre les Menaces Quantiques
PDF
Des joyaux de code natif aux trésors Java avec jextract
PDF
From native code gems to Java treasures with jextract
PDF
Exciting Features and Enhancements in Java 23 and 24
PDF
Monitoring Java Application Security with JDK Tools and JFR Events
PDF
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
PDF
From native code gems to Java treasures with jextract
PDF
Monitoring Java Application Security with JDK Tools and JFR Events
PDF
Java 23 and Beyond - A Roadmap Of Innovations
PDF
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
PDF
Monitoring Java Application Security with JDK Tools and JFR Events
PDF
Java 22 and Beyond- A Roadmap of Innovations
PDF
Surveillance de la sécurité des applications Java avec les outils du JDK e...
PDF
A Glance At The Java Performance Toolbox
PDF
Monitoring Java Application Security with JDK Tools and JFR Events.pdf
PDF
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
PDF
Java 21 Language Features and Beyond
PDF
From Java 17 to 21- A Showcase of JDK Security Enhancements
Empower Inclusion Through Accessible Java Applications
Java 25 and Beyond - A Roadmap of Innovations
Sécuriser les Applications Java Contre les Menaces Quantiques
Des joyaux de code natif aux trésors Java avec jextract
From native code gems to Java treasures with jextract
Exciting Features and Enhancements in Java 23 and 24
Monitoring Java Application Security with JDK Tools and JFR Events
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
From native code gems to Java treasures with jextract
Monitoring Java Application Security with JDK Tools and JFR Events
Java 23 and Beyond - A Roadmap Of Innovations
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Monitoring Java Application Security with JDK Tools and JFR Events
Java 22 and Beyond- A Roadmap of Innovations
Surveillance de la sécurité des applications Java avec les outils du JDK e...
A Glance At The Java Performance Toolbox
Monitoring Java Application Security with JDK Tools and JFR Events.pdf
Enhancing Productivity and Insight A Tour of JDK Tools Progress Beyond Java 17
Java 21 Language Features and Beyond
From Java 17 to 21- A Showcase of JDK Security Enhancements

Recently uploaded (20)

PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Encapsulation theory and applications.pdf
PPTX
Machine Learning_overview_presentation.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PPT
Teaching material agriculture food technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Digital-Transformation-Roadmap-for-Companies.pptx
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
Encapsulation theory and applications.pdf
Machine Learning_overview_presentation.pptx
Encapsulation_ Review paper, used for researhc scholars
Mobile App Security Testing_ A Comprehensive Guide.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Assigned Numbers - 2025 - Bluetooth® Document
NewMind AI Weekly Chronicles - August'25-Week II
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Teaching material agriculture food technology
“AI and Expert System Decision Support & Business Intelligence Systems”
sap open course for s4hana steps from ECC to s4
Per capita expenditure prediction using model stacking based on satellite ima...
Big Data Technologies - Introduction.pptx

Stability anti patterns in cloud-native applications

  • 1. Stability anti-patterns in cloud-native applications #devfestRO @ammbra1508
  • 2. I am Ana Solutions Architect @ IBM Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508
  • 3. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 What are anti-patterns? An anti-pattern is a common response to a recurring problem that is usually ineffective and risks being highly counterproductive. Wikipedia
  • 4. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 STABILITY Continue to work with a SYSTEM DISRUPTIONS OCCUR TEMPORARY SHOCKS CONTINUOUS LOAD STRESS COMPONENT FAILURES Users can Even when System Stability
  • 5. Application deployments in a specific order #devfestRO @ammbra1508
  • 7. #devfestRO @ammbra1508 #devfestRO @ammbra1508 A specific order to the start and stop processes when bringing up applications. A deployment depends on previous successful deployment of another application. A service waits for another component to be available. kubectl wait --for=condition=ready pod -l app=backend Examples and symptoms
  • 8. #devfestRO @ammbra1508 #devfestRO @ammbra1508 Why is it bad? Wait time between deployments equals application not fully functional When the condition is never met, the next deployment cannot proceed and the process breaks.
  • 9. #devfestRO @ammbra1508 #devfestRO @ammbra1508 Solutions at design level Concurrently deploy and start all parts of an application. Use retry patterns. Use circuit-breaker patterns.
  • 10. #devfestRO @ammbra1508 #devfestRO @ammbra1508 Choose a deployment strategy Blue/green deployments for instant rollout/rollback LB Pod v2 Pod v2 1. Pod v3 Pod v3 LB Pod v2 Pod v2 2. Pod v3 Pod v3 LB Pod v3 Pod v3 3. Pod v2 Pod v2 LB Pod v3 Pod v3 4. Pod v2 Pod v2
  • 11. #devfestRO @ammbra1508 #devfestRO @ammbra1508 Choose a deployment strategy Canary deployments when the user does the testing LB Pod v3 Pod v3 1. Pod v4 LB Pod v3 Pod v3 2. Pod v4 LB Pod v3 Pod v3 Pod v4 3. LB Pod v4 Pod v4 Pod v4 4.
  • 13. #devfestRO @ammbra1508 HELLO! #devfestRO @ammbra1508 Synchronous call-and-response based system Queue-based messaging systems System-to-System messaging via SMTP or SMS Integration via synchronous communication with a software that forces the calling system to wait/stop from what is doing. Containerizing the middleware as is. Distorted usage of declarative deployment pattern. Examples and symptoms
  • 14. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Requests may be sent but not receive a reply. The provider claims to send a different response format. Synchronous calls are vicious amplifiers that facilitate blockages. Tightly coupled middleware amplifies shocks to the system. Why is it bad?
  • 15. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Application Level Solutions (1) Circuit Breaker pattern ✚ Consider delayed retries. ✚ Report, record and correlate state changes.
  • 16. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Application Level Solutions (2) Failure rate threshold Configuration example for SpringBoot with Resillience4j Allowed number of calls in half- open state Sliding window Wait duration
  • 17. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Is this enough? Yes, if your end-user is happy that <<real>> data is not present in the response. Yes, if your fallback method retrieves a unified satisfying response.
  • 18. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Transition Solutions (1) DETERMINE THE OUTAGE PATTERN BASED ON MAINTENANCE WINDOW CACHE SOLUTION Stress testing + Load Testing
  • 19. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Transition Solutions (2) Cache Hit: Check if TTL of first layer cache expired NO: Return the response from Redis. YES: Go at 2nd layer cache and find entry: Try to call the real method On Success: Store the new result with a proper TTL On Failure: Extend the existing TTL to put it back into the first layer and return the result. Cache miss: Try to call the real method On Success: Store the new result with a proper TTL On Failure: Return a result covering the miss. Record the failure. helm3 install my-redis stable/redis
  • 20. Source: “Inside out” animation by Pixar https://www.psychologies.co.uk/inside-out-interview-creators
  • 21. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Circuit breaker Traefik Solutions at Kubernetes level(1)
  • 22. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Circuit breaker Istio Traefik Solutions at Kubernetes level(2)
  • 25. Not setting memory or CPU can result into scheduling an unlimited number of pods on any node. The container could use all of the available memory on its node, possibly invoking the OOM (out of memory) Killer. The default memory limit of the namespace (in which the container is running) is assigned to the container. #devfestRO#devfestRO @ammbra1508 Why is it bad?
  • 26. #devfestRO @ammbra1508 #devfestRO @ammbra1508 Solutions at Kubernetes level Set memory and CPU requests below their limits Control resource limits via ResourceQuotas and LimitRange in the namespace settings. Keep the CPU request at 1 core or below and use ReplicaSets to scale it out.
  • 28. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 2 hosts, 2 instances, 300 threads 3000 threads 70 instances 20 hosts Anytime you have a “many-to-one” or “many-to-few” relationship. Amplify the scaling effects through “shared resource” or ”commons project”. Dangerously combining horizontal and vertical autoscaling. Counterproductive usage of predictable demands and elastic scale patterns. Examples and symptoms
  • 29. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 One service can flood another with requests beyond its capacity. Shared resources are a capacity constraint. Why is it bad?
  • 30. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Solutions at design level Approximate a shared-nothing architecture through reducing the number of callers of the shared resource. Design for pairs of applications that each act as a failover for the other.
  • 31. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Provision cluster nodes to have the same resource footprint. Ensure that the cluster autoscaler pod has enough resources. To avoid delays in provisioning, over-provision your cluster. https://github.com/kubernetes/autoscaler/blob/master/cluster- autoscaler/FAQ.md#how-can-i-configure-overprovisioning-with- cluster-autoscaler Solutions at Kubernetes level(1)
  • 32. #devfestRO @ammbra1508 • I am Ana • Solutions Architect @ IBM • Co-founder of Bucharest Software Craftsmanship Community HELLO! #devfestRO @ammbra1508 Ensure that every pod has resource requests defined. Validate that resource requests are close to actual usage. Install metrics-server and configure custom/external metrics. Specify PodDisruptionBudget for application pods. Solutions at Kubernetes level(2)
  • 33. #devfestRO @ammbra1508 Takeaways Avoid deploying things in a specific order: applications should not wait because a dependency is not ready. Consider setting memory and CPU limits to reduce the risk of resource contention and that resource requests are close to actual usage. Utilize Kubernetes’s self-healing mechanism, implement retries and circuit breakers both at application and Kubernetes level. Avoid using both HPA and VPA; consider installing metrics-server and adding custom metrics for horizontal scaling.