SlideShare a Scribd company logo
Service Mesh vs. Frameworks:
Where to put the resilience?
Michael Hofmann
https://hofmann-itconsulting.de
(1) Distributed Systems and Resilience
(2) Framework
(3) Service Mesh
(4) Framework and Service Mesh Characteristics
(5) Thoughts about Resilience
(6) Essential Requirements
(7) Conclusion
Agenda
Distributed Systems
➔ degree of distribution raises failure rate!
➔ compensation strategy: resilience!
slow response
timeout
aborted network connection
...
Typical Communication Errors Fallacies of Distributed Computing
The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn't change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.
Hystrix
Alternative: Service Mesh?!
Resilience
Resilience4J
Failsafe
MicroProfile Fault Tolerance
…
Framework „DEATH“ Framework ACTIVE
Resilience Patterns
─
Timeout
─
Retry
─
Fallback
─
Circuit Breaker
─
Bulkhead
many more:
Uwe Friedrichsen: “Patterns of resilience” https://www.slideshare.net/ufried/patterns-of-resilience
@CircuitBreaker(successThreshold = 10,
requestVolumeThreshold = 4, failureRatio=0.5, delay = 1000)
public Connection serviceA() {
Connection conn = null;
counterForInvokingServiceA++;
conn = connectionService();
return conn;
}
MicroProfile Fault Tolerance
@Retry(maxRetries = 3)
@Fallback(fallbackMethod = "doFallback")
public Result doWork() {
return callServiceA(); // fallback on RuntimeException
}
private Result doFallback() {
return ...;
}
Service Mesh
The term service mesh is used to describe the
network of microservices that make up such
application and the interactions between them.
(istio.io)
Don’t manage a Service Mesh without tooling!
Requirements:
(1) manage calls on layer 7 (application layer, L7)
(2) resilience, routing, security and telemetry
(3) decentralized & transparent for services (implementation independent)
Istio Architecture
Resilience Patterns in Istio
✔
Timeout
✔
Retry
✔
CircuitBreaker
✔
Bulkhead
✗
Fallback?
✗
is a Fallback possible?
✗
less technical, more business driven
https://dzone.com/articles/fallbacks-are-overrated-architecting-for-resilienc
Resilience in Istio
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: VirtualService
metadata:
name: reviews
spec:
hosts:
- reviews
http:
- route:
- destination:
host: reviews
subset: v1
retries:
attempts: 3
perTryTimeout: 2s
EOF
$ kubectl apply -f - <<EOF
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: reviews
spec:
host: reviews
trafficPolicy:
connectionPool:
tcp:
maxConnections: 1
http:
http1MaxPendingRequests: 1
maxRequestsPerConnection: 1
outlierDetection:
consecutiveErrors: 1
interval: 1s
baseEjectionTime: 3m
maxEjectionPercent: 100
EOF
Resilience in Istio
Apply to sidecar
Resilience rules
— transparent for service
— act global on all sidecars
Fault Injection
MicroProfile with Istio setting
apiVersion:networking.istio.io/
v1alpha3
kind: VirtualService
metadata:
name: ratings
...
spec:
hosts:
- ratings
http:
- fault:
delay:
fixedDelay: 7s
percent: 100
MP_Fault_Tolerance_NonFallback_Enabled = false
Frameworks Characteristics
—
Java: a lot of different frameworks
—
Team decides framework?!?
—
Learning curve for every framework
—
Different frameworks behave different
—
Same framework in different version behave different
—
Same framework in different versions parallel in use
Frameworks Characteristics
➔ Change of framework:
➔ Replace all positions in code
➔ New behavior
➔ New deployment
➔ New tests
➔ Risk of chain reaction:
framework ➔ load balancing ➔ service registry
➔ Multiple service registries for every different framework?
Service Mesh Characteristics
—
Define new rule
—
Same behavior (… no framework change)
—
unchanged deployed service
—
new tests only for new rules
—
Client-side load balancing in sidecar
—
Service Registry based on endpoints in K8S
$ kubectl apply -f ...
Thoughts about Resilience
Resilience pattern still correct if communication behavior changes?
—
Modified behavior of partner
—
Modified communication partner
—
Modified infrastructure
—
Load changes during day
—
Side effects from other systems
—
Anticipate problems of tomorrow?
Thoughts about Resilience
—
Main problem: choose the right resilience pattern
—
Correct parameters for pattern?
—
Measure resilience
—
Mostly: try & error for suitable pattern/params
(main reason for end of life in hystrix)
—
Often: retry storm
—
Often: missing musketeer principle
(black sheep)
Essential Requirements
—
Modification: Quick and easy change of
(1) params for chosen pattern
(2) resilience pattern
—
Test
—
Monitoring
—
No black sheep
Essential Requirements
Istio Framework
Modification
+ Modify Params
- Change Pattern: Lifecycle
Test Fault Injection complicated
Monitoring + +
Black Sheep
No:
rule in all sidecars
$ kubectl apply -f ...
Conclusion
—
Comparable resilience patterns
—
Missing fallback in service mesh (but overrated)
—
Higher flexibility in service mesh
—
Fault injection easy in service mesh
Solve problems where they arise!
Service Mesh for L4-L7
Developer for L8 (original profession)

More Related Content

PPTX
PPTX
Vertx – reactive toolkit
PPTX
Planning to Fail #phpuk13
PDF
Big Data Streams Architectures. Why? What? How?
PDF
Architecting for failure - Why are distributed systems hard?
PDF
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
PPTX
Architecture of Falcon, a new chat messaging backend system build on Scala
PPTX
Designing distributed systems
Vertx – reactive toolkit
Planning to Fail #phpuk13
Big Data Streams Architectures. Why? What? How?
Architecting for failure - Why are distributed systems hard?
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Architecture of Falcon, a new chat messaging backend system build on Scala
Designing distributed systems

Similar to Service Mesh vs. Frameworks: Where to put the resilience? (20)

PDF
Three Degrees of Mediation: Challenges and Lessons in building Cloud-agnostic...
PPTX
NoSQL Introduction, Theory, Implementations
PDF
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
PPT
Performance testing virtualized systems v5
PDF
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
PPTX
Planning to Fail #phpne13
PDF
Design (Cloud systems) for Failures
PDF
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
PPTX
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
PPTX
Microsoft Azure Cloud Basics Tutorial
PPTX
CLUSTER COMPUTING
PPTX
Distributed systems and scalability rules
PDF
170215 msa intro
PPTX
Introduction
PDF
Java Abs Dynamic Server Replication
DOCX
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
DOCX
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
PDF
Could the “C” in HPC stand for Cloud?
PPT
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
PDF
Azure and cloud design patterns
Three Degrees of Mediation: Challenges and Lessons in building Cloud-agnostic...
NoSQL Introduction, Theory, Implementations
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
Performance testing virtualized systems v5
Infinit's Next Generation Key-value Store - Julien Quintard and Quentin Hocqu...
Planning to Fail #phpne13
Design (Cloud systems) for Failures
Amazon EKS 그리고 Service Mesh (김세호 솔루션즈 아키텍트, AWS) :: Gaming on AWS 2018
Stephane Lapointe & Alexandre Brisebois: Développer des microservices avec Se...
Microsoft Azure Cloud Basics Tutorial
CLUSTER COMPUTING
Distributed systems and scalability rules
170215 msa intro
Introduction
Java Abs Dynamic Server Replication
IEEE 2014 JAVA CLOUD COMPUTING PROJECTS Scalable analytics for iaa s cloud av...
2014 IEEE JAVA CLOUD COMPUTING PROJECT Scalable analytics for iaas cloud avai...
Could the “C” in HPC stand for Cloud?
Fundamentals Of Transaction Systems - Part 1: Causality banishes Acausality ...
Azure and cloud design patterns
Ad

More from Michael Hofmann (13)

PDF
Service Specific AuthZ In The Cloud Infrastructure
PDF
New Ways To Production - Stress-Free Evolution Of Your Cloud Applications
PDF
Developer Experience Cloud Native - Become Efficient and Achieve Parity
PDF
The Easy Way to Secure Microservices
PDF
Service Mesh vs. Frameworks: Where to put the resilience?
PDF
Developer Experience Cloud Native - From Code Gen to Git Commit without a CI/...
PDF
Servicierung von Monolithen - Der Weg zu neuen Technologien bis hin zum Servi...
PDF
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
PDF
Service Mesh - kilometer 30 in a microservice marathon
PDF
Service Mesh - Kilometer 30 im Microservices-Marathon
PDF
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
PPTX
Microprofile.io - Cloud Native mit Java EE
PPTX
Microservices mit Java EE - am Beispiel von IBM Liberty
Service Specific AuthZ In The Cloud Infrastructure
New Ways To Production - Stress-Free Evolution Of Your Cloud Applications
Developer Experience Cloud Native - Become Efficient and Achieve Parity
The Easy Way to Secure Microservices
Service Mesh vs. Frameworks: Where to put the resilience?
Developer Experience Cloud Native - From Code Gen to Git Commit without a CI/...
Servicierung von Monolithen - Der Weg zu neuen Technologien bis hin zum Servi...
Service Mesh mit Istio und MicroProfile - eine harmonische Kombination?
Service Mesh - kilometer 30 in a microservice marathon
Service Mesh - Kilometer 30 im Microservices-Marathon
API-Economy bei Financial Services – Kein Stein bleibt auf dem anderen
Microprofile.io - Cloud Native mit Java EE
Microservices mit Java EE - am Beispiel von IBM Liberty
Ad

Recently uploaded (20)

PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Understanding Forklifts - TECH EHS Solution
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How Creative Agencies Leverage Project Management Software.pdf
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Softaken Excel to vCard Converter Software.pdf
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Introduction to Artificial Intelligence
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
top salesforce developer skills in 2025.pdf
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PPTX
Transform Your Business with a Software ERP System
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Operating system designcfffgfgggggggvggggggggg
How to Migrate SBCGlobal Email to Yahoo Easily
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Understanding Forklifts - TECH EHS Solution
Design an Analysis of Algorithms I-SECS-1021-03
CHAPTER 2 - PM Management and IT Context
How Creative Agencies Leverage Project Management Software.pdf
Online Work Permit System for Fast Permit Processing
Softaken Excel to vCard Converter Software.pdf
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction to Artificial Intelligence
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How to Choose the Right IT Partner for Your Business in Malaysia
top salesforce developer skills in 2025.pdf
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Transform Your Business with a Software ERP System
Design an Analysis of Algorithms II-SECS-1021-03
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Operating system designcfffgfgggggggvggggggggg

Service Mesh vs. Frameworks: Where to put the resilience?

  • 1. Service Mesh vs. Frameworks: Where to put the resilience? Michael Hofmann https://hofmann-itconsulting.de
  • 2. (1) Distributed Systems and Resilience (2) Framework (3) Service Mesh (4) Framework and Service Mesh Characteristics (5) Thoughts about Resilience (6) Essential Requirements (7) Conclusion Agenda
  • 3. Distributed Systems ➔ degree of distribution raises failure rate! ➔ compensation strategy: resilience! slow response timeout aborted network connection ... Typical Communication Errors Fallacies of Distributed Computing The network is reliable. Latency is zero. Bandwidth is infinite. The network is secure. Topology doesn't change. There is one administrator. Transport cost is zero. The network is homogeneous.
  • 4. Hystrix Alternative: Service Mesh?! Resilience Resilience4J Failsafe MicroProfile Fault Tolerance … Framework „DEATH“ Framework ACTIVE
  • 5. Resilience Patterns ─ Timeout ─ Retry ─ Fallback ─ Circuit Breaker ─ Bulkhead many more: Uwe Friedrichsen: “Patterns of resilience” https://www.slideshare.net/ufried/patterns-of-resilience
  • 6. @CircuitBreaker(successThreshold = 10, requestVolumeThreshold = 4, failureRatio=0.5, delay = 1000) public Connection serviceA() { Connection conn = null; counterForInvokingServiceA++; conn = connectionService(); return conn; } MicroProfile Fault Tolerance @Retry(maxRetries = 3) @Fallback(fallbackMethod = "doFallback") public Result doWork() { return callServiceA(); // fallback on RuntimeException } private Result doFallback() { return ...; }
  • 7. Service Mesh The term service mesh is used to describe the network of microservices that make up such application and the interactions between them. (istio.io) Don’t manage a Service Mesh without tooling! Requirements: (1) manage calls on layer 7 (application layer, L7) (2) resilience, routing, security and telemetry (3) decentralized & transparent for services (implementation independent)
  • 9. Resilience Patterns in Istio ✔ Timeout ✔ Retry ✔ CircuitBreaker ✔ Bulkhead ✗ Fallback? ✗ is a Fallback possible? ✗ less technical, more business driven https://dzone.com/articles/fallbacks-are-overrated-architecting-for-resilienc
  • 10. Resilience in Istio $ kubectl apply -f - <<EOF apiVersion: networking.istio.io/v1alpha3 kind: VirtualService metadata: name: reviews spec: hosts: - reviews http: - route: - destination: host: reviews subset: v1 retries: attempts: 3 perTryTimeout: 2s EOF $ kubectl apply -f - <<EOF apiVersion: networking.istio.io/v1alpha3 kind: DestinationRule metadata: name: reviews spec: host: reviews trafficPolicy: connectionPool: tcp: maxConnections: 1 http: http1MaxPendingRequests: 1 maxRequestsPerConnection: 1 outlierDetection: consecutiveErrors: 1 interval: 1s baseEjectionTime: 3m maxEjectionPercent: 100 EOF
  • 11. Resilience in Istio Apply to sidecar Resilience rules — transparent for service — act global on all sidecars Fault Injection MicroProfile with Istio setting apiVersion:networking.istio.io/ v1alpha3 kind: VirtualService metadata: name: ratings ... spec: hosts: - ratings http: - fault: delay: fixedDelay: 7s percent: 100 MP_Fault_Tolerance_NonFallback_Enabled = false
  • 12. Frameworks Characteristics — Java: a lot of different frameworks — Team decides framework?!? — Learning curve for every framework — Different frameworks behave different — Same framework in different version behave different — Same framework in different versions parallel in use
  • 13. Frameworks Characteristics ➔ Change of framework: ➔ Replace all positions in code ➔ New behavior ➔ New deployment ➔ New tests ➔ Risk of chain reaction: framework ➔ load balancing ➔ service registry ➔ Multiple service registries for every different framework?
  • 14. Service Mesh Characteristics — Define new rule — Same behavior (… no framework change) — unchanged deployed service — new tests only for new rules — Client-side load balancing in sidecar — Service Registry based on endpoints in K8S $ kubectl apply -f ...
  • 15. Thoughts about Resilience Resilience pattern still correct if communication behavior changes? — Modified behavior of partner — Modified communication partner — Modified infrastructure — Load changes during day — Side effects from other systems — Anticipate problems of tomorrow?
  • 16. Thoughts about Resilience — Main problem: choose the right resilience pattern — Correct parameters for pattern? — Measure resilience — Mostly: try & error for suitable pattern/params (main reason for end of life in hystrix) — Often: retry storm — Often: missing musketeer principle (black sheep)
  • 17. Essential Requirements — Modification: Quick and easy change of (1) params for chosen pattern (2) resilience pattern — Test — Monitoring — No black sheep
  • 18. Essential Requirements Istio Framework Modification + Modify Params - Change Pattern: Lifecycle Test Fault Injection complicated Monitoring + + Black Sheep No: rule in all sidecars $ kubectl apply -f ...
  • 19. Conclusion — Comparable resilience patterns — Missing fallback in service mesh (but overrated) — Higher flexibility in service mesh — Fault injection easy in service mesh Solve problems where they arise! Service Mesh for L4-L7 Developer for L8 (original profession)