SlideShare a Scribd company logo
Better Traffic Control
with Envoy
Mark McBride
1/31/2018
Why Care about Traffic Control
Generaliza5ons: Kubernetes leads to a bunch of good things.
• Crea5ng new services is easier.
• Deploying new service versions is easier.
• Deploying smaller services is easier.
Why Care about Traffic Control
But the good things aren’t free.
• New code needs to be (safely!) integrated with your request flow.
• Addi5onal abstrac5ons have < 100% reliability.
• Longer call chains introduce more chances for failure.
Goals of Traffic Control
• Resilience
• Distributed systems are never “up”1. Dealing with failures should be
straighTorward.
• Rou5ng
• Introducing a new code to the call chain is a common opera5on. It
should be straighTorward.
1. Charity Majors, hYps://opensource.com/ar5cle/17/7/state-systems-administra5on
The Setup
• Create scenarios using augmented Envoy examples
• Use wrk to drive load against the system and measure results
• Curl, because no demo is complete without some curl
• A preview of envoy-tools to observe Envoy stats directly
Control Requires Visibility
• Making unobservable changes is not advised.
• Envoy comes with great tools out of the box.
• Stats on listeners, clusters, protocols, and more.
• An admin server for direct observa5on and control.
• envoy-tools (coming soon!) – a repository of tools that provide a more
approachable interface.
The Examples
Adding Reality to Examples
• Add configurable latency and success rate
Adding Reality to Examples
Retries
Envoy supports retry policies aYached to
routes
• Select error codes to retry on.
• Configure 5meouts for each retry.
• Configure number of retries.
Retries
• No failures!
Retries—a Closer Look
Failures when calling service1
No failures returned to client
Safe Retries
• Usually you don’t want to retry all
requests.
• Side effects are important to consider.
• Atomicity is important to consider.
• Computa5onal expense is important to
consider.
• Add more routes, and configure retries
accordingly.
Load Shedding
• Some5mes you get more traffic than you can handle.
• Envoy supports request limits on a per-cluster basis.
• Envoy also supports two priority groups, allowing you to save slots for
important traffic.
Without Circuit Breakers
Failures are fine, but 99% latency is slowwwwww
as requests just back up
Also, POST requests are totally offline because
we’re swamped with GETs
Without Circuit Breakers
Retries overflow, which is slow
With Circuit Breakers
Gobs of failures, but p99 latency is s5ll good.
Also, POST requests are available.
Also, we told clients to back off with the
x-envoy-overloaded response header.
With Circuit Breakers
Pending requests overflow, which is fast!
An Overview of Rou5ng
• Endpoint metadata for richer rou5ng primi5ves
• Probabilis5c distribu5on of traffic across mul5ple clusters
• 1% of traffic to my-great-rewrite, 99% to legacy
• 1% of traffic to v2 of my service, 99% to v1
• Header based rou5ng to cluster subsets
• If “x-canary” is set route to endpoints with a version label of v2
• Priority rou5ng, which we saw in the circuit breaking example
• Zone aware rou5ng
Traffic Shioing the Hard Way
• Mul5ple clusters
• Mul5ple routes
Header-based Canary
• When we specify the canary header, the route matches and we (and
only we) are routed to service1a
• When header is not present, the route doesn’t match and we go on to
the next route, sending traffic to service1
Probabilis5c Rollout
• With the run5me match, we choose this route 25% of the 5me,
sending 25% of our traffic to service1
Traffic Shioed
25% of traffic to service1a
The Easy Way
• Restar5ng servers on every config change is tedious in this demo.
• It’s even more tedious in produc5on.
• Envoy provides a beYer way—the xDS APIs.
xDS APIs
• CDS - discover clusters, which are logical groupings of endpoints.
• A cluster defini5on can have a reference to an EDS endpoint
• EDS - discover endpoints for a cluster.
• LDS - discover listeners for an Envoy
• A listener’s filter chain can have a reference to an RDS endpoint
• RDS - discover routes for a filter chain
Dynamic Config
• The xDS APIs give you a central point-of-control to manage a fleet of
Envoys
• Bridge service discovery (e.g. from Kubernetes) to Envoy
• Bridge rou5ng config (e.g. from Houston) to Envoy
Advanced Rou5ng with EDS
• CDS (cluster discovery service) defines groups of endpoints.
• EDS (endpoint discovery service) discovers the actual endpoints for
clusters.
• EDS allows you to aYach metadata to an endpoint.
• Our mul5-cluster example can be collapsed to a metadata based
approach on a single cluster.
Even Easier with Houston
• An CDS/EDS server with integra5ons to EC2, ECS, Kubernetes,
Consul, DC/OS, or JSON files
• An LDS/RDS server with an intui5ve route configura5on UI
• Stats parsing, forwarding, and change tracking
Ques5ons/Contact
Mark McBride
mark@turbinelabs.io
Twitter - @mccv
http://www.turbinelabs.io

More Related Content

PPTX
Microservices With Istio Service Mesh
PDF
CNCF Singapore - Introduction to Envoy
PPTX
Introduction to the Container Network Interface (CNI)
PPTX
Keeping a Secret with HashiCorp Vault
PDF
Observability, Distributed Tracing, and Open Source: The Missing Primer
PDF
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
PPTX
OpenTelemetry For Operators
PDF
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...
Microservices With Istio Service Mesh
CNCF Singapore - Introduction to Envoy
Introduction to the Container Network Interface (CNI)
Keeping a Secret with HashiCorp Vault
Observability, Distributed Tracing, and Open Source: The Missing Primer
ネットワークの自動化・監視の取り組みについて #netopscoding #npstudy
OpenTelemetry For Operators
Ensuring Kubernetes Cost Efficiency across (many) Clusters - DevOps Gathering...

What's hot (20)

PDF
The RED Method: How to monitoring your microservices.
PPTX
Envoy and Kafka
PDF
Model driven telemetry
PPTX
Docker Kubernetes Istio
PPSX
Service Mesh - Observability
PDF
Shift left Observability
PDF
Topic 3: Large-scale Distributed Systems
PDF
Grafana Loki: like Prometheus, but for Logs
PPTX
Istio a service mesh
PPTX
あなたのところに専用線が届くまで
PDF
Monitoring Microservices
PDF
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
PPTX
Service Mesh - Why? How? What?
PDF
Introduction to Istio Service Mesh
PDF
Why Microservice
PDF
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
PPTX
Containers and workload security an overview
PDF
Apache Druid 101
PPTX
Hashicorp Vault ppt
PDF
FreeSWITCH Monitoring
The RED Method: How to monitoring your microservices.
Envoy and Kafka
Model driven telemetry
Docker Kubernetes Istio
Service Mesh - Observability
Shift left Observability
Topic 3: Large-scale Distributed Systems
Grafana Loki: like Prometheus, but for Logs
Istio a service mesh
あなたのところに専用線が届くまで
Monitoring Microservices
Real-time Analytics with Upsert Using Apache Kafka and Apache Pinot | Yupeng ...
Service Mesh - Why? How? What?
Introduction to Istio Service Mesh
Why Microservice
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
Containers and workload security an overview
Apache Druid 101
Hashicorp Vault ppt
FreeSWITCH Monitoring
Ad

Similar to Traffic Control with Envoy Proxy (20)

PPTX
Embracing Failure - Fault Injection and Service Resilience at Netflix
PPTX
Service Stampede: Surviving a Thousand Services
PDF
Client Drivers and Cassandra, the Right Way
PDF
Impala Performance Update
PPTX
Concurrency at Scale: Evolution to Micro-Services
PPTX
Making communication across boundaries simple with Azure Service Bus
PDF
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
PPT
Dealing with the Three Horrible Problems in Verification
PPTX
PDF
VMworld 2014: Extreme Performance Series
PPT
Nokia kpi and_core_optimization
PDF
Production Ready Microservices at Scale
PPTX
Tokyo azure meetup #12 service fabric internals
PDF
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
PDF
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
PDF
denovolab.com class 4 voip switch
PDF
5 Steps on the Way to Continuous Delivery
PDF
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
PPTX
Database and Public Endpoints redundancy on Azure
PPTX
Planning to Fail #phpuk13
Embracing Failure - Fault Injection and Service Resilience at Netflix
Service Stampede: Surviving a Thousand Services
Client Drivers and Cassandra, the Right Way
Impala Performance Update
Concurrency at Scale: Evolution to Micro-Services
Making communication across boundaries simple with Azure Service Bus
IBM MQ: Managing Workloads, Scaling and Availability with MQ Clusters
Dealing with the Three Horrible Problems in Verification
VMworld 2014: Extreme Performance Series
Nokia kpi and_core_optimization
Production Ready Microservices at Scale
Tokyo azure meetup #12 service fabric internals
Jeremy Edberg (MinOps ) - How to build a solid infrastructure for a startup t...
Denovo SIP VoIP Termination SBC Session Boarder Controler @ denofolab.com
denovolab.com class 4 voip switch
5 Steps on the Way to Continuous Delivery
[QCon London 2020] The Future of Cloud Native API Gateways - Richard Li
Database and Public Endpoints redundancy on Azure
Planning to Fail #phpuk13
Ad

Recently uploaded (20)

PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PPTX
Operating system designcfffgfgggggggvggggggggg
PDF
AI in Product Development-omnex systems
PDF
System and Network Administration Chapter 2
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PDF
Digital Strategies for Manufacturing Companies
PDF
PTS Company Brochure 2025 (1).pdf.......
PPTX
Essential Infomation Tech presentation.pptx
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Transform Your Business with a Software ERP System
PDF
Nekopoi APK 2025 free lastest update
PPTX
Reimagine Home Health with the Power of Agentic AI​
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
System and Network Administraation Chapter 3
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Operating system designcfffgfgggggggvggggggggg
AI in Product Development-omnex systems
System and Network Administration Chapter 2
CHAPTER 2 - PM Management and IT Context
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Digital Strategies for Manufacturing Companies
PTS Company Brochure 2025 (1).pdf.......
Essential Infomation Tech presentation.pptx
How Creative Agencies Leverage Project Management Software.pdf
Odoo Companies in India – Driving Business Transformation.pdf
Transform Your Business with a Software ERP System
Nekopoi APK 2025 free lastest update
Reimagine Home Health with the Power of Agentic AI​
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
2025 Textile ERP Trends: SAP, Odoo & Oracle
Navsoft: AI-Powered Business Solutions & Custom Software Development
How to Migrate SBCGlobal Email to Yahoo Easily
System and Network Administraation Chapter 3

Traffic Control with Envoy Proxy

  • 1. Better Traffic Control with Envoy Mark McBride 1/31/2018
  • 2. Why Care about Traffic Control Generaliza5ons: Kubernetes leads to a bunch of good things. • Crea5ng new services is easier. • Deploying new service versions is easier. • Deploying smaller services is easier.
  • 3. Why Care about Traffic Control But the good things aren’t free. • New code needs to be (safely!) integrated with your request flow. • Addi5onal abstrac5ons have < 100% reliability. • Longer call chains introduce more chances for failure.
  • 4. Goals of Traffic Control • Resilience • Distributed systems are never “up”1. Dealing with failures should be straighTorward. • Rou5ng • Introducing a new code to the call chain is a common opera5on. It should be straighTorward. 1. Charity Majors, hYps://opensource.com/ar5cle/17/7/state-systems-administra5on
  • 5. The Setup • Create scenarios using augmented Envoy examples • Use wrk to drive load against the system and measure results • Curl, because no demo is complete without some curl • A preview of envoy-tools to observe Envoy stats directly
  • 6. Control Requires Visibility • Making unobservable changes is not advised. • Envoy comes with great tools out of the box. • Stats on listeners, clusters, protocols, and more. • An admin server for direct observa5on and control. • envoy-tools (coming soon!) – a repository of tools that provide a more approachable interface.
  • 8. Adding Reality to Examples • Add configurable latency and success rate
  • 9. Adding Reality to Examples
  • 10. Retries Envoy supports retry policies aYached to routes • Select error codes to retry on. • Configure 5meouts for each retry. • Configure number of retries.
  • 12. Retries—a Closer Look Failures when calling service1 No failures returned to client
  • 13. Safe Retries • Usually you don’t want to retry all requests. • Side effects are important to consider. • Atomicity is important to consider. • Computa5onal expense is important to consider. • Add more routes, and configure retries accordingly.
  • 14. Load Shedding • Some5mes you get more traffic than you can handle. • Envoy supports request limits on a per-cluster basis. • Envoy also supports two priority groups, allowing you to save slots for important traffic.
  • 15. Without Circuit Breakers Failures are fine, but 99% latency is slowwwwww as requests just back up Also, POST requests are totally offline because we’re swamped with GETs
  • 16. Without Circuit Breakers Retries overflow, which is slow
  • 17. With Circuit Breakers Gobs of failures, but p99 latency is s5ll good. Also, POST requests are available. Also, we told clients to back off with the x-envoy-overloaded response header.
  • 18. With Circuit Breakers Pending requests overflow, which is fast!
  • 19. An Overview of Rou5ng • Endpoint metadata for richer rou5ng primi5ves • Probabilis5c distribu5on of traffic across mul5ple clusters • 1% of traffic to my-great-rewrite, 99% to legacy • 1% of traffic to v2 of my service, 99% to v1 • Header based rou5ng to cluster subsets • If “x-canary” is set route to endpoints with a version label of v2 • Priority rou5ng, which we saw in the circuit breaking example • Zone aware rou5ng
  • 20. Traffic Shioing the Hard Way • Mul5ple clusters • Mul5ple routes
  • 21. Header-based Canary • When we specify the canary header, the route matches and we (and only we) are routed to service1a • When header is not present, the route doesn’t match and we go on to the next route, sending traffic to service1
  • 22. Probabilis5c Rollout • With the run5me match, we choose this route 25% of the 5me, sending 25% of our traffic to service1
  • 23. Traffic Shioed 25% of traffic to service1a
  • 24. The Easy Way • Restar5ng servers on every config change is tedious in this demo. • It’s even more tedious in produc5on. • Envoy provides a beYer way—the xDS APIs.
  • 25. xDS APIs • CDS - discover clusters, which are logical groupings of endpoints. • A cluster defini5on can have a reference to an EDS endpoint • EDS - discover endpoints for a cluster. • LDS - discover listeners for an Envoy • A listener’s filter chain can have a reference to an RDS endpoint • RDS - discover routes for a filter chain
  • 26. Dynamic Config • The xDS APIs give you a central point-of-control to manage a fleet of Envoys • Bridge service discovery (e.g. from Kubernetes) to Envoy • Bridge rou5ng config (e.g. from Houston) to Envoy
  • 27. Advanced Rou5ng with EDS • CDS (cluster discovery service) defines groups of endpoints. • EDS (endpoint discovery service) discovers the actual endpoints for clusters. • EDS allows you to aYach metadata to an endpoint. • Our mul5-cluster example can be collapsed to a metadata based approach on a single cluster.
  • 28. Even Easier with Houston • An CDS/EDS server with integra5ons to EC2, ECS, Kubernetes, Consul, DC/OS, or JSON files • An LDS/RDS server with an intui5ve route configura5on UI • Stats parsing, forwarding, and change tracking
  • 29. Ques5ons/Contact Mark McBride mark@turbinelabs.io Twitter - @mccv http://www.turbinelabs.io