SlideShare a Scribd company logo
Pierre Souchay
Twitter: @vizionr
Github: pierresouchay
Building a Service
Mesh at Criteo with
Consul and
HAProxy
Leading Auth/Discovery/Sec Team @Criteo
Dealing with 270k+ services, 50k Consul nodes in 12 DCs
1st external contributor to Consul
Author of consul-templaterb
Menu
β€’ Starters
β€’ Consul @Criteo
β€’ Main
β€’ Why Infrastructure Service Mesh?
β€’ Cheeses (many of them)
β€’ How we built HAProxy-Connect
β€’ Sweets
β€’ Features
3 β€’
Consul History
at
When, Why, How ?
4 β€’
More servers every year
DC: 12 (9 prod)
Servers: 50k+
Services: 4k+
Instances: 270k+
HTTP req/s: 7M+
BigData: 180+ Pb
Kafka msg/s: 10M+
Criteo is a major Advertiser
2013 - Hosts
β€’ Front-Facing LB
β€’ Web-Apps (few 100’s)
β€’ Micro-Services (few 100’s)
β€’ Many backends (few 100’s)
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
2015 - Mesos
β€’ Containers
β€’ Frequent changes
β€’ Many services/machine
β€’ Different Provisioning
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
1 2 3 4
Provisioning
time
is an issue
globally (F5)
Database
polling shows
its limits
Services both
in containers
and machines?
More latency
Introduced
By new
Load-Balancers
Sounds almost good enough but…
Time to move on
What is Consul?
β€’ OSS Distributed Discovery system
β€’ Each agent declares its services
β€’ IP/PORT + health + TAGS + META
β€’ Anyone can discover all services
β€’ Notified when services changes
β€’ Provides DNS!
β€’ …
β€’ Fault tolerance
β€’ Multi DC/Clouds / Legacy compatible
Step 1
Register * in
Consul
Deployer
ΓΌAdd Health-Checks
ΓΌAdd Tags
Step 2 : re-
implement
our libraries
HTTP Client Side LB Database Access
Kafka Memcached/Couchbase
Step 3
Load Balancers
Mission
complete!
Time for
beers!
Can we go further?
Can we simplify?
The CSLB in our apps
is hard to maintain
β€’ 10k lines of code in our SDKs
β€’ TLS is hard
β€’ 3 SDKs => 3 implementations
Consul Connect Hi-Level View
A set of APIs in Consul
Plug any LB
Envoy by default
HAProxy-Connect
β€’ Open-Source
β€’ Built in go for static/easy deployment
β€’ Implementation of Consul Connect
β€’ Designed/Built by Criteo
β€’ Will be transferred to HAProxy Technologies!
HAProxy-
Connect
Configures
fully HAProxy
β€’ No Need to create any HAProxy configuration
β€’ Listen all changes in Consul topology
β€’ TLS between apps for free!
β€’ Bare-Metal / Containers compatible
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
TLS
Mutual TLS
TLS provided by Consul for the client and target
Client validates the server it is talking to
Mutual authentication, server knows caller
Client/Server have public CA of Consul, so they
can validate each other
Rotation is supported
Transparent mechanism supported by Consul
Can work with HashiCorp’s Vault transparently
HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy
At Service1 level
β€œvery static”
Frontend β€˜my_exposed_service_tls’
β€’ Config TLS
β€’ SPOE filter (will see later)
Backend β€˜localhost’ to service1
β€’ Config TLS
At Service1 level
β€œvery static”
Frontend β€˜my_exposed_service_tls’
β€’ Config TLS
β€’ SPOE filter (will see later)
Backend β€˜localhost’ to service1
β€’ Config TLS
# Ingress
## Ingress frontend, handling TLS
frontend front_downstream
mode http
bind 0.0.0.0:21000 name front_downstream_bind crt crt.pem ca-file ca.pem ssl verify required
# Configure SPOE
filter spoe engine intentions config spoe.conf
# Reject based on SPOE response
tcp-request content reject unless { var(sess.connect.auth) -m int eq 1 }
default_backend back_downstream
## Backend in plain text
backend back_downstream
mode http
server downstream_node 127.0.0.1:31415
At App1 level
β€œvery dynamic”
Frontend β€˜service1_target_plain’
β€’ Listen on plain text to localhost
Backend β€˜service1_targets_tls’ to service1
β€’ Config TLS client
β€’ Validates service1_TLS using CA
β€’ service1_instance000
β€’ service1_instance001
β€’ …
At App1 level
β€œvery dynamic”
Frontend β€˜service1_target_plain’
β€’ Listen on plain text to localhost
Backend β€˜service1_targets_tls’ to service1
β€’ Config TLS client
β€’ Validates service1_TLS using CA
β€’ service1_instance000
β€’ service1_instance001
β€’ …
# Egress to service1
## Frontend listening on localhost, in plaintext
frontend front_service1
mode http
bind 127.0.0.1:31417 name front_service1_bind
timeout client 30000
default_backend back_service1
## Backend handling TLC
backend back_service1
mode http
balance leastconn
timeout server 60000
timeout connect 1000
server srv_0 10.0.0.100:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem
server srv_1 10.0.0.101:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem
server srv_2 10.0.0.102:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem
server srv_3 10.0.0.103:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem
Service1 can also have targets
What about
authorization?
Intentions
Implementation
of Intentions
HAProxy-connect also
listen to SPOE requests
HAProxy can thus validate
if appX can target service
You can get a global graph
of calls across Β΅Services!
What is SPOE
HAProxy-Connect
What is SPOE
HAProxy-Connect
Configuration SPOE
Send TLS Client Certificate for validation
HAProxy-Connect as SPOA
β€’ Decodes TLS certificate (SPIFFE)
β€’ Extract name of caller (Consul Service)
β€’ Call Consul agent with service name to validate intention
β€’ If not OK => Connection is closed
Under the hood
β€’ HAProxy-Connect
1. Generates initial HAProxy configuration (Basic conf with ctrl socket + SPOE)
2. Starts HAProxy
3. Starts dataplane-api with generated config
β€’ Uses dataplane-API
β€’ Rest API to control HAProxy configuration dynamically
β€’ Removes complexity of socat & friends
Optimizations
β€’ Reload is evil, we over allocate per power
of 2 slots
β€’ HAProxy-Connect enable/disable slots
according to their state
Slot of servers overallocated
β€’ HAProxy-Connect applies configuration
changes
β€’ Watch the global result
β€’ If different, re-apply all the changes until
convergence works => can work if
someone else is playing with us
Convergence
Statistics
β€’ Minimalist but efficient
HAProxy stats page
β€’ For EGRESS – with full details
β€’ For INGRESS
Prometheus
Statistics EGRESS (App1) – Per target/global
Stats INGRESS (Service1)
Logs
HAProxy-
Connect
collects all of
them
Can be
redirected to
syslog/system
if you want
Cool stuff / QA
β€’ Open-Source!
β€’ https://github.com/criteo/HAProxy-consul-connect
β€’ Implemented SPOP to implement SPOA for Golang
β€’ https://github.com/criteo/HAProxy-spoe-go
β€’ Some others can benefit too!
β€’ https://github.com/clems4ever/HAProxy-ldap-auth
β€’ Thanks to https://github.com/Aestek
@vizionr
Github: pierresouchay

More Related Content

PPTX
2019 05-28 SRE Consul Criteo Meetup
PDF
2019 Lightning Talk: Discovery, Consul and Inversion of Control for the infr...
PPTX
2019 hashiconf seattle_consul_ioc
PDF
2019 hashiconf consul-templaterb
PDF
Consul administration at scale
PDF
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
PDF
Consul
PPTX
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)
2019 05-28 SRE Consul Criteo Meetup
2019 Lightning Talk: Discovery, Consul and Inversion of Control for the infr...
2019 hashiconf seattle_consul_ioc
2019 hashiconf consul-templaterb
Consul administration at scale
Cloud Solution Day 2016: Microservices on Mesos & Netflix OSS
Consul
Building a Cloud Native Service - Docker Meetup Santa Clara (July 20, 2017)

What's hot (20)

PPTX
MicroServices at Netflix - challenges of scale
PDF
Operating Consul as an Early Adopter
PPTX
WebSocket MicroService vs. REST Microservice
PDF
Consul First Steps
PDF
Serverless for the Cloud Native Era with Fission
Β 
PDF
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
PPTX
Spring Cloud and Netflix Components
PPTX
Introducing envoy-based service mesh at Booking.com
PDF
Cortex: Prometheus as a Service, One Year On
Β 
PPTX
Building Microservices with Spring Cloud and Netflix OSS
PDF
React Native EU 2021 - Creating a VoIP app in React Native - the beginner's g...
PDF
Kong in 1.x Territory
PDF
Building Thick Clients with Tower in Rust
PDF
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
PDF
Developing a user-friendly OpenResty application
PDF
Fact-Based Monitoring - PuppetConf 2014
Β 
PPTX
Cinder Updates - Liberty Edition
PPTX
Project meniscus
PDF
Altitude NY 2018: Don't let the weeds overwhelm the garden
Β 
PDF
Microservices with Netflix OSS and Spring Cloud
MicroServices at Netflix - challenges of scale
Operating Consul as an Early Adopter
WebSocket MicroService vs. REST Microservice
Consul First Steps
Serverless for the Cloud Native Era with Fission
Β 
Weave Cortex: Multi-tenant, horizontally scalable Prometheus as a Service
Spring Cloud and Netflix Components
Introducing envoy-based service mesh at Booking.com
Cortex: Prometheus as a Service, One Year On
Β 
Building Microservices with Spring Cloud and Netflix OSS
React Native EU 2021 - Creating a VoIP app in React Native - the beginner's g...
Kong in 1.x Territory
Building Thick Clients with Tower in Rust
GopherCon 2017 - Writing Networking Clients in Go: The Design & Implementati...
Developing a user-friendly OpenResty application
Fact-Based Monitoring - PuppetConf 2014
Β 
Cinder Updates - Liberty Edition
Project meniscus
Altitude NY 2018: Don't let the weeds overwhelm the garden
Β 
Microservices with Netflix OSS and Spring Cloud
Ad

Similar to HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy (20)

PDF
Service discovery like a pro (presented at reversimX)
PPTX
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
PPT
soap protocol presentation protocol explanation
PPTX
Building Portable Applications with Kubernetes
Β 
PPT
Soap service
PPTX
Service Discovery with Consul - Arunvel Arunachalam
PPTX
Service Discovery Like a Pro
PPTX
Introduction to Istio for APIs and Microservices meetup
PDF
Why do you need REST
PPTX
Basic understanding of websocket and and REST API
PDF
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
PDF
Networking in Kubernetes
PDF
Deploying Kafka on DC/OS
PDF
IoT Secure Bootsrapping : ideas
PDF
Kubernetes Networking
PDF
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
PPTX
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
PDF
Real time web apps
PDF
D1-3-Signaling
PDF
Openstack days sv building highly available services using kubernetes (preso)
Service discovery like a pro (presented at reversimX)
Orchestrating Docker with Terraform and Consul by Mitchell Hashimoto
soap protocol presentation protocol explanation
Building Portable Applications with Kubernetes
Β 
Soap service
Service Discovery with Consul - Arunvel Arunachalam
Service Discovery Like a Pro
Introduction to Istio for APIs and Microservices meetup
Why do you need REST
Basic understanding of websocket and and REST API
OSGi Remote Services - Alexander Broekhuis, Bram de Kruijff
Networking in Kubernetes
Deploying Kafka on DC/OS
IoT Secure Bootsrapping : ideas
Kubernetes Networking
HTTP/2 Comes to Java: Servlet 4.0 and what it means for the Java/Jakarta EE e...
DevOps Interview Questions Part - 2 | Devops Interview Questions And Answers ...
Real time web apps
D1-3-Signaling
Openstack days sv building highly available services using kubernetes (preso)
Ad

Recently uploaded (20)

PPTX
Introduction to Information and Communication Technology
PPTX
Digital Literacy And Online Safety on internet
PPTX
Module 1 - Cyber Law and Ethics 101.pptx
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
Slides PDF The World Game (s) Eco Economic Epochs.pdf
PDF
Decoding a Decade: 10 Years of Applied CTI Discipline
PPTX
INTERNET------BASICS-------UPDATED PPT PRESENTATION
PPTX
Funds Management Learning Material for Beg
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
Introduction to the IoT system, how the IoT system works
PDF
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
PPT
Ethics in Information System - Management Information System
PPTX
artificial intelligence overview of it and more
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
DOCX
Unit-3 cyber security network security of internet system
PDF
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
PPTX
international classification of diseases ICD-10 review PPT.pptx
PDF
Paper PDF World Game (s) Great Redesign.pdf
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
Tenda Login Guide: Access Your Router in 5 Easy Steps
Introduction to Information and Communication Technology
Digital Literacy And Online Safety on internet
Module 1 - Cyber Law and Ethics 101.pptx
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
Slides PDF The World Game (s) Eco Economic Epochs.pdf
Decoding a Decade: 10 Years of Applied CTI Discipline
INTERNET------BASICS-------UPDATED PPT PRESENTATION
Funds Management Learning Material for Beg
PptxGenJS_Demo_Chart_20250317130215833.pptx
Introduction to the IoT system, how the IoT system works
Vigrab.top – Online Tool for Downloading and Converting Social Media Videos a...
Ethics in Information System - Management Information System
artificial intelligence overview of it and more
SASE Traffic Flow - ZTNA Connector-1.pdf
Unit-3 cyber security network security of internet system
πŸ’° π”πŠπ“πˆ πŠπ„πŒπ„ππ€ππ†π€π πŠπˆππ„π‘πŸ’πƒ π‡π€π‘πˆ 𝐈𝐍𝐈 πŸπŸŽπŸπŸ“ πŸ’°
Β 
international classification of diseases ICD-10 review PPT.pptx
Paper PDF World Game (s) Great Redesign.pdf
522797556-Unit-2-Temperature-measurement-1-1.pptx
Tenda Login Guide: Access Your Router in 5 Easy Steps

HAProxyConf 2019: Building a Service Mesh at Criteo with Consul and HAProxy

  • 1. Pierre Souchay Twitter: @vizionr Github: pierresouchay Building a Service Mesh at Criteo with Consul and HAProxy Leading Auth/Discovery/Sec Team @Criteo Dealing with 270k+ services, 50k Consul nodes in 12 DCs 1st external contributor to Consul Author of consul-templaterb
  • 2. Menu β€’ Starters β€’ Consul @Criteo β€’ Main β€’ Why Infrastructure Service Mesh? β€’ Cheeses (many of them) β€’ How we built HAProxy-Connect β€’ Sweets β€’ Features
  • 4. 4 β€’ More servers every year DC: 12 (9 prod) Servers: 50k+ Services: 4k+ Instances: 270k+ HTTP req/s: 7M+ BigData: 180+ Pb Kafka msg/s: 10M+ Criteo is a major Advertiser
  • 5. 2013 - Hosts β€’ Front-Facing LB β€’ Web-Apps (few 100’s) β€’ Micro-Services (few 100’s) β€’ Many backends (few 100’s)
  • 10. 2015 - Mesos β€’ Containers β€’ Frequent changes β€’ Many services/machine β€’ Different Provisioning
  • 13. 1 2 3 4 Provisioning time is an issue globally (F5) Database polling shows its limits Services both in containers and machines? More latency Introduced By new Load-Balancers Sounds almost good enough but…
  • 15. What is Consul? β€’ OSS Distributed Discovery system β€’ Each agent declares its services β€’ IP/PORT + health + TAGS + META β€’ Anyone can discover all services β€’ Notified when services changes β€’ Provides DNS! β€’ … β€’ Fault tolerance β€’ Multi DC/Clouds / Legacy compatible
  • 16. Step 1 Register * in Consul Deployer ΓΌAdd Health-Checks ΓΌAdd Tags
  • 17. Step 2 : re- implement our libraries HTTP Client Side LB Database Access Kafka Memcached/Couchbase
  • 20. Can we go further? Can we simplify?
  • 21. The CSLB in our apps is hard to maintain β€’ 10k lines of code in our SDKs β€’ TLS is hard β€’ 3 SDKs => 3 implementations
  • 22. Consul Connect Hi-Level View A set of APIs in Consul Plug any LB Envoy by default
  • 23. HAProxy-Connect β€’ Open-Source β€’ Built in go for static/easy deployment β€’ Implementation of Consul Connect β€’ Designed/Built by Criteo β€’ Will be transferred to HAProxy Technologies!
  • 24. HAProxy- Connect Configures fully HAProxy β€’ No Need to create any HAProxy configuration β€’ Listen all changes in Consul topology β€’ TLS between apps for free! β€’ Bare-Metal / Containers compatible
  • 26. TLS Mutual TLS TLS provided by Consul for the client and target Client validates the server it is talking to Mutual authentication, server knows caller Client/Server have public CA of Consul, so they can validate each other Rotation is supported Transparent mechanism supported by Consul Can work with HashiCorp’s Vault transparently
  • 28. At Service1 level β€œvery static” Frontend β€˜my_exposed_service_tls’ β€’ Config TLS β€’ SPOE filter (will see later) Backend β€˜localhost’ to service1 β€’ Config TLS
  • 29. At Service1 level β€œvery static” Frontend β€˜my_exposed_service_tls’ β€’ Config TLS β€’ SPOE filter (will see later) Backend β€˜localhost’ to service1 β€’ Config TLS # Ingress ## Ingress frontend, handling TLS frontend front_downstream mode http bind 0.0.0.0:21000 name front_downstream_bind crt crt.pem ca-file ca.pem ssl verify required # Configure SPOE filter spoe engine intentions config spoe.conf # Reject based on SPOE response tcp-request content reject unless { var(sess.connect.auth) -m int eq 1 } default_backend back_downstream ## Backend in plain text backend back_downstream mode http server downstream_node 127.0.0.1:31415
  • 30. At App1 level β€œvery dynamic” Frontend β€˜service1_target_plain’ β€’ Listen on plain text to localhost Backend β€˜service1_targets_tls’ to service1 β€’ Config TLS client β€’ Validates service1_TLS using CA β€’ service1_instance000 β€’ service1_instance001 β€’ …
  • 31. At App1 level β€œvery dynamic” Frontend β€˜service1_target_plain’ β€’ Listen on plain text to localhost Backend β€˜service1_targets_tls’ to service1 β€’ Config TLS client β€’ Validates service1_TLS using CA β€’ service1_instance000 β€’ service1_instance001 β€’ … # Egress to service1 ## Frontend listening on localhost, in plaintext frontend front_service1 mode http bind 127.0.0.1:31417 name front_service1_bind timeout client 30000 default_backend back_service1 ## Backend handling TLC backend back_service1 mode http balance leastconn timeout server 60000 timeout connect 1000 server srv_0 10.0.0.100:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem server srv_1 10.0.0.101:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem server srv_2 10.0.0.102:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem server srv_3 10.0.0.103:21000 enabled ssl weight 1 crt crt.pem ca-file ca.pem
  • 32. Service1 can also have targets
  • 35. Implementation of Intentions HAProxy-connect also listen to SPOE requests HAProxy can thus validate if appX can target service You can get a global graph of calls across Β΅Services!
  • 37. What is SPOE HAProxy-Connect Configuration SPOE Send TLS Client Certificate for validation
  • 38. HAProxy-Connect as SPOA β€’ Decodes TLS certificate (SPIFFE) β€’ Extract name of caller (Consul Service) β€’ Call Consul agent with service name to validate intention β€’ If not OK => Connection is closed
  • 39. Under the hood β€’ HAProxy-Connect 1. Generates initial HAProxy configuration (Basic conf with ctrl socket + SPOE) 2. Starts HAProxy 3. Starts dataplane-api with generated config β€’ Uses dataplane-API β€’ Rest API to control HAProxy configuration dynamically β€’ Removes complexity of socat & friends
  • 40. Optimizations β€’ Reload is evil, we over allocate per power of 2 slots β€’ HAProxy-Connect enable/disable slots according to their state Slot of servers overallocated β€’ HAProxy-Connect applies configuration changes β€’ Watch the global result β€’ If different, re-apply all the changes until convergence works => can work if someone else is playing with us Convergence
  • 41. Statistics β€’ Minimalist but efficient HAProxy stats page β€’ For EGRESS – with full details β€’ For INGRESS Prometheus
  • 42. Statistics EGRESS (App1) – Per target/global
  • 44. Logs HAProxy- Connect collects all of them Can be redirected to syslog/system if you want
  • 45. Cool stuff / QA β€’ Open-Source! β€’ https://github.com/criteo/HAProxy-consul-connect β€’ Implemented SPOP to implement SPOA for Golang β€’ https://github.com/criteo/HAProxy-spoe-go β€’ Some others can benefit too! β€’ https://github.com/clems4ever/HAProxy-ldap-auth β€’ Thanks to https://github.com/Aestek @vizionr Github: pierresouchay