SlideShare a Scribd company logo
4
Most read
13
Most read
20
Most read
Scale Kubernetes to Support
50,000 Services
Haibin Michael Xie, Senior Staff Engineer/Architect, Huawei
Agenda
• Challenges while scaling services
• Solutions and prototypes
• Performance data
• Q&A
Master
What are the Challenges while Scaling Services
• Control plane (Master, kubelet,
kube-proxy) API Server
Controller
Manager
Scheduler
Node
KubeProxy
…
ETCD
Kubelet
Load Balancer
• Deploy services and pods
• Propagate endpoints
• Add/remove services in load balancer
• Propagate endpoints
• Data plane (load balancer)
Pod Pod Pod
Node
KubeProxy Kubelet
Load Balancer
Pod Pod Pod
API Server
Control Plane
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
N nodes per cluster
M pods per second
QPS: N*M endpoints per second
Service deployed
Pod deployed and
scheduled
Endpoints
/registry/services/endpoints/default/my-service
/registry/services/specs/default/my-service
API Server
Control Plane
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
N nodes per cluster
M pods per second
QPS: N*M endpoints per second
Load: N*M*(M+1)/2 addresses per second
Control Plane Solution
1. Partition endpoints object into multiple objects
• Pros: reduce Endpoints object size
• Cons: increase # of objects and requests
2. Central load balancer
• Pros: reduce connections and requests to API server
• Cons: one more hop in service routing, require strong HA, limited LB scalability
3. Batch creating/updating endpoints
• Timer based, no change to data structure in ETCD
• Pros: reduce QPS
• Cons: E2E latency is increased by Batch interval
API Server
Control Plane Solution
ETCD
services
Controller
Manager
pods
endpoints
Endpoints
Controller
Node
KubeProxy
Node
KubeProxy
Node
KubeProxy
… …
QPS: N*M per second
Load: N*M*(M+1)/2 addresses per second
Timer and batch QPS: N*M per second
Load: N*M*(M+1)/2 addresses per second
QPS: N per second
Load: N*M addresses per second
N nodes per cluster
M pods per second
Batch Processing Requests Reduction
One batch per 0.5 second.
 QPS:reduced 98%
Pods per
Service
Number of
Service
EndPoints Controller # of Requests
Before After Reduction
10
100 551 10 98.2%
150 785 14 98.2%
200 1105 17 98.5%
Test setup:
1 Master,4 slaves
16 core 2.60GHz, 48GB RAM
Batch Processing E2E Latency Reduction
Latency: reduced 60+% Pods per
Service
Number of
Service
E2E Latency (Second)
Before After Reduction
10
100 8.5 3.5 59.1%
150 13.5 5.3 60.9%
200 22.8 7.8 65.8%
Data Panel
• What is IPTables?
• iptables is a user-space application that allows configuring Linux kernel
firewall (implemented on top of Netfilter) by configuring chains and
rules.
• What is Netfilter? A framework provided by the Linux kernel that allows
customization of networking-related operations, such as packet filtering,
NAT, port translation etc.
• Issues with IPTables as load balancer
• Latency to access service (routing latency)
• Latency to add/remove rule
IPTables Example
# Iptables –t nat –L –n
Chain PREROUTING (policy ACCEPT)
target prot opt source destination
KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */  1
DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL
Chain KUBE-SEP-G3MLSGWVLUPEIMXS (1 references)  4
target prot opt source destination
MARK all -- 172.16.16.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.16.2:80
Chain KUBE-SEP-OUBP2X5UG3G4CYYB (1 references)
target prot opt source destination
MARK all -- 192.168.190.128 anywhere /* default/kubernetes: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/kubernetes: */ tcp to:192.168.190.128:6443
Chain KUBE-SEP-PXEMGP3B44XONJEO (1 references)  4
target prot opt source destination
MARK all -- 172.16.91.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351
DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.91.2:80
Chain KUBE-SERVICES (2 references)  2
target prot opt source destination
KUBE-SVC-N4RX4VPNP4ATLCGG tcp -- anywhere 192.168.3.237 /* default/webpod-service: cluster IP */ tcp dpt:http
KUBE-SVC-6N4SJQIF3IX3FORG tcp -- anywhere 192.168.3.1 /* default/kubernetes: cluster IP */ tcp dpt:https
KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type
LOCAL
Chain KUBE-SVC-6N4SJQIF3IX3FORG (1 references)
target prot opt source destination
KUBE-SEP-OUBP2X5UG3G4CYYB all -- anywhere anywhere /* default/kubernetes: */
Chain KUBE-SVC-N4RX4VPNP4ATLCGG (1 references)  3
target prot opt source destination
KUBE-SEP-G3MLSGWVLUPEIMXS all -- anywhere anywhere /* default/webpod-service: */ statistic mode random probability 0.50000000000
KUBE-SEP-PXEMGP3B44XONJEO all -- anywhere anywhere /* default/webpod-service: */
IPTables Service Routing Performance
1 Service (µs) 1000 Services (µs) 10000 Services (µs) 50000 Services (µs)
First Service 575 614 1023 1821
Middle Service 575 602 1048 4174
Last Service 575 631 1050 7077
In this test, there is one entry per service in KUBE-SERVICES chain.
Where is latency generated?
• Long list of rules in a chain
• Enumerate through the list to find a service and pod
Latency to Add IPTables Rules
• Where is the latency generated?
• not incremental
• copy all rules
• make changes
• save all rules back
• IPTables locked during rule update
• Time spent to add one rule when there are 5k services (40k rules): 11
minutes
• 20k services (160k rules): 5 hours
Data Plane Solution
• Re-struct IPTables using search tree (Performance benefit)
• Replace IPTables with IPVS (Performance and beyond)
VIP
Restruct IPTables by Search Tree
10.10.0.0/16
10.10.1.0/24
VIP: 10.10.1.5 VIP:10.10.1.100
10.10.100.0/24
VIP:10.10.100.1
Service VIP range: 10.10.0.0/16
CIDR list = [16, 24], defines tree layout
Create 3 services: 10.10.1.5, 10.10.1.100, 10.10.100.1
Search tree based service routing time complexity: , m is tree depth
Original service routing time complexity: O(n)
What is IPVS
• Transport layer load balancer which directs requests for TCP and UDP
based services to real servers.
• Same to IPTables, IPVS is built on top of Netfilter.
• Support 3 load balancing mode: NAT, DR and IP Tunneling.
IPVS vs. IPTables
IPTables:
• Operates tables provided by linux firewall
• IPTables is more flexible to manipulate package at different stage: Pre-routing,
post-routing, forward, input, output.
• IPTables has more operations: SNAT, DNAT, reject packets, port translation etc.
Why using IPVS?
• Better performance (Hashing vs. Chain)
• More load balancing algorithm
• Round robin, source/destination hashing.
• Based on least load, least connection or locality, can assign weight to server.
• Support server health check and connection retry
• Support sticky session
IPVS Load Balancing Mode in Kubernetes
• Not public released yet
• No Kubernetes behavior change, complete functionalities: external IP,
nodePort etc
• Kube-proxy startup parameter mode=IPVS, in addition to original modes:
mode=userspace and mode=iptables
• Kube-proxy lines of code: 11800
• IPVS mode adds 680 lines of code, dependent on seasaw library
IPVS vs. IPTables Latency to Add Rules
# of Services 1 5,000 20,000
# of Rules 8 40,000 160,000
IPTables 2 ms 11 min 5 hours
IPVS 2 ms 2 ms 2 ms
Measured by iptables and ipvsadm, observations:
 In IPTables mode, latency to add rule increases significantly when # of service increases
 In IPVS mode, latency to add VIP and backend IPs does not increase when # of service increases
IPVS vs. IPTables Network Bandwidth
Measured by qperf
Each service exposes 4 ports (4 entries in KUBE-SERVICES chain)
Bandwidth, QPS, Latency have similar pattern
ith service first first last first last first last first last first last
# of services 1 1000 1000 5000 5000 10000 10000 25000 25000 50000 50000
Bandwidth, IPTables (MB/S) 66.6 64 56 50 38.6 15 6 0 0 0 0
Bandwidth, IPVS (MB/S) 65.3 61.7 55.3 53.5 53.8 43 43.5 30 28.5 24 23.8
More Perf/Scalability Work Done
• Scale nodes and pods in single cluster
• Reduce E2E latency of deploying pods/services
• Increase pod deployment throughput
• Improve scheduling performance
Thank You
haibin.michael.xie@huawei.com

More Related Content

PDF
eBPF - Rethinking the Linux Kernel
PDF
Cilium - overview and recent updates
PPTX
Kubernetes for Beginners: An Introductory Guide
PDF
Kubernetes Networking with Cilium - Deep Dive
PDF
EBPF and Linux Networking
PDF
Routed Provider Networks on OpenStack
PDF
BPF: Tracing and more
PDF
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
eBPF - Rethinking the Linux Kernel
Cilium - overview and recent updates
Kubernetes for Beginners: An Introductory Guide
Kubernetes Networking with Cilium - Deep Dive
EBPF and Linux Networking
Routed Provider Networks on OpenStack
BPF: Tracing and more
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region

What's hot (20)

PDF
Accelerating Envoy and Istio with Cilium and the Linux Kernel
PDF
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
PDF
PostgreSQL on EXT4, XFS, BTRFS and ZFS
PDF
[2018] 오픈스택 5년 운영의 경험
PDF
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
PDF
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
PDF
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
PDF
Writing the Container Network Interface(CNI) plugin in golang
PDF
Kubernetes Deployment Strategies
PPTX
Introduction to kubernetes
PPTX
Introduction to the Container Network Interface (CNI)
PDF
Cloud Native Networking & Security with Cilium & eBPF
PDF
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
PDF
Linux Networking Explained
PDF
Kubernetes Networking | Kubernetes Services, Pods & Ingress Networks | Kubern...
PDF
Introduction to eBPF and XDP
PDF
Extending kubernetes with CustomResourceDefinitions
PDF
[KubeCon NA 2020] containerd: Rootless Containers 2020
PDF
Kubernetes - A Comprehensive Overview
PPTX
Linux Network Stack
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Cilium - Bringing the BPF Revolution to Kubernetes Networking and Security
PostgreSQL on EXT4, XFS, BTRFS and ZFS
[2018] 오픈스택 5년 운영의 경험
BPF & Cilium - Turning Linux into a Microservices-aware Operating System
OVS VXLAN Network Accelaration on OpenStack (VXLAN offload and DPDK) - OpenSt...
[OpenInfra Days Korea 2018] (Track 2) Neutron LBaaS 어디까지 왔니? - Octavia 소개
Writing the Container Network Interface(CNI) plugin in golang
Kubernetes Deployment Strategies
Introduction to kubernetes
Introduction to the Container Network Interface (CNI)
Cloud Native Networking & Security with Cilium & eBPF
[오픈소스컨설팅]쿠버네티스를 활용한 개발환경 구축
Linux Networking Explained
Kubernetes Networking | Kubernetes Services, Pods & Ingress Networks | Kubern...
Introduction to eBPF and XDP
Extending kubernetes with CustomResourceDefinitions
[KubeCon NA 2020] containerd: Rootless Containers 2020
Kubernetes - A Comprehensive Overview
Linux Network Stack
Ad

Similar to Scale Kubernetes to support 50000 services (20)

PPTX
Scaling Kubernetes to Support 50000 Services.pptx
PDF
Open vSwitch Introduction
PPTX
Using OpenStack In a Traditional Hosting Environment
PDF
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
PDF
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
PPT
Enabling Active Networks Services on A Gigabit Routing Switch
PDF
Presentation oracle net services
PDF
Introduction to istio
PPTX
Kubernetes Internals
PDF
Load Balancing 101
PDF
OpenKilda: Stream Processing Meets Openflow
PDF
Kubernetes Networking 101 kubecon EU 2022
PPT
Distributed & Highly Available server applications in Java and Scala
PDF
DPDK Summit 2015 - HP - Al Sanders
PDF
SRE NL MeetUp - eBPF.pdf
PDF
Managing Microservices With The Istio Service Mesh on Kubernetes
PDF
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
PDF
Tech Talk by John Casey (CTO) CPLANE_NETWORKS : High Performance OpenStack Ne...
PDF
SDN/NFV: Service Chaining
PDF
Evolution of kube-proxy (Brussels, Fosdem 2020)
Scaling Kubernetes to Support 50000 Services.pptx
Open vSwitch Introduction
Using OpenStack In a Traditional Hosting Environment
How to Migrate 100 Clusters from On-Prem to Google Cloud Without Downtime
Kubermatic How to Migrate 100 Clusters from On-Prem to Google Cloud Without D...
Enabling Active Networks Services on A Gigabit Routing Switch
Presentation oracle net services
Introduction to istio
Kubernetes Internals
Load Balancing 101
OpenKilda: Stream Processing Meets Openflow
Kubernetes Networking 101 kubecon EU 2022
Distributed & Highly Available server applications in Java and Scala
DPDK Summit 2015 - HP - Al Sanders
SRE NL MeetUp - eBPF.pdf
Managing Microservices With The Istio Service Mesh on Kubernetes
KubeCon EU 2016: Creating an Advanced Load Balancing Solution for Kubernetes ...
Tech Talk by John Casey (CTO) CPLANE_NETWORKS : High Performance OpenStack Ne...
SDN/NFV: Service Chaining
Evolution of kube-proxy (Brussels, Fosdem 2020)
Ad

More from LinuxCon ContainerCon CloudOpen China (20)

PDF
SecurityPI - Hardening your IoT endpoints in Home.
PDF
kdump: usage and_internals
PDF
Status of Embedded Linux
PDF
Building a Better Thermostat
PDF
Flowchain: A case study on building a Blockchain for the IoT
PDF
Secure Containers with EPT Isolation
PDF
Open Source Software Business Models Redux
PDF
PDF
Running Legacy Applications with Containers
PDF
Introduction to OCI Image Technologies Serving Container
PDF
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
PDF
Policy-based Resource Placement
PDF
From Resilient to Antifragile Chaos Engineering Primer
PDF
PDF
See what happened with real time kvm when building real time cloud pezhang@re...
PDF
PDF
How Open Source Communities do Standardization
PDF
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
PDF
Linuxcon secureefficientcontainerimagemanagementharbor
PDF
Fully automated kubernetes deployment and management
SecurityPI - Hardening your IoT endpoints in Home.
kdump: usage and_internals
Status of Embedded Linux
Building a Better Thermostat
Flowchain: A case study on building a Blockchain for the IoT
Secure Containers with EPT Isolation
Open Source Software Business Models Redux
Running Legacy Applications with Containers
Introduction to OCI Image Technologies Serving Container
Rebuild - Simplifying Embedded and IoT Development Using Linux Containers
Policy-based Resource Placement
From Resilient to Antifragile Chaos Engineering Primer
See what happened with real time kvm when building real time cloud pezhang@re...
How Open Source Communities do Standardization
High Performance Linux Virtual Machine on Microsoft Azure: SR-IOV Networking ...
Linuxcon secureefficientcontainerimagemanagementharbor
Fully automated kubernetes deployment and management

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Electronic commerce courselecture one. Pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PPTX
Cloud computing and distributed systems.
PPTX
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Advanced Soft Computing BINUS July 2025.pdf
Network Security Unit 5.pdf for BCA BBA.
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
NewMind AI Monthly Chronicles - July 2025
Advanced methodologies resolving dimensionality complications for autism neur...
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Review of recent advances in non-invasive hemoglobin estimation
Spectral efficient network and resource selection model in 5G networks
Electronic commerce courselecture one. Pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Cloud computing and distributed systems.
MYSQL Presentation for SQL database connectivity

Scale Kubernetes to support 50000 services

  • 1. Scale Kubernetes to Support 50,000 Services Haibin Michael Xie, Senior Staff Engineer/Architect, Huawei
  • 2. Agenda • Challenges while scaling services • Solutions and prototypes • Performance data • Q&A
  • 3. Master What are the Challenges while Scaling Services • Control plane (Master, kubelet, kube-proxy) API Server Controller Manager Scheduler Node KubeProxy … ETCD Kubelet Load Balancer • Deploy services and pods • Propagate endpoints • Add/remove services in load balancer • Propagate endpoints • Data plane (load balancer) Pod Pod Pod Node KubeProxy Kubelet Load Balancer Pod Pod Pod
  • 4. API Server Control Plane ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … N nodes per cluster M pods per second QPS: N*M endpoints per second Service deployed Pod deployed and scheduled
  • 6. API Server Control Plane ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … N nodes per cluster M pods per second QPS: N*M endpoints per second Load: N*M*(M+1)/2 addresses per second
  • 7. Control Plane Solution 1. Partition endpoints object into multiple objects • Pros: reduce Endpoints object size • Cons: increase # of objects and requests 2. Central load balancer • Pros: reduce connections and requests to API server • Cons: one more hop in service routing, require strong HA, limited LB scalability 3. Batch creating/updating endpoints • Timer based, no change to data structure in ETCD • Pros: reduce QPS • Cons: E2E latency is increased by Batch interval
  • 8. API Server Control Plane Solution ETCD services Controller Manager pods endpoints Endpoints Controller Node KubeProxy Node KubeProxy Node KubeProxy … … QPS: N*M per second Load: N*M*(M+1)/2 addresses per second Timer and batch QPS: N*M per second Load: N*M*(M+1)/2 addresses per second QPS: N per second Load: N*M addresses per second N nodes per cluster M pods per second
  • 9. Batch Processing Requests Reduction One batch per 0.5 second.  QPS:reduced 98% Pods per Service Number of Service EndPoints Controller # of Requests Before After Reduction 10 100 551 10 98.2% 150 785 14 98.2% 200 1105 17 98.5% Test setup: 1 Master,4 slaves 16 core 2.60GHz, 48GB RAM
  • 10. Batch Processing E2E Latency Reduction Latency: reduced 60+% Pods per Service Number of Service E2E Latency (Second) Before After Reduction 10 100 8.5 3.5 59.1% 150 13.5 5.3 60.9% 200 22.8 7.8 65.8%
  • 11. Data Panel • What is IPTables? • iptables is a user-space application that allows configuring Linux kernel firewall (implemented on top of Netfilter) by configuring chains and rules. • What is Netfilter? A framework provided by the Linux kernel that allows customization of networking-related operations, such as packet filtering, NAT, port translation etc. • Issues with IPTables as load balancer • Latency to access service (routing latency) • Latency to add/remove rule
  • 12. IPTables Example # Iptables –t nat –L –n Chain PREROUTING (policy ACCEPT) target prot opt source destination KUBE-SERVICES all -- anywhere anywhere /* kubernetes service portals */  1 DOCKER all -- anywhere anywhere ADDRTYPE match dst-type LOCAL Chain KUBE-SEP-G3MLSGWVLUPEIMXS (1 references)  4 target prot opt source destination MARK all -- 172.16.16.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.16.2:80 Chain KUBE-SEP-OUBP2X5UG3G4CYYB (1 references) target prot opt source destination MARK all -- 192.168.190.128 anywhere /* default/kubernetes: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/kubernetes: */ tcp to:192.168.190.128:6443 Chain KUBE-SEP-PXEMGP3B44XONJEO (1 references)  4 target prot opt source destination MARK all -- 172.16.91.2 anywhere /* default/webpod-service: */ MARK set 0x4d415351 DNAT tcp -- anywhere anywhere /* default/webpod-service: */ tcp to:172.16.91.2:80 Chain KUBE-SERVICES (2 references)  2 target prot opt source destination KUBE-SVC-N4RX4VPNP4ATLCGG tcp -- anywhere 192.168.3.237 /* default/webpod-service: cluster IP */ tcp dpt:http KUBE-SVC-6N4SJQIF3IX3FORG tcp -- anywhere 192.168.3.1 /* default/kubernetes: cluster IP */ tcp dpt:https KUBE-NODEPORTS all -- anywhere anywhere /* kubernetes service nodeports; NOTE: this must be the last rule in this chain */ ADDRTYPE match dst-type LOCAL Chain KUBE-SVC-6N4SJQIF3IX3FORG (1 references) target prot opt source destination KUBE-SEP-OUBP2X5UG3G4CYYB all -- anywhere anywhere /* default/kubernetes: */ Chain KUBE-SVC-N4RX4VPNP4ATLCGG (1 references)  3 target prot opt source destination KUBE-SEP-G3MLSGWVLUPEIMXS all -- anywhere anywhere /* default/webpod-service: */ statistic mode random probability 0.50000000000 KUBE-SEP-PXEMGP3B44XONJEO all -- anywhere anywhere /* default/webpod-service: */
  • 13. IPTables Service Routing Performance 1 Service (µs) 1000 Services (µs) 10000 Services (µs) 50000 Services (µs) First Service 575 614 1023 1821 Middle Service 575 602 1048 4174 Last Service 575 631 1050 7077 In this test, there is one entry per service in KUBE-SERVICES chain. Where is latency generated? • Long list of rules in a chain • Enumerate through the list to find a service and pod
  • 14. Latency to Add IPTables Rules • Where is the latency generated? • not incremental • copy all rules • make changes • save all rules back • IPTables locked during rule update • Time spent to add one rule when there are 5k services (40k rules): 11 minutes • 20k services (160k rules): 5 hours
  • 15. Data Plane Solution • Re-struct IPTables using search tree (Performance benefit) • Replace IPTables with IPVS (Performance and beyond)
  • 16. VIP Restruct IPTables by Search Tree 10.10.0.0/16 10.10.1.0/24 VIP: 10.10.1.5 VIP:10.10.1.100 10.10.100.0/24 VIP:10.10.100.1 Service VIP range: 10.10.0.0/16 CIDR list = [16, 24], defines tree layout Create 3 services: 10.10.1.5, 10.10.1.100, 10.10.100.1 Search tree based service routing time complexity: , m is tree depth Original service routing time complexity: O(n)
  • 17. What is IPVS • Transport layer load balancer which directs requests for TCP and UDP based services to real servers. • Same to IPTables, IPVS is built on top of Netfilter. • Support 3 load balancing mode: NAT, DR and IP Tunneling.
  • 18. IPVS vs. IPTables IPTables: • Operates tables provided by linux firewall • IPTables is more flexible to manipulate package at different stage: Pre-routing, post-routing, forward, input, output. • IPTables has more operations: SNAT, DNAT, reject packets, port translation etc. Why using IPVS? • Better performance (Hashing vs. Chain) • More load balancing algorithm • Round robin, source/destination hashing. • Based on least load, least connection or locality, can assign weight to server. • Support server health check and connection retry • Support sticky session
  • 19. IPVS Load Balancing Mode in Kubernetes • Not public released yet • No Kubernetes behavior change, complete functionalities: external IP, nodePort etc • Kube-proxy startup parameter mode=IPVS, in addition to original modes: mode=userspace and mode=iptables • Kube-proxy lines of code: 11800 • IPVS mode adds 680 lines of code, dependent on seasaw library
  • 20. IPVS vs. IPTables Latency to Add Rules # of Services 1 5,000 20,000 # of Rules 8 40,000 160,000 IPTables 2 ms 11 min 5 hours IPVS 2 ms 2 ms 2 ms Measured by iptables and ipvsadm, observations:  In IPTables mode, latency to add rule increases significantly when # of service increases  In IPVS mode, latency to add VIP and backend IPs does not increase when # of service increases
  • 21. IPVS vs. IPTables Network Bandwidth Measured by qperf Each service exposes 4 ports (4 entries in KUBE-SERVICES chain) Bandwidth, QPS, Latency have similar pattern ith service first first last first last first last first last first last # of services 1 1000 1000 5000 5000 10000 10000 25000 25000 50000 50000 Bandwidth, IPTables (MB/S) 66.6 64 56 50 38.6 15 6 0 0 0 0 Bandwidth, IPVS (MB/S) 65.3 61.7 55.3 53.5 53.8 43 43.5 30 28.5 24 23.8
  • 22. More Perf/Scalability Work Done • Scale nodes and pods in single cluster • Reduce E2E latency of deploying pods/services • Increase pod deployment throughput • Improve scheduling performance