SlideShare a Scribd company logo
Baker: Scaling OVN with
Kubernetes API Server
Han Zhou
OpenStack Summit Boston 2017
Why OVN?
OVS is GREAT.
OVN makes it GREATER!
2
OVN Challenges
● OVN is distributed, but not fully …
â—‹ Can we distributed Northd?
3
Northd
NB
SB
OVN-Controller
OVS
HV
HV
…
Central
OVSDB
HV
OVN Challenges
â—Ź OVSDB SB
â—‹ No clustering (yet)
4
Northd
NB
SB
OVN-Controller
OVS
HV
HV
…
Central
OVSDB
HV
It is nothing but distributed
state management ...
Scale-out with Baker
â—Ź Distributed northd
â—‹ Computes lflows for local only
â—Ź Scale-out central cluster
â—‹ K8S API server framework
â—‹ Backed by ETCD
â—‹ Clustering
â—Ź Distributed agents
â—‹ Watch for local objects only
â—‹ Translate objects to NB DB
5
Northd
NB
SB
OVN-Controller
OVSHV
Central
ETCD
ETCD
ETCDBaker
API
server
Baker
API
server
Baker
API
server
Baker
Agent
HV
… HV
RESTful
API
One more thing ...
6
â—Ź Northd and ovn-controller are all distributed
â—Ź They process data related to local HV only
But what does this mean?
In terms of overlay ...
7
â—Ź Logical-to-physical mapping states
(port-binding) for connectivity
● Doesn’t scale when everyone talks to
everyone else in a *large* zone
â—‹ Maybe not the case for public
cloud, or small-to-medium
enterprise cloud.
â—‹ But it is typical use case for
eBay’s private cloud.
Are we solving the right problem?
8
â—Ź Connectivity v.s. Segmentation
â—Ź L2 Segmentation v.s. L3 segmentation
â—Ź Address sets (L3) based segmentation
â—‹ ACL: default deny, whitelist access
â—‹ IPAM:
â–  Use ip efficiently
â–  Summarized CIDRs to reduce address set size
Flat network
9
â—Ź Reuse OVN abstraction and pipeline
â—‹ Port security
â—‹ ARP proxy
â—‹ ACL
â—‹ LB
○ …
â—‹ But NOT overlay
â—Ź Use localnet port to connect to physical
network directly
â—Ź Data to be processed by each HV
depends on size of AddressSet used by
ACLs that apply to ports on the HV
Baker Object Model
â—Ź Similar as OVN NB Schema
â—‹ Logical Port
â–  Addresses
â–  Port security
â—‹ ACL
â—‹ Address Set
â—‹ Load balancer (TBD)
â—‹ ...
â—Ź Differences
â—‹ No Logical Switch (local)
â—‹ Port-SecGroup binding
â—‹ ACL: SecGroup instead of
individual ports in inport/outport
10
Neutron Plugin
â—Ź Support security group, with API extensions
â—‹ Address set - support external IPs from legacy systems
â—‹ Security group rule packet logging
11
Scalability - Control plane throughput
12
â—Ź Test
â—‹ E2E: Neutron - Baker - OVS
â—‹ Simulated 1k HVs on 10 BMs
â–  OVS/OVN 2.7
â—‹ 1 node Neutron + mysql
â—‹ 1 node Baker API server + ETCD
â–  K8s 1.6 pre-release, etcd 3.0
â—Ź Result for single client (parallel test TBD)
â—‹ Result impacted by SG (address set) size
â—‹ ~3 ports/sec for SG size 1K
Scalability - Latency
13
â—Ź Test
â—‹ E2E from Neutron to OVS flow installation for the port created
â–  Create port from neutron, bind port in ovs on HV
â–  Wait:
â—Ź ovn-nbctl wait-until Logical_Switch_Port <port> up=true
â—Ź ovn-nbctl --wait=hv sync
â—‹ Create ports on top of existing 10K ports, 1K HVs, SG size 1K
â—‹ 10K * 3 (flows/ACL) = 30K flows / ovs port
â—Ź Result
â—‹ Avg 2 sec
Improvement - ovn-controller
14
â—Ź Flow computation blocks flow installation
â—Ź Improvement: avoid repeated computation when in-flight
messages to OVS pending
â—Ź Test result (SG size 10k, flow installation for 10 ports on HV):
â—‹ 10k * 3 * 10 = 300k OVS flows
â—‹ Before: 50 min
â—‹ After: 16 sec
Other Lessons learned
15
â—Ź Postpone ACL expanding from Neutron to HV
â—‹ Introduce port-group binding object in Baker
○ Use port-group instead of lport in “inport/outport” in ACL
â—‹ Baker agent expand ACL on HV for local lports only
â—‹ Benefit:
â–  Reduced Neutron overhead
â–  Reduced API calls from Neutron to Baker
â–  Reduced data size in Baker
Other Lessons learned
16
â—Ź Baker RESTful API: use Protobuf instead of JSON-RPC
â—‹ 10 - 20 % throughput increase for SG size 1k - 10k
â—‹ Lower CPU cost on API-server
Thanks!
Q & A

More Related Content

PPTX
OVN Controller Incremental Processing
PDF
Large scale overlay networks with ovn: problems and solutions
PPTX
OVN operationalization at scale at eBay
PDF
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
PDF
Kraken mesoscon 2018
PPTX
Rocks db state store in structured streaming
PDF
Inter-process communication on steroids
PDF
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond
OVN Controller Incremental Processing
Large scale overlay networks with ovn: problems and solutions
OVN operationalization at scale at eBay
QNIBTerminal: Understand your datacenter by overlaying multiple information l...
Kraken mesoscon 2018
Rocks db state store in structured streaming
Inter-process communication on steroids
Scylla Summit 2022: The Future of Consensus in ScyllaDB 5.0 and Beyond

What's hot (20)

PDF
BKK16-203 Irq prediction or how to better estimate idle time
 
PDF
20160401 guster-roadmap
PDF
M|18 Deep Dive: InnoDB Transactions and Write Paths
PDF
HBaseCon2017 Transactions in HBase
PDF
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
PDF
Tungsten University: Introduction to Continuent Tungsten 2.0
PDF
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
PDF
Gluster for sysadmins
PDF
CRuby Committers Who's Who in 2013
PDF
Ceph Block Devices: A Deep Dive
PDF
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
PDF
Simon
PPTX
M|18 Battle of the Online Schema Change Methods
PDF
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
PDF
Cloud storage: the right way OSS EU 2018
PDF
New bare-metal provisioning setup built around Collins
PDF
hbaseconasia2017: HBase Practice At XiaoMi
PDF
Fast, deterministic, and verifiable computations with WebAssembly. WASM on th...
PDF
Couchbase live 2016
PDF
A day in the life of a log message
BKK16-203 Irq prediction or how to better estimate idle time
 
20160401 guster-roadmap
M|18 Deep Dive: InnoDB Transactions and Write Paths
HBaseCon2017 Transactions in HBase
KubeCon EU 2019 - P2P Docker Image Distribution in Hybrid Cloud Environment w...
Tungsten University: Introduction to Continuent Tungsten 2.0
OSDC 2013 | Distributed Storage with GlusterFS by Dr. Udo Seidel
Gluster for sysadmins
CRuby Committers Who's Who in 2013
Ceph Block Devices: A Deep Dive
Live migration: pros, cons and gotchas -- Pavel Emelyanov
 
Simon
M|18 Battle of the Online Schema Change Methods
Flink Forward Berlin 2017: Tzu-Li (Gordon) Tai - Managing State in Apache Flink
Cloud storage: the right way OSS EU 2018
New bare-metal provisioning setup built around Collins
hbaseconasia2017: HBase Practice At XiaoMi
Fast, deterministic, and verifiable computations with WebAssembly. WASM on th...
Couchbase live 2016
A day in the life of a log message
Ad

Similar to Baker: Scaling OVN with Kubernetes API Server (20)

PPTX
OVN DBs HA with scale test
PPTX
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
PDF
Kubernetes from scratch at veepee sysadmins days 2019
PPTX
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
PDF
Open Source Backends for OpenStack Neutron
PDF
Ovn vancouver
PDF
OpenDaylight OpenStack Integration
PPTX
OVN - Basics and deep dive
PDF
Distributed routing
PPTX
Kubernetes @ Squarespace: Kubernetes in the Datacenter
PDF
LF_OVS_17_State of the OVN
PDF
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
PDF
WSO2 Kubernetes Reference Architecture - Nov 2017
PDF
hbaseconasia2017: hbase-2.0.0
PDF
Moving from CellsV1 to CellsV2 at CERN
PDF
HKG15-301: OVS implemented via ODP & vendor SDKs
 
PPTX
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
PPTX
OpenEBS hangout #4
PDF
Open vSwitch for networking solution for L2
PDF
Automating auto-scaled load balancer based on linux and vm orchestrator
OVN DBs HA with scale test
2014 OpenStack Summit - Neutron OVS to LinuxBridge Migration
Kubernetes from scratch at veepee sysadmins days 2019
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Open Source Backends for OpenStack Neutron
Ovn vancouver
OpenDaylight OpenStack Integration
OVN - Basics and deep dive
Distributed routing
Kubernetes @ Squarespace: Kubernetes in the Datacenter
LF_OVS_17_State of the OVN
Hyperconverged Cloud, Not just a toy anymore - Andrew Hatfield, Red Hat
WSO2 Kubernetes Reference Architecture - Nov 2017
hbaseconasia2017: hbase-2.0.0
Moving from CellsV1 to CellsV2 at CERN
HKG15-301: OVS implemented via ODP & vendor SDKs
 
Taking Cloud to Extremes: Scaled-down, Highly Available, and Mission-critical...
OpenEBS hangout #4
Open vSwitch for networking solution for L2
Automating auto-scaled load balancer based on linux and vm orchestrator
Ad

Recently uploaded (20)

PPTX
sap open course for s4hana steps from ECC to s4
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation theory and applications.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Empathic Computing: Creating Shared Understanding
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
sap open course for s4hana steps from ECC to s4
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Spectroscopy.pptx food analysis technology
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation theory and applications.pdf
MYSQL Presentation for SQL database connectivity
The Rise and Fall of 3GPP – Time for a Sabbatical?
 
Programs and apps: productivity, graphics, security and other tools
Building Integrated photovoltaic BIPV_UPV.pdf
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Empathic Computing: Creating Shared Understanding
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Unlocking AI with Model Context Protocol (MCP)
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...

Baker: Scaling OVN with Kubernetes API Server

  • 1. Baker: Scaling OVN with Kubernetes API Server Han Zhou OpenStack Summit Boston 2017
  • 2. Why OVN? OVS is GREAT. OVN makes it GREATER! 2
  • 3. OVN Challenges â—Ź OVN is distributed, but not fully … â—‹ Can we distributed Northd? 3 Northd NB SB OVN-Controller OVS HV HV … Central OVSDB HV
  • 4. OVN Challenges â—Ź OVSDB SB â—‹ No clustering (yet) 4 Northd NB SB OVN-Controller OVS HV HV … Central OVSDB HV It is nothing but distributed state management ...
  • 5. Scale-out with Baker â—Ź Distributed northd â—‹ Computes lflows for local only â—Ź Scale-out central cluster â—‹ K8S API server framework â—‹ Backed by ETCD â—‹ Clustering â—Ź Distributed agents â—‹ Watch for local objects only â—‹ Translate objects to NB DB 5 Northd NB SB OVN-Controller OVSHV Central ETCD ETCD ETCDBaker API server Baker API server Baker API server Baker Agent HV … HV RESTful API
  • 6. One more thing ... 6 â—Ź Northd and ovn-controller are all distributed â—Ź They process data related to local HV only But what does this mean?
  • 7. In terms of overlay ... 7 â—Ź Logical-to-physical mapping states (port-binding) for connectivity â—Ź Doesn’t scale when everyone talks to everyone else in a *large* zone â—‹ Maybe not the case for public cloud, or small-to-medium enterprise cloud. â—‹ But it is typical use case for eBay’s private cloud.
  • 8. Are we solving the right problem? 8 â—Ź Connectivity v.s. Segmentation â—Ź L2 Segmentation v.s. L3 segmentation â—Ź Address sets (L3) based segmentation â—‹ ACL: default deny, whitelist access â—‹ IPAM: â–  Use ip efficiently â–  Summarized CIDRs to reduce address set size
  • 9. Flat network 9 â—Ź Reuse OVN abstraction and pipeline â—‹ Port security â—‹ ARP proxy â—‹ ACL â—‹ LB â—‹ … â—‹ But NOT overlay â—Ź Use localnet port to connect to physical network directly â—Ź Data to be processed by each HV depends on size of AddressSet used by ACLs that apply to ports on the HV
  • 10. Baker Object Model â—Ź Similar as OVN NB Schema â—‹ Logical Port â–  Addresses â–  Port security â—‹ ACL â—‹ Address Set â—‹ Load balancer (TBD) â—‹ ... â—Ź Differences â—‹ No Logical Switch (local) â—‹ Port-SecGroup binding â—‹ ACL: SecGroup instead of individual ports in inport/outport 10
  • 11. Neutron Plugin â—Ź Support security group, with API extensions â—‹ Address set - support external IPs from legacy systems â—‹ Security group rule packet logging 11
  • 12. Scalability - Control plane throughput 12 â—Ź Test â—‹ E2E: Neutron - Baker - OVS â—‹ Simulated 1k HVs on 10 BMs â–  OVS/OVN 2.7 â—‹ 1 node Neutron + mysql â—‹ 1 node Baker API server + ETCD â–  K8s 1.6 pre-release, etcd 3.0 â—Ź Result for single client (parallel test TBD) â—‹ Result impacted by SG (address set) size â—‹ ~3 ports/sec for SG size 1K
  • 13. Scalability - Latency 13 â—Ź Test â—‹ E2E from Neutron to OVS flow installation for the port created â–  Create port from neutron, bind port in ovs on HV â–  Wait: â—Ź ovn-nbctl wait-until Logical_Switch_Port <port> up=true â—Ź ovn-nbctl --wait=hv sync â—‹ Create ports on top of existing 10K ports, 1K HVs, SG size 1K â—‹ 10K * 3 (flows/ACL) = 30K flows / ovs port â—Ź Result â—‹ Avg 2 sec
  • 14. Improvement - ovn-controller 14 â—Ź Flow computation blocks flow installation â—Ź Improvement: avoid repeated computation when in-flight messages to OVS pending â—Ź Test result (SG size 10k, flow installation for 10 ports on HV): â—‹ 10k * 3 * 10 = 300k OVS flows â—‹ Before: 50 min â—‹ After: 16 sec
  • 15. Other Lessons learned 15 â—Ź Postpone ACL expanding from Neutron to HV â—‹ Introduce port-group binding object in Baker â—‹ Use port-group instead of lport in “inport/outport” in ACL â—‹ Baker agent expand ACL on HV for local lports only â—‹ Benefit: â–  Reduced Neutron overhead â–  Reduced API calls from Neutron to Baker â–  Reduced data size in Baker
  • 16. Other Lessons learned 16 â—Ź Baker RESTful API: use Protobuf instead of JSON-RPC â—‹ 10 - 20 % throughput increase for SG size 1k - 10k â—‹ Lower CPU cost on API-server