SlideShare a Scribd company logo
© 2018 LUMINA NETWORKS, INC. © 2018 LUMINA NETWORKS, INC.
SDN Meetup
Hitless Controller Upgrade
© 2018 LUMINA NETWORKS, INC.
Introduction
Towards a Hitless upgrade.
• Traditional Network Upgrades
– Closed Systems
• HW and Control Bundled (From the one Vendor)
• HW upgrade sometimes requires Control plane refresh
– Line card needs new OS and/or RE upgrade.
– Large Events
• Sometimes Months of Planning
• Failure is handled by rollback
– End Game is lots of small Automated Upgrades.
© 2018 LUMINA NETWORKS, INC.
Brutal Automation is the only way
Its easy to regress back to inefficient practices.
• Arash Ashouriha, Deutsche Telekom AG (NYSE: DT)'s deputy
chief technology officer, said the only way that his company could
now succeed was through a process of "brutal automation.”
THE HAGUE -- SDN NFV World Congress 2017
© 2018 LUMINA NETWORKS, INC.
Controller Upgrade CI/CD Toolsets
Software Practices and Toolsets that need to be employed.
• Upgrades MUST be Automated.
• Automated Dev Test Framework.
– NO Shortcuts!
• Pre Validation Checks.
• Engineer Hands off Upgrade Process.
• Post Validation Checks.
• Automated Rollback.
• Post Rollback Validations.
© 2018 LUMINA NETWORKS, INC.
Data and Control Layer Separation
• Data Plane
– Rule driven
• Openflow rules
• Configured by application on controller
– Isolated from control plane
– Benefits of no control traffic between nodes
– Decisions made by application
– Any "white box" with OF interface
– Flows and groups are static until reprogrammed
© 2018 LUMINA NETWORKS, INC.
Data and Control Layer Separation
• Control Plane
– Application/"Flow Manager"
• Controller acts as message bus
• Application calculates flows/groups
– Receives LLDP from nodes
• Topology built
– Shares/Distributes network state to all Controllers
– Drives potential for "hitless" upgrade
– Has it’s challenges…
© 2018 LUMINA NETWORKS, INC.
Challenges with Openflow Hitless Upgrade
Can it be hitless?
Types of Changes we need to understand.
• Controller APP Change
– Path Computational change that requires an algorithm change
– Service Change (new way of using abstracted resources)
• Controller Change
– Project Updates - openflow plugin /stats manager /topology manager etc
– Plugin Updates - openflow 1.3 -> 1.4
– MDSAL/Model changes - yang model changes
• Dataplane Pipeline
– No Pipeline Change >>>> HITLESS ☺
• Flows, Groups, Tables stay the same
– Pipeline Change
• Flows, Groups, Tables are not supporting new Pipeline
© 2018 LUMINA NETWORKS, INC.
Controller APP Change
• Can you overlay a PCE Change?
• New LSP Mesh / SR topo (Nodes SID)
• Even if you could handle a new Label base, you need to handle:
– Match Duplication (on ingress)
• How would you handle this?
– Action Duplication (on egress)
• Resource Limits
– Group Limits - stats manager with lots of groups - clustering then replicates
that data
– Flow limits
© 2018 LUMINA NETWORKS, INC.
Controller Infrastructure
• Plugin Changes
– Experimenter (mechanism for proprietary messages within the
protocol)
– Version Bump
• Controller Project Changes
– Is Hitless Upgrade Considered Part of the Project?
– Namespace
– Functionality
© 2018 LUMINA NETWORKS, INC.
No PCE change or Pipeline change (Easiest Scenario) But we still
have to be aware of:
• Group Limits
• Flow Limits
• Stats Manager
– Reconciling Flows
– General Load (lots of data)
No pipeline change
© 2018 LUMINA NETWORKS, INC.
• Flow and or Group type changes.
– Flows actions you may need change
• Ingress flow now has a new action?
– Group Tables you may need change
• Change from All to a Hierarchy
– New Tables
• Table reassignment
• Flow and group tables perform different functions
• Packet match lookups/forwarding
Pipeline Change
© 2018 LUMINA NETWORKS, INC.
Node Upgrades
• Switch OS upgrade
– Remove from service
• Rerouting any transit services
• Got ingress or egress services?
– They are dual homed right? If they aren’t, well..
– Upgrade
– Check
– Place Back into Service.
© 2018 LUMINA NETWORKS, INC.
Controller & Application Upgrades
• Option A
• Single cluster
• Disconnect switches - data plane continues, flows/groups state is persistent
• Perform upgrade
• Re-deploy
• Reconnect Switches
• Reliably manage outage window
• Not completely hitless
© 2018 LUMINA NETWORKS, INC.
Multi Site Cluster/Controller groups
Not so easy
• Option B
• Idea of having a fall back cluster
• Increased redundancy, Increased cost
• Point switches to this cluster - if datastore are shared across both clusters, can
upgrade one cluster at a time
• Will this be hitless?
• Key lies in what is actually being upgraded
• However - hitless rollback if required
• Saves production state in case of emergency
© 2018 LUMINA NETWORKS, INC.
How we do it
Not so easy
• Avoiding initial data plane impact
– Prepare
• Stop running controller process
• Disconnect controllers from switches
• Environment tools - orchestration/monitoring systems
– Checks
• Switch connections
• Controller status
• Data plane
– Upgrade
© 2018 LUMINA NETWORKS, INC.
Automation Tools
• Software provisioning/IT automation
• Completely hands off - process driven upgrade
• Operational ready process - tested and proven
• Powerful automation tool - Ansible Project
• Concept of roles/playbooks and inventories
– Pre-Check
• Ability to check for existing packages/files/information
• Make decisions based on OS
• Run native/non-native commands direct to servers
– Upgrade
• Copy, move and edit files
• Extract and install packages
• Native Linux Functionality built into native ansible commands
– Post-Check
• Validation
• File cksum checks
• Application Config
© 2018 LUMINA NETWORKS, INC.
In-house DevOps Tools
• Compare and validate datastore with switches
• Use to understand current state of network -
– Nodes?
• LLDP received?
– Links?
• Is topology built internally?
• Is appropriate topology datastore populated correctly?
– Flows?
• Comparison of operational/config datastore
• Are flows reported on switches and in operational?
• Verify correct flow and group calculation
© 2018 LUMINA NETWORKS, INC.
Challenges
• Lab and Production environment differences
• Users/Permissions
• Directory Structure
• Addressing schemes
• Resource limitation
• Hard to get "identical" production environment
• Inventory management
• Variables, secrets, package versioning
• Process needs to be "bullet proof"
• Tested/Refined,Feedback, etc
• CI/CD
• Accounting for differences between lab and production can be tricky
• Product Changes/Customer tool changes
• Changes in orchestration applications
• Application namespace changes and functionality changes
• Regression testing needs to be thorough and capture corner cases
• Appropriate testing framework
© 2018 LUMINA NETWORKS, INC.
Way around the challenges
• Automation, automation, automation
• Know the environment/product well enough to automate the entire process
• Automated Testing framework - thorough use case and functionality testing
• No changes implemented that aren’t tested
• No engineering "hands on" during upgrade
• Anyone can run the upgrade is the goal
• Knowledge
– Knowledge is in the process
– Knowledge is in the automation and toolset / CI/CD
– Efficiency, effectiveness - not reliant on individuals or their knowledge in
constantly changing industry
© 2018 LUMINA NETWORKS, INC.
Thank you!

More Related Content

PDF
Lumina Networks Overview
PDF
5G in Brownfield how SDN makes 5G Deployments Work
PPTX
OSN Bay Area Feb 2019 Meetup: Lumina Networks, Unlocking a digital future
PDF
OSN Bay Area Feb 2019 Meetup: ONAP Edge, 5G and Beyond
PPSX
CISCO: Accelerating Small Cell Deployments in the Enterprise
PDF
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
PDF
Edge and 5G: What is in it for the developers?
Lumina Networks Overview
5G in Brownfield how SDN makes 5G Deployments Work
OSN Bay Area Feb 2019 Meetup: Lumina Networks, Unlocking a digital future
OSN Bay Area Feb 2019 Meetup: ONAP Edge, 5G and Beyond
CISCO: Accelerating Small Cell Deployments in the Enterprise
93136540 spider-cloud-small-cell-cluster-case-study-091911-final
Edge and 5G: What is in it for the developers?

What's hot (20)

PDF
Networking Challenges for the Next Decade
PDF
Network Evolution and Market Outlook
PDF
btNOG 5: Network Automation
PDF
Enabling the Digital Leap: Strategies for K–12 Schools
PDF
Intel® Network Builders - Network Edge Ecosystem Program
PDF
Delivering Carrier Grade OCP for Virtualized Data Centers
PPTX
Next generation WAN Webinar
PDF
Software-Defined WAN 101
PDF
Verizon Managed SD-WAN with Cisco IWAN
PDF
SDN-Based Enterprise Connectivity Service
PDF
Colt’s Carrier SDN & NFV: Experience, Learnings & Future Plans
PPTX
Benefits of disaggregation and open source networking in data centers
PDF
Updates on NFV and SDN Activities from the Broadband Forum
PDF
Approaches to Network Automation
PPTX
SDN and NFV Value in Business Services - A Presentation By Cox Communications
PDF
Accelerating Application Delivery with Cisco and F5
PDF
Wireless Network Optimization (2010)
PDF
Tech Talk by Tim Van Herck: SDN & NFV for WAN
PDF
Realizing a Multi-Layer Transport SDN: Practical Considerations and Implement...
PDF
Cisco IWAN – Intelligent Connectivity for Today’s Reality
Networking Challenges for the Next Decade
Network Evolution and Market Outlook
btNOG 5: Network Automation
Enabling the Digital Leap: Strategies for K–12 Schools
Intel® Network Builders - Network Edge Ecosystem Program
Delivering Carrier Grade OCP for Virtualized Data Centers
Next generation WAN Webinar
Software-Defined WAN 101
Verizon Managed SD-WAN with Cisco IWAN
SDN-Based Enterprise Connectivity Service
Colt’s Carrier SDN & NFV: Experience, Learnings & Future Plans
Benefits of disaggregation and open source networking in data centers
Updates on NFV and SDN Activities from the Broadband Forum
Approaches to Network Automation
SDN and NFV Value in Business Services - A Presentation By Cox Communications
Accelerating Application Delivery with Cisco and F5
Wireless Network Optimization (2010)
Tech Talk by Tim Van Herck: SDN & NFV for WAN
Realizing a Multi-Layer Transport SDN: Practical Considerations and Implement...
Cisco IWAN – Intelligent Connectivity for Today’s Reality
Ad

Similar to Hitless Controller Upgrades (20)

PDF
Consistent Updates in Software-De!ned Networks
PDF
Network Automation Journey, A systems engineer NetOps perspective
PDF
Transforming to Enable 5G
PPTX
Microservices K8S
PDF
Openstack upgrade without_down_time_20141103r1
PDF
Kick starting Network Automation
PPTX
Infrastructure Automation
PDF
Immediate download DevOps for networking boost your organization's growth by ...
PDF
DevOps for networking boost your organization's growth by incorporating netwo...
PDF
Quick wins in the NetOps Journey by Vincent Boon, Opengear
PPT
Cumulus networks - Overcoming traditional network limitations with open source
PDF
Forward Networks - Networking Field Day 13 presentation
PDF
Complete Download DevOps for networking boost your organization's growth by i...
PDF
PDF DevOps for networking boost your organization's growth by incorporating n...
PPTX
Forward Networks - Networking Field Day 13 presentation
PDF
Openstack devops challenges
PDF
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
PDF
OpenStack Summit Vancouver: Lessons learned on upgrades
PDF
Future of Kubernetes and its Impact on Technology Industry.pdf
PPTX
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Consistent Updates in Software-De!ned Networks
Network Automation Journey, A systems engineer NetOps perspective
Transforming to Enable 5G
Microservices K8S
Openstack upgrade without_down_time_20141103r1
Kick starting Network Automation
Infrastructure Automation
Immediate download DevOps for networking boost your organization's growth by ...
DevOps for networking boost your organization's growth by incorporating netwo...
Quick wins in the NetOps Journey by Vincent Boon, Opengear
Cumulus networks - Overcoming traditional network limitations with open source
Forward Networks - Networking Field Day 13 presentation
Complete Download DevOps for networking boost your organization's growth by i...
PDF DevOps for networking boost your organization's growth by incorporating n...
Forward Networks - Networking Field Day 13 presentation
Openstack devops challenges
When DevOps and Networking Intersect by Brent Salisbury of socketplane.io
OpenStack Summit Vancouver: Lessons learned on upgrades
Future of Kubernetes and its Impact on Technology Industry.pdf
Room 2 - 4 - Juncheng Anthony Lin - Redhat - A Practical Approach to Traditio...
Ad

Recently uploaded (20)

PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PPTX
Internet of Things (IOT) - A guide to understanding
PPTX
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
composite construction of structures.pdf
PDF
Digital Logic Computer Design lecture notes
PPTX
Sustainable Sites - Green Building Construction
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
Model Code of Practice - Construction Work - 21102022 .pdf
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
Arduino robotics embedded978-1-4302-3184-4.pdf
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
Welding lecture in detail for understanding
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PDF
Well-logging-methods_new................
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
The CXO Playbook 2025 – Future-Ready Strategies for C-Suite Leaders Cerebrai...
CYBER-CRIMES AND SECURITY A guide to understanding
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT-1 - COAL BASED THERMAL POWER PLANTS
Internet of Things (IOT) - A guide to understanding
Engineering Ethics, Safety and Environment [Autosaved] (1).pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
composite construction of structures.pdf
Digital Logic Computer Design lecture notes
Sustainable Sites - Green Building Construction
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Structs to JSON How Go Powers REST APIs.pdf
Model Code of Practice - Construction Work - 21102022 .pdf
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
Arduino robotics embedded978-1-4302-3184-4.pdf
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Welding lecture in detail for understanding
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
Well-logging-methods_new................

Hitless Controller Upgrades

  • 1. © 2018 LUMINA NETWORKS, INC. © 2018 LUMINA NETWORKS, INC. SDN Meetup Hitless Controller Upgrade
  • 2. © 2018 LUMINA NETWORKS, INC. Introduction Towards a Hitless upgrade. • Traditional Network Upgrades – Closed Systems • HW and Control Bundled (From the one Vendor) • HW upgrade sometimes requires Control plane refresh – Line card needs new OS and/or RE upgrade. – Large Events • Sometimes Months of Planning • Failure is handled by rollback – End Game is lots of small Automated Upgrades.
  • 3. © 2018 LUMINA NETWORKS, INC. Brutal Automation is the only way Its easy to regress back to inefficient practices. • Arash Ashouriha, Deutsche Telekom AG (NYSE: DT)'s deputy chief technology officer, said the only way that his company could now succeed was through a process of "brutal automation.” THE HAGUE -- SDN NFV World Congress 2017
  • 4. © 2018 LUMINA NETWORKS, INC. Controller Upgrade CI/CD Toolsets Software Practices and Toolsets that need to be employed. • Upgrades MUST be Automated. • Automated Dev Test Framework. – NO Shortcuts! • Pre Validation Checks. • Engineer Hands off Upgrade Process. • Post Validation Checks. • Automated Rollback. • Post Rollback Validations.
  • 5. © 2018 LUMINA NETWORKS, INC. Data and Control Layer Separation • Data Plane – Rule driven • Openflow rules • Configured by application on controller – Isolated from control plane – Benefits of no control traffic between nodes – Decisions made by application – Any "white box" with OF interface – Flows and groups are static until reprogrammed
  • 6. © 2018 LUMINA NETWORKS, INC. Data and Control Layer Separation • Control Plane – Application/"Flow Manager" • Controller acts as message bus • Application calculates flows/groups – Receives LLDP from nodes • Topology built – Shares/Distributes network state to all Controllers – Drives potential for "hitless" upgrade – Has it’s challenges…
  • 7. © 2018 LUMINA NETWORKS, INC. Challenges with Openflow Hitless Upgrade Can it be hitless? Types of Changes we need to understand. • Controller APP Change – Path Computational change that requires an algorithm change – Service Change (new way of using abstracted resources) • Controller Change – Project Updates - openflow plugin /stats manager /topology manager etc – Plugin Updates - openflow 1.3 -> 1.4 – MDSAL/Model changes - yang model changes • Dataplane Pipeline – No Pipeline Change >>>> HITLESS ☺ • Flows, Groups, Tables stay the same – Pipeline Change • Flows, Groups, Tables are not supporting new Pipeline
  • 8. © 2018 LUMINA NETWORKS, INC. Controller APP Change • Can you overlay a PCE Change? • New LSP Mesh / SR topo (Nodes SID) • Even if you could handle a new Label base, you need to handle: – Match Duplication (on ingress) • How would you handle this? – Action Duplication (on egress) • Resource Limits – Group Limits - stats manager with lots of groups - clustering then replicates that data – Flow limits
  • 9. © 2018 LUMINA NETWORKS, INC. Controller Infrastructure • Plugin Changes – Experimenter (mechanism for proprietary messages within the protocol) – Version Bump • Controller Project Changes – Is Hitless Upgrade Considered Part of the Project? – Namespace – Functionality
  • 10. © 2018 LUMINA NETWORKS, INC. No PCE change or Pipeline change (Easiest Scenario) But we still have to be aware of: • Group Limits • Flow Limits • Stats Manager – Reconciling Flows – General Load (lots of data) No pipeline change
  • 11. © 2018 LUMINA NETWORKS, INC. • Flow and or Group type changes. – Flows actions you may need change • Ingress flow now has a new action? – Group Tables you may need change • Change from All to a Hierarchy – New Tables • Table reassignment • Flow and group tables perform different functions • Packet match lookups/forwarding Pipeline Change
  • 12. © 2018 LUMINA NETWORKS, INC. Node Upgrades • Switch OS upgrade – Remove from service • Rerouting any transit services • Got ingress or egress services? – They are dual homed right? If they aren’t, well.. – Upgrade – Check – Place Back into Service.
  • 13. © 2018 LUMINA NETWORKS, INC. Controller & Application Upgrades • Option A • Single cluster • Disconnect switches - data plane continues, flows/groups state is persistent • Perform upgrade • Re-deploy • Reconnect Switches • Reliably manage outage window • Not completely hitless
  • 14. © 2018 LUMINA NETWORKS, INC. Multi Site Cluster/Controller groups Not so easy • Option B • Idea of having a fall back cluster • Increased redundancy, Increased cost • Point switches to this cluster - if datastore are shared across both clusters, can upgrade one cluster at a time • Will this be hitless? • Key lies in what is actually being upgraded • However - hitless rollback if required • Saves production state in case of emergency
  • 15. © 2018 LUMINA NETWORKS, INC. How we do it Not so easy • Avoiding initial data plane impact – Prepare • Stop running controller process • Disconnect controllers from switches • Environment tools - orchestration/monitoring systems – Checks • Switch connections • Controller status • Data plane – Upgrade
  • 16. © 2018 LUMINA NETWORKS, INC. Automation Tools • Software provisioning/IT automation • Completely hands off - process driven upgrade • Operational ready process - tested and proven • Powerful automation tool - Ansible Project • Concept of roles/playbooks and inventories – Pre-Check • Ability to check for existing packages/files/information • Make decisions based on OS • Run native/non-native commands direct to servers – Upgrade • Copy, move and edit files • Extract and install packages • Native Linux Functionality built into native ansible commands – Post-Check • Validation • File cksum checks • Application Config
  • 17. © 2018 LUMINA NETWORKS, INC. In-house DevOps Tools • Compare and validate datastore with switches • Use to understand current state of network - – Nodes? • LLDP received? – Links? • Is topology built internally? • Is appropriate topology datastore populated correctly? – Flows? • Comparison of operational/config datastore • Are flows reported on switches and in operational? • Verify correct flow and group calculation
  • 18. © 2018 LUMINA NETWORKS, INC. Challenges • Lab and Production environment differences • Users/Permissions • Directory Structure • Addressing schemes • Resource limitation • Hard to get "identical" production environment • Inventory management • Variables, secrets, package versioning • Process needs to be "bullet proof" • Tested/Refined,Feedback, etc • CI/CD • Accounting for differences between lab and production can be tricky • Product Changes/Customer tool changes • Changes in orchestration applications • Application namespace changes and functionality changes • Regression testing needs to be thorough and capture corner cases • Appropriate testing framework
  • 19. © 2018 LUMINA NETWORKS, INC. Way around the challenges • Automation, automation, automation • Know the environment/product well enough to automate the entire process • Automated Testing framework - thorough use case and functionality testing • No changes implemented that aren’t tested • No engineering "hands on" during upgrade • Anyone can run the upgrade is the goal • Knowledge – Knowledge is in the process – Knowledge is in the automation and toolset / CI/CD – Efficiency, effectiveness - not reliant on individuals or their knowledge in constantly changing industry
  • 20. © 2018 LUMINA NETWORKS, INC. Thank you!