SlideShare a Scribd company logo
4
Most read
5
Most read
RIFT: Routing In Fat Trees
A New DC Routing Protocol
Antoni Przygienda & Zhaohui (Jeffrey) Zhang
Juniper Distinguished Engineer
Background: DC Technologies Evolution
• Tree to CLOS topology
• Tree: core/aggregation/access layers
• Folded CLOS, or Fat Tree
• Spine & Leaf
• Layer 2 switching to Layer 3 routing
• Layer 3 routing underlay with Layer 2/3 overlay
• Layer 3 underlay routing: IGP  eBGP  RIFT
• For scaling, convergence and Opex considerations
Source: Arial 12pt.
Issues with LSR IGP
• Link State Routing protocols like OSPF/ISIS have the following issues
when used in large scale DCs
• Failure Impact Scope (aka Blast Radius)
• A small change (e.g. a single link up/down on a leaf) is flooded throughout of an IGP
area, triggering SPF recalculation on every node
• Rich connections make flooding unnecessarily redundant and inefficient
• Every node holds host routes of all servers (aggregation limits prefix mobility
and leads to blackholes on failures)
• As a result, eBGP was introduced as a DC underlay routing
technology
• RFC 7938
Issues with eBGP Solution
• The eBGP solution has the following prominent issues
• Cannot take advantages of well defined network topology
• E.g. ideally a leaf (tier-3) node only needs a default route, and a tier-2 node only needs a default route
and routes for destinations south of it
• This cannot be done due to black-holing upon link failure
• A node needs to keep all paths learnt from different neighbors
• A leaf node connecting to 32 tier-2 nodes needs to keep 32 paths for each of all the prefixes in the DC
• eBGP peering configuration burden
• There are other subtle issues making the eBGP band-aid solution not that
simple
• See appendix slide
RIFT: LINK-STATE UP, DISTANCE VECTOR DOWN & BOUNCE
RIFT 2017, Juniper Confidential
Northbound LSR
• Link State flooded northbound to the top tier
• With flooding reduction
• Each node has full view of the southbound topology
• A top tier node has full set of prefixes from the SPF calculation
• A middle tier node has only information necessary for its level
• All destinations south of the node, from its SPF calculation
• Default route (next slide)
• Potential disaggregated routes (next slide)
• Fast convergence and ECMP benefits of LSR
Southbound Distance Vector Routing
• Default route and automatically disaggregated routes (when needed)
advertised one-hop southbound
• When a level-2 node A detects that another level-2 node B cannot reach one
of A’s south destinations P, it advertises P via southbound DVR
• That way a south level-3 node will route P traffic only towards A (via the more specific
route) not towards B (via the default route)
• A node’s local link state is advertised one-hop southbound and then
reflected one-hop northbound
• So that node A can detect if node B can reach A’s south destinations
• Other than that, link state is not propagated south, greatly reducing impact
scope
AUTOMATIC DE-AGGREGATION
• SOUTH REPRESENTATION OF THE RED
SPINE IS REFLECTED BY THE GREEN
LAYER
• LOWER RED SPINE SEES THAT UPPER
NODE HAS NO ADJACENCY TO THE
ONLY AVAILABLE NEXT-HOP TO P1
• LOWER RED NODE DISAGGREGATES
P1
RIFT 2017, Juniper Confidential
Zero Touch Provisioning
• Only top tier nodes need to be configured
• Nodes that must be leaves or have leaf-leaf connection may be configured
• Nodes with specific configuration can be mixed with others
• Upon connection nodes will fully auto-configure themselves and form
adjacencies in a well defined north/south topology
• With optional east-west connections
• ZTP makes DC fabric like RAM banks
• No one configures RAM banks and CAS/RAS manually in a laptop
• DC fabric HW is largely commodity already
• DC fabric OPEX must and will commoditize
• RIFT enables that
Other Features of RIFT
• Optimal Reduction and Load-Balancing of Flooding
• Mobility Support
• Built-in support for rapid prefix moving from one leaf to another
• Key/Value Store
• Fabric Bandwidth Balancing: weighted all paths routing (RIFT is loop-free)
• Northbound: modify the distance of default route received from a neighbor based on available BW through
that neighbor
• Southbound: during SPF consider available BW through lower level nodes
• Segment Routing Support
• Leaf-to-leaf Procedures
• Allow E-W traffic strictly for local prefixes
• Policy Guided Prefixes
• Moved to a separate draft
Summary of RIFT Advantages
• Advantages of Link-State and Distance
Vector
• Fastest Possible Convergence
• Automatic Detection of Topology
• Minimal Routes/Info on TORs
• High Degree of ECMP
• Fast De-comissioning of Nodes
• Maximum Propagation Speed with Flexible
# Prefixes in an Update
• No Disadvantages of Link-State or
Distance Vector
• Reduced and Balanced Flooding
• Automatic Neighbor Detection
• Unique RIFT Advantages
• True ZTP
• Minimal Blast Radius on Failures
• Can Utilize All Paths Through Fabric
Without Looping
• Automatic Disaggregation on Failures
• Simple Leaf Implementation that Can Scale
Down to Servers
• Key-Value Store
• Horizontal Links Used for Protection Only
• Supports Non-Equal Cost Multipath and
Can Replace MC-LAG
• Optimal Flooding Reduction and Load-
Balancing
Standardization/Industry Status
• IETF Standard Track Working Group
• https://datatracker.ietf.org/wg/rift/about/
• Juniper co-chair
• Base protocol standard targeted at Feb 2019
• YANG and other things forthcoming
• Juniper main author with Cisco/Comcast co-authors
• Other design team members from Bloomberg, HPE, Mellanox and
open source community
• Two independent implementations
• Juniper: available on-box (JUNOS) or standalone package (for public
evaluation)
• Open Source: by Bruno Rijsman (on-going)
Appendices
eBGP Solution Nuances
• ASN related knobs/tricks
• Numbering scheme to control path-hunting
• Allow-own-as knob to reuse private ASNs
• “Relaxed ECMP Multipath” since ECMP over different AS does not normally work with eBGP
• Additional work to get beyond 65K ASNs
• To see fabric topology BGP-LS must be run on top
• Add-path to support multi-homing, N-ECMP and prevent oscillation
• Vendor-specific provisioning & configuration
• Emerging work on peer auto-discovery diametrically opposite to BGP design principals
• “Violations” of BGP FSM (e.g. restart and MRAI timers)
• Reliance on peer-groups to prevent withdraw dispersion and path hunting upon server link
failures
• Less than successful attempts at prefix summary without black-holing/micro-loop
REQUIREMENTS BREAKDOWN (RFC7938+) FOR A “MINIMAL” OPEX
FABRIC”
Peer Discovery/True ZTP/Preventing Cabling Violations ⚠️ ⚠️
Minimal Amount of Routes/Information on ToRs
High Degree of ECMP (BGP needs lots knobs, memory, own-AS-path
violations) and ideally NEC and LFA
⚠️
Non Equal Cost Multi-Path, ECMP Independent Anycast, MC-LAG
Replacement
Traffic Engineering by Next-Hops, Prefix Modifications
See All Links in Topology to Support PCE/SR ⚠️
Carry Opaque Configuration Data (Key-Value) Efficiently ⚠️
Take a Node out of Production Quickly and Without Disruption
Automatic Disaggregation on Failures to Prevent Black-Holing and
Back-Hauling
Minimal Blast Radius on Failures (On Failure Smallest Possible Part of
the Network “Shakes”)
Fastest Possible Convergence on Failures
Bandwidth Load Balancing
Simplest Initial Implementation RIFT 2017, Juniper Confidential

More Related Content

PDF
Introduction to OpenFlow
PDF
Course 102: Lecture 25: Devices and Device Drivers
PDF
Overview of Spanning Tree Protocol (STP & RSTP)
PPTX
Mpls technology
PPTX
MPLS Layer 3 VPN
PDF
Ether channel fundamentals
PDF
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
PPTX
Link Aggregation Control Protocol
Introduction to OpenFlow
Course 102: Lecture 25: Devices and Device Drivers
Overview of Spanning Tree Protocol (STP & RSTP)
Mpls technology
MPLS Layer 3 VPN
Ether channel fundamentals
Cisco Live! :: Cisco ASR 9000 Architecture :: BRKARC-2003 | Milan Jan/2014
Link Aggregation Control Protocol

What's hot (20)

PDF
IBM Power Systems Performance Report
PPTX
VXLAN Practice Guide
PPT
Spanning tree protocol
PDF
Cisco Packet Transport Network – MPLS-TP
PDF
QOS (Quality of Services) - Computer Networks
PDF
Developing SDN apps in Ryu
PDF
From frustration to fascination: dissecting Replication
PDF
Inter as vpn option c
PDF
Overview of Spanning Tree Protocol
PPT
PDF
Cisco IPv6 Tutorial
PPT
Juniper mpls best practice part 2
PDF
Network Security and Visibility through NetFlow
PPTX
Multiprotocol label switching
PDF
Red Hat Enterprise Linux 8
PPT
Galvin-operating System(Ch3)
PPT
PDF
Palo alto networks product overview
PDF
Segment Routing: A Tutorial
PDF
Building DataCenter networks with VXLAN BGP-EVPN
IBM Power Systems Performance Report
VXLAN Practice Guide
Spanning tree protocol
Cisco Packet Transport Network – MPLS-TP
QOS (Quality of Services) - Computer Networks
Developing SDN apps in Ryu
From frustration to fascination: dissecting Replication
Inter as vpn option c
Overview of Spanning Tree Protocol
Cisco IPv6 Tutorial
Juniper mpls best practice part 2
Network Security and Visibility through NetFlow
Multiprotocol label switching
Red Hat Enterprise Linux 8
Galvin-operating System(Ch3)
Palo alto networks product overview
Segment Routing: A Tutorial
Building DataCenter networks with VXLAN BGP-EVPN
Ad

Similar to Routing In Fat Trees (20)

PDF
RIFT A New Approach to Building DC Fabrics
PPTX
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PDF
PLNOG 3: Emil Gągała - SUBSECOND END TO END SERVICE RESTORATION
PDF
PLNOG 8: Emil Gągała - DATA CENTER FABRIC COOKBOOK
PDF
Dynamic Routing with FRR - pfSense Hangout December 2017
PPTX
Layer3protocols
PPTX
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
DOCX
Ospf and eigrp concepts and configuration
PPTX
PLNOG 17 - Krzysztof Wilczyński - EVPN – zwycięzca w wyścigu standardów budow...
PPT
Dynamic Routing All Algorithms, Working And Basics
PDF
Flexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLAN
PPTX
TechWiseTV Workshop: Segment Routing for the Datacenter
PPTX
JCSA2013 05 Pascal Thubert - La frange polymorphe de l'Internet
PPTX
BGP Flowspec (RFC5575) Case study and Discussion
PPTX
How to build resilient industrial networks
PDF
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PDF
PLNOG 13: Julian Lucek: Centralized Traffic Enginnering
PPTX
Cisco Live Milan 2015 - BGP advance
PDF
PLNOG 6: Rafał Szarecki - Routing w Sieci - Praktyczne aspekty implementacji ...
PPT
11Chapter8R_InternetRoutingProtocols.ppt
RIFT A New Approach to Building DC Fabrics
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PLNOG 3: Emil Gągała - SUBSECOND END TO END SERVICE RESTORATION
PLNOG 8: Emil Gągała - DATA CENTER FABRIC COOKBOOK
Dynamic Routing with FRR - pfSense Hangout December 2017
Layer3protocols
MPLS in DC and inter-DC networks: the unified forwarding mechanism for networ...
Ospf and eigrp concepts and configuration
PLNOG 17 - Krzysztof Wilczyński - EVPN – zwycięzca w wyścigu standardów budow...
Dynamic Routing All Algorithms, Working And Basics
Flexible Data Centre Fabric - FabricPath/TRILL, OTV, LISP and VXLAN
TechWiseTV Workshop: Segment Routing for the Datacenter
JCSA2013 05 Pascal Thubert - La frange polymorphe de l'Internet
BGP Flowspec (RFC5575) Case study and Discussion
How to build resilient industrial networks
PLNOG 9: Donald E. Eastlake 3rd - Transparent Interconnection of Lost of Links
PLNOG 13: Julian Lucek: Centralized Traffic Enginnering
Cisco Live Milan 2015 - BGP advance
PLNOG 6: Rafał Szarecki - Routing w Sieci - Praktyczne aspekty implementacji ...
11Chapter8R_InternetRoutingProtocols.ppt
Ad

More from APNIC (20)

PPTX
APNIC Report, presented at APAN 60 by Thy Boskovic
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
PDF
RPKI Status Update, presented by Makito Lay at IDNOG 10
PDF
The Internet -By the Numbers, Sri Lanka Edition
PDF
Triggering QUIC, presented by Geoff Huston at IETF 123
PDF
DNSSEC Made Easy, presented at PHNOG 2025
PDF
BGP Security Best Practices that Matter, presented at PHNOG 2025
PDF
APNIC's Role in the Pacific Islands, presented at Pacific IGF 2205
PDF
IPv6 Deployment and Best Practices, presented by Makito Lay
PDF
Cleaning up your RPKI invalids, presented at PacNOG 35
PDF
The Internet - By the numbers, presented at npNOG 11
PDF
Transmission Control Protocol (TCP) and Starlink
PDF
DDoS in India, presented at INNOG 8 by Dave Phelan
PDF
Global Networking Trends, presented at the India ISP Conclave 2025
PDF
Make DDoS expensive for the threat actors
PDF
Fast Reroute in SR-MPLS, presented at bdNOG 19
PDF
DDos Mitigation Strategie, presented at bdNOG 19
PDF
ICP -2 Review – What It Is, and How to Participate and Provide Your Feedback
PDF
APNIC Update - Global Synergy among the RIRs: Connecting the Regions
PDF
Measuring Starlink Protocol Performance, presented at LACNIC 43
APNIC Report, presented at APAN 60 by Thy Boskovic
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
RPKI Status Update, presented by Makito Lay at IDNOG 10
The Internet -By the Numbers, Sri Lanka Edition
Triggering QUIC, presented by Geoff Huston at IETF 123
DNSSEC Made Easy, presented at PHNOG 2025
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC's Role in the Pacific Islands, presented at Pacific IGF 2205
IPv6 Deployment and Best Practices, presented by Makito Lay
Cleaning up your RPKI invalids, presented at PacNOG 35
The Internet - By the numbers, presented at npNOG 11
Transmission Control Protocol (TCP) and Starlink
DDoS in India, presented at INNOG 8 by Dave Phelan
Global Networking Trends, presented at the India ISP Conclave 2025
Make DDoS expensive for the threat actors
Fast Reroute in SR-MPLS, presented at bdNOG 19
DDos Mitigation Strategie, presented at bdNOG 19
ICP -2 Review – What It Is, and How to Participate and Provide Your Feedback
APNIC Update - Global Synergy among the RIRs: Connecting the Regions
Measuring Starlink Protocol Performance, presented at LACNIC 43

Recently uploaded (20)

PPTX
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PPTX
E -tech empowerment technologies PowerPoint
PDF
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
PDF
Unit-1 introduction to cyber security discuss about how to secure a system
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PPTX
PptxGenJS_Demo_Chart_20250317130215833.pptx
PDF
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
PPTX
Job_Card_System_Styled_lorem_ipsum_.pptx
PPTX
presentation_pfe-universite-molay-seltan.pptx
PPTX
Digital Literacy And Online Safety on internet
PDF
Cloud-Scale Log Monitoring _ Datadog.pdf
PDF
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
PPTX
522797556-Unit-2-Temperature-measurement-1-1.pptx
PDF
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
DOCX
Unit-3 cyber security network security of internet system
PDF
An introduction to the IFRS (ISSB) Stndards.pdf
PPTX
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
PDF
The New Creative Director: How AI Tools for Social Media Content Creation Are...
PPTX
international classification of diseases ICD-10 review PPT.pptx
CHE NAA, , b,mn,mblblblbljb jb jlb ,j , ,C PPT.pptx
SASE Traffic Flow - ZTNA Connector-1.pdf
E -tech empowerment technologies PowerPoint
💰 𝐔𝐊𝐓𝐈 𝐊𝐄𝐌𝐄𝐍𝐀𝐍𝐆𝐀𝐍 𝐊𝐈𝐏𝐄𝐑𝟒𝐃 𝐇𝐀𝐑𝐈 𝐈𝐍𝐈 𝟐𝟎𝟐𝟓 💰
Unit-1 introduction to cyber security discuss about how to secure a system
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PptxGenJS_Demo_Chart_20250317130215833.pptx
Best Practices for Testing and Debugging Shopify Third-Party API Integrations...
Job_Card_System_Styled_lorem_ipsum_.pptx
presentation_pfe-universite-molay-seltan.pptx
Digital Literacy And Online Safety on internet
Cloud-Scale Log Monitoring _ Datadog.pdf
Automated vs Manual WooCommerce to Shopify Migration_ Pros & Cons.pdf
522797556-Unit-2-Temperature-measurement-1-1.pptx
FINAL CALL-6th International Conference on Networks & IOT (NeTIOT 2025)
Unit-3 cyber security network security of internet system
An introduction to the IFRS (ISSB) Stndards.pdf
Introduction about ICD -10 and ICD11 on 5.8.25.pptx
The New Creative Director: How AI Tools for Social Media Content Creation Are...
international classification of diseases ICD-10 review PPT.pptx

Routing In Fat Trees

  • 1. RIFT: Routing In Fat Trees A New DC Routing Protocol Antoni Przygienda & Zhaohui (Jeffrey) Zhang Juniper Distinguished Engineer
  • 2. Background: DC Technologies Evolution • Tree to CLOS topology • Tree: core/aggregation/access layers • Folded CLOS, or Fat Tree • Spine & Leaf • Layer 2 switching to Layer 3 routing • Layer 3 routing underlay with Layer 2/3 overlay • Layer 3 underlay routing: IGP  eBGP  RIFT • For scaling, convergence and Opex considerations Source: Arial 12pt.
  • 3. Issues with LSR IGP • Link State Routing protocols like OSPF/ISIS have the following issues when used in large scale DCs • Failure Impact Scope (aka Blast Radius) • A small change (e.g. a single link up/down on a leaf) is flooded throughout of an IGP area, triggering SPF recalculation on every node • Rich connections make flooding unnecessarily redundant and inefficient • Every node holds host routes of all servers (aggregation limits prefix mobility and leads to blackholes on failures) • As a result, eBGP was introduced as a DC underlay routing technology • RFC 7938
  • 4. Issues with eBGP Solution • The eBGP solution has the following prominent issues • Cannot take advantages of well defined network topology • E.g. ideally a leaf (tier-3) node only needs a default route, and a tier-2 node only needs a default route and routes for destinations south of it • This cannot be done due to black-holing upon link failure • A node needs to keep all paths learnt from different neighbors • A leaf node connecting to 32 tier-2 nodes needs to keep 32 paths for each of all the prefixes in the DC • eBGP peering configuration burden • There are other subtle issues making the eBGP band-aid solution not that simple • See appendix slide
  • 5. RIFT: LINK-STATE UP, DISTANCE VECTOR DOWN & BOUNCE RIFT 2017, Juniper Confidential
  • 6. Northbound LSR • Link State flooded northbound to the top tier • With flooding reduction • Each node has full view of the southbound topology • A top tier node has full set of prefixes from the SPF calculation • A middle tier node has only information necessary for its level • All destinations south of the node, from its SPF calculation • Default route (next slide) • Potential disaggregated routes (next slide) • Fast convergence and ECMP benefits of LSR
  • 7. Southbound Distance Vector Routing • Default route and automatically disaggregated routes (when needed) advertised one-hop southbound • When a level-2 node A detects that another level-2 node B cannot reach one of A’s south destinations P, it advertises P via southbound DVR • That way a south level-3 node will route P traffic only towards A (via the more specific route) not towards B (via the default route) • A node’s local link state is advertised one-hop southbound and then reflected one-hop northbound • So that node A can detect if node B can reach A’s south destinations • Other than that, link state is not propagated south, greatly reducing impact scope
  • 8. AUTOMATIC DE-AGGREGATION • SOUTH REPRESENTATION OF THE RED SPINE IS REFLECTED BY THE GREEN LAYER • LOWER RED SPINE SEES THAT UPPER NODE HAS NO ADJACENCY TO THE ONLY AVAILABLE NEXT-HOP TO P1 • LOWER RED NODE DISAGGREGATES P1 RIFT 2017, Juniper Confidential
  • 9. Zero Touch Provisioning • Only top tier nodes need to be configured • Nodes that must be leaves or have leaf-leaf connection may be configured • Nodes with specific configuration can be mixed with others • Upon connection nodes will fully auto-configure themselves and form adjacencies in a well defined north/south topology • With optional east-west connections • ZTP makes DC fabric like RAM banks • No one configures RAM banks and CAS/RAS manually in a laptop • DC fabric HW is largely commodity already • DC fabric OPEX must and will commoditize • RIFT enables that
  • 10. Other Features of RIFT • Optimal Reduction and Load-Balancing of Flooding • Mobility Support • Built-in support for rapid prefix moving from one leaf to another • Key/Value Store • Fabric Bandwidth Balancing: weighted all paths routing (RIFT is loop-free) • Northbound: modify the distance of default route received from a neighbor based on available BW through that neighbor • Southbound: during SPF consider available BW through lower level nodes • Segment Routing Support • Leaf-to-leaf Procedures • Allow E-W traffic strictly for local prefixes • Policy Guided Prefixes • Moved to a separate draft
  • 11. Summary of RIFT Advantages • Advantages of Link-State and Distance Vector • Fastest Possible Convergence • Automatic Detection of Topology • Minimal Routes/Info on TORs • High Degree of ECMP • Fast De-comissioning of Nodes • Maximum Propagation Speed with Flexible # Prefixes in an Update • No Disadvantages of Link-State or Distance Vector • Reduced and Balanced Flooding • Automatic Neighbor Detection • Unique RIFT Advantages • True ZTP • Minimal Blast Radius on Failures • Can Utilize All Paths Through Fabric Without Looping • Automatic Disaggregation on Failures • Simple Leaf Implementation that Can Scale Down to Servers • Key-Value Store • Horizontal Links Used for Protection Only • Supports Non-Equal Cost Multipath and Can Replace MC-LAG • Optimal Flooding Reduction and Load- Balancing
  • 12. Standardization/Industry Status • IETF Standard Track Working Group • https://datatracker.ietf.org/wg/rift/about/ • Juniper co-chair • Base protocol standard targeted at Feb 2019 • YANG and other things forthcoming • Juniper main author with Cisco/Comcast co-authors • Other design team members from Bloomberg, HPE, Mellanox and open source community • Two independent implementations • Juniper: available on-box (JUNOS) or standalone package (for public evaluation) • Open Source: by Bruno Rijsman (on-going)
  • 14. eBGP Solution Nuances • ASN related knobs/tricks • Numbering scheme to control path-hunting • Allow-own-as knob to reuse private ASNs • “Relaxed ECMP Multipath” since ECMP over different AS does not normally work with eBGP • Additional work to get beyond 65K ASNs • To see fabric topology BGP-LS must be run on top • Add-path to support multi-homing, N-ECMP and prevent oscillation • Vendor-specific provisioning & configuration • Emerging work on peer auto-discovery diametrically opposite to BGP design principals • “Violations” of BGP FSM (e.g. restart and MRAI timers) • Reliance on peer-groups to prevent withdraw dispersion and path hunting upon server link failures • Less than successful attempts at prefix summary without black-holing/micro-loop
  • 15. REQUIREMENTS BREAKDOWN (RFC7938+) FOR A “MINIMAL” OPEX FABRIC” Peer Discovery/True ZTP/Preventing Cabling Violations ⚠️ ⚠️ Minimal Amount of Routes/Information on ToRs High Degree of ECMP (BGP needs lots knobs, memory, own-AS-path violations) and ideally NEC and LFA ⚠️ Non Equal Cost Multi-Path, ECMP Independent Anycast, MC-LAG Replacement Traffic Engineering by Next-Hops, Prefix Modifications See All Links in Topology to Support PCE/SR ⚠️ Carry Opaque Configuration Data (Key-Value) Efficiently ⚠️ Take a Node out of Production Quickly and Without Disruption Automatic Disaggregation on Failures to Prevent Black-Holing and Back-Hauling Minimal Blast Radius on Failures (On Failure Smallest Possible Part of the Network “Shakes”) Fastest Possible Convergence on Failures Bandwidth Load Balancing Simplest Initial Implementation RIFT 2017, Juniper Confidential