SlideShare a Scribd company logo
Switches, Penguins and
One Bad Cable
Alex Balk
Back on AUGUST 13th 2015
ADI NAVEH
ALEX BALK
ALEX KARASIK
CHEN SHABI
DAFNA FRANK
DORI SHMUEL
GAI RADZI
GERARDO LARACUENTE
GUY MAZUZ
RYAN MCQUILLAN
SHAHAF SAGES
YAFIT MELES
We Are
Core
OUR PURPOSE
OUTBRAIN helps people discover
content that they find interesting.
250 Billion
Content Recommendations
Every Month
½ Billion
People Worldwide
OUTBRAIN
BY THE NUMBERS
THE BEGINNING
Building a Layer 3 network with Cumulus Linux
Building a Layer 3 network with Cumulus Linux
Building a Layer 3 network with Cumulus Linux
AVAILABILITYMANAGEMENT
Two Main Networking Challenges
AVAILABILITYMANAGEMENT
4
racks
80
nodes
320
nodes
1
switch
4
racks
80
nodes
320
nodes
1
switch
AVAILABILITYMANAGEMENT
Node
Stack A Stack B
Backbone
NodeNode
SCALE IS ABOUT
DOING MORE WITH LESS
Building a Layer 3 network with Cumulus Linux
Building a Layer 3 network with Cumulus Linux
6 Million
Metrics generated every minute
150 Releases
To production every day
OUTBRAIN
BY THE NUMBERS
SCALE IS ABOUT
TURNING THE LIGHTS ON
Network (gasp!)
was 100% Manual!
• Every change =
risk
• Switching stack
proprietary
• Debugging = fight
or just a hit-n-miss
• Lead time to set-up
new stack
measured in
weeks!
• No way to scale to
the next 10X
June 2017
OUTBRAIN OFFICES
New Data Center = Clos Fabric — running BGP end-to-end
Node
Leaf A Leaf B
Spine
NodeNode
SpineSpine
No bonding.
No backbone.
Everything is just a router!
All possible paths to all possible destinations constructed —
hop-by-hop
Node
Leaf A
Spine
NodeNode
SpineSpine
Leaf B
ECMP = “Send it down any available path, they’re all the same”.
SIMPLE
PREDICTABLE
SCALABLE
A Network That is Now
DEVICE
MANAGEMENT
CABLE
MANAGEMENT
SETUP TIME
MONITORING
TESTING
SCALE IS ABOUT
BUILDING THE RIGHT CULTURE
SCALE IS ABOUT
CHOOSING THE RIGHT TOOLS
To bootstrap the new datacenter5 DAYS
99% Of code worked as expected
1 Bad cable...out of 3,000
END SOLUTION MODULES
DRAFT 051116
Thank
You
Hardcore Tech Stuff
Slides shamelessly “borrowed”
from Adi Naveh’s internal tech talk
Infranet Team
Gai
Adi Yafit
Chen
Traditional Network Topology
Aggregation
Core
Access
Traditional Network Topology
Access
Aggregation
Core
Services
Clients
North-South Traffic
Load Balancers Load Balancers
ISP
Traditional Network Topology in Data Center
Access
Aggregation
Core
Services
North-South Traffic
East-West Traffic

More Related Content

PDF
Containers across Clouds - Docker Randstad, April 17th, 2015
PDF
Operationalizing EVPN in the Data Center: Part 2
PDF
Demystifying EVPN in the data center: Part 1 in 2 episode series
PPTX
Best practices for network troubleshooting
PDF
NetDevOps 202: Life After Configuration
PPTX
Cumulus Networks: Automating Network Configuration
PDF
How deep is your buffer – Demystifying buffers and application performance
PPTX
Demystifying Networking: Data Center Networking Trends 2017
Containers across Clouds - Docker Randstad, April 17th, 2015
Operationalizing EVPN in the Data Center: Part 2
Demystifying EVPN in the data center: Part 1 in 2 episode series
Best practices for network troubleshooting
NetDevOps 202: Life After Configuration
Cumulus Networks: Automating Network Configuration
How deep is your buffer – Demystifying buffers and application performance
Demystifying Networking: Data Center Networking Trends 2017

More from Cumulus Networks (20)

PPTX
Building Scalable Data Center Networks
PPTX
Network Architecture for Containers
PPTX
Webinar: Network Automation [Tips & Tricks]
PPTX
July NYC Open Networking Meeup
PPTX
Demystifying Networking Webinar Series- Routing on the Host
PDF
Ifupdown2: Network Interface Manager
PPTX
Operationalizing VRF in the Data Center
PPTX
Microservices Network Architecture 101
PPTX
Linux networking is Awesome!
PPTX
Webinar-Linux Networking is Awesome
PDF
Webinar- Tea for the Tillerman
PDF
Dreamhost deploying dreamcompute at scale
PDF
Operationalizing BGP in the SDDC
PDF
Manage your switches like servers
PDF
Cumulus Linux 2.5.5 What's New
PDF
Cumulus Linux 2.5.4
PPTX
Cumulus Linux 2.5.3
PDF
Open Networking for Your OpenStack
PDF
Big data, better networks
PDF
Mlag invisibile layer 2 redundancy
Building Scalable Data Center Networks
Network Architecture for Containers
Webinar: Network Automation [Tips & Tricks]
July NYC Open Networking Meeup
Demystifying Networking Webinar Series- Routing on the Host
Ifupdown2: Network Interface Manager
Operationalizing VRF in the Data Center
Microservices Network Architecture 101
Linux networking is Awesome!
Webinar-Linux Networking is Awesome
Webinar- Tea for the Tillerman
Dreamhost deploying dreamcompute at scale
Operationalizing BGP in the SDDC
Manage your switches like servers
Cumulus Linux 2.5.5 What's New
Cumulus Linux 2.5.4
Cumulus Linux 2.5.3
Open Networking for Your OpenStack
Big data, better networks
Mlag invisibile layer 2 redundancy
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
Spectroscopy.pptx food analysis technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation theory and applications.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Approach and Philosophy of On baking technology
NewMind AI Weekly Chronicles - August'25 Week I
Spectroscopy.pptx food analysis technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Encapsulation_ Review paper, used for researhc scholars
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Chapter 3 Spatial Domain Image Processing.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation theory and applications.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
Unlocking AI with Model Context Protocol (MCP)
Building Integrated photovoltaic BIPV_UPV.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Approach and Philosophy of On baking technology
Ad

Building a Layer 3 network with Cumulus Linux

Editor's Notes

  • #19: Publishers from all around the world
  • #31: In fact, when we designed the new datacenter, We wrote Chef cookbooks to automate provisioning and config We wrote unit and integration tests using Chef’s toolchain. And setup a CI pipeline for the code. We even simulated the entire datacenter, switches, servers and all, Using Vagrant. It worked so well, that bootstrapping the new datacenter took us just 5 days. Think about it. The first time we ever saw a real Dell switch running Linux, was when we arrived onsite for the buildout. And yet, 99% of our code worked as expected. In 5 days, we were able to setup a LAN, VPN, server provisioning, DNS, LDAP, and dealt with some quirky BIOS configs. On the servers, mind you, not the switches. We even hooked Cumulus’ built-in cabling validation, To our Prometheus based monitoring system. So that right after we turned monitoring on, we got an alert. On one bad cable. Out of 3000.