SlideShare a Scribd company logo
1© 2018 Mellanox Technologies
Opennebula Techday, Frankfurt
Sept. 26 2018 | Arne Heitmann, Staff System Engineer, EMEA
Considerations on Smart
Cloud Implementations
2© 2018 Mellanox Technologies
Agenda
 Introduction Mellanox
 Faster storage and networks
 Linbit architecture with RDMA/RoCE
 RDMA/RoCE
 ASAP2 (OVS Offloading)
 Overlay Networking
 EVPN/VXLAN routing
 Conclusion
3© 2018 Mellanox Technologies
Mellanox Overview
 Ticker: MLNX
 Worldwide Offices
 Company Headquarters
• Yokneam, Israel
• Sunnyvale, California
 Employees worldwide
• ~ 2,900
Ticker: MLNX
4© 2018 Mellanox Technologies
Comprehensive End-to-End Ethernet Product Portfolio
NICs
Cables
NICs
Optics
Switches
Automation &
Monitoring
Management
software
5© 2018 Mellanox Technologies
Unique Engine in Mellanox Ethernet Switch
 Mellanox switches are powered by
Mellanox superior ASIC
 Wire speed, cut through switching at any
packet size
 Zero Jitter
 Low power
 10GbE to 100GbE
 DAC Passive Copper for
10/25/40/50/100GbE
 vs. active copper is more expensive, less
reliable and suffers from interoperability
issues
 Active Fiber for 10/25/40/50/100GbE
6© 2018 Mellanox Technologies
New Storage Media Require Faster Networks
 Transition to faster storage media requires
faster networks
 Flash SSDs move the bottleneck from the
storage to the network
 What does it take to saturate one 10Gb/s link?
• 24 x HDDs
• 2 x SATA SSDs
• 1 x SAS SSD
• NVMe…
7© 2018 Mellanox Technologies
DRBD and RDMA – Architectural Advantage
8© 2018 Mellanox Technologies
Excursion - RoCE - RDMA over Converged Ethernet
 Remote Direct Memory Access (RDMA) is a technology that enables data to be read from and written to a remote server
without involving either one’s CPU, which results in:
 Reduced latency
 Increased throughput
 The CPU freed up to perform other tasks
 Twice the bandwidth with less than
half the CPU utilization
 RoCE needs a lossless network!
9© 2018 Mellanox Technologies
RoCE Done Right!
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
Pausetime(Microseconds)
Time/Seconds
Application Blocked by the Switch
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
Pausetime(Microseconds)
Time/Seconds
Application Blocked by the Switch
Application runs Smoothly
Other
Switches
10© 2018 Mellanox Technologies
Best Congestion Management For RoCE
 Configuration
 4 hosts connected to 1 switch in a star topology
 ECN enabled, PFC enabled
 3 sources to 1 common destination
 Results
 Other ASIC sends pauses to hosts, no pauses sent by Spectrum
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233
Pausetime(Microseconds)
Time/Seconds
Spectrum switch
No Pauses
0
100000
200000
300000
400000
500000
600000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37
Pausetime(Microseconds)
Time/Seconds
Other ASIC based switch
Up to 21% Pause time
VS
11© 2018 Mellanox Technologies
Better RoCE with Fast Congestion Notification
marks packets entering queue
marks packets exiting queue
 Fast Congestion Notification
• Packets marked as they leave queue
• Up to 10ms faster alerts
• Servers react faster
• Reduces average queue depth
- Lowers real world latency
• Improves application performance
 Legacy Congestion Notification:
• Packets marked as they enter queue
• Notification delayed until queue empties
10 Gigabit Ethernet
10 Gigabit Ethernet
Delay inside switch is equivalent to placing server hundreds of miles away
12© 2018 Mellanox Technologies
Predictable QoS with RoCE
 Configuration
 7 hosts connected to 1 switch in a star topology
 ECN enabled, PFC enabled
 6 sources to 1 common destination
 Results
 Non-equal bandwidth distribution on other ASICs
-1500000
500000
2500000
4500000
6500000
8500000
10500000
12500000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
KBytes/second
Time/Seconds
Other ASIC Based Switch
-1500000
500000
2500000
4500000
6500000
8500000
10500000
12500000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39
KBytes/second
Time/Seconds
Spectrum Switch
One source host gets 50%
of the total bandwidth
5 others get only 10% each
Each host gets 16.66% of
the total bandwidthVS
13© 2018 Mellanox Technologies
RoCE runs best in Lossless Networks
 Could be complex configuration
 2 modes
 Enhanced mode for experts
 User mode for easy configuration meeting 98% of implementation
14© 2018 Mellanox Technologies
NEO Simplifies RoCE Provisioning
 Automated setup of RoCE across entire fabric
 Mellanox switches
 Mellanox NICs
 Ideal for End-to-End Mellanox deployments
 No manual configuration needed
15© 2018 Mellanox Technologies
Para-Virtualized SR-IOV
Single Root I/O Virtualization (SR-IOV)
 PCIe device presents multiple
instances to the OS/Hypervisor
 Enables Application Direct Access
 Bare metal performance for VM
 Reduces CPU overhead
 Enables many advanced NIC
features (e.g. DPDK, RDMA, ASAP2)
NIC
Hypervisor
vSwitch
VM VM
SR-IOV NIC
Hypervisor VM VM
eSwitch
Physical Function
(PF)
Virtual Function
(VF)
16© 2018 Mellanox Technologies
Introduction to OVS (Open vSwitch)
 Software component that typically provides switching between Virtual Machines
 Target application: Multi-server virtualization deployments
 OVS is an open project. Code and materials at http://openvswitch.org/
 OVS Main Functionality
• Bridge traffic between Virtual-Machines (VMs) on the same Host
• Bridge traffic between VMs and the outside world
• Migration of VMs with all of their associated configuration:
- L2 learning table, L3 forwarding state, ACLs, QoS, Policy and more
• OpenFlow support
• VM tagging and manipulation
• Flow-based switching
• Support for tunneling: VXLAN, GRE and more
 OVS works on KVM, XenServer, OpenStack
17© 2018 Mellanox Technologies
Open Virtual Switch (OVS) Challenges
 Virtual switches such as Open vSwitch (OVS) are used as the
forwarding plane in the hypervisor
 Virtual switches implement extensive support for SDN (e.g.
enforce policies) and are widely used by the industry
 Supports L2-L3 networking features:
 L2 & L3 Forwarding, NAT, ACL, Connection Tracking etc.
 Flow based
 OVS Challenges:
 Awful Packet Performance: <1M w/ 2-4 cores,
 Burns CPU like Hell : Even w/ 12 cores, can’t get 1/3rd 100G NIC Speed
 Bad User Experience: High and unpredictable latency, packet drops
 Solution
 Offload OVS data plane into Mellanox NIC using ASAP2 technology
18© 2018 Mellanox Technologies
• Enable SR-IOV data path with OVS control plane
• Use Open vSwitch to be the management interface and offload OVS data-plane to
Mellanox embedded Switch (eSwitch) using ASAP2 Direct
ASAP2 SRIOV - Example
19© 2018 Mellanox Technologies
Virtualized Datapath Options Today
19
VF0 VF1
SR-IOV
(Single-Root IO Virtualization)
Accelerated vSwitch
(Open vSwitch over DPDK)
Hardware Dependent to the NIC
line rate, no CPU overhead
ToR for switching
DPDK - Direct IO to NIC or vNIC
switching, bonding, overlay
Kernel
space
User
space
Legacy vSwitch
(kernel datapath)
Default for Openstack
switching, bonding, overlay, live
migration
20© 2018 Mellanox Technologies
OVS over DPDK VS. OVS Offload – ConnectX-5
ConnectX-5 provides significant performance boost
 Without adding CPU resources
7.6
MPPS
66
MPPS
4 Cores
0 Cores
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0
10
20
30
40
50
60
70
OVS over DPDK OVS Offload
NumberofDedicatedCores
MillionPacketPerSecond
Message Rate Dedicated Hypervisor Cores
Test ASAP2
Direct
OVS DPDK Benefit
1 Flow
VXLAN
66M PPS 7.6M PPS
(VLAN)
8.6X
60K flows
VXLAN
19.8M PPS 1.9M PPS 10.4X
21© 2018 Mellanox Technologies
ASAP2 – Facts (current)
 Offloads:
 Match 12 Tuple and Forward to VM/Network or Drop
 Ethernet Layer 2
 IP (v4 /v6)
 TCP /UDP
 Action:
 Forwarding
 Drop/Allow
 VxLAN Encap/Decap
 VLAN Push/Pop
 ConnectX-4 Lx: Per port
 ConnectX-5: Per flow
 Header Re-write (ConnectX-5): Up to and including Layer 4
 VF mirroring: Mirroring traffic from one VF to another in the same eSwitch
 VF LAG with LACP: Active/Backup and Bonding (50Gb/s from 2 ports of 25GbE)
 Supported OS (as of today):
 RHEL 7.4
 RHEL 7.5
 CentOS 7.2
 Packages required:
 MLNX_OFED 4.4
 Iproute 4.12 and up
 Openvswitch 2.8 and up , for CentOS 7.2 from Mellanox.com
22© 2018 Mellanox Technologies
Overlay Networks
 Traditional VLANs based networks
 Layer 2 segmentation
 VLANs scalability is 4K
 No support for VM Mobility
 Overlay networks with VXLAN
 Layer 2 over Layer 3 segmentation
 VXLAN scalability is 16M
 Support for VM Mobility
 Multi tenant isolation
 Overlay networks run as independent virtual networks on top of a physical network infrastructure
23© 2018 Mellanox Technologies
VXLAN Overview
 VXLAN - Virtual eXtensible Local Area Network:
 A standard overlay protocol that enables multiple layer 2
logical networks over a single underlay layer 3 network
 Each virtual network is a VXLAN logical layer 2 segment
 Encapsulates MAC-based layer 2 Ethernet frames in layer 3
UDP/IP packets
 Uses a 24-bit VXLAN network identifier (VNI) in the VXLAN
header, hence scales to 16 million layer 2 segments
24© 2018 Mellanox Technologies
VTEP - VXLAN Tunnel End Point
 VTEP on the host
 VXLAN agents run on hosts hypervisor
 Encapsulation/de-ecapsulation in software
 Degraded performance
 Mellanox network adapters support VXLAN offloads,
encapsulation/de-ecapsulation can be offloaded to
the NIC
 VTEP on the ToR
 VXLAN agents run on ToRs
 Encapsulation/de-ecapsulation in switch hardware
 Efficient performance
 Cumulus Linux that runs on Mellanox switches
supports VTEP on the switch
25© 2018 Mellanox Technologies
VTEP on the Host - Accelerating Overlay Networks
 Virtual Overlay Networks simplifies management and VM migration
 Overlay Accelerators in NIC enable Bare Metal Performance
26© 2018 Mellanox Technologies
VTEP on the ToR
 VTEP on the ToR enables scaleability and flexibility
 Multitenancy / integration of legacy sevices
27© 2018 Mellanox Technologies
Why BGP-EVPN + VXLAN ?
 BGP-EVPN is an open controller-less solution
 Controllers are centralized and limit the scale of the solution
 Controller lock-in the customers into certain technologies and increase costs
 BGP-EVPN with VXLAN is a better alternative to legacy VPLS/VLL
28© 2018 Mellanox Technologies
VXLAN Bridging with EVPN
 Ethernet Virtual Private Network (EVPN)
 Often used to implement controller-less VXLAN
 Standard-based control plane for VXLAN defined in RFC 7432
 Relies on multi-protocol BGP (MP-BGP) for information
exchange
 Key features include:
 VNI membership exchange between VTEPs
 Exchange of host MAC and IP addresses
 Support for host/VM mobility (MAC and IP moves)
 Support for inter-VXLAN routing
 Support for layer 3 multi-tenancy with VRFs
 Support for dual-attached hosts via VXLAN active-active
mode.
29© 2018 Mellanox Technologies
VXLAN Routing Modes
 EVPN supports three VXLAN routing modes:
 Centralized routing:
 Specific VTEPs act as designated Layer 3 gateways and route between subnets
 Other VTEPs just act as bridges
 Distributed asymmetric routing:
 Every VTEP participates in routing
 Ingress VTEP only participates in routing; The egress VTEPs only acts as bridges
 Distributed symmetric routing:
 Every VTEP participates in routing
 Both the ingress VTEP and the egress VTEP participate in routing
30© 2018 Mellanox Technologies
Distributed VXLAN Routing
 Distributed Asymmetric Routing:
 each VTEP acts as a layer 3 gateway, performing routing for its
attached hosts
 Only the ingress VTEP performs routing, the egress VTEP only
performs the bridging
 Advantage:
 Easy to deploy and no additional special VNIs
 Less routing hops occur to communicate between VXLANs
 Disadvantage:
 Each VTEP must be provisioned with all VLANs/VNIs
 Distributed Symmetric Routing
 each VTEP acts as a layer 3 gateway, performing routing for its
attached hosts
 Both the ingress VTEP and egress VTEP route the packets
 A new specialty transit VNI is used for all routed VXLAN traffic,
called the L3VN
 Advantage:
 Each VTEP only needs local VLANs, local VNIs and L3VNI with
associated VLAN
 Disadvantage:
 More complex configuration
 Extra routing hop that might cause extra latency
31© 2018 Mellanox Technologies
Conclusion
 Cloud Infrastructures with hypervirtualized topologies, storage and compute
 Provide flexibility at any scale
 Require Intelligent use of protocol and feature tool sets
 Fast, distributed storage requires higher bandwidth and efficient CPU management
 RoCE done right accelerates the storage performance
 VMs require internal and external communication over a virtually switched infrastructure
 ASAP2 helps taking the load from the OVS and thus the CPU while optimizing the communication path
 Highly virtualized environments need to extend L2 segregation above VLAN limits and accross L3
boundaries
 Overlay networking with VXLAN adds scale
 VXLAN with EVPN adds flexibility and manageability
 VXLAN routing adds agility
32© 2018 Mellanox Technologies
Thank You
33© 2018 Mellanox Technologies
For Further Reading
Addendum
 RoCE/RDMA:
 Mellanox RoCE Homepage
 Boosting Performance With RDMA – A Case Study
 What is RDMA?
 RDMA/RoCE Solutions
 Recommended Network Configuration Examples for RoCE Deployment
 ASAP2:
 Mellanox ASAP2 Homepage
 Getting started with Mellanox ASAP^2
 The Ideal Network for Containers and NFV Microservices
 Overlay Networking/VXLAN/EVPN
 EVPN with Mellanox Switches
 Top 3 considerations for picking your BGP EVPN VXLAN infrastructure
 VXLAN is finally simple, use EVPN and set up VXLAN in 3 steps
 Mellanox Ethernet Solutions
 Mellanox Open Ethernet Switches
 Mellanox Open Ethernet Switches Product Brief

More Related Content

PPTX
Vxlan deep dive session rev0.5 final
PDF
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
PDF
Network Programming: Data Plane Development Kit (DPDK)
PPTX
OpenvSwitch Deep Dive
PDF
VXLAN BGP EVPN: Technology Building Blocks
PPTX
DPDK KNI interface
PPTX
Introduction to DPDK
PDF
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...
Vxlan deep dive session rev0.5 final
Advanced Networking: The Critical Path for HPC, Cloud, Machine Learning and more
Network Programming: Data Plane Development Kit (DPDK)
OpenvSwitch Deep Dive
VXLAN BGP EVPN: Technology Building Blocks
DPDK KNI interface
Introduction to DPDK
Meshing OpenStack and Bare Metal Networks with EVPN - David Iles, Mellanox Te...

What's hot (20)

PPTX
Differences of the Cisco Operating Systems
PDF
IOS Cisco - Cheat sheets
PPTX
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
PDF
Cumulus networks conversion guide
PPTX
Dpdk applications
PDF
DPDK: Multi Architecture High Performance Packet Processing
PDF
Ccnp presentation [Day 1-3] Class
PDF
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
PPTX
Cisco Security portfolio update
PDF
Building DataCenter networks with VXLAN BGP-EVPN
PDF
VPNaaS in Neutron
PPTX
5 configuring TCP/IP
PPT
Red Hat Ansible 적용 사례
PDF
VLANs in the Linux Kernel
PPTX
SDN Architecture & Ecosystem
PDF
Mobile Transport Evolution with Unified MPLS
PDF
Segment Routing Technology Deep Dive and Advanced Use Cases
PPSX
FD.IO Vector Packet Processing
PPTX
NFV foundation/NFV For Dummies
PDF
Fun with Network Interfaces
Differences of the Cisco Operating Systems
IOS Cisco - Cheat sheets
Pushing Packets - How do the ML2 Mechanism Drivers Stack Up
Cumulus networks conversion guide
Dpdk applications
DPDK: Multi Architecture High Performance Packet Processing
Ccnp presentation [Day 1-3] Class
Deploying CloudStack and Ceph with flexible VXLAN and BGP networking
Cisco Security portfolio update
Building DataCenter networks with VXLAN BGP-EVPN
VPNaaS in Neutron
5 configuring TCP/IP
Red Hat Ansible 적용 사례
VLANs in the Linux Kernel
SDN Architecture & Ecosystem
Mobile Transport Evolution with Unified MPLS
Segment Routing Technology Deep Dive and Advanced Use Cases
FD.IO Vector Packet Processing
NFV foundation/NFV For Dummies
Fun with Network Interfaces
Ad

Similar to OpenNebula - Mellanox Considerations for Smart Cloud (20)

PPTX
22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...
PPTX
Mellanox Approach to NFV & SDN
PDF
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
PDF
Open coud networking at full speed - Avi Alkobi
PDF
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
PDF
Mellanox Announcements at SC15
PDF
Open Ethernet: an open-source approach to modern network design
PDF
vPC techonology for full ha from dc core to baremetel server.
PPSX
From virtual to high end HW routing for the adult
PPT
Современные сетевые аспекты, которые нужно учитывать при построении ЦОД. Кон...
PDF
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
PDF
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
PDF
OptiQNet-842-DM-v0.4-for-852
PDF
Accelerating 5G enterprise networks with edge computing and latency assurance
PDF
100G Networking Berlin.pdf
PPTX
Lattice roadmap and DC-SCM for HPC at OCP.pptx
PDF
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
PDF
6th SDN Interest Group Seminar - Session1 (131210)
PPTX
MidoNet Overview - OpenStack and SDN integration
PDF
Advancing Applications Performance With InfiniBand
22 - IDNOG03 - Christopher Lim (Mellanox) - Efficient Virtual Network for Ser...
Mellanox Approach to NFV & SDN
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
Open coud networking at full speed - Avi Alkobi
OVNC 2015-Open Ethernet과 SDN을 통한 Mellanox의 차세대 네트워크 혁신 방안
Mellanox Announcements at SC15
Open Ethernet: an open-source approach to modern network design
vPC techonology for full ha from dc core to baremetel server.
From virtual to high end HW routing for the adult
Современные сетевые аспекты, которые нужно учитывать при построении ЦОД. Кон...
Mellanox for OpenStack - OpenStack最新情報セミナー 2014年10月
Announcing the Mellanox ConnectX-5 100G InfiniBand Adapter
OptiQNet-842-DM-v0.4-for-852
Accelerating 5G enterprise networks with edge computing and latency assurance
100G Networking Berlin.pdf
Lattice roadmap and DC-SCM for HPC at OCP.pptx
OpenStack Networks the Web-Scale Way - Scott Laffer, Cumulus Networks
6th SDN Interest Group Seminar - Session1 (131210)
MidoNet Overview - OpenStack and SDN integration
Advancing Applications Performance With InfiniBand
Ad

More from OpenNebula Project (20)

PDF
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
PDF
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
PDF
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
PDF
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
PDF
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
PDF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
PDF
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
PDF
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
PDF
Replacing vCloud with OpenNebula
PDF
NTS: What We Do With OpenNebula - and Why We Do It
PDF
OpenNebula from the Perspective of an ISP
PDF
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
PDF
Performant and Resilient Storage: The Open Source & Linux Way
PDF
NetApp Hybrid Cloud with OpenNebula
PPTX
NSX with OpenNebula - upcoming 5.10
PDF
Security for Private Cloud Environments
PDF
CheckPoint R80.30 Installation on OpenNebula
PDF
DE-CIX: CloudConnectivity
PDF
PDF
Cloud Disaggregation with OpenNebula
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
Replacing vCloud with OpenNebula
NTS: What We Do With OpenNebula - and Why We Do It
OpenNebula from the Perspective of an ISP
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
Performant and Resilient Storage: The Open Source & Linux Way
NetApp Hybrid Cloud with OpenNebula
NSX with OpenNebula - upcoming 5.10
Security for Private Cloud Environments
CheckPoint R80.30 Installation on OpenNebula
DE-CIX: CloudConnectivity
Cloud Disaggregation with OpenNebula

Recently uploaded (20)

PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPT
Introduction Database Management System for Course Database
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
Nekopoi APK 2025 free lastest update
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
System and Network Administraation Chapter 3
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PDF
Digital Strategies for Manufacturing Companies
PDF
AI in Product Development-omnex systems
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
Introduction to Artificial Intelligence
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
medical staffing services at VALiNTRY
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PDF
Odoo Companies in India – Driving Business Transformation.pdf
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Odoo POS Development Services by CandidRoot Solutions
Understanding Forklifts - TECH EHS Solution
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
How Creative Agencies Leverage Project Management Software.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Introduction Database Management System for Course Database
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Nekopoi APK 2025 free lastest update
Design an Analysis of Algorithms II-SECS-1021-03
System and Network Administraation Chapter 3
2025 Textile ERP Trends: SAP, Odoo & Oracle
Digital Strategies for Manufacturing Companies
AI in Product Development-omnex systems
VVF-Customer-Presentation2025-Ver1.9.pptx
Introduction to Artificial Intelligence
How to Choose the Right IT Partner for Your Business in Malaysia
medical staffing services at VALiNTRY
ManageIQ - Sprint 268 Review - Slide Deck
Odoo Companies in India – Driving Business Transformation.pdf

OpenNebula - Mellanox Considerations for Smart Cloud

  • 1. 1© 2018 Mellanox Technologies Opennebula Techday, Frankfurt Sept. 26 2018 | Arne Heitmann, Staff System Engineer, EMEA Considerations on Smart Cloud Implementations
  • 2. 2© 2018 Mellanox Technologies Agenda  Introduction Mellanox  Faster storage and networks  Linbit architecture with RDMA/RoCE  RDMA/RoCE  ASAP2 (OVS Offloading)  Overlay Networking  EVPN/VXLAN routing  Conclusion
  • 3. 3© 2018 Mellanox Technologies Mellanox Overview  Ticker: MLNX  Worldwide Offices  Company Headquarters • Yokneam, Israel • Sunnyvale, California  Employees worldwide • ~ 2,900 Ticker: MLNX
  • 4. 4© 2018 Mellanox Technologies Comprehensive End-to-End Ethernet Product Portfolio NICs Cables NICs Optics Switches Automation & Monitoring Management software
  • 5. 5© 2018 Mellanox Technologies Unique Engine in Mellanox Ethernet Switch  Mellanox switches are powered by Mellanox superior ASIC  Wire speed, cut through switching at any packet size  Zero Jitter  Low power  10GbE to 100GbE  DAC Passive Copper for 10/25/40/50/100GbE  vs. active copper is more expensive, less reliable and suffers from interoperability issues  Active Fiber for 10/25/40/50/100GbE
  • 6. 6© 2018 Mellanox Technologies New Storage Media Require Faster Networks  Transition to faster storage media requires faster networks  Flash SSDs move the bottleneck from the storage to the network  What does it take to saturate one 10Gb/s link? • 24 x HDDs • 2 x SATA SSDs • 1 x SAS SSD • NVMe…
  • 7. 7© 2018 Mellanox Technologies DRBD and RDMA – Architectural Advantage
  • 8. 8© 2018 Mellanox Technologies Excursion - RoCE - RDMA over Converged Ethernet  Remote Direct Memory Access (RDMA) is a technology that enables data to be read from and written to a remote server without involving either one’s CPU, which results in:  Reduced latency  Increased throughput  The CPU freed up to perform other tasks  Twice the bandwidth with less than half the CPU utilization  RoCE needs a lossless network!
  • 9. 9© 2018 Mellanox Technologies RoCE Done Right! 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 Pausetime(Microseconds) Time/Seconds Application Blocked by the Switch 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 Pausetime(Microseconds) Time/Seconds Application Blocked by the Switch Application runs Smoothly Other Switches
  • 10. 10© 2018 Mellanox Technologies Best Congestion Management For RoCE  Configuration  4 hosts connected to 1 switch in a star topology  ECN enabled, PFC enabled  3 sources to 1 common destination  Results  Other ASIC sends pauses to hosts, no pauses sent by Spectrum 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930313233 Pausetime(Microseconds) Time/Seconds Spectrum switch No Pauses 0 100000 200000 300000 400000 500000 600000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 Pausetime(Microseconds) Time/Seconds Other ASIC based switch Up to 21% Pause time VS
  • 11. 11© 2018 Mellanox Technologies Better RoCE with Fast Congestion Notification marks packets entering queue marks packets exiting queue  Fast Congestion Notification • Packets marked as they leave queue • Up to 10ms faster alerts • Servers react faster • Reduces average queue depth - Lowers real world latency • Improves application performance  Legacy Congestion Notification: • Packets marked as they enter queue • Notification delayed until queue empties 10 Gigabit Ethernet 10 Gigabit Ethernet Delay inside switch is equivalent to placing server hundreds of miles away
  • 12. 12© 2018 Mellanox Technologies Predictable QoS with RoCE  Configuration  7 hosts connected to 1 switch in a star topology  ECN enabled, PFC enabled  6 sources to 1 common destination  Results  Non-equal bandwidth distribution on other ASICs -1500000 500000 2500000 4500000 6500000 8500000 10500000 12500000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 KBytes/second Time/Seconds Other ASIC Based Switch -1500000 500000 2500000 4500000 6500000 8500000 10500000 12500000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 KBytes/second Time/Seconds Spectrum Switch One source host gets 50% of the total bandwidth 5 others get only 10% each Each host gets 16.66% of the total bandwidthVS
  • 13. 13© 2018 Mellanox Technologies RoCE runs best in Lossless Networks  Could be complex configuration  2 modes  Enhanced mode for experts  User mode for easy configuration meeting 98% of implementation
  • 14. 14© 2018 Mellanox Technologies NEO Simplifies RoCE Provisioning  Automated setup of RoCE across entire fabric  Mellanox switches  Mellanox NICs  Ideal for End-to-End Mellanox deployments  No manual configuration needed
  • 15. 15© 2018 Mellanox Technologies Para-Virtualized SR-IOV Single Root I/O Virtualization (SR-IOV)  PCIe device presents multiple instances to the OS/Hypervisor  Enables Application Direct Access  Bare metal performance for VM  Reduces CPU overhead  Enables many advanced NIC features (e.g. DPDK, RDMA, ASAP2) NIC Hypervisor vSwitch VM VM SR-IOV NIC Hypervisor VM VM eSwitch Physical Function (PF) Virtual Function (VF)
  • 16. 16© 2018 Mellanox Technologies Introduction to OVS (Open vSwitch)  Software component that typically provides switching between Virtual Machines  Target application: Multi-server virtualization deployments  OVS is an open project. Code and materials at http://openvswitch.org/  OVS Main Functionality • Bridge traffic between Virtual-Machines (VMs) on the same Host • Bridge traffic between VMs and the outside world • Migration of VMs with all of their associated configuration: - L2 learning table, L3 forwarding state, ACLs, QoS, Policy and more • OpenFlow support • VM tagging and manipulation • Flow-based switching • Support for tunneling: VXLAN, GRE and more  OVS works on KVM, XenServer, OpenStack
  • 17. 17© 2018 Mellanox Technologies Open Virtual Switch (OVS) Challenges  Virtual switches such as Open vSwitch (OVS) are used as the forwarding plane in the hypervisor  Virtual switches implement extensive support for SDN (e.g. enforce policies) and are widely used by the industry  Supports L2-L3 networking features:  L2 & L3 Forwarding, NAT, ACL, Connection Tracking etc.  Flow based  OVS Challenges:  Awful Packet Performance: <1M w/ 2-4 cores,  Burns CPU like Hell : Even w/ 12 cores, can’t get 1/3rd 100G NIC Speed  Bad User Experience: High and unpredictable latency, packet drops  Solution  Offload OVS data plane into Mellanox NIC using ASAP2 technology
  • 18. 18© 2018 Mellanox Technologies • Enable SR-IOV data path with OVS control plane • Use Open vSwitch to be the management interface and offload OVS data-plane to Mellanox embedded Switch (eSwitch) using ASAP2 Direct ASAP2 SRIOV - Example
  • 19. 19© 2018 Mellanox Technologies Virtualized Datapath Options Today 19 VF0 VF1 SR-IOV (Single-Root IO Virtualization) Accelerated vSwitch (Open vSwitch over DPDK) Hardware Dependent to the NIC line rate, no CPU overhead ToR for switching DPDK - Direct IO to NIC or vNIC switching, bonding, overlay Kernel space User space Legacy vSwitch (kernel datapath) Default for Openstack switching, bonding, overlay, live migration
  • 20. 20© 2018 Mellanox Technologies OVS over DPDK VS. OVS Offload – ConnectX-5 ConnectX-5 provides significant performance boost  Without adding CPU resources 7.6 MPPS 66 MPPS 4 Cores 0 Cores 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 0 10 20 30 40 50 60 70 OVS over DPDK OVS Offload NumberofDedicatedCores MillionPacketPerSecond Message Rate Dedicated Hypervisor Cores Test ASAP2 Direct OVS DPDK Benefit 1 Flow VXLAN 66M PPS 7.6M PPS (VLAN) 8.6X 60K flows VXLAN 19.8M PPS 1.9M PPS 10.4X
  • 21. 21© 2018 Mellanox Technologies ASAP2 – Facts (current)  Offloads:  Match 12 Tuple and Forward to VM/Network or Drop  Ethernet Layer 2  IP (v4 /v6)  TCP /UDP  Action:  Forwarding  Drop/Allow  VxLAN Encap/Decap  VLAN Push/Pop  ConnectX-4 Lx: Per port  ConnectX-5: Per flow  Header Re-write (ConnectX-5): Up to and including Layer 4  VF mirroring: Mirroring traffic from one VF to another in the same eSwitch  VF LAG with LACP: Active/Backup and Bonding (50Gb/s from 2 ports of 25GbE)  Supported OS (as of today):  RHEL 7.4  RHEL 7.5  CentOS 7.2  Packages required:  MLNX_OFED 4.4  Iproute 4.12 and up  Openvswitch 2.8 and up , for CentOS 7.2 from Mellanox.com
  • 22. 22© 2018 Mellanox Technologies Overlay Networks  Traditional VLANs based networks  Layer 2 segmentation  VLANs scalability is 4K  No support for VM Mobility  Overlay networks with VXLAN  Layer 2 over Layer 3 segmentation  VXLAN scalability is 16M  Support for VM Mobility  Multi tenant isolation  Overlay networks run as independent virtual networks on top of a physical network infrastructure
  • 23. 23© 2018 Mellanox Technologies VXLAN Overview  VXLAN - Virtual eXtensible Local Area Network:  A standard overlay protocol that enables multiple layer 2 logical networks over a single underlay layer 3 network  Each virtual network is a VXLAN logical layer 2 segment  Encapsulates MAC-based layer 2 Ethernet frames in layer 3 UDP/IP packets  Uses a 24-bit VXLAN network identifier (VNI) in the VXLAN header, hence scales to 16 million layer 2 segments
  • 24. 24© 2018 Mellanox Technologies VTEP - VXLAN Tunnel End Point  VTEP on the host  VXLAN agents run on hosts hypervisor  Encapsulation/de-ecapsulation in software  Degraded performance  Mellanox network adapters support VXLAN offloads, encapsulation/de-ecapsulation can be offloaded to the NIC  VTEP on the ToR  VXLAN agents run on ToRs  Encapsulation/de-ecapsulation in switch hardware  Efficient performance  Cumulus Linux that runs on Mellanox switches supports VTEP on the switch
  • 25. 25© 2018 Mellanox Technologies VTEP on the Host - Accelerating Overlay Networks  Virtual Overlay Networks simplifies management and VM migration  Overlay Accelerators in NIC enable Bare Metal Performance
  • 26. 26© 2018 Mellanox Technologies VTEP on the ToR  VTEP on the ToR enables scaleability and flexibility  Multitenancy / integration of legacy sevices
  • 27. 27© 2018 Mellanox Technologies Why BGP-EVPN + VXLAN ?  BGP-EVPN is an open controller-less solution  Controllers are centralized and limit the scale of the solution  Controller lock-in the customers into certain technologies and increase costs  BGP-EVPN with VXLAN is a better alternative to legacy VPLS/VLL
  • 28. 28© 2018 Mellanox Technologies VXLAN Bridging with EVPN  Ethernet Virtual Private Network (EVPN)  Often used to implement controller-less VXLAN  Standard-based control plane for VXLAN defined in RFC 7432  Relies on multi-protocol BGP (MP-BGP) for information exchange  Key features include:  VNI membership exchange between VTEPs  Exchange of host MAC and IP addresses  Support for host/VM mobility (MAC and IP moves)  Support for inter-VXLAN routing  Support for layer 3 multi-tenancy with VRFs  Support for dual-attached hosts via VXLAN active-active mode.
  • 29. 29© 2018 Mellanox Technologies VXLAN Routing Modes  EVPN supports three VXLAN routing modes:  Centralized routing:  Specific VTEPs act as designated Layer 3 gateways and route between subnets  Other VTEPs just act as bridges  Distributed asymmetric routing:  Every VTEP participates in routing  Ingress VTEP only participates in routing; The egress VTEPs only acts as bridges  Distributed symmetric routing:  Every VTEP participates in routing  Both the ingress VTEP and the egress VTEP participate in routing
  • 30. 30© 2018 Mellanox Technologies Distributed VXLAN Routing  Distributed Asymmetric Routing:  each VTEP acts as a layer 3 gateway, performing routing for its attached hosts  Only the ingress VTEP performs routing, the egress VTEP only performs the bridging  Advantage:  Easy to deploy and no additional special VNIs  Less routing hops occur to communicate between VXLANs  Disadvantage:  Each VTEP must be provisioned with all VLANs/VNIs  Distributed Symmetric Routing  each VTEP acts as a layer 3 gateway, performing routing for its attached hosts  Both the ingress VTEP and egress VTEP route the packets  A new specialty transit VNI is used for all routed VXLAN traffic, called the L3VN  Advantage:  Each VTEP only needs local VLANs, local VNIs and L3VNI with associated VLAN  Disadvantage:  More complex configuration  Extra routing hop that might cause extra latency
  • 31. 31© 2018 Mellanox Technologies Conclusion  Cloud Infrastructures with hypervirtualized topologies, storage and compute  Provide flexibility at any scale  Require Intelligent use of protocol and feature tool sets  Fast, distributed storage requires higher bandwidth and efficient CPU management  RoCE done right accelerates the storage performance  VMs require internal and external communication over a virtually switched infrastructure  ASAP2 helps taking the load from the OVS and thus the CPU while optimizing the communication path  Highly virtualized environments need to extend L2 segregation above VLAN limits and accross L3 boundaries  Overlay networking with VXLAN adds scale  VXLAN with EVPN adds flexibility and manageability  VXLAN routing adds agility
  • 32. 32© 2018 Mellanox Technologies Thank You
  • 33. 33© 2018 Mellanox Technologies For Further Reading Addendum  RoCE/RDMA:  Mellanox RoCE Homepage  Boosting Performance With RDMA – A Case Study  What is RDMA?  RDMA/RoCE Solutions  Recommended Network Configuration Examples for RoCE Deployment  ASAP2:  Mellanox ASAP2 Homepage  Getting started with Mellanox ASAP^2  The Ideal Network for Containers and NFV Microservices  Overlay Networking/VXLAN/EVPN  EVPN with Mellanox Switches  Top 3 considerations for picking your BGP EVPN VXLAN infrastructure  VXLAN is finally simple, use EVPN and set up VXLAN in 3 steps  Mellanox Ethernet Solutions  Mellanox Open Ethernet Switches  Mellanox Open Ethernet Switches Product Brief