SlideShare a Scribd company logo
Ethernet Services
over IPoIB
Ali Ayoub, Mellanox Technologies
March, 2012

www.openfabrics.org

1
Background
• What is IPoIB?
– IP encapsulation over InfiniBand
– RFC 4391/4392
– Provides IP services over InfiniBand fabric

• Benefits:
– Acts like an data-link within the TCP/IP Stack
• Socket-based apps run transparently

– Allow users to run IP-based unmodified applications
on InfiniBand
– Supports NIC offloads: CSUM, TSO, etc.. And other
performance features such as: NAPI, TSS, RSS, etc..
www.openfabrics.org

2
IPoIB Packet Format
• IPoIB integrated as
Data Link layer
• Sockets API is not
affected

TCP/IP Stack

L5
Sockets API

L4
L3
L2 (IPoIB)
L1 (IB)

• Packet frame:
IB
Header

www.openfabrics.org

IPoIB
Header

IP
Header

Payload

3
Limitations
• IPoIB is limited for IP applications only
– No Ethernet Header encapsulation

• IPoIB network interface doesn’t act like a
standard Ethernet Network Device
•
•
•
•

Non-standard Ethernet interface utilities & characteristics
20 Bytes link-layer address
Ethernet link-layer (MAC) address is 6 Bytes
DHCP requires using dhcp-client identifier

– Host administrator must be aware of PKEYs
• While Ethernet interfaces support VLANs
• vconfig command cannot be used
www.openfabrics.org

4
Limitations cont.
• Link Layer setting, modification, migration
– IPoIB Link Layer address is based on QPN/GID
• Cannot be controlled by the user

– Host administrator cannot set or re-configure the link
layer address (MAC)

IPoIB Link Layer (RFC 4391)
www.openfabrics.org

5
Limitations cont.
• IPoIB cannot be used in fully-virtualized or
para-virtualized environments
– Hypervisor networking model normally use “bridged
mode”
– Virtual Switch (vSwitch) is Ethernet L2 switch
– IPoIB NIC cannot be enslaved to a vSwitch

• Promiscuous mode is unsupported
– Simplifies vSwitch functionality

www.openfabrics.org

6
IPoIB & Network Virtualization
VM “sees” an Ethernet
Network Device (e1000,
virtio, vmxnet, etc..)

Hypervisor

Packet sent out by the VM
includes an Ethernet
header

Virtual
Interface(s)
(vifX)
Virtual
Switch(s)
(vSwitchX)
Uplink(s)
(pifX)

vSwitch0

vSwitch1

pif0

Network Virtualization Model

vSwitch forwards the
Ethernet packet to the host
Network Interface
pif1

Native IPoIB driver serves
only IP packets (doesn’t
expect Ethernet Header)
Reference http://www.vmware.com/resources/techresources/997

www.openfabrics.org

7
Solution
• Create a new Ethernet Network Device
over IPoIB interface
• Managed by “eIPoIB” kernel module
• Registers a standard Ethernet Interface
into the Operating System
• Same “Look & Feel” as an Ethernet
NIC

TCP/IP
IP
Frame

Ethernet
Ethernet
Frame

eIPoIB
Shim Layer
Strip
Ethernet
Header

– ifconfig, vconfig, ethtool, promiscuous, etc..

• eIPoIB is a shim layer driver between
the TCP/IP stack and the native IPoIB
www.openfabrics.org

Push
Ethernet
Header

IP
Frame

Strip
IPoIB
Header

IPoIB
Push
IPoIB
Header

IPoIB
Frame

IB layer
8
eIPoIB vs. IPoIB Interface
• ifconfig ib0

www.openfabrics.org

9
eIPoIB vs. IPoIB Interface
• ifconfig eth3

www.openfabrics.org

10
eIPoIB Operations in a Nutshell
• Initialization:

TCP/IP

– Register Ethernet network interface (eth0)
– Map it to native IPoIB interface (ib0)

IP
Frame

• Transmit:
– Packet to be sent has Ethernet Header
– eIPoIB strips the Ethernet Header
– Pass the IP packet to the underlying IPoIB interface.

Ethernet
Ethernet
Frame

• Non IP/ARP/RARP packets are dropped

– Native IPoIB interface sends the packet out

eIPoIB
Shim Layer

• Receive:
–
–
–
–

Packet received is an IP packet
Native IPoIB handovers the IP packet to eIPoIB layer
eIPoIB pushes the Ethernet header
Packet forwarded to upper layers as regular Ethernet
frame

Push
Ethernet
Header

Strip
Ethernet
Header

IP
Frame

Strip
IPoIB
Header

IPoIB
Push
IPoIB
Header

IPoIB
Frame

IB layer
www.openfabrics.org

11
eIPoIB Frame Flow
IP
Header

IP Data
TCP/UDP

TCP/IP
IP
Frame

Ethernet
Header

Ethernet

Ethernet Data

Ethernet
Frame

IP
Header

IP Data
TCP/UDP

eIPoIB
Shim Layer
Strip
Ethernet
Header

IPoIB
Header

IPoIB

IP Data
Push
IPoIB
Header

www.openfabrics.org

IPoIB
Header

IP Data

CRC

InfiniBand
Header

IP
Frame

IPoIB
Frame

IB layer
12
eIPoIB/IPoIB Interoperability
TCP/IP

• Same wire protocol
• eIPoIB “talks” IPoIB

IP
Frame

Ethernet
Ethernet
Frame

TCP/IP

eIPoIB
Shim Layer
Strip
Ethernet
Header

IPoIB

www.openfabrics.org

IP
Frame

Strip
IPoIB
Header

IPoIB

InfiniBand
Header

IPoIB
Header

Ethernet Data

CRC

IB layer

Push
Ethernet
Header

Push
IPoIB
Header

IPoIB
Frame

IB layer
13
eIPoIB Networking Model (KVM)

eth0
ib0

Reference http://www.linux-kvm.org/wiki
www.openfabrics.org

14
eIPoIB Design (KVM)
vm1

vif0.1

vm2

vif0.2

vm.x

vif0.3

vif0.x

QEMU

TCP/IP

tap0.1

tap0.2

tap0.3

tap0.x

br0
eth0
ib0.0

ib0.1

ib0.2

CMA
ib0.3

ib0.x

ib0

IB port 0

www.openfabrics.org

15
eIPoIB – Closer Look
• Promiscuous mode
• MAC Translation

www.openfabrics.org

16
Promiscuous Mode
• Normally vSwitch uplink is put in
promiscuous mode
– InfiniBand doesn’t support promiscuous mode

• Promiscuous mode is simulated by:
a. Snooping the src.mac and vlan of outgoing packets
b. OS notifies the driver when a new MAC/VLAN need
to be “served”. For example: the driver can get a
notification from libvirt library (available for
KVM/XEN) when a new VM virtual NIC is created

• Multicast promiscuous support requires
IGMP/MLD Snooping in eIPoIB level
www.openfabrics.org

17
MAC Translation
• Requirements:
– eIPoIB exposes an Ethernet MAC
• Local Ethernet MAC (LEMAC) 6 bytes length

– eIPoIB neighbors are seen as Ethernet neighbors
• Remote Ethernet MAC (REMAC) 6 bytes length

– IPoIB Local MAC used on the wire
• Local IPoIB MAC (LIMAC) 20 bytes

– IPoIB neighbors’ MAC used on the wire
• Remote IPoIB MAC (RIMAC) 20 bytes

www.openfabrics.org

18
MAC Translation cont.
Receive Flow
• Receiver QP => dst.mac
• SQPN/SLID => src.mac
–

Remember QPN/LID to IPoIB-MAC
(QPN/GID) mapping

• IPoIB header => Ethernet
ethertype
• Replace IPoIB header by
Ethernet header

Transmit Flow
• src.mac => QP
(child interface)
– Source MAC normally controlled by the
host/hypervisor admin

• dst.mac => IPoIB-MAC
• Ethernet header ethertype
=> IPoIB Header
• Strip Ethernet header, and
handover packet to IPoIB

ARP/NDP packets are modified in TX/RX flow, so the MAC addresses
in the packet payload are updated accordingly
www.openfabrics.org

19
Questions?

www.openfabrics.org

20

More Related Content

PDF
Introduction to ewasm
PDF
An Overview of Linux Networking Options
PDF
Kamailio on Docker
PDF
ONIE / Cumulus Networks Webinar
PDF
Tomcat openssl
PDF
Tomcat openssl
PPTX
The Basic Introduction of Open vSwitch
PPTX
VDC by NETWORKERS HOME
Introduction to ewasm
An Overview of Linux Networking Options
Kamailio on Docker
ONIE / Cumulus Networks Webinar
Tomcat openssl
Tomcat openssl
The Basic Introduction of Open vSwitch
VDC by NETWORKERS HOME

What's hot (20)

PDF
Route flow autoconf demo 2nd sdn world congress 2013
PDF
ONIE LinuxCon 2015
PPTX
ONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
PPTX
The Switch as a Server - PuppetConf 2014
PPTX
OSDC 2014 ONIE by Nat Morris
PDF
Cisco's journey from Verbs to Libfabric
PDF
Understanding Open vSwitch
PDF
Report on OFELIA
PDF
Cumulus networks conversion guide
PPTX
Introduction to nexux from zero to Hero
PDF
Kubernetes networking-made-easy-with-open-v switch
PDF
Kamailio with Docker and Kubernetes
PDF
Open Networking for Your OpenStack
PDF
Openstack Neutron & Interconnections with BGP/MPLS VPNs
PDF
The linux kernel hidden inside windows 10
PDF
Interconnecting Neutron and Network Operators' BGP VPNs
PDF
IPLOOK vEPC solution
PDF
Linux firmware for iRMC controller on Fujitsu Primergy servers
PDF
Cumulus Linux 2.5.4
PPTX
20171010 multitenancy in openshift
Route flow autoconf demo 2nd sdn world congress 2013
ONIE LinuxCon 2015
ONIE: Open Network Install Environment @ OSDC 2014 Netways, Berlin
The Switch as a Server - PuppetConf 2014
OSDC 2014 ONIE by Nat Morris
Cisco's journey from Verbs to Libfabric
Understanding Open vSwitch
Report on OFELIA
Cumulus networks conversion guide
Introduction to nexux from zero to Hero
Kubernetes networking-made-easy-with-open-v switch
Kamailio with Docker and Kubernetes
Open Networking for Your OpenStack
Openstack Neutron & Interconnections with BGP/MPLS VPNs
The linux kernel hidden inside windows 10
Interconnecting Neutron and Network Operators' BGP VPNs
IPLOOK vEPC solution
Linux firmware for iRMC controller on Fujitsu Primergy servers
Cumulus Linux 2.5.4
20171010 multitenancy in openshift
Ad

Similar to 2012 workshop wed_ethernet_servicesoveri_poib (20)

PDF
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
PPTX
Networking in Docker Containers
PDF
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
PDF
Scaling the Container Dataplane
PDF
SDN/OpenFlow #lspe
PDF
Run Your Own 6LoWPAN Based IoT Network
PDF
LinuxConJapan2014_makita_0_MACVLAN.pdf
PDF
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
PDF
"One network to rule them all" - OpenStack Summit Austin 2016
PDF
Flexible NFV WAN interconnections with Neutron BGP VPN
PDF
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
PDF
PLNOG 6: Konrad Plich, Robert Woźny - TPIX - How to connect two IXes?
PDF
Unikernels: Rise of the Library Hypervisor
PDF
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
PDF
neutron_icehouse_update
PPTX
Network virtualization
PDF
CommScope RUCKUS ICX Switching Configuration
PDF
Learning how AWS implement AWS VPC CNI
PPTX
OpenStack Quantum
PDF
Unikernels: the rise of the library hypervisor in MirageOS
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Networking in Docker Containers
Adding IEEE 802.15.4 and 6LoWPAN to an Embedded Linux Device
Scaling the Container Dataplane
SDN/OpenFlow #lspe
Run Your Own 6LoWPAN Based IoT Network
LinuxConJapan2014_makita_0_MACVLAN.pdf
DPDK Summit - 08 Sept 2014 - Futurewei - Jun Xu - Revisit the IP Stack in Lin...
"One network to rule them all" - OpenStack Summit Austin 2016
Flexible NFV WAN interconnections with Neutron BGP VPN
XPDS14 - Xen as High-Performance NFV Platform - Jun Nakajima, Intel
PLNOG 6: Konrad Plich, Robert Woźny - TPIX - How to connect two IXes?
Unikernels: Rise of the Library Hypervisor
OpenStack and OpenContrail for FreeBSD platform by Michał Dubiel
neutron_icehouse_update
Network virtualization
CommScope RUCKUS ICX Switching Configuration
Learning how AWS implement AWS VPC CNI
OpenStack Quantum
Unikernels: the rise of the library hypervisor in MirageOS
Ad

Recently uploaded (20)

PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Machine learning based COVID-19 study performance prediction
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Approach and Philosophy of On baking technology
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Electronic commerce courselecture one. Pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Big Data Technologies - Introduction.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
MIND Revenue Release Quarter 2 2025 Press Release
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
Machine learning based COVID-19 study performance prediction
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Approach and Philosophy of On baking technology
Assigned Numbers - 2025 - Bluetooth® Document
Electronic commerce courselecture one. Pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
SOPHOS-XG Firewall Administrator PPT.pptx
Empathic Computing: Creating Shared Understanding
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Encapsulation_ Review paper, used for researhc scholars
Big Data Technologies - Introduction.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
The Rise and Fall of 3GPP – Time for a Sabbatical?
Programs and apps: productivity, graphics, security and other tools
MIND Revenue Release Quarter 2 2025 Press Release

2012 workshop wed_ethernet_servicesoveri_poib

  • 1. Ethernet Services over IPoIB Ali Ayoub, Mellanox Technologies March, 2012 www.openfabrics.org 1
  • 2. Background • What is IPoIB? – IP encapsulation over InfiniBand – RFC 4391/4392 – Provides IP services over InfiniBand fabric • Benefits: – Acts like an data-link within the TCP/IP Stack • Socket-based apps run transparently – Allow users to run IP-based unmodified applications on InfiniBand – Supports NIC offloads: CSUM, TSO, etc.. And other performance features such as: NAPI, TSS, RSS, etc.. www.openfabrics.org 2
  • 3. IPoIB Packet Format • IPoIB integrated as Data Link layer • Sockets API is not affected TCP/IP Stack L5 Sockets API L4 L3 L2 (IPoIB) L1 (IB) • Packet frame: IB Header www.openfabrics.org IPoIB Header IP Header Payload 3
  • 4. Limitations • IPoIB is limited for IP applications only – No Ethernet Header encapsulation • IPoIB network interface doesn’t act like a standard Ethernet Network Device • • • • Non-standard Ethernet interface utilities & characteristics 20 Bytes link-layer address Ethernet link-layer (MAC) address is 6 Bytes DHCP requires using dhcp-client identifier – Host administrator must be aware of PKEYs • While Ethernet interfaces support VLANs • vconfig command cannot be used www.openfabrics.org 4
  • 5. Limitations cont. • Link Layer setting, modification, migration – IPoIB Link Layer address is based on QPN/GID • Cannot be controlled by the user – Host administrator cannot set or re-configure the link layer address (MAC) IPoIB Link Layer (RFC 4391) www.openfabrics.org 5
  • 6. Limitations cont. • IPoIB cannot be used in fully-virtualized or para-virtualized environments – Hypervisor networking model normally use “bridged mode” – Virtual Switch (vSwitch) is Ethernet L2 switch – IPoIB NIC cannot be enslaved to a vSwitch • Promiscuous mode is unsupported – Simplifies vSwitch functionality www.openfabrics.org 6
  • 7. IPoIB & Network Virtualization VM “sees” an Ethernet Network Device (e1000, virtio, vmxnet, etc..) Hypervisor Packet sent out by the VM includes an Ethernet header Virtual Interface(s) (vifX) Virtual Switch(s) (vSwitchX) Uplink(s) (pifX) vSwitch0 vSwitch1 pif0 Network Virtualization Model vSwitch forwards the Ethernet packet to the host Network Interface pif1 Native IPoIB driver serves only IP packets (doesn’t expect Ethernet Header) Reference http://www.vmware.com/resources/techresources/997 www.openfabrics.org 7
  • 8. Solution • Create a new Ethernet Network Device over IPoIB interface • Managed by “eIPoIB” kernel module • Registers a standard Ethernet Interface into the Operating System • Same “Look & Feel” as an Ethernet NIC TCP/IP IP Frame Ethernet Ethernet Frame eIPoIB Shim Layer Strip Ethernet Header – ifconfig, vconfig, ethtool, promiscuous, etc.. • eIPoIB is a shim layer driver between the TCP/IP stack and the native IPoIB www.openfabrics.org Push Ethernet Header IP Frame Strip IPoIB Header IPoIB Push IPoIB Header IPoIB Frame IB layer 8
  • 9. eIPoIB vs. IPoIB Interface • ifconfig ib0 www.openfabrics.org 9
  • 10. eIPoIB vs. IPoIB Interface • ifconfig eth3 www.openfabrics.org 10
  • 11. eIPoIB Operations in a Nutshell • Initialization: TCP/IP – Register Ethernet network interface (eth0) – Map it to native IPoIB interface (ib0) IP Frame • Transmit: – Packet to be sent has Ethernet Header – eIPoIB strips the Ethernet Header – Pass the IP packet to the underlying IPoIB interface. Ethernet Ethernet Frame • Non IP/ARP/RARP packets are dropped – Native IPoIB interface sends the packet out eIPoIB Shim Layer • Receive: – – – – Packet received is an IP packet Native IPoIB handovers the IP packet to eIPoIB layer eIPoIB pushes the Ethernet header Packet forwarded to upper layers as regular Ethernet frame Push Ethernet Header Strip Ethernet Header IP Frame Strip IPoIB Header IPoIB Push IPoIB Header IPoIB Frame IB layer www.openfabrics.org 11
  • 12. eIPoIB Frame Flow IP Header IP Data TCP/UDP TCP/IP IP Frame Ethernet Header Ethernet Ethernet Data Ethernet Frame IP Header IP Data TCP/UDP eIPoIB Shim Layer Strip Ethernet Header IPoIB Header IPoIB IP Data Push IPoIB Header www.openfabrics.org IPoIB Header IP Data CRC InfiniBand Header IP Frame IPoIB Frame IB layer 12
  • 13. eIPoIB/IPoIB Interoperability TCP/IP • Same wire protocol • eIPoIB “talks” IPoIB IP Frame Ethernet Ethernet Frame TCP/IP eIPoIB Shim Layer Strip Ethernet Header IPoIB www.openfabrics.org IP Frame Strip IPoIB Header IPoIB InfiniBand Header IPoIB Header Ethernet Data CRC IB layer Push Ethernet Header Push IPoIB Header IPoIB Frame IB layer 13
  • 14. eIPoIB Networking Model (KVM) eth0 ib0 Reference http://www.linux-kvm.org/wiki www.openfabrics.org 14
  • 16. eIPoIB – Closer Look • Promiscuous mode • MAC Translation www.openfabrics.org 16
  • 17. Promiscuous Mode • Normally vSwitch uplink is put in promiscuous mode – InfiniBand doesn’t support promiscuous mode • Promiscuous mode is simulated by: a. Snooping the src.mac and vlan of outgoing packets b. OS notifies the driver when a new MAC/VLAN need to be “served”. For example: the driver can get a notification from libvirt library (available for KVM/XEN) when a new VM virtual NIC is created • Multicast promiscuous support requires IGMP/MLD Snooping in eIPoIB level www.openfabrics.org 17
  • 18. MAC Translation • Requirements: – eIPoIB exposes an Ethernet MAC • Local Ethernet MAC (LEMAC) 6 bytes length – eIPoIB neighbors are seen as Ethernet neighbors • Remote Ethernet MAC (REMAC) 6 bytes length – IPoIB Local MAC used on the wire • Local IPoIB MAC (LIMAC) 20 bytes – IPoIB neighbors’ MAC used on the wire • Remote IPoIB MAC (RIMAC) 20 bytes www.openfabrics.org 18
  • 19. MAC Translation cont. Receive Flow • Receiver QP => dst.mac • SQPN/SLID => src.mac – Remember QPN/LID to IPoIB-MAC (QPN/GID) mapping • IPoIB header => Ethernet ethertype • Replace IPoIB header by Ethernet header Transmit Flow • src.mac => QP (child interface) – Source MAC normally controlled by the host/hypervisor admin • dst.mac => IPoIB-MAC • Ethernet header ethertype => IPoIB Header • Strip Ethernet header, and handover packet to IPoIB ARP/NDP packets are modified in TX/RX flow, so the MAC addresses in the packet payload are updated accordingly www.openfabrics.org 19