SlideShare a Scribd company logo
Simple is better
Building fast IPv6 transition mechanisms
on Snabb Switch
31 January 2016 – FOSDEM 2016
Katerina Barone-Adesi kbarone@igalia.com
Andy Wingo wingo@igalia.com
Snabb Switch
A toolkit for building network functions
High performance, flexible, hackable
data plane
The Tao of Snabb
Simple > Complex
Small > Large
Commodity > Proprietary
Simple > Complex
How do we compose network functions
from smaller parts?
Build inside of network function like
composing UNIX pipelines
intel10g | reassemble | lwaftr | fragment
| intel10g
Apps independently developed, linked
together at run-time
Simple > Complex
What is a packet?
struct packet {
unsigned char data[10*1024];
uint16_t length;
};
Small > Large
Early code budget: 10000 lines
Build in a minute
Constraints driving creativity
Small > Large
Secret weapon: LuaJIT
High performance with minimal fuss
Small > Large
Minimize dependencies
1 minute build budget includes LuaJIT
and all deps
Deliverable is single binary
Small > Large
Writing our own drivers, in Lua
User-space networking
The data plane is our domain, not the
kernel’s
❧
Not DPDK’s either!❧
Fits in 10000-line budget❧
Commodity > Proprietary
Open source (Apache 2.0)
Commodity > Proprietary
Open data sheets
Intel 82599 10Gb, soon up to 100Gb
Soon: Mellanox (they agree to release
data sheet!)
Also Linux tap interfaces, virtio host and
guest
Commodity > Proprietary
Double down on 64-bit x86 servers
Prefer CPU over NIC where possible
Embrace the memory hierarchy
Storytime!
“We need to do work on data... but
there’s just so much of it and it’s really
far away.”
Storytime!
Modern x86: who’s winning?
Clock speed same since years ago
Main memory just as far away
HPC people are winning
“We need to do work on data... but
there’s just so much of it and it’s really
far away.”
Three primary improvements:
CPU can work on more data per cycle,
once data in registers
❧
CPU can load more data per cycle,
once it’s in cache
❧
CPU can make more parallel fetches
to L3 and RAM at once
❧
Networking folks can win
too
Instead of chasing zero-copy, tying
yourself to ever-more-proprietary
features of your NIC, just take the hit
once: DDIO into L3.
Copy if you need to – copies with L3 not
expensive.
Software will eat the world!
Networking folks can win
too
Once in L3, you have:
wide loads and stores via AVX2 and
soon AVX-512 (64 bytes!)
❧
pretty good instruction-level
parallelism: up to 16 concurrent L2
misses per core on haswell
❧
wide SIMD: checksum in software!❧
software, not firmware❧
</storytime>
So what about the lwAFTR
IPv6 transition on Snabb: a lwAFTR
Why IPv6?
● The IPv4 address space is exhausted
- IANA top level exhaustion in 2011
- 4/5 Regional Internet Registries exhausted
- September 2012 in Europe
- September 2015 in the US
- AfriNIC within the next few years
● The internet is still growing
● Moving to IPv6 helps
IPv6 transition mechanisms
● Users want everything to continue working
… including IPv4 websites, networked games,
etc
● Some user equipment cannot do IPv6
● Several options: NAT64, 464XLAT, DS-Lite...
Why Lightweight 4over6?
● Similar to DS-Lite, but less centralized state
● Share IPv4 addresses between users
● Each user gets a port range
● Allows providers to have a simpler architecture:
pure IPv6, not dual-stacked IPv4 and IPv6, in
their internal network
● Standardized as RFC 7596 in 2015
Two main parts: B4 and AFTR
● Both encapsulate and decapsulate IPv4-in-IPv6
● Each user (subscriber) has a B4
● The network provider has one or more AFTRs,
which store per-subscriber (not per-flow)
information
● The information: The B4's IPv6 address, IPv4
address, and port range.
lw4o6 architecture
Lw4o6 address sharing
IPv4 is tunnelled in IPv6
Snabb lwAFTR
● Started July 2015
● Proof of concept data plane October 2015
● It's already usable and fast.
● http://github.com/igalia/snabbswitch/
- lwaftr* branches
- Apache License v2
Performance
● Hardware: two 10-gigabit NICs
- Intel 82599ES, SFI/SFP+
● Xeon processor: E5-2620 v3 @ 2.40GHz
● Snabb-lwaftr alpha release
● 550-byte packets
● Over 4 million packets/second
→ over 17 gigabit/second handled on one core
Challenges
● Correctly handling ICMP
- conveying failure information, for instance to
an IPv4 host if a failure occurs within the tunnel
● Speed
● Speed with a lot of subscribers
● Correctness
● Hairpinning
Hairpinning: client-to-client traffic
And we’re back
Implementation challenges
Binding table lookup - Port partition
When to hairpin?
Virtualization
Policy
Configuration
Binding table lookup
Say, Belgium: millions of tunnels
Per-tunnel: IPv4, IPv6 of B4, port set ID
At least 4 + 16 + 2 = 22 bytes
2M entries: 44MB
You can’t fit it in L3.
Binding table lookup
So always budget for an L3 cache miss –
but only one!
4 MPPS in: 250 ns/packet
One cache miss RTT (80 ns) within
budget
Many fetches can happen in that RTT
Binding table lookup: v1
Open-addressed robin-hood hash table
with linear probing
Result probably right where we first look
for it, otherwise in adjacent memory,
might fetch adjacent cache lines
Binding table lookup: v2
Maximum probe length around 8 for 2e6
entries, 40% occupancy
Stream in all 8 entries at once in parallel
Branchless binary search over those 8
entries
Binding table lookup: v3
Stream in all 8 entries at once in parallel
for multiple packets in parallel❧
32 packets at a time: amortized 50ns/
lookup
Worst-case bounds!
Port partitioning
Different IPv4 addresses can have their
ports partitioned in different ways
Need f(ipv4, port) -> params
Current solution: partition IPv4 space
into ranges with same parameters, use
binary search
Hairpinning
Problem: after decapsulating IPv4
packet, send to internet or re-tunnel back
to IPv6?
Answer: Use port partition as quick
check, if so do the hairpinning
Yay software
Virtualization
Want to make a virtualized lwaftr
Missing virtio-net implementation
Work by Virtual Open Systems; thanks!
Usual workload: One Snabb-NFV per
interface on the host
Same performance
Yay software!
Policy
Ingress/egress filtering
Pflua! https://github.com/Igalia/pflua
As an app!
Configuration
Compile binding table from text
Update/control plane TBD
Future work
Yang
Smaller packets
Integrate ILP binding table fetch
40Gb
Thanks!
kbarone@igalia.com
wingo@igalia.com
https://github.com/SnabbCo/snabbswitch
https://github.com/Igalia/snabbswitch
ps. We are hiring!

More Related Content

PDF
iptables and Kubernetes
PDF
Netty @Apple: Large Scale Deployment/Connectivity
KEY
Twisted: a quick introduction
PDF
WTF is Twisted?
PDF
An Introduction to Twisted
PPTX
Asynchronous Python with Twisted
ODP
Developing high-performance network servers in Lisp
PDF
Ractor's speed is not light-speed
iptables and Kubernetes
Netty @Apple: Large Scale Deployment/Connectivity
Twisted: a quick introduction
WTF is Twisted?
An Introduction to Twisted
Asynchronous Python with Twisted
Developing high-performance network servers in Lisp
Ractor's speed is not light-speed

What's hot (20)

PDF
rtnetlink
PDF
iptables 101- bottom-up
PDF
Load Balancing 101
PPTX
Notes on Netty baics
PDF
IP Virtual Server(IPVS) 101
PDF
Building GUI App with Electron and Lisp
PDF
Writing a fast HTTP parser
PDF
Netty - a pragmatic introduction
PDF
A Kong retrospective: from 0.10 to 0.13
PPTX
Rust kafka-5-2019-unskip
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
PDF
Open stack networking vlan, gre
PDF
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
PDF
Building scalable network applications with Netty (as presented on NLJUG JFal...
PPTX
How fluentd fits into the modern software landscape
PDF
OSDC 2018 - Distributed monitoring
PDF
Woo: Writing a fast web server @ ELS2015
ODP
Building Netty Servers
PPTX
Netty Notes Part 3 - Channel Pipeline and EventLoops
PDF
Driving containerd operations with gRPC
rtnetlink
iptables 101- bottom-up
Load Balancing 101
Notes on Netty baics
IP Virtual Server(IPVS) 101
Building GUI App with Electron and Lisp
Writing a fast HTTP parser
Netty - a pragmatic introduction
A Kong retrospective: from 0.10 to 0.13
Rust kafka-5-2019-unskip
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Open stack networking vlan, gre
CNTUG x SDN Meetup #33 Talk 1: 從 Cilium 認識 cgroup ebpf - Ruian
Building scalable network applications with Netty (as presented on NLJUG JFal...
How fluentd fits into the modern software landscape
OSDC 2018 - Distributed monitoring
Woo: Writing a fast web server @ ELS2015
Building Netty Servers
Netty Notes Part 3 - Channel Pipeline and EventLoops
Driving containerd operations with gRPC
Ad

Similar to Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSDEM 2016) (20)

PDF
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
PDF
IPv4aaS tutorial and hands-on
PDF
Tutorial: IPv6-only transition with demo
PDF
OpenStack Scale-out Networking Architecture
PDF
Io t hurdles_i_pv6_slides_doin
PPTX
Introduction to DPDK
PDF
Rapid IPv6 Deployment for ISP Networks
PPTX
High performace network of Cloud Native Taiwan User Group
PDF
Cilium - Fast IPv6 Container Networking with BPF and XDP
PDF
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
PDF
IPv6 Transition & Deployment, including IPv6-only in cellular and broadband
PDF
DPDK Summit 2015 - Aspera - Charles Shiflett
PPTX
Apache Hadoop 3.0 Community Update
PDF
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
PPTX
Compatibility between IPv4 and IPv6
PDF
100 M pps on PC.
ODP
There and back again
PDF
Run Your Own 6LoWPAN Based IoT Network
Production high-performance networking with Snabb and LuaJIT (Linux.conf.au 2...
IPv4aaS tutorial and hands-on
Tutorial: IPv6-only transition with demo
OpenStack Scale-out Networking Architecture
Io t hurdles_i_pv6_slides_doin
Introduction to DPDK
Rapid IPv6 Deployment for ISP Networks
High performace network of Cloud Native Taiwan User Group
Cilium - Fast IPv6 Container Networking with BPF and XDP
Snabb, a toolkit for building user-space network functions (ES.NOG 20)
IPv6 Transition & Deployment, including IPv6-only in cellular and broadband
DPDK Summit 2015 - Aspera - Charles Shiflett
Apache Hadoop 3.0 Community Update
Practical Guide to Run an IEEE 802.15.4 Network with 6LoWPAN Under Linux
Compatibility between IPv4 and IPv6
100 M pps on PC.
There and back again
Run Your Own 6LoWPAN Based IoT Network
Ad

More from Igalia (20)

PDF
Life of a Kernel Bug Fix
PDF
Unlocking the Full Potential of WPE to Build a Successful Embedded Product
PDF
Advancing WebDriver BiDi support in WebKit
PDF
Jumping Over the Garden Wall - WPE WebKit on Android
PDF
Collective Funding, Governance and Prioritiation of Browser Engine Projects
PDF
Don't let your motivation go, save time with kworkflow
PDF
Solving the world’s (localization) problems
PDF
The Whippet Embeddable Garbage Collection Library
PDF
Nobody asks "How is JavaScript?"
PDF
Getting more juice out from your Raspberry Pi GPU
PDF
WebRTC support in WebKitGTK and WPEWebKit with GStreamer: Status update
PDF
Demystifying Temporal: A Deep Dive into JavaScript New Temporal API
PDF
CSS :has() Unlimited Power
PDF
Device-Generated Commands in Vulkan
PDF
Current state of Lavapipe: Mesa's software renderer for Vulkan
PDF
Vulkan Video is Open: Application showcase
PDF
Scheme on WebAssembly: It is happening!
PDF
EBC - A new backend compiler for etnaviv
PDF
RISC-V LLVM State of the Union
PDF
Device-Generated Commands in Vulkan
Life of a Kernel Bug Fix
Unlocking the Full Potential of WPE to Build a Successful Embedded Product
Advancing WebDriver BiDi support in WebKit
Jumping Over the Garden Wall - WPE WebKit on Android
Collective Funding, Governance and Prioritiation of Browser Engine Projects
Don't let your motivation go, save time with kworkflow
Solving the world’s (localization) problems
The Whippet Embeddable Garbage Collection Library
Nobody asks "How is JavaScript?"
Getting more juice out from your Raspberry Pi GPU
WebRTC support in WebKitGTK and WPEWebKit with GStreamer: Status update
Demystifying Temporal: A Deep Dive into JavaScript New Temporal API
CSS :has() Unlimited Power
Device-Generated Commands in Vulkan
Current state of Lavapipe: Mesa's software renderer for Vulkan
Vulkan Video is Open: Application showcase
Scheme on WebAssembly: It is happening!
EBC - A new backend compiler for etnaviv
RISC-V LLVM State of the Union
Device-Generated Commands in Vulkan

Recently uploaded (20)

PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
Review of recent advances in non-invasive hemoglobin estimation
Network Security Unit 5.pdf for BCA BBA.
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
Digital-Transformation-Roadmap-for-Companies.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Spectral efficient network and resource selection model in 5G networks
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Mobile App Security Testing_ A Comprehensive Guide.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
NewMind AI Weekly Chronicles - August'25 Week I
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Chapter 3 Spatial Domain Image Processing.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Encapsulation_ Review paper, used for researhc scholars
Approach and Philosophy of On baking technology
Teaching material agriculture food technology

Snabb Switch: Riding the HPC wave to simpler, better network appliances (FOSDEM 2016)

  • 1. Simple is better Building fast IPv6 transition mechanisms on Snabb Switch 31 January 2016 – FOSDEM 2016 Katerina Barone-Adesi kbarone@igalia.com Andy Wingo wingo@igalia.com
  • 2. Snabb Switch A toolkit for building network functions High performance, flexible, hackable data plane
  • 3. The Tao of Snabb Simple > Complex Small > Large Commodity > Proprietary
  • 4. Simple > Complex How do we compose network functions from smaller parts? Build inside of network function like composing UNIX pipelines intel10g | reassemble | lwaftr | fragment | intel10g Apps independently developed, linked together at run-time
  • 5. Simple > Complex What is a packet? struct packet { unsigned char data[10*1024]; uint16_t length; };
  • 6. Small > Large Early code budget: 10000 lines Build in a minute Constraints driving creativity
  • 7. Small > Large Secret weapon: LuaJIT High performance with minimal fuss
  • 8. Small > Large Minimize dependencies 1 minute build budget includes LuaJIT and all deps Deliverable is single binary
  • 9. Small > Large Writing our own drivers, in Lua User-space networking The data plane is our domain, not the kernel’s ❧ Not DPDK’s either!❧ Fits in 10000-line budget❧
  • 10. Commodity > Proprietary Open source (Apache 2.0)
  • 11. Commodity > Proprietary Open data sheets Intel 82599 10Gb, soon up to 100Gb Soon: Mellanox (they agree to release data sheet!) Also Linux tap interfaces, virtio host and guest
  • 12. Commodity > Proprietary Double down on 64-bit x86 servers Prefer CPU over NIC where possible Embrace the memory hierarchy
  • 13. Storytime! “We need to do work on data... but there’s just so much of it and it’s really far away.”
  • 14. Storytime! Modern x86: who’s winning? Clock speed same since years ago Main memory just as far away
  • 15. HPC people are winning “We need to do work on data... but there’s just so much of it and it’s really far away.” Three primary improvements: CPU can work on more data per cycle, once data in registers ❧ CPU can load more data per cycle, once it’s in cache ❧ CPU can make more parallel fetches to L3 and RAM at once ❧
  • 16. Networking folks can win too Instead of chasing zero-copy, tying yourself to ever-more-proprietary features of your NIC, just take the hit once: DDIO into L3. Copy if you need to – copies with L3 not expensive. Software will eat the world!
  • 17. Networking folks can win too Once in L3, you have: wide loads and stores via AVX2 and soon AVX-512 (64 bytes!) ❧ pretty good instruction-level parallelism: up to 16 concurrent L2 misses per core on haswell ❧ wide SIMD: checksum in software!❧ software, not firmware❧
  • 19. IPv6 transition on Snabb: a lwAFTR
  • 20. Why IPv6? ● The IPv4 address space is exhausted - IANA top level exhaustion in 2011 - 4/5 Regional Internet Registries exhausted - September 2012 in Europe - September 2015 in the US - AfriNIC within the next few years ● The internet is still growing ● Moving to IPv6 helps
  • 21. IPv6 transition mechanisms ● Users want everything to continue working … including IPv4 websites, networked games, etc ● Some user equipment cannot do IPv6 ● Several options: NAT64, 464XLAT, DS-Lite...
  • 22. Why Lightweight 4over6? ● Similar to DS-Lite, but less centralized state ● Share IPv4 addresses between users ● Each user gets a port range ● Allows providers to have a simpler architecture: pure IPv6, not dual-stacked IPv4 and IPv6, in their internal network ● Standardized as RFC 7596 in 2015
  • 23. Two main parts: B4 and AFTR ● Both encapsulate and decapsulate IPv4-in-IPv6 ● Each user (subscriber) has a B4 ● The network provider has one or more AFTRs, which store per-subscriber (not per-flow) information ● The information: The B4's IPv6 address, IPv4 address, and port range.
  • 26. IPv4 is tunnelled in IPv6
  • 27. Snabb lwAFTR ● Started July 2015 ● Proof of concept data plane October 2015 ● It's already usable and fast. ● http://github.com/igalia/snabbswitch/ - lwaftr* branches - Apache License v2
  • 28. Performance ● Hardware: two 10-gigabit NICs - Intel 82599ES, SFI/SFP+ ● Xeon processor: E5-2620 v3 @ 2.40GHz ● Snabb-lwaftr alpha release ● 550-byte packets ● Over 4 million packets/second → over 17 gigabit/second handled on one core
  • 29. Challenges ● Correctly handling ICMP - conveying failure information, for instance to an IPv4 host if a failure occurs within the tunnel ● Speed ● Speed with a lot of subscribers ● Correctness ● Hairpinning
  • 32. Implementation challenges Binding table lookup - Port partition When to hairpin? Virtualization Policy Configuration
  • 33. Binding table lookup Say, Belgium: millions of tunnels Per-tunnel: IPv4, IPv6 of B4, port set ID At least 4 + 16 + 2 = 22 bytes 2M entries: 44MB You can’t fit it in L3.
  • 34. Binding table lookup So always budget for an L3 cache miss – but only one! 4 MPPS in: 250 ns/packet One cache miss RTT (80 ns) within budget Many fetches can happen in that RTT
  • 35. Binding table lookup: v1 Open-addressed robin-hood hash table with linear probing Result probably right where we first look for it, otherwise in adjacent memory, might fetch adjacent cache lines
  • 36. Binding table lookup: v2 Maximum probe length around 8 for 2e6 entries, 40% occupancy Stream in all 8 entries at once in parallel Branchless binary search over those 8 entries
  • 37. Binding table lookup: v3 Stream in all 8 entries at once in parallel for multiple packets in parallel❧ 32 packets at a time: amortized 50ns/ lookup Worst-case bounds!
  • 38. Port partitioning Different IPv4 addresses can have their ports partitioned in different ways Need f(ipv4, port) -> params Current solution: partition IPv4 space into ranges with same parameters, use binary search
  • 39. Hairpinning Problem: after decapsulating IPv4 packet, send to internet or re-tunnel back to IPv6? Answer: Use port partition as quick check, if so do the hairpinning Yay software
  • 40. Virtualization Want to make a virtualized lwaftr Missing virtio-net implementation Work by Virtual Open Systems; thanks! Usual workload: One Snabb-NFV per interface on the host Same performance Yay software!
  • 42. Configuration Compile binding table from text Update/control plane TBD
  • 43. Future work Yang Smaller packets Integrate ILP binding table fetch 40Gb