On the feasibility of 40 Gbps network data capture and
retention with general purpose hardware
SAC 2018 | Pau, France
Guillermo Julián-Moreno
Rafael Leira
Jorge E. López de Vergara
Francisco Gómez-Arribas
Iván González
April 10, 2018
Naudit HPCN &
Escuela Politécnica Superior, Universidad
Autónoma de Madrid
Outline
1. Introduction
2. Design and implementation
3. Results
4. Conclusions
1
Introduction
Motivation
Why would we want to capture and store traffic?
• Online analysis and monitoring (e.g., flow records, traffic volume dashboards, IDS).
• Data retention for specialized analysis or policy requirements (e.g. GPDR).
10 GbE standard is widespread, now we are seeing higher speeds (40 GbE, but also 100
GbE)
• 10 GbE is the “last” standard that we can process with a single core.
• 40 GbE and higher speeds require parallelism: how?
2
Purpose of the system
• Receive the traffic at 40 Gbps as efficiently as possible.
• Timestamp the incoming traffic.
• Store the network frames in the disk at 40 Gbps.
• Use commercial off-the-shelf hardware to reduce costs.
3
Design and implementation
Previous architecture
DMA
tail
head
Descriptor ring Data bufferNIC
• Single thread copying frames to the intermediate buffer.
• Write files by blocks and use padding at the end of each file.
• Return the descriptor’s ownership to the card after the copy (no allocations
needed).
4
Reading from NIC and copying to buffer
1
23
4
Rx
Intermediate buffer
...
• Usual approach for parallelism is RSS queues. Problem: ensuring uniform
distribution between queues.
• Given our limited scope, switch to single queue and fixed descriptor assignments:
uniform distribution and no synchronization required for reading.
• Use a single atomic counter for the buffer write offset: as fast as possible and no
deadlocks possible.
• Add padding to the beginning and end of the files to avoid having frames
overrunning file boundaries.
5
Client reading
Kernel Userspace
Rx 0
Rx 1
Rx 2
Rx 3
1. Allocation
2. Copy Client
…
3. New data!
4. Reading
• Userspace clients get last written byte via syscalls and set their last read byte.
• RX thread 0 updates s, the space available in the buffer. No thread writes more
than ⌊s/n⌋ bytes in a batch.
6
Writing to disk
Two options for the write process:
• Regular files written in 4MB blocks. Needs a fast filesystem.
• Distributed writes between several NVMe disks with SPDK.
Features to reduce hardware requirements:
• Simple filtering system that searches “strings” of bytes at fixed positions.
• Selective storage: only store the first N bytes of each frame.
7
Results
Hardware used
Traffic generator RX Server 1 RX Server 2
CPU Intel Xeon E5-1620 v2 Intel Xeon E5-1620 v2 2 × Intel Xeon E5-2630 v4
Clock 3.70GHz 3.70GHz 2.20 GHz
Cores 4 4 2 x 10
Memory 32 GB 32 GB 2 x 64 GB
NIC Intel XL710 Intel XL710 Intel XL710
Storage SATA RAID SATA RAID 6 x NVMe
Est. cost 7,000e 7,000e 10,000e
Table 1: Specifications of the servers used for testing. HyperThreading was disabled.
8
Storage speed
0
10
20
30
40
50
2 3 4 5 6
Rate(Gbps)
Number of discs
Disk write rate
Software RAID
SPDK
Theoretical max speed
Figure 1: Performance of the NVMe disk array.
9
Traffic generation
0
5
10
15
20
25
30
35
40
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
Rate(Gbps)
Frame size (bytes)
Generation rate
Send rate
Theoretical max rate
Figure 2: Synthetic traffic rates achieved with our custom, DPDK-based traffic generator. We also
made a version capable of sending large PCAP files at line rate.
10
Timestamping accuracy
Frames Mean Std
All 1738 ns 3296 ns
One out of every eight 55 ns 287 ns
Table 2: Timestamping accuracy. The Intel NIC posts descriptors in batches of eight, so we have to
take that into account for the accuracy.
11
Traffic capture
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Port Drop %
Send rate
Capture rate
Theoretical max rate
Figure 3: Results of the first test: retrieval of the
frames from the NIC. The bottleneck is the card
for small frame sizes.
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Port Drop %
Send rate
Capture rate
Theoretical max rate
Figure 4: Results of the second test: writing of
the frames to a null device.
12
Traffic storage
0
25
50
75
100
100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500
0
5
10
15
20
25
30
35
40
Loss%
Rate(Gbps)
Frame size (bytes)
Capture rate
Lost %
Send rate
Capture rate
Theoretical max rate
Figure 5: Results of the third test: traffic storage using SPDK.
13
Traffic storage
Name Size Avg. frame size Send rate Loss %
CAIDA 222 GB 787.91 B 39.78 Gbps <0.01
University 4.3 GB 910.08 B 39.82 Gbps 0
Table 3: Performance on reception of traffic capture files.
14
Conclusions
Results
• We have created and open-sourced a system capable of capturing, timestamping
and storing network traffic at 40 Gbps.
• Not using RSS parallelism is feasible and useful in our limited-scope system.
• The one-copy mechanism and synchronization algorithms allow our system to store
line-rate traffic at frame sizes of 300 bytes and above (enough for the majority of
environments).
• We have created a testbed capable of saturating 40 GbE links for frames of size 96
bytes or greater.
15
Future work
• Improve the selective-storage approach: more effective filters (ASCII/BPF) or limits
based on RX rate.
• Detailed comparison of the disorder and timestamp inaccuracies between our
approach and RSS queues.
• Port this system to virtual machines with SR-IOV NVFs.
16
Questions?

More Related Content

PDF
Accelerating Networked Applications with Flexible Packet Processing
PDF
Transparent eBPF Offload: Playing Nice with the Linux Kernel
PDF
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
PDF
Stacks and Layers: Integrating P4, C, OVS and OpenStack
PDF
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
PDF
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
PDF
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
PPTX
Aerospike & GCE (LSPE Talk)
Accelerating Networked Applications with Flexible Packet Processing
Transparent eBPF Offload: Playing Nice with the Linux Kernel
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Stacks and Layers: Integrating P4, C, OVS and OpenStack
Common Support Issues And How To Troubleshoot Them - Michael Hackett, Vikhyat...
Accelerating Ceph Performance with High Speed Networks and Protocols - Qingch...
Accelerate Reed-Solomon coding for Fault-Tolerance in RAID-like system
Aerospike & GCE (LSPE Talk)

What's hot (20)

PPTX
Network emulator
PDF
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
PDF
Clemson: Solving the HPC Data Deluge
PDF
Whitebox Switches Deployment Experience
PPTX
Tc basics
PDF
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
PPTX
Compiling P4 to XDP, IOVISOR Summit 2017
PDF
Ceph for Big Science - Dan van der Ster
ODP
A Baker's dozen of TCP
PPTX
Redis on NVMe SSD - Zvika Guz, Samsung
PDF
Header compression and multiplexing in LISP
PPTX
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
PPTX
Jumbo Mumbo in OpenStack
PDF
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
PDF
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
PDF
Simplemux traffic optimization
PDF
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
PDF
Caching methodology and strategies
PDF
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
PPTX
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Network emulator
Kernel Recipes 2017 - EBPF and XDP - Eric Leblond
Clemson: Solving the HPC Data Deluge
Whitebox Switches Deployment Experience
Tc basics
Using Network Acceleration for an Optimized Edge Cloud Server Architecture
Compiling P4 to XDP, IOVISOR Summit 2017
Ceph for Big Science - Dan van der Ster
A Baker's dozen of TCP
Redis on NVMe SSD - Zvika Guz, Samsung
Header compression and multiplexing in LISP
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Jumbo Mumbo in OpenStack
Revisão: Forwarding Metamorphosis: Fast Programmable Match-Action Processing ...
[RakutenTechConf2014] [D-4] The next step of LeoFS and Introducing NewDB Project
Simplemux traffic optimization
Accelerating Ceph with iWARP RDMA over Ethernet - Brien Porter, Haodong Tang
Caching methodology and strategies
2009 11 06 3gpp Ietf Ipv6 Shanghai Nat64
Disaggregation a Primer: Optimizing design for Edge Cloud & Bare Metal applic...
Ad

Similar to On the feasibility of 40 Gbps network data capture and retention with general purpose hardware (20)

PPTX
RDMA at Hyperscale: Experience and Future Directions
PDF
Moreno tfm2012
ODP
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
PDF
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
PPTX
Collaborate nfs kyle_final
PPTX
fpga2014-wjun.pptx
PDF
optimizing_ceph_flash
PDF
Conference Paper: Universal Node: Towards a high-performance NFV environment
PDF
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
PDF
100 M pps on PC.
PPTX
Disaggregating Ceph using NVMeoF
PDF
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
PDF
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
PDF
Disaggregating Ceph using NVMeoF
PPTX
Native IP Decoding MPEG-TS Video to Uncompressed IP (and Vice versa) on COTS ...
PPTX
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
PDF
End nodes in the Multigigabit era
PPTX
High performace network of Cloud Native Taiwan User Group
PDF
TRex Traffic Generator - Hanoch Haim
PDF
ODSA Use Case - SmartNIC
RDMA at Hyperscale: Experience and Future Directions
Moreno tfm2012
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linux
Network-aware Data Management for High Throughput Flows Akamai, Cambridge, ...
Collaborate nfs kyle_final
fpga2014-wjun.pptx
optimizing_ceph_flash
Conference Paper: Universal Node: Towards a high-performance NFV environment
Network-aware Data Management for Large Scale Distributed Applications, IBM R...
100 M pps on PC.
Disaggregating Ceph using NVMeoF
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Disaggregating Ceph using NVMeoF
Native IP Decoding MPEG-TS Video to Uncompressed IP (and Vice versa) on COTS ...
PLNOG16: Obsługa 100M pps na platformie PC , Przemysław Frasunek, Paweł Mała...
End nodes in the Multigigabit era
High performace network of Cloud Native Taiwan User Group
TRex Traffic Generator - Hanoch Haim
ODSA Use Case - SmartNIC
Ad

More from Jorge E. López de Vergara Méndez (9)

PDF
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
PDF
Dictyogram: a Statistical Approach for the Definition and Visualization of Ne...
PDF
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
PDF
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
PDF
Merging heterogeneous network measurement data
PPTX
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
PPTX
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
PPTX
Defining ontologies for IP traffic measurements at MOI ISG
PPTX
Integración semántica de información de distintos repositorios de medidas de red
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Dictyogram: a Statistical Approach for the Definition and Visualization of Ne...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
Merging heterogeneous network measurement data
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Defining ontologies for IP traffic measurements at MOI ISG
Integración semántica de información de distintos repositorios de medidas de red

Recently uploaded (20)

PDF
Soil Improvement Techniques Note - Rabbi
PDF
First part_B-Image Processing - 1 of 2).pdf
PPTX
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
PDF
Exploratory_Data_Analysis_Fundamentals.pdf
PPTX
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
PDF
Design of Material Handling Equipment Lecture Note
PPTX
"Array and Linked List in Data Structures with Types, Operations, Implementat...
PDF
Computer organization and architecuture Digital Notes....pdf
PDF
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
PPTX
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
PPTX
CyberSecurity Mobile and Wireless Devices
PPT
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
PPTX
Petroleum Refining & Petrochemicals.pptx
PDF
Applications of Equal_Area_Criterion.pdf
PDF
Unit1 - AIML Chapter 1 concept and ethics
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
Measurement Uncertainty and Measurement System analysis
PPTX
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
PDF
Prof. Dr. KAYIHURA A. SILAS MUNYANEZA, PhD..pdf
PDF
August -2025_Top10 Read_Articles_ijait.pdf
Soil Improvement Techniques Note - Rabbi
First part_B-Image Processing - 1 of 2).pdf
tack Data Structure with Array and Linked List Implementation, Push and Pop O...
Exploratory_Data_Analysis_Fundamentals.pdf
ASME PCC-02 TRAINING -DESKTOP-NLE5HNP.pptx
Design of Material Handling Equipment Lecture Note
"Array and Linked List in Data Structures with Types, Operations, Implementat...
Computer organization and architecuture Digital Notes....pdf
LOW POWER CLASS AB SI POWER AMPLIFIER FOR WIRELESS MEDICAL SENSOR NETWORK
AUTOMOTIVE ENGINE MANAGEMENT (MECHATRONICS).pptx
CyberSecurity Mobile and Wireless Devices
Chapter 1 - Introduction to Manufacturing Technology_2.ppt
Petroleum Refining & Petrochemicals.pptx
Applications of Equal_Area_Criterion.pdf
Unit1 - AIML Chapter 1 concept and ethics
Module 8- Technological and Communication Skills.pptx
Measurement Uncertainty and Measurement System analysis
CN_Unite_1 AI&DS ENGGERING SPPU PUNE UNIVERSITY
Prof. Dr. KAYIHURA A. SILAS MUNYANEZA, PhD..pdf
August -2025_Top10 Read_Articles_ijait.pdf

On the feasibility of 40 Gbps network data capture and retention with general purpose hardware

  • 1. On the feasibility of 40 Gbps network data capture and retention with general purpose hardware SAC 2018 | Pau, France Guillermo Julián-Moreno Rafael Leira Jorge E. López de Vergara Francisco Gómez-Arribas Iván González April 10, 2018 Naudit HPCN & Escuela Politécnica Superior, Universidad Autónoma de Madrid
  • 2. Outline 1. Introduction 2. Design and implementation 3. Results 4. Conclusions 1
  • 4. Motivation Why would we want to capture and store traffic? • Online analysis and monitoring (e.g., flow records, traffic volume dashboards, IDS). • Data retention for specialized analysis or policy requirements (e.g. GPDR). 10 GbE standard is widespread, now we are seeing higher speeds (40 GbE, but also 100 GbE) • 10 GbE is the “last” standard that we can process with a single core. • 40 GbE and higher speeds require parallelism: how? 2
  • 5. Purpose of the system • Receive the traffic at 40 Gbps as efficiently as possible. • Timestamp the incoming traffic. • Store the network frames in the disk at 40 Gbps. • Use commercial off-the-shelf hardware to reduce costs. 3
  • 7. Previous architecture DMA tail head Descriptor ring Data bufferNIC • Single thread copying frames to the intermediate buffer. • Write files by blocks and use padding at the end of each file. • Return the descriptor’s ownership to the card after the copy (no allocations needed). 4
  • 8. Reading from NIC and copying to buffer 1 23 4 Rx Intermediate buffer ... • Usual approach for parallelism is RSS queues. Problem: ensuring uniform distribution between queues. • Given our limited scope, switch to single queue and fixed descriptor assignments: uniform distribution and no synchronization required for reading. • Use a single atomic counter for the buffer write offset: as fast as possible and no deadlocks possible. • Add padding to the beginning and end of the files to avoid having frames overrunning file boundaries. 5
  • 9. Client reading Kernel Userspace Rx 0 Rx 1 Rx 2 Rx 3 1. Allocation 2. Copy Client … 3. New data! 4. Reading • Userspace clients get last written byte via syscalls and set their last read byte. • RX thread 0 updates s, the space available in the buffer. No thread writes more than ⌊s/n⌋ bytes in a batch. 6
  • 10. Writing to disk Two options for the write process: • Regular files written in 4MB blocks. Needs a fast filesystem. • Distributed writes between several NVMe disks with SPDK. Features to reduce hardware requirements: • Simple filtering system that searches “strings” of bytes at fixed positions. • Selective storage: only store the first N bytes of each frame. 7
  • 12. Hardware used Traffic generator RX Server 1 RX Server 2 CPU Intel Xeon E5-1620 v2 Intel Xeon E5-1620 v2 2 × Intel Xeon E5-2630 v4 Clock 3.70GHz 3.70GHz 2.20 GHz Cores 4 4 2 x 10 Memory 32 GB 32 GB 2 x 64 GB NIC Intel XL710 Intel XL710 Intel XL710 Storage SATA RAID SATA RAID 6 x NVMe Est. cost 7,000e 7,000e 10,000e Table 1: Specifications of the servers used for testing. HyperThreading was disabled. 8
  • 13. Storage speed 0 10 20 30 40 50 2 3 4 5 6 Rate(Gbps) Number of discs Disk write rate Software RAID SPDK Theoretical max speed Figure 1: Performance of the NVMe disk array. 9
  • 14. Traffic generation 0 5 10 15 20 25 30 35 40 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 Rate(Gbps) Frame size (bytes) Generation rate Send rate Theoretical max rate Figure 2: Synthetic traffic rates achieved with our custom, DPDK-based traffic generator. We also made a version capable of sending large PCAP files at line rate. 10
  • 15. Timestamping accuracy Frames Mean Std All 1738 ns 3296 ns One out of every eight 55 ns 287 ns Table 2: Timestamping accuracy. The Intel NIC posts descriptors in batches of eight, so we have to take that into account for the accuracy. 11
  • 16. Traffic capture 0 25 50 75 100 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 5 10 15 20 25 30 35 40 Loss% Rate(Gbps) Frame size (bytes) Capture rate Lost % Port Drop % Send rate Capture rate Theoretical max rate Figure 3: Results of the first test: retrieval of the frames from the NIC. The bottleneck is the card for small frame sizes. 0 25 50 75 100 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 5 10 15 20 25 30 35 40 Loss% Rate(Gbps) Frame size (bytes) Capture rate Lost % Port Drop % Send rate Capture rate Theoretical max rate Figure 4: Results of the second test: writing of the frames to a null device. 12
  • 17. Traffic storage 0 25 50 75 100 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 0 5 10 15 20 25 30 35 40 Loss% Rate(Gbps) Frame size (bytes) Capture rate Lost % Send rate Capture rate Theoretical max rate Figure 5: Results of the third test: traffic storage using SPDK. 13
  • 18. Traffic storage Name Size Avg. frame size Send rate Loss % CAIDA 222 GB 787.91 B 39.78 Gbps <0.01 University 4.3 GB 910.08 B 39.82 Gbps 0 Table 3: Performance on reception of traffic capture files. 14
  • 20. Results • We have created and open-sourced a system capable of capturing, timestamping and storing network traffic at 40 Gbps. • Not using RSS parallelism is feasible and useful in our limited-scope system. • The one-copy mechanism and synchronization algorithms allow our system to store line-rate traffic at frame sizes of 300 bytes and above (enough for the majority of environments). • We have created a testbed capable of saturating 40 GbE links for frames of size 96 bytes or greater. 15
  • 21. Future work • Improve the selective-storage approach: more effective filters (ASCII/BPF) or limits based on RX rate. • Detailed comparison of the disorder and timestamp inaccuracies between our approach and RSS queues. • Port this system to virtual machines with SR-IOV NVFs. 16