SlideShare a Scribd company logo
Marat Zhanikeev
maratishe@gmail.com
maratishe.github.io
Sub-Flow Traffic Analysis
Tokyo Univ. of Science
Packet Traffic Genome Project
NETSAP/COMPSACW 2018 @NII/Tokyo
PDF → bit.do/180727
#packets #traffic #subflow
#GA #genome #datacenter
Towards a
as a Method for Realtime
(several) General Observations
• in actual genome research, the concept of junk DNA is firmly established ??
• ... but has anyone heard of a GA algorithm with gene memory?
• related example: when optimizing VM placement in DCs, would you ① bin-pack
all VMs at regular intervals or rather ② formulate a migration-centric cost-aware
problem 01
◦ hint: full repacking is not feasible as it causes massive VM placement jitter
• proposal: let’s look into GA with memory concept for traffic
processing
◦ note: genome and GA accidentally overlap, but GA is not a strong requirement (any
optimization should do)
• closest rivals: the ”let’s look at traffic at subflow/flowlet/burst/train
level” crowd 03
01 myself+0 ”Optimizing Virtual Machine Migration for Energy-Efficient Clouds” IEICE Trans. Comm (2014)
03 K.He+5 ”Presto: Edge-based Load Balancing for Fast Datacenter Networks” ACM SIGCOMM Com.Comm Review (2015)
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 2/13
2/13
First, Capture Packets Effectively
• between advanced capture drivers and massively multicore 02
processors, not a problem even in modern DCs (10Gbps and above)
Global
Networks
Data Center
Internals
Gateway
Switch
Capture
Manager
CPU
CPU
CPU
CPU
CPU
CPU
…
Storage
Mirror
02 myself+0 ”A lock-free shared memory design for high-throughput multicore packet traffic capture” IJNM (2014)
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 3/13
3/13
Raw Packets to Genome
• using 2-3 intervals on the log scale, cut flows into bursts/trains and
classify them using ① packet count and ② interval (gap)
• two consecutive lower letter patterns (lazy keep-alive?) are not recorded, and are
used to cut genome into words
Increasing packet gap
#ofpacketsinaburst
Captured
packets
(per flow)
F
A B C
a b c
aBcF aFaFbF
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 4/13
4/13
Looking for Patterns in Genome Strings
0 1 2 3 4 5
log( 1 + gap) in microseconds
0
2
4
6
8
10
12
14
spagevitucesnocfo#
F
A
a b
c
a
A
b
B
c
• meaningful sequences representing valid
dynamics in traffic – see C1...C6
0 1 2 3 4 5
log( 1 + gap)
0
2
4
6
8
10
12
14
16
No.consecutivegaps
C1 C2 C3
C4 C5 C6
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 5/13
5/13
Yet More Patterns
• few As, various Fs, and other patters are obvious
F A B C a b c
0
0.2
0.4
0.6
0.8
1
Countdistribution
F A B C a b c
0
0.2
0.4
0.6
0.8
1
Countdistribution
wand
wide
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 6/13
6/13
Meaningful Genome Sequences
• again, many valid sequences
but ① small overall ratio (feature,
not bug) and ② long words are
increasingly rare
0 5 10 15 20 25 30 35 40 45
Gene length
0
0.01
0.02
0.03
Occurenceprobability
wand
wide
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 7/13
7/13
Genome → Traffic State Classification
0 100 200 300 400 500
Time sequence
0
1
2
3
4
5
6
7
Clusteraffiliation
Cluster counts : 66 291 20 57 33 23 10
0 100 200 300 400 500
Time sequence
0
1
2
3
4
5
6
7
Clusteraffiliation
Cluster counts : 73 316 68 14 13 6 10
wand
wide
• simple kmeans
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 8/13
8/13
Genome → Traffic State Classification (2)
• the same kmeans data, showing that there is overall majority of states are
of the same type (idle)
• the 80% of the same kind of state is perfect for the junk DNA element in
optimizations
0 40 80 120 160 200 240
Decreasing sequence
0.2
0.6
1
1.4
1.8
2.2
Occurencecountaslog(1+y)
wand
wide
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 9/13
9/13
Wrapup and Future Work
• even with plain kmeans, a drastically uneven distribution of traffic states
(judging from genome sequences) is discovered – perfect for the junk DNA element
• next step: migrate to Bayesian classifier as a better technique
◦ HMI: humans identify and define patterns, machines remember them by
incorporating the patterns as junk DNA (at runtime)
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 10/13
10/13
That’s all, thank you ...
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 11/13
11/13
Advanced Packet Capture
PF_RINGs
meter
User
space
Kernel
Capture
thread
Capture
thread
Capture
thread
NIC Driver
…
Manager
Network Network
To/from
Collector
• PF_RING is one of the best
(kernel level) methods for
streamlining packet capture
• then, it is a matter of efficiency
splitting the job across
multiple cores 02
• the concept of massively
multicore processors is
applicable as well
02 myself+0 ”A lock-free shared memory design for high-throughput multicore packet traffic capture” IJNM (2014)
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 12/13
12/13
Trace-based Analysis
• real packet traces are preferred to simulated traffic
• WAND 04 and WIDE 05 traffic archives are among the best known in this area, this
paper used both for comparison
04 ”WAND: Waikato Internet Traffic Storage” https://wand.net.nz/wits (current)
05 ”MAWI Working Group Traffic Archive: Packet Traces from WIDE Backbone” http://mawi.wide.ad.jp/mawi (current)
Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 13/13
13/13

More Related Content

DOCX
Proposal for System Analysis and Desing
PPTX
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
PDF
Best Practices for Analyzing VoIP at 10G
PPTX
Big Data Analytics and Advanced Computer Networking Scenarios
PPTX
Tune Up Your Network for the New Year
PDF
Traffic Matrices and its measurement
PPTX
Routing, Network Performance, and Role of Analytics
Proposal for System Analysis and Desing
Papers We Love Sept. 2018: 007: Democratically Finding The Cause of Packet Drops
Best Practices for Analyzing VoIP at 10G
Big Data Analytics and Advanced Computer Networking Scenarios
Tune Up Your Network for the New Year
Traffic Matrices and its measurement
Routing, Network Performance, and Role of Analytics

Similar to Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis (20)

PDF
Go with the Flow-v2
PDF
Monitoring and Managing Network Application Performance
PDF
Monitoring and Managing Network Application Performance
PDF
Visibility into 40G/100G Networks for Real-time and Post Capture Analysis and...
PDF
Just two clicks away - from monitoring and reporting to root-cause analysis
PDF
Internet ttraffic monitering anomalous behiviour detection
PDF
NETFLOW ANALYZER 9600 - AN OVERVIEW
PDF
Performing network security analytics
PPTX
Network traffic analysis with cyber security
PPTX
Free Netflow analyzer training - diagnosing_and_troubleshooting
PPT
Internet Traffic Monitoring and Analysis
PDF
DNA: an overview
PPTX
CapAnalysis - Deep Packet Inspection
PDF
PacketsNeverLie
PPTX
Open source network forensics and advanced pcap analysis
PPTX
NetBrain CE 5.0
PDF
Classification of Software Defined Network Traffic to provide Quality of Service
PPTX
Study of Internet Traffic to Analyze and Predict Traffic
PPT
Finding Needles in Haystacks (The Size of Countries)
PDF
Internet data mining 2006
Go with the Flow-v2
Monitoring and Managing Network Application Performance
Monitoring and Managing Network Application Performance
Visibility into 40G/100G Networks for Real-time and Post Capture Analysis and...
Just two clicks away - from monitoring and reporting to root-cause analysis
Internet ttraffic monitering anomalous behiviour detection
NETFLOW ANALYZER 9600 - AN OVERVIEW
Performing network security analytics
Network traffic analysis with cyber security
Free Netflow analyzer training - diagnosing_and_troubleshooting
Internet Traffic Monitoring and Analysis
DNA: an overview
CapAnalysis - Deep Packet Inspection
PacketsNeverLie
Open source network forensics and advanced pcap analysis
NetBrain CE 5.0
Classification of Software Defined Network Traffic to provide Quality of Service
Study of Internet Traffic to Analyze and Predict Traffic
Finding Needles in Haystacks (The Size of Countries)
Internet data mining 2006
Ad

More from Tokyo University of Science (20)

PDF
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
PDF
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
PDF
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
PDF
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
PDF
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
PDF
Taking the Step from Software to Product Development \\ when teaching PBL at ...
PDF
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
PDF
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
PDF
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
PDF
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
PDF
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
PDF
On a Hybrid Packets-and-Circuits Switching Logic
PDF
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
PDF
Complexity Resolution Control for Context Based on Metromaps
PDF
The Declarative-Coordinated Model for Self-Optimization of Service Networks
PDF
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
PDF
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
PDF
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
PDF
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
PDF
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
A Method for Cloud-Assisted Secure Wireless Grouping of Client Devices at Net...
Ultrasound Relative Positioning for IoT Devices in Dense Wireless Spaces
What if We Atomize Student Data and Apps and Put Them on Docker Containers?
Large-Scale Crowdsourcing by Vehicular Data Packets in a Sparse Roadside Infr...
On Performance Under Hotspots in Hadoop versus Bigdata Replay Platforms
Taking the Step from Software to Product Development \\ when teaching PBL at ...
Design and Implementation of a 3-Party Cloud-Backed Handshake for Secure Grou...
The Switchboard Optimization Problem and Heuristics for Cut-Through Networking
The Switchboard Traffic Engineering Problem for Mixed Contention/Cut-Through ...
Bulk-n-Pick Method for One-to-Many Data Transfer in Dense Wireless Spaces
Fog Cloud Caching at Network Edge via Local Hardware Awareness Spaces
On a Hybrid Packets-and-Circuits Switching Logic
Image-Related Uses for Roadside Infrastructure \\ based on Wireless Beacons
Complexity Resolution Control for Context Based on Metromaps
The Declarative-Coordinated Model for Self-Optimization of Service Networks
3-Way Scripts as a Practical Platform for Secure Distributed Code in Clouds
3-Way Scripts as a Base Unit for Flexible Scale-Out Code
Towards Social Robotics on Smartphones with Simple XYZV Sensor Feedback
Back to Rings but not Tokens: Physical and Logical Designs for Distributed Fi...
Browser Visualization using PNGs Generated by HTML5 Workers on Multicore
Ad

Recently uploaded (20)

PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Electronic commerce courselecture one. Pdf
PPTX
Cloud computing and distributed systems.
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
KodekX | Application Modernization Development
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Network Security Unit 5.pdf for BCA BBA.
Diabetes mellitus diagnosis method based random forest with bat algorithm
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Machine learning based COVID-19 study performance prediction
20250228 LYD VKU AI Blended-Learning.pptx
Electronic commerce courselecture one. Pdf
Cloud computing and distributed systems.
Spectral efficient network and resource selection model in 5G networks
Advanced methodologies resolving dimensionality complications for autism neur...
KodekX | Application Modernization Development
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf

Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis

  • 1. Marat Zhanikeev maratishe@gmail.com maratishe.github.io Sub-Flow Traffic Analysis Tokyo Univ. of Science Packet Traffic Genome Project NETSAP/COMPSACW 2018 @NII/Tokyo PDF → bit.do/180727 #packets #traffic #subflow #GA #genome #datacenter Towards a as a Method for Realtime
  • 2. (several) General Observations • in actual genome research, the concept of junk DNA is firmly established ?? • ... but has anyone heard of a GA algorithm with gene memory? • related example: when optimizing VM placement in DCs, would you ① bin-pack all VMs at regular intervals or rather ② formulate a migration-centric cost-aware problem 01 ◦ hint: full repacking is not feasible as it causes massive VM placement jitter • proposal: let’s look into GA with memory concept for traffic processing ◦ note: genome and GA accidentally overlap, but GA is not a strong requirement (any optimization should do) • closest rivals: the ”let’s look at traffic at subflow/flowlet/burst/train level” crowd 03 01 myself+0 ”Optimizing Virtual Machine Migration for Energy-Efficient Clouds” IEICE Trans. Comm (2014) 03 K.He+5 ”Presto: Edge-based Load Balancing for Fast Datacenter Networks” ACM SIGCOMM Com.Comm Review (2015) Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 2/13 2/13
  • 3. First, Capture Packets Effectively • between advanced capture drivers and massively multicore 02 processors, not a problem even in modern DCs (10Gbps and above) Global Networks Data Center Internals Gateway Switch Capture Manager CPU CPU CPU CPU CPU CPU … Storage Mirror 02 myself+0 ”A lock-free shared memory design for high-throughput multicore packet traffic capture” IJNM (2014) Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 3/13 3/13
  • 4. Raw Packets to Genome • using 2-3 intervals on the log scale, cut flows into bursts/trains and classify them using ① packet count and ② interval (gap) • two consecutive lower letter patterns (lazy keep-alive?) are not recorded, and are used to cut genome into words Increasing packet gap #ofpacketsinaburst Captured packets (per flow) F A B C a b c aBcF aFaFbF Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 4/13 4/13
  • 5. Looking for Patterns in Genome Strings 0 1 2 3 4 5 log( 1 + gap) in microseconds 0 2 4 6 8 10 12 14 spagevitucesnocfo# F A a b c a A b B c • meaningful sequences representing valid dynamics in traffic – see C1...C6 0 1 2 3 4 5 log( 1 + gap) 0 2 4 6 8 10 12 14 16 No.consecutivegaps C1 C2 C3 C4 C5 C6 Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 5/13 5/13
  • 6. Yet More Patterns • few As, various Fs, and other patters are obvious F A B C a b c 0 0.2 0.4 0.6 0.8 1 Countdistribution F A B C a b c 0 0.2 0.4 0.6 0.8 1 Countdistribution wand wide Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 6/13 6/13
  • 7. Meaningful Genome Sequences • again, many valid sequences but ① small overall ratio (feature, not bug) and ② long words are increasingly rare 0 5 10 15 20 25 30 35 40 45 Gene length 0 0.01 0.02 0.03 Occurenceprobability wand wide Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 7/13 7/13
  • 8. Genome → Traffic State Classification 0 100 200 300 400 500 Time sequence 0 1 2 3 4 5 6 7 Clusteraffiliation Cluster counts : 66 291 20 57 33 23 10 0 100 200 300 400 500 Time sequence 0 1 2 3 4 5 6 7 Clusteraffiliation Cluster counts : 73 316 68 14 13 6 10 wand wide • simple kmeans Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 8/13 8/13
  • 9. Genome → Traffic State Classification (2) • the same kmeans data, showing that there is overall majority of states are of the same type (idle) • the 80% of the same kind of state is perfect for the junk DNA element in optimizations 0 40 80 120 160 200 240 Decreasing sequence 0.2 0.6 1 1.4 1.8 2.2 Occurencecountaslog(1+y) wand wide Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 9/13 9/13
  • 10. Wrapup and Future Work • even with plain kmeans, a drastically uneven distribution of traffic states (judging from genome sequences) is discovered – perfect for the junk DNA element • next step: migrate to Bayesian classifier as a better technique ◦ HMI: humans identify and define patterns, machines remember them by incorporating the patterns as junk DNA (at runtime) Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 10/13 10/13
  • 11. That’s all, thank you ... Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 11/13 11/13
  • 12. Advanced Packet Capture PF_RINGs meter User space Kernel Capture thread Capture thread Capture thread NIC Driver … Manager Network Network To/from Collector • PF_RING is one of the best (kernel level) methods for streamlining packet capture • then, it is a matter of efficiency splitting the job across multiple cores 02 • the concept of massively multicore processors is applicable as well 02 myself+0 ”A lock-free shared memory design for high-throughput multicore packet traffic capture” IJNM (2014) Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 12/13 12/13
  • 13. Trace-based Analysis • real packet traces are preferred to simulated traffic • WAND 04 and WIDE 05 traffic archives are among the best known in this area, this paper used both for comparison 04 ”WAND: Waikato Internet Traffic Storage” https://wand.net.nz/wits (current) 05 ”MAWI Working Group Traffic Archive: Packet Traces from WIDE Backbone” http://mawi.wide.ad.jp/mawi (current) Marat Zhanikeev – maratishe@gmail.com Towards a Packet Traffic Genome Project as a Method for Realtime Sub-Flow Traffic Analysis – bit.do/180727 13/13 13/13