Use case:
Merging heterogeneous
network measurement data
Jorge E. López de Vergara and Javier Aracil
Jorge.lopez_vergara@uam.es
Credits to Rubén García-Valcárcel, Iván González, Rafael
Leira, Víctor Moreno, David Muelas, Javier Ramos, Paula
Roquero, Carlos Vega and the rest of the HPCN-UAM team
SMART Internet Monitoring Study 3rd Workshop,
Barcelona, Spain, 22nd April 2015
Contents
•  Introduction
•  Technologies
–  High-speed traffic measurements
–  Data integration alternatives
–  Hadoop for network measurements
–  Log processing
•  Conclusions
Merging heterogeneous network measurement data 2
Introduction
•  Network measurement data can come from
different sources
–  Network-oriented sources:
•  SNMP MIB instances
•  Netflow records
•  Pcap files
•  …
–  Application-oriented sources:
•  Logs
–  Some standard (Apache web log)
–  Some proprietary (application specific)
•  Important to deal with encrypted traffic
–  It is necessary to provide ways to merge them
Merging heterogeneous network measurement data 3
High-speed traffic
measurements
•  Requirements
–  Capture at core networks
–  +10 Gbps links, sometimes virtualized
–  No packet drops
•  Available off-the-shelf resources that help on this
tough task
–  Intel +10G network cards
•  Intel DPDK
•  Other developments: HPCAP at UAM
–  Mellanox +10G network cards
•  Mellanox Messaging Accelerator (VMA)
–  Multicore processors
•  CPU affinity and isolation for key tasks
–  Lots of RAM memory
•  Use of hugepages and mmap
Merging heterogeneous network measurement data 4
High-speed traffic
measurements
Merging heterogeneous network measurement data 5
HPCAP
M3
OMON
Traffic
Sniffer
NIC	
  
Packet
Ring
Packet
Buffer
Packet	
  
Dumper
Flow	
  
Manager
Flow	
  
Exporter
Packet-­‐level
Traces
Flow-­‐level
Traces
Agg.Stats
Traces
App.	
  1
App.	
  2
Flow
Table
READ
(on	
  demand)
WRITE
NON-­‐VOLATILE	
  
MEMORY
VOLATILE	
  
MEMORY
KERNEL-­‐LEVEL
PROCESSING	
  
MODULE
Packet
Arrival
USER-­‐LEVEL
PROCESSING	
  
MODULE
App.	
  N
M3
OMON
API
packet
access
MRTG
access
flow
access
READ
(real-­‐time)
Data integration alternatives
•  SQL databases
–  Pros: reliable, normalized schemas, consistent
with defined constraints
–  Cons: slow, need to use materialized views to
go faster, creating materialized views is costly
and sometimes it can’t be done concurrently
à Not valid to deal with high-speed network
measurements
•  Plain files, no SQL
–  Pros: much faster
–  Cons: inconsistencies, lack of normalization
à Necessary to deal with high-speed network
measurements, but keeping in mind its limitations
Merging heterogeneous network measurement data 6
Hadoop for network
measurements
•  General scenario
Merging heterogeneous network measurement data 7
Flow
Process
Flows
PCAP
PC PC
Users’ network
Internet
MapReduce
Jobs
HTTPSDNS
Hive
Hadoop
Time series
Popular
web pages
Malfunctioning
Equipment
Most
used PCs
Predictions
SerDe
HQL
HDFS
Admin Queries
Raw traffic
Processed
traffic
Deployed on
many nodes
to work
properly
Otherdata
sources(logs)
Hadoop for network
measurements
•  DNS analysis
Merging heterogeneous network measurement data 8
DNS
Traffic
Whois
database
Intelligent
Capture
Module
RADIUS
records
Hadoop
Long-Term
Storage
NoSQL
Database
Data
Visualization
Log processing
•  Requirements (real scenario):
–  Process 3M events per second (about 5 Gbps)
•  Put together application logs and network flows
–  Several disks in parallel are necessary to store the
events at appropriate rate
–  Fast access to time series and aggregated statistics
à What operators demand
–  Slower access to raw data
à What IT analysts demand
•  Elasticsearch and Kibana tools provide some
support, but it is necessary to tune them
Merging heterogeneous network measurement data 9
Kibana UI
Merging heterogeneous network measurement data 10
Conclusions
•  Need for different network measurement data
sources
–  Combine network and application data
–  Necessary to find sources of problems
•  Is the slowdown caused by the network, the server or the
application?
•  This question can only be answered if all information from
different layers is provided and analyzed
•  Huge amount of data à it is necessary to work
with fast processing systems
•  Availability of processing tools from the Cloud
Computing community
–  It is necessary to adapt them to the network
measurment processes
Merging heterogeneous network measurement data 11

More Related Content

PDF
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
PPTX
Big data applications
PPTX
Telco analytics at scale
PDF
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
PPTX
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
PPTX
Improving Organizational Knowledge with Natural Language Processing Enriched ...
PDF
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
PDF
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters
Stream Data Processing at Big Data Landscape by Oleksandr Fedirko
Big data applications
Telco analytics at scale
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at...
Improving Organizational Knowledge with Natural Language Processing Enriched ...
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
Node Architecture Implications for In-Memory Data Analytics on Scale-in Clusters

What's hot (20)

PDF
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
PPT
Grid optical network service architecture for data intensive applications
PPTX
Cloudgene - A MapReduce based Workflow Management System
PPTX
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
PPTX
Apache spot 系統架構
PPTX
Integrating Apache Phoenix with Distributed Query Engines
PPTX
Data streaming fundamentals
PPTX
Querying Druid in SQL with Superset
PPTX
Shaping a Digital Vision
PDF
RISELab:Enabling Intelligent Real-Time Decisions
PDF
StreamHorizon and bigdata overview
PDF
Integrating Hadoop & Solr
PDF
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
PPTX
Basic Hadoop Architecture V1 vs V2
PPTX
Spark in the Enterprise - 2 Years Later by Alan Saldich
PPTX
IoFMT – Internet of Fleet Management Things
PPTX
Spark Streaming the Industrial IoT
PPTX
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
PPTX
Data Science at Scale by Sarah Guido
PDF
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Tuning Java Driver for Apache Cassandra by Nenad Bozic at Big Data Spain 2017
Grid optical network service architecture for data intensive applications
Cloudgene - A MapReduce based Workflow Management System
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
Apache spot 系統架構
Integrating Apache Phoenix with Distributed Query Engines
Data streaming fundamentals
Querying Druid in SQL with Superset
Shaping a Digital Vision
RISELab:Enabling Intelligent Real-Time Decisions
StreamHorizon and bigdata overview
Integrating Hadoop & Solr
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Basic Hadoop Architecture V1 vs V2
Spark in the Enterprise - 2 Years Later by Alan Saldich
IoFMT – Internet of Fleet Management Things
Spark Streaming the Industrial IoT
How Apache Spark Is Helping Tame the Wild West of Wi-Fi
Data Science at Scale by Sarah Guido
Using Spark and Riak for IoT Apps—Patterns and Anti-Patterns: Spark Summit Ea...
Ad

Similar to Merging heterogeneous network measurement data (20)

PPTX
Hadoop ppt1
PPTX
Lightning Fast Analytics with Hive LLAP and Druid
PDF
Hadoop Distributed File System
PDF
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
PDF
PNDA - Platform for Network Data Analytics
PPTX
Hadoop and Big data in Big data and cloud.pptx
PPTX
Hadoop/MapReduce/HDFS
PDF
Distributed Computing with Apache Hadoop: Technology Overview
PPTX
Hadoop.pptx
PPTX
Hadoop.pptx
PPTX
List of Engineering Colleges in Uttarakhand
PPT
Architecting Big Data Ingest & Manipulation
PPTX
Big Data for QAs
PPTX
Bigdata workshop february 2015
PPTX
ngs07.data-center.ssadasdasdasdlides.pptx
PPTX
Introduction to Apache Apex
PPTX
VTU 6th Sem Elective CSE - Module 4 cloud computing
PDF
module4-cloudcomputing-180131071200.pdf
PPTX
Big data analytics and machine intelligence v5.0
PPTX
Colorado Springs Open Source Hadoop/MySQL
Hadoop ppt1
Lightning Fast Analytics with Hive LLAP and Druid
Hadoop Distributed File System
Crossing Analytics Systems: Case for Integrated Provenance in Data Lakes
PNDA - Platform for Network Data Analytics
Hadoop and Big data in Big data and cloud.pptx
Hadoop/MapReduce/HDFS
Distributed Computing with Apache Hadoop: Technology Overview
Hadoop.pptx
Hadoop.pptx
List of Engineering Colleges in Uttarakhand
Architecting Big Data Ingest & Manipulation
Big Data for QAs
Bigdata workshop february 2015
ngs07.data-center.ssadasdasdasdlides.pptx
Introduction to Apache Apex
VTU 6th Sem Elective CSE - Module 4 cloud computing
module4-cloudcomputing-180131071200.pdf
Big data analytics and machine intelligence v5.0
Colorado Springs Open Source Hadoop/MySQL
Ad

More from Jorge E. López de Vergara Méndez (9)

PDF
On the feasibility of 40 Gbps network data capture and retention with general...
PDF
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
PDF
Dictyogram: a Statistical Approach for the Definition and Visualization of Ne...
PDF
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
PDF
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
PPTX
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
PPTX
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
PPTX
Defining ontologies for IP traffic measurements at MOI ISG
PPTX
Integración semántica de información de distintos repositorios de medidas de red
On the feasibility of 40 Gbps network data capture and retention with general...
Evaluación de equipamiento de bajo coste para realizar medidas de red en ento...
Dictyogram: a Statistical Approach for the Definition and Visualization of Ne...
Análisis de Datos Funcionales para Gestión de Red: Téecnicas, Retos y Oportun...
MONITORIZACIÓN Y ANÁLISIS DE TRÁFICO DE RED CON APACHE HADOOP
Multimedia flow classification at 10 Gbps using acceleration techniques on co...
Evaluating Quality of Experience in IPTV Services Using MPEG Frame Loss Rate
Defining ontologies for IP traffic measurements at MOI ISG
Integración semántica de información de distintos repositorios de medidas de red

Recently uploaded (20)

PDF
The Evolution of Traditional to New Media .pdf
PDF
Slides: PDF The World Game (s) Eco Economic Epochs.pdf
PPTX
Cyber Hygine IN organizations in MSME or
DOCX
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
PDF
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
PPTX
curriculumandpedagogyinearlychildhoodcurriculum-171021103104 - Copy.pptx
PPT
12 Things That Make People Trust a Website Instantly
PPTX
Basic understanding of cloud computing one need
PDF
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
PPTX
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
PDF
Paper The World Game (s) Great Redesign.pdf
PDF
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
PDF
Course Overview and Agenda cloud security
PDF
Exploring The Internet Of Things(IOT).ppt
PDF
Computer Networking, Internet, Casting in Network
PPTX
The-Importance-of-School-Sanitation.pptx
PPTX
Reading as a good Form of Recreation
PPTX
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
PPTX
Tìm hiểu về dịch vụ FTTH - Fiber Optic Access Node
PPTX
Internet Safety for Seniors presentation
The Evolution of Traditional to New Media .pdf
Slides: PDF The World Game (s) Eco Economic Epochs.pdf
Cyber Hygine IN organizations in MSME or
Powerful Ways AIRCONNECT INFOSYSTEMS Pvt Ltd Enhances IT Infrastructure in In...
The_Decisive_Battle_of_Yarmuk,battle of yarmuk
curriculumandpedagogyinearlychildhoodcurriculum-171021103104 - Copy.pptx
12 Things That Make People Trust a Website Instantly
Basic understanding of cloud computing one need
Top 8 Trusted Sources to Buy Verified Cash App Accounts.pdf
10.2981-wlb.2004.021Figurewlb3bf00068fig0001.pptx
Paper The World Game (s) Great Redesign.pdf
mera desh ae watn.(a source of motivation and patriotism to the youth of the ...
Course Overview and Agenda cloud security
Exploring The Internet Of Things(IOT).ppt
Computer Networking, Internet, Casting in Network
The-Importance-of-School-Sanitation.pptx
Reading as a good Form of Recreation
1402_iCSC_-_RESTful_Web_APIs_--_Josef_Hammer.pptx
Tìm hiểu về dịch vụ FTTH - Fiber Optic Access Node
Internet Safety for Seniors presentation

Merging heterogeneous network measurement data

  • 1. Use case: Merging heterogeneous network measurement data Jorge E. López de Vergara and Javier Aracil Jorge.lopez_vergara@uam.es Credits to Rubén García-Valcárcel, Iván González, Rafael Leira, Víctor Moreno, David Muelas, Javier Ramos, Paula Roquero, Carlos Vega and the rest of the HPCN-UAM team SMART Internet Monitoring Study 3rd Workshop, Barcelona, Spain, 22nd April 2015
  • 2. Contents •  Introduction •  Technologies –  High-speed traffic measurements –  Data integration alternatives –  Hadoop for network measurements –  Log processing •  Conclusions Merging heterogeneous network measurement data 2
  • 3. Introduction •  Network measurement data can come from different sources –  Network-oriented sources: •  SNMP MIB instances •  Netflow records •  Pcap files •  … –  Application-oriented sources: •  Logs –  Some standard (Apache web log) –  Some proprietary (application specific) •  Important to deal with encrypted traffic –  It is necessary to provide ways to merge them Merging heterogeneous network measurement data 3
  • 4. High-speed traffic measurements •  Requirements –  Capture at core networks –  +10 Gbps links, sometimes virtualized –  No packet drops •  Available off-the-shelf resources that help on this tough task –  Intel +10G network cards •  Intel DPDK •  Other developments: HPCAP at UAM –  Mellanox +10G network cards •  Mellanox Messaging Accelerator (VMA) –  Multicore processors •  CPU affinity and isolation for key tasks –  Lots of RAM memory •  Use of hugepages and mmap Merging heterogeneous network measurement data 4
  • 5. High-speed traffic measurements Merging heterogeneous network measurement data 5 HPCAP M3 OMON Traffic Sniffer NIC   Packet Ring Packet Buffer Packet   Dumper Flow   Manager Flow   Exporter Packet-­‐level Traces Flow-­‐level Traces Agg.Stats Traces App.  1 App.  2 Flow Table READ (on  demand) WRITE NON-­‐VOLATILE   MEMORY VOLATILE   MEMORY KERNEL-­‐LEVEL PROCESSING   MODULE Packet Arrival USER-­‐LEVEL PROCESSING   MODULE App.  N M3 OMON API packet access MRTG access flow access READ (real-­‐time)
  • 6. Data integration alternatives •  SQL databases –  Pros: reliable, normalized schemas, consistent with defined constraints –  Cons: slow, need to use materialized views to go faster, creating materialized views is costly and sometimes it can’t be done concurrently à Not valid to deal with high-speed network measurements •  Plain files, no SQL –  Pros: much faster –  Cons: inconsistencies, lack of normalization à Necessary to deal with high-speed network measurements, but keeping in mind its limitations Merging heterogeneous network measurement data 6
  • 7. Hadoop for network measurements •  General scenario Merging heterogeneous network measurement data 7 Flow Process Flows PCAP PC PC Users’ network Internet MapReduce Jobs HTTPSDNS Hive Hadoop Time series Popular web pages Malfunctioning Equipment Most used PCs Predictions SerDe HQL HDFS Admin Queries Raw traffic Processed traffic Deployed on many nodes to work properly Otherdata sources(logs)
  • 8. Hadoop for network measurements •  DNS analysis Merging heterogeneous network measurement data 8 DNS Traffic Whois database Intelligent Capture Module RADIUS records Hadoop Long-Term Storage NoSQL Database Data Visualization
  • 9. Log processing •  Requirements (real scenario): –  Process 3M events per second (about 5 Gbps) •  Put together application logs and network flows –  Several disks in parallel are necessary to store the events at appropriate rate –  Fast access to time series and aggregated statistics à What operators demand –  Slower access to raw data à What IT analysts demand •  Elasticsearch and Kibana tools provide some support, but it is necessary to tune them Merging heterogeneous network measurement data 9
  • 10. Kibana UI Merging heterogeneous network measurement data 10
  • 11. Conclusions •  Need for different network measurement data sources –  Combine network and application data –  Necessary to find sources of problems •  Is the slowdown caused by the network, the server or the application? •  This question can only be answered if all information from different layers is provided and analyzed •  Huge amount of data à it is necessary to work with fast processing systems •  Availability of processing tools from the Cloud Computing community –  It is necessary to adapt them to the network measurment processes Merging heterogeneous network measurement data 11