SlideShare a Scribd company logo
DICE Horizon 2020 Project
Grant Agreement no. 644869
http://www.dice-h2020.eu Funded by the Horizon 2020
Framework Programme of the European Union
Monitoring in Big Data
Frameworks
Gabriel Iuhasz
Institute e-Austria Timisoara
26 November 2015
Overview
o Introduction
o Cloud Computing and Big Data
o Monitoring Tools
o Monitoring Requirements and Solutions
o Conclusions
Introduction
o Big Data in Cloud computing
o Volume, Velocity, Variety and Veracity
o Cost Reduction, Rapid provisioning/time to market,
Flexibility/scalability
o DevOps and Cloud
o Development and Operations
o Communication, Collaboration, Integration,
Automation
o DevOps Monitoring
o Measurement is a key aspect of DevOps
Big Data in Cloud Computing
o Challenges of Big Data On Cloud
o Low Latency real-time data
o Virtualization overhead
o Multi-tenancy overhead
o Scalability
o Lack of RDBMS support
o Availability
o Data integrity/privacy
Hadoop Ecosystem
Cloudera
HortonWorks
Monitoring Architecture
o Cross layer monitoring of big data platforms
o Types of metrics are highly dependent on the type of the
application
o Have to be decided on a platform/application basis
o Centralized Monitoring
o All resource states are sent to a centralized monitoring server
o Metrics are continuously polled from monitored components
o Single point of failure
o Lacks scalability
o Decentralized Monitoring
o No single point of failure
o Central authority is diffused
Tools
o Hadoop Performance Monitoring UI
o Lightweight monitoring UI for Hadoop server
o Uses Hadoop metrics (using Sinks)
o SequenceIQ
o Based on ELK stack and Docker containers
o ElasticSearch can be easily scaled horizontally
o Logstash server on client side
o Ganglia
o Scalable distributed monitoring system
o Low per-node overhead
o Focused on System Metrics
o Gmond, gmetad and Web Front-end
Tools II
o Apache Chukwa
o Built on top of HDFS
o Easily scalable
o Potentially high overhead
o Hadoop Vaidya
o Rule Based diagnostic tool for M/R jobs
o Performes post run results analysis
o Nagios
o Plugin based architecture
o Uses a centralized server to collect metrics
o Possible to create a hierarchical deployment
Requirements
o Difficulties in cloud monitoring
o Scale
o Velocity or Timeliness
o Constant changes
o The need for scalability and automation
o Easy re-configurability
o Lightweight metrics collectors
o Identifying pertinent metrics
DICE Overview
Platform-Indep.
Model
Domain
Models
Continuous
Validation
Continuous
Monitoring
Data
Awareness
Architecture
Model
Platform-Specific
Model
Platform
Description
DICE MARTE
Deployment &
Continuous
Integration
DICE IDE
Big Data
QA
Models
DICE Monitoring Platform
o RESTful Web Service
o Used to deploy and configure all core/auxiliary components
o Used to query ElasticSearch
Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML)
o Used for auto-scaling of monitoring solution
o ELK Stack
o Extremely flexible/configurable
o Horizontally scalable
o Can except various input and output formats
o ETL via Logstash server (filters)
o Logstash-forwarder secure transmission (new Beats Data Shippers)
o Visualization using Kibana4
o Collectd
o Statistics collection daemon
o A lot of plugins available
o Simple configuration
DICE Monitoring Platform II
DICE Monitoring Platform Scaled
DICE Monitoring Platform Variant
Conclusions
o We have given a short overview of current
monitoring platforms Identified key requirements for
Big Data Monitoring
o Scaling, Autonomy, Timeliness
o Automation via Chef recipes
o Presented the current Architecture of the DICE
Monitoring Platform
o Currently collecting from: HDFS, YARN, Spark, Storm, Kafka
o In the near future: Cassandra possibly Trident
o Creating the full lambda architecture based anomaly
detection platform
o ElasticSearch used as serving layer
Thank You!
Questions?

More Related Content

PDF
Big Data Monitoring Cockpit
PPTX
Our journey with druid - from initial research to full production scale
PDF
Building Reactive Real-time Data Pipeline
PDF
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
PDF
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
PDF
Don't build a data science team
PDF
Advanced data science algorithms applied to scalable stream processing by Dav...
PDF
Finding the needle in the haystack: how Nestle is leveraging big data to defe...
Big Data Monitoring Cockpit
Our journey with druid - from initial research to full production scale
Building Reactive Real-time Data Pipeline
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Quick dive into the big data pool without drowning - Demi Ben-Ari @ Panorays
Don't build a data science team
Advanced data science algorithms applied to scalable stream processing by Dav...
Finding the needle in the haystack: how Nestle is leveraging big data to defe...

What's hot (18)

PDF
Complex event processing platform handling millions of users - Krzysztof Zarz...
PPTX
Migrating Big Data Workloads to the Cloud
PPTX
Spark Streaming and Expert Systems
PDF
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
PDF
T-Mobile and Elastic
PDF
Batch and Interactive Analytics: From Data to Insight
PDF
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
PDF
The Rise of Engineering-Driven Analytics by Loren Shure
PDF
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
PDF
Monitoring Big Data Systems - "The Simple Way"
PPTX
ironSource Atom BigData Berlin
PPTX
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
PPTX
Intuit Analytics Cloud 101
PPT
Counting Unique Users in Real-Time: Here's a Challenge for You!
PDF
Data ops in practice - Swedish style
PPTX
Netflix Data Engineering @ Uber Engineering Meetup
PDF
Data Pipline Observability meetup
PDF
Lambda Architecture and open source technology stack for real time big data
Complex event processing platform handling millions of users - Krzysztof Zarz...
Migrating Big Data Workloads to the Cloud
Spark Streaming and Expert Systems
How KeyBank Used Elastic to Build an Enterprise Monitoring Solution
T-Mobile and Elastic
Batch and Interactive Analytics: From Data to Insight
Leveraging Spark to Democratize Data for Omni-Commerce with Shafaq Abdullah
The Rise of Engineering-Driven Analytics by Loren Shure
Shortening the Feedback Loop: How Spotify’s Big Data Ecosystem has evolved to...
Monitoring Big Data Systems - "The Simple Way"
ironSource Atom BigData Berlin
Data Apps with the Lambda Architecture - with Real Work Examples on Merging B...
Intuit Analytics Cloud 101
Counting Unique Users in Real-Time: Here's a Challenge for You!
Data ops in practice - Swedish style
Netflix Data Engineering @ Uber Engineering Meetup
Data Pipline Observability meetup
Lambda Architecture and open source technology stack for real time big data
Ad

Viewers also liked (7)

PDF
Каталог 17/2016
PPTX
Pig on Tez: Low Latency Data Processing with Big Data
PDF
IDOL presentation
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
PDF
Integrating big data into the monitoring and evaluation of development progra...
PPTX
Hive + Tez: A Performance Deep Dive
PDF
From Code to Kubernetes
Каталог 17/2016
Pig on Tez: Low Latency Data Processing with Big Data
IDOL presentation
Apache Tez - A New Chapter in Hadoop Data Processing
Integrating big data into the monitoring and evaluation of development progra...
Hive + Tez: A Performance Deep Dive
From Code to Kubernetes
Ad

Similar to Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015 (20)

PDF
IBM Think Milano
PDF
MISE2015
PPTX
Craig Sheridan International Industry-Academia Workshop on Cloud Reliability ...
PPTX
Cloud Expo 2015: DICE: Developing Data-Intensive Cloud Applications with Iter...
PDF
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
PPTX
DICE & Cloudify – Quality Big Data Made Easy
PDF
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
PDF
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
PPTX
Hadoop Turns a Corner and Sees the Future
PPTX
Intorducing Big Data and Microsoft Azure
PPTX
Towards Quality-Aware Development of Big Data Applications with DICE
PPTX
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
PDF
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
PDF
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
PPTX
The rise of “Big Data” on cloud computing
PDF
Big data: Challenges, Practices and Technologies
PDF
Trend Micro Big Data Platform and Apache Bigtop
PDF
Getting started with Hadoop on the Cloud with Bluemix
PPT
Big data analytics, survey r.nabati
PDF
Big data and hadoop overvew
IBM Think Milano
MISE2015
Craig Sheridan International Industry-Academia Workshop on Cloud Reliability ...
Cloud Expo 2015: DICE: Developing Data-Intensive Cloud Applications with Iter...
Architecting Modern Data Platforms Jan Kunigk Ian Buss Paul Wilkinson
DICE & Cloudify – Quality Big Data Made Easy
Monitoring Big Data Systems Done "The Simple Way" - Demi Ben-Ari - Codemotion...
Monitoring Big Data Systems "Done the simple way" - Demi Ben-Ari - Codemotion...
Hadoop Turns a Corner and Sees the Future
Intorducing Big Data and Microsoft Azure
Towards Quality-Aware Development of Big Data Applications with DICE
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
2013 International Conference on Knowledge, Innovation and Enterprise Presen...
SURVEY ON BIG DATA PROCESSING USING HADOOP, MAP REDUCE
The rise of “Big Data” on cloud computing
Big data: Challenges, Practices and Technologies
Trend Micro Big Data Platform and Apache Bigtop
Getting started with Hadoop on the Cloud with Bluemix
Big data analytics, survey r.nabati
Big data and hadoop overvew

Recently uploaded (20)

PDF
Understanding Forklifts - TECH EHS Solution
PPTX
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PDF
PTS Company Brochure 2025 (1).pdf.......
PDF
Softaken Excel to vCard Converter Software.pdf
PDF
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PPTX
VVF-Customer-Presentation2025-Ver1.9.pptx
PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PPTX
Transform Your Business with a Software ERP System
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PPT
Introduction Database Management System for Course Database
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PPTX
history of c programming in notes for students .pptx
PDF
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PPTX
Introduction to Artificial Intelligence
Understanding Forklifts - TECH EHS Solution
Agentic AI : A Practical Guide. Undersating, Implementing and Scaling Autono...
PTS Company Brochure 2025 (1).pdf.......
Softaken Excel to vCard Converter Software.pdf
Why TechBuilder is the Future of Pickup and Delivery App Development (1).pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
VVF-Customer-Presentation2025-Ver1.9.pptx
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
Transform Your Business with a Software ERP System
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
Wondershare Filmora 15 Crack With Activation Key [2025
Introduction Database Management System for Course Database
Design an Analysis of Algorithms II-SECS-1021-03
history of c programming in notes for students .pptx
Flood Susceptibility Mapping Using Image-Based 2D-CNN Deep Learnin. Overview ...
Internet Downloader Manager (IDM) Crack 6.42 Build 41
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Odoo Companies in India – Driving Business Transformation.pdf
Introduction to Artificial Intelligence

Monitoring in Big Data Frameworks @ Big Data Meetup, Timisoara, 2015

  • 1. DICE Horizon 2020 Project Grant Agreement no. 644869 http://www.dice-h2020.eu Funded by the Horizon 2020 Framework Programme of the European Union Monitoring in Big Data Frameworks Gabriel Iuhasz Institute e-Austria Timisoara 26 November 2015
  • 2. Overview o Introduction o Cloud Computing and Big Data o Monitoring Tools o Monitoring Requirements and Solutions o Conclusions
  • 3. Introduction o Big Data in Cloud computing o Volume, Velocity, Variety and Veracity o Cost Reduction, Rapid provisioning/time to market, Flexibility/scalability o DevOps and Cloud o Development and Operations o Communication, Collaboration, Integration, Automation o DevOps Monitoring o Measurement is a key aspect of DevOps
  • 4. Big Data in Cloud Computing o Challenges of Big Data On Cloud o Low Latency real-time data o Virtualization overhead o Multi-tenancy overhead o Scalability o Lack of RDBMS support o Availability o Data integrity/privacy
  • 8. Monitoring Architecture o Cross layer monitoring of big data platforms o Types of metrics are highly dependent on the type of the application o Have to be decided on a platform/application basis o Centralized Monitoring o All resource states are sent to a centralized monitoring server o Metrics are continuously polled from monitored components o Single point of failure o Lacks scalability o Decentralized Monitoring o No single point of failure o Central authority is diffused
  • 9. Tools o Hadoop Performance Monitoring UI o Lightweight monitoring UI for Hadoop server o Uses Hadoop metrics (using Sinks) o SequenceIQ o Based on ELK stack and Docker containers o ElasticSearch can be easily scaled horizontally o Logstash server on client side o Ganglia o Scalable distributed monitoring system o Low per-node overhead o Focused on System Metrics o Gmond, gmetad and Web Front-end
  • 10. Tools II o Apache Chukwa o Built on top of HDFS o Easily scalable o Potentially high overhead o Hadoop Vaidya o Rule Based diagnostic tool for M/R jobs o Performes post run results analysis o Nagios o Plugin based architecture o Uses a centralized server to collect metrics o Possible to create a hierarchical deployment
  • 11. Requirements o Difficulties in cloud monitoring o Scale o Velocity or Timeliness o Constant changes o The need for scalability and automation o Easy re-configurability o Lightweight metrics collectors o Identifying pertinent metrics
  • 13. DICE Monitoring Platform o RESTful Web Service o Used to deploy and configure all core/auxiliary components o Used to query ElasticSearch Exports metrics in: JSON, CSV, OSLC Perf. Mon 2.0 (RDF+XML) o Used for auto-scaling of monitoring solution o ELK Stack o Extremely flexible/configurable o Horizontally scalable o Can except various input and output formats o ETL via Logstash server (filters) o Logstash-forwarder secure transmission (new Beats Data Shippers) o Visualization using Kibana4 o Collectd o Statistics collection daemon o A lot of plugins available o Simple configuration
  • 17. Conclusions o We have given a short overview of current monitoring platforms Identified key requirements for Big Data Monitoring o Scaling, Autonomy, Timeliness o Automation via Chef recipes o Presented the current Architecture of the DICE Monitoring Platform o Currently collecting from: HDFS, YARN, Spark, Storm, Kafka o In the near future: Cassandra possibly Trident o Creating the full lambda architecture based anomaly detection platform o ElasticSearch used as serving layer

Editor's Notes

  • #4: - DevOps is a design philosophy that emphasizes collaboration and communication while automating the process of software delivery and infrastructure changes
  • #5: RDBMS
  • #13: Quality-Aware Development for Big Data applications