SlideShare a Scribd company logo
Monitoring using Open source technologies
Utkarsh
Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).
• An active contributor to Grafana.
• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmu
Email – utkarsh.cmu@gmail.com
GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg
Monitoring using Open source technologies
Monitoring using Open source technologies
Hi, I am
Jack.
Requirements:
• 50,000 unique metrics from one source
• Data points every minute
• Roughly about 72 million data points per day
• Data retention 60 days
• User friendly UI with possible customization
Monitoring using Open source technologies
Monitoring using Open source technologies
Monitoring using Open source technologies
Mission accomplished!
1 metrics source
50,000 unique metrics
72 million data points per day
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data points per day
Team 2 Requirements:
• 400,000 unique metrics
• About 600 million data points per day
Team 3 Requirements:
• 500,000 unique metrics
• About 2 billion data points per day
Team 4 Requirements:
• 800,000 unique metrics
• About 5 billion data points per day
And more………
Monitoring using Open source technologies
Should he continue with Graphite?
Should he ask to reduce metrics or datapoints?
How to dynamically scale Graphite?
Does Grafana support other datasources?
OpenTSDB / InfluxDB / KairosDB / Prometheus?
Support scaling Infrastructure to support variable load of metrics?
Challenges:
• Multiple teams
• Millions of unique metrics
• Above 10 billion data points a day
• Process 3 million logs every minute
and generate metrics
• Reprocessing of metrics and logs if
needed
• Provide real time monitoring for all
of the above using GRAFANA!
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
POC works for:
1 metrics source
50,000 unique metrics
72 million data points per day
Team 1 requirements:
1 metrics source
100,000 unique metrics
200 million data points per day
Monitoring using Open source technologies
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
Monitoring using Open source technologies
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
Monitoring using Open source technologies
Clustering Graphite
CARBON
RELAY
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
. . .
GRAPHITE WEB GRAPHITE WEB
LOAD
BALANCER
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Monitoring using Open source technologies
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Monitoring using Open source technologies
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Monitoring using Open source technologies
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
Monitoring using Open source technologies
Team 3 requirements:
Over 5000 log sources
3 million logs per minute
Monitoring using Open source technologies
Monitoring using Open source technologies
Alerting
Graphite Stats
- Apps using a stats library written by
Alexander Filipchik
Custom metrics
- From other sources
Monitoring using Open source technologies
Monitoring using Open source technologies
Monitoring using Open source technologies
Monitoring using Open source technologies
(Subject to effort and time)
Monitoring using Open source technologies
Alerting
• More than 3 million unique metrics supported
- creation and deletion happens all the time
• More than 11 billion data points written per day
- across all TSDBs
• Processing about 40 billion events per day
- logs and metrics events in near real time (within 30 seconds)
• More than 3000 requests per minute to Grafana dashboards
- around 7000 requests in during outages
Alerting
Monitoring using Open source technologies
Monitoring using Open source technologies
https://grafana.net/plugins
Monitoring using Open source technologies
Monitoring using Open source technologies
http://grafana.org/
http://docs.grafana.org/
https://github.com/grafana/grafana
https://raintank.slack.com
Monitoring using Open source technologies
• Move
• Copy
• Extract
• Insert
• Remove
• Rows
• Panels
• Template varia
• Dashboard tag
• Dashboards
• Datasources
• Orgs
• Rows
• Panels
• Template variables
• Dashboard tags
Version Control
• Production
• Staging
• Testing
• Development
Grafana in multiple environments
• Last 24 hours
• By a dashboard tag
• Customized dashboa
Generate GIFs of important dashbo
Generate GIFs of important dashbo
• Upload/Store/Download
dashboards to/in/from AWS S3
respectively.
• Search/Download community
dashboards from Grafana.net
External features
Monitoring using Open source technologies
https://utkarshcmu.github.io/wizzy-site/
https://utkarshcmu.github.io/wizzy-site/home/
https://github.com/utkarshcmu/wizzy
https://raintank.slack.com/messages/wizzy/
Utkarsh
Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).
• An active contributor to Grafana.
• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmu
Email – utkarsh.cmu@gmail.com
GrafanaCon 2016 Speaker - https://www.youtube.com/watch?v=llRhdvV25rg

More Related Content

PPTX
Introducing wizzy - a CLI tool for Grafana
PPTX
Talk @ GrafanaCon 2016
PPTX
Quix presto ide, presto summit IL
PDF
JanusGraph: Looking Backward, Reaching Forward
PDF
Community-Driven Graphs with JanusGraph
PPT
Add ons for stash
PPTX
Presto summit israel 2019-04
PDF
JanusGraph, Jupyter Meetup NYC
Introducing wizzy - a CLI tool for Grafana
Talk @ GrafanaCon 2016
Quix presto ide, presto summit IL
JanusGraph: Looking Backward, Reaching Forward
Community-Driven Graphs with JanusGraph
Add ons for stash
Presto summit israel 2019-04
JanusGraph, Jupyter Meetup NYC

What's hot (16)

PPTX
Data analytics at a petabyte scale final
PDF
Graph Computing with JanusGraph
PPTX
Janus graph lookingbackwardreachingforward
PDF
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
PPTX
Guidelines for productive full stack data engineers
PPTX
Powers of Ten Redux
PDF
Graph Processing with Titan and Scylla
PPTX
AWS Finland meetup 2018 August
PDF
Start Flying with Python & Apache TinkerPop
PDF
Zillow's favorite big data & machine learning tools
PDF
Scalable Machine Learning
PDF
Torkel Ödegaard (Creator of Grafana) - Grafana at #DOXLON
PDF
Graph Computing with Apache TinkerPop
PPTX
Presto@Netflix Presto Meetup 03-19-15
PPTX
The Fermilab HEPCloud Facility
PDF
Big problems Big Data, simple solutions
Data analytics at a petabyte scale final
Graph Computing with JanusGraph
Janus graph lookingbackwardreachingforward
Peter Bakas - Zero to Insights - Real time analytics with Kafka, C*, and Spar...
Guidelines for productive full stack data engineers
Powers of Ten Redux
Graph Processing with Titan and Scylla
AWS Finland meetup 2018 August
Start Flying with Python & Apache TinkerPop
Zillow's favorite big data & machine learning tools
Scalable Machine Learning
Torkel Ödegaard (Creator of Grafana) - Grafana at #DOXLON
Graph Computing with Apache TinkerPop
Presto@Netflix Presto Meetup 03-19-15
The Fermilab HEPCloud Facility
Big problems Big Data, simple solutions
Ad

Viewers also liked (20)

PDF
An Introduction to the Heatmap / Histogram Plugin
PDF
Beautiful Monitoring With Grafana and InfluxDB
PDF
Blue Cedar, Jersey - Redesign
PPTX
Introduction to Finch
PPTX
Developing leaders introduction 2015 2016
PPTX
Grafana datasource plugin
PPTX
Grafana optimization for Prometheus
PDF
[GREE Tech Talk#10] ネットワークの可視化
PDF
The Many Faces of Apache Kafka: Leveraging real-time data at scale
PDF
ETL Is Dead, Long-live Streams
PDF
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
ODP
From Config Management Sucks to #cfgmgmtlove
PDF
Rootconf
PDF
Tracxn - Enterprise Security Startup Landscape
PDF
Mesoscon 2015
PPTX
Prometheus on AWS
PPTX
Prometheus on AWS
PDF
Building Product from ground up using Open Source Technologies
PDF
Data science team, a practice to setup
PPTX
Send that (damn) elevator down !
An Introduction to the Heatmap / Histogram Plugin
Beautiful Monitoring With Grafana and InfluxDB
Blue Cedar, Jersey - Redesign
Introduction to Finch
Developing leaders introduction 2015 2016
Grafana datasource plugin
Grafana optimization for Prometheus
[GREE Tech Talk#10] ネットワークの可視化
The Many Faces of Apache Kafka: Leveraging real-time data at scale
ETL Is Dead, Long-live Streams
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
From Config Management Sucks to #cfgmgmtlove
Rootconf
Tracxn - Enterprise Security Startup Landscape
Mesoscon 2015
Prometheus on AWS
Prometheus on AWS
Building Product from ground up using Open Source Technologies
Data science team, a practice to setup
Send that (damn) elevator down !
Ad

Similar to Monitoring using Open source technologies (20)

PDF
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
PDF
Grafana overview deck - Tech - 2023 May v1.pdf
PPTX
Scaling Graphite At Yelp
PDF
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
PDF
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
PDF
Machine learning at Scale with Apache Spark
PDF
Logs, Metrics, traces and Mayhem - An Interactive Observability Adventure Wor...
PPTX
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
PPTX
GraphLab Conference 2014 Keynote - Carlos Guestrin
PDF
Time series data monitoring at 99acres.com
PDF
capitulando la keynote de GrafanaCON 2025 - Madrid
PDF
Winning the metrics battle
PDF
Big Data Berlin - Criteo
PDF
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
PDF
How to measure everything - a million metrics per second with minimal develop...
PDF
FastNetMon and Metrics
PDF
Rethinking metrics: metrics 2.0 @ Lisa 2014
PPTX
2013 06-03 berlin buzzwords
PPTX
2013.09.10 Giraph at London Hadoop Users Group
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Grafana overview deck - Tech - 2023 May v1.pdf
Scaling Graphite At Yelp
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Machine learning at Scale with Apache Spark
Logs, Metrics, traces and Mayhem - An Interactive Observability Adventure Wor...
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
GraphLab Conference 2014 Keynote - Carlos Guestrin
Time series data monitoring at 99acres.com
capitulando la keynote de GrafanaCON 2025 - Madrid
Winning the metrics battle
Big Data Berlin - Criteo
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
How to measure everything - a million metrics per second with minimal develop...
FastNetMon and Metrics
Rethinking metrics: metrics 2.0 @ Lisa 2014
2013 06-03 berlin buzzwords
2013.09.10 Giraph at London Hadoop Users Group

Recently uploaded (20)

PDF
Lecture1 pattern recognition............
PDF
Foundation of Data Science unit number two notes
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPT
Quality review (1)_presentation of this 21
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Database Infoormation System (DBIS).pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
.pdf is not working space design for the following data for the following dat...
Lecture1 pattern recognition............
Foundation of Data Science unit number two notes
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Business Acumen Training GuidePresentation.pptx
Clinical guidelines as a resource for EBP(1).pdf
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Quality review (1)_presentation of this 21
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Mega Projects Data Mega Projects Data
Database Infoormation System (DBIS).pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
climate analysis of Dhaka ,Banglades.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Introduction to machine learning and Linear Models
oil_refinery_comprehensive_20250804084928 (1).pptx
.pdf is not working space design for the following data for the following dat...

Monitoring using Open source technologies