Talk @ GrafanaCon 2016
Utkarsh
Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).
• An active contributor to Grafana.
• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmu
Email – utkarsh.bhatnagar@sony.com
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Hi, I am
Jack.
Requirements:
• 50,000 unique metrics from one source
• Data points every minute
• Roughly about 72 million data points per day
• Data retention 60 days
• User friendly UI with possible customization
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Mission accomplished!
1 metrics source
50,000 unique metrics
72 million data points per day
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data points per day
Team 2 Requirements:
• 400,000 unique metrics
• About 600 million data points per day
Team 3 Requirements:
• 500,000 unique metrics
• About 2 billion data points per day
Team 4 Requirements:
• 800,000 unique metrics
• About 5 billion data points per day
And more………
Talk @ GrafanaCon 2016
Should he continue with Graphite?
Should he ask to reduce metrics or datapoints?
How to dynamically scale Graphite?
Does Grafana support other datasources?
OpenTSDB / InfluxDB / KairosDB / Prometheus?
Support scaling Infrastructure to support variable load of metrics?
Challenges:
• Multiple teams
• Millions of unique metrics
• Above 10 billion data points a day
• Process 3 million logs every minute
and generate metrics
• Reprocessing of metrics and logs if
needed
• Provide real time monitoring for all
of the above using GRAFANA!
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
POC works for:
1 metrics source
50,000 unique metrics
72 million data points per day
Team 1 requirements:
1 metrics source
100,000 unique metrics
200 million data points per day
Talk @ GrafanaCon 2016
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
Talk @ GrafanaCon 2016
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
Talk @ GrafanaCon 2016
Clustering Graphite
CARBON
RELAY
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
CARBON CACHE
+ WHISPER +
GRAPHITE WEB
. . .
GRAPHITE WEB GRAPHITE WEB
LOAD
BALANCER
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Talk @ GrafanaCon 2016
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Talk @ GrafanaCon 2016
Team 2 requirements:
1 metrics source
500,000 unique metrics
2 billion data points per day
CR
G G G. . .
GW GW
LB
Talk @ GrafanaCon 2016
Team 1 Requirements:
• 100,000 unique metrics
• About 200 million data
points per day
Team 2 Requirements:
• 500,000 unique metrics
• About 2 billion data
points per day
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
And more………
Team 3 Requirements:
• 3 million logs a minute
• Generate metrics in real
time
Talk @ GrafanaCon 2016
Team 3 requirements:
Over 5000 log sources
3 million logs per minute
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Alerting
Application Metrics
- Apps using a stats library written by
Alexander Filipchik
(Principal Engineer @ PlayStation)
Custom metrics
- From other sources
• More than 4 million unique metrics supported
- creation and deletion happens all the time
• More than 11 billion data points written per day
- across all TSDBs
• Processing about 40 billion events per day
- logs and metrics events in near real time (within 30 seconds)
• More than 3000 requests per minute to Grafana dashboards
- around 7000 in during outages
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
(Subject to effort and time)
Talk @ GrafanaCon 2016
Alerting
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Talk @ GrafanaCon 2016
Sep, 21st 2015
Nov, 17th 2016
Grafana Pull Requests:
• Total - 144
• Accepted – 128
• Declined – 14
• Open - 2
https://utkarshcmu.github.io/wizzy/
• Prod , Stage and Dev installations of Grafana
• Move/Copy rows, panels from one dashboard to another
• Version control your dashboards
• Manage Grafana entities like orgs, etc via CLI
Talk @ GrafanaCon 2016
Utkarsh
Bhatnagar
• Senior Software Engineer @ Sony Interactive Entertainment (PlayStation).
• An active contributor to Grafana.
• Project initiator for wizzy – a user friendly CLI tool for GRAFANA
GitHub - https://github.com/utkarshcmu
Email – utkarsh.bhatnagar@sony.com

More Related Content

PPTX
Monitoring using Open source technologies
PPTX
Introducing wizzy - a CLI tool for Grafana
PPTX
Developing leaders introduction 2015 2016
PDF
An Introduction to the Heatmap / Histogram Plugin
PDF
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
PDF
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
PDF
Measure All the Things! - Austin Data Day 2014
PDF
Winning the metrics battle
Monitoring using Open source technologies
Introducing wizzy - a CLI tool for Grafana
Developing leaders introduction 2015 2016
An Introduction to the Heatmap / Histogram Plugin
Snap Telemetry Framework & Plugin Architecture at GrafanaCon 2016
Scalable Monitoring Using Apache Spark and Friends with Utkarsh Bhatnagar
Measure All the Things! - Austin Data Day 2014
Winning the metrics battle

Similar to Talk @ GrafanaCon 2016 (20)

PDF
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
PDF
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
PDF
Big Data Berlin - Criteo
PDF
Logs, Metrics, traces and Mayhem - An Interactive Observability Adventure Wor...
PDF
Machine learning at Scale with Apache Spark
PDF
Grafana overview deck - Tech - 2023 May v1.pdf
PPTX
GraphLab Conference 2014 Keynote - Carlos Guestrin
PPTX
Scaling Graphite At Yelp
PDF
Making Data Science Scalable - 5 Lessons Learned
PDF
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
PPTX
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
PDF
DATA @ NFLX (Tableau Conference 2014 Presentation)
PDF
The Analytics Frontier of the Hadoop Eco-System
PPTX
Correlate Log Data with Business Metrics Like a Jedi
PDF
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
PPTX
Big data analytics_7_giants_public_24_sep_2013
PDF
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
PPTX
2013 06-03 berlin buzzwords
PPTX
2013.09.10 Giraph at London Hadoop Users Group
PPTX
Optimizing spark based data pipelines - are you up for it?
OSMC 2022 | The Power of Metrics, Logs & Traces with Open Source by Emil-Andr...
From 6 hours to 1 minute... in 2 days! How we managed to stream our (long) Ha...
Big Data Berlin - Criteo
Logs, Metrics, traces and Mayhem - An Interactive Observability Adventure Wor...
Machine learning at Scale with Apache Spark
Grafana overview deck - Tech - 2023 May v1.pdf
GraphLab Conference 2014 Keynote - Carlos Guestrin
Scaling Graphite At Yelp
Making Data Science Scalable - 5 Lessons Learned
Александр Махомет "Beyond the code или как мониторить ваш PHP сайт"
Discover How IBM Uses InfluxDB and Grafana to Help Clients Monitor Large Prod...
DATA @ NFLX (Tableau Conference 2014 Presentation)
The Analytics Frontier of the Hadoop Eco-System
Correlate Log Data with Business Metrics Like a Jedi
Taking the Pain out of Data Science - RecSys Machine Learning Framework Over ...
Big data analytics_7_giants_public_24_sep_2013
OSDC 2014: Devdas Bhagat - Graphite: Graphs for the modern age
2013 06-03 berlin buzzwords
2013.09.10 Giraph at London Hadoop Users Group
Optimizing spark based data pipelines - are you up for it?
Ad

Recently uploaded (20)

PDF
_Nature and dynamics of communities and community development .pdf
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PDF
5_tips_to_become_a_Presentation_Jedi_@itseugenec.pdf
PPTX
Unit 8#Concept of teaching and learning.pptx
PPTX
Bob Difficult Questions 08 17 2025.pptx
PDF
Public speaking for kids in India - LearnifyU
PDF
Unnecessary information is required for the
PDF
6.-propertise of noble gases, uses and isolation in noble gases
PDF
IKS PPT.....................................
PPTX
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
PPTX
Copy- of-Lesson-6-Digestive-System.pptx
PDF
Yusen Logistics Group Sustainability Report 2024.pdf
PPTX
INDIGENOUS-LANGUAGES-AND-LITERATURE.pptx
DOCX
Action plan to easily understanding okey
PDF
Microsoft-365-Administrator-s-Guide_.pdf
DOCX
CLASS XII bbbbbnjhcvfyfhfyfyhPROJECT.docx
PPTX
Rakhi Presentation vbbrfferregergrgerg.pptx
PPTX
Kompem Part Untuk MK Komunikasi Pembangunan 5.pptx
PPTX
Lesson-7-Gas. -Exchange_074636.pptx
PPTX
Shizophrnia ppt for clinical psychology students of AS
_Nature and dynamics of communities and community development .pdf
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
5_tips_to_become_a_Presentation_Jedi_@itseugenec.pdf
Unit 8#Concept of teaching and learning.pptx
Bob Difficult Questions 08 17 2025.pptx
Public speaking for kids in India - LearnifyU
Unnecessary information is required for the
6.-propertise of noble gases, uses and isolation in noble gases
IKS PPT.....................................
Phylogeny and disease transmission of Dipteran Fly (ppt).pptx
Copy- of-Lesson-6-Digestive-System.pptx
Yusen Logistics Group Sustainability Report 2024.pdf
INDIGENOUS-LANGUAGES-AND-LITERATURE.pptx
Action plan to easily understanding okey
Microsoft-365-Administrator-s-Guide_.pdf
CLASS XII bbbbbnjhcvfyfhfyfyhPROJECT.docx
Rakhi Presentation vbbrfferregergrgerg.pptx
Kompem Part Untuk MK Komunikasi Pembangunan 5.pptx
Lesson-7-Gas. -Exchange_074636.pptx
Shizophrnia ppt for clinical psychology students of AS
Ad

Talk @ GrafanaCon 2016

  • 2. Utkarsh Bhatnagar • Senior Software Engineer @ Sony Interactive Entertainment (PlayStation). • An active contributor to Grafana. • Project initiator for wizzy – a user friendly CLI tool for GRAFANA GitHub - https://github.com/utkarshcmu Email – utkarsh.bhatnagar@sony.com
  • 6. Requirements: • 50,000 unique metrics from one source • Data points every minute • Roughly about 72 million data points per day • Data retention 60 days • User friendly UI with possible customization
  • 9. Mission accomplished! 1 metrics source 50,000 unique metrics 72 million data points per day
  • 10. Team 1 Requirements: • 100,000 unique metrics • About 200 million data points per day Team 2 Requirements: • 400,000 unique metrics • About 600 million data points per day Team 3 Requirements: • 500,000 unique metrics • About 2 billion data points per day Team 4 Requirements: • 800,000 unique metrics • About 5 billion data points per day And more………
  • 12. Should he continue with Graphite? Should he ask to reduce metrics or datapoints? How to dynamically scale Graphite? Does Grafana support other datasources? OpenTSDB / InfluxDB / KairosDB / Prometheus? Support scaling Infrastructure to support variable load of metrics? Challenges: • Multiple teams • Millions of unique metrics • Above 10 billion data points a day • Process 3 million logs every minute and generate metrics • Reprocessing of metrics and logs if needed • Provide real time monitoring for all of the above using GRAFANA!
  • 13. Team 1 Requirements: • 100,000 unique metrics • About 200 million data points per day Team 2 Requirements: • 500,000 unique metrics • About 2 billion data points per day Team 3 Requirements: • 3 million logs a minute • Generate metrics in real time And more……… Team 1 Requirements: • 100,000 unique metrics • About 200 million data points per day
  • 14. POC works for: 1 metrics source 50,000 unique metrics 72 million data points per day Team 1 requirements: 1 metrics source 100,000 unique metrics 200 million data points per day
  • 16. Team 1 Requirements: • 100,000 unique metrics • About 200 million data points per day Team 2 Requirements: • 500,000 unique metrics • About 2 billion data points per day Team 3 Requirements: • 3 million logs a minute • Generate metrics in real time And more……… Team 2 Requirements: • 500,000 unique metrics • About 2 billion data points per day
  • 17. Team 2 requirements: 1 metrics source 500,000 unique metrics 2 billion data points per day
  • 19. Team 2 requirements: 1 metrics source 500,000 unique metrics 2 billion data points per day
  • 21. Clustering Graphite CARBON RELAY CARBON CACHE + WHISPER + GRAPHITE WEB CARBON CACHE + WHISPER + GRAPHITE WEB CARBON CACHE + WHISPER + GRAPHITE WEB . . . GRAPHITE WEB GRAPHITE WEB LOAD BALANCER
  • 22. Team 2 requirements: 1 metrics source 500,000 unique metrics 2 billion data points per day CR G G G. . . GW GW LB
  • 24. Team 2 requirements: 1 metrics source 500,000 unique metrics 2 billion data points per day CR G G G. . . GW GW LB
  • 26. Team 2 requirements: 1 metrics source 500,000 unique metrics 2 billion data points per day CR G G G. . . GW GW LB
  • 28. Team 1 Requirements: • 100,000 unique metrics • About 200 million data points per day Team 2 Requirements: • 500,000 unique metrics • About 2 billion data points per day Team 3 Requirements: • 3 million logs a minute • Generate metrics in real time And more……… Team 3 Requirements: • 3 million logs a minute • Generate metrics in real time
  • 30. Team 3 requirements: Over 5000 log sources 3 million logs per minute
  • 34. Application Metrics - Apps using a stats library written by Alexander Filipchik (Principal Engineer @ PlayStation) Custom metrics - From other sources
  • 35. • More than 4 million unique metrics supported - creation and deletion happens all the time • More than 11 billion data points written per day - across all TSDBs • Processing about 40 billion events per day - logs and metrics events in near real time (within 30 seconds) • More than 3000 requests per minute to Grafana dashboards - around 7000 in during outages
  • 40. (Subject to effort and time)
  • 46. Sep, 21st 2015 Nov, 17th 2016 Grafana Pull Requests: • Total - 144 • Accepted – 128 • Declined – 14 • Open - 2
  • 48. • Prod , Stage and Dev installations of Grafana • Move/Copy rows, panels from one dashboard to another • Version control your dashboards • Manage Grafana entities like orgs, etc via CLI
  • 50. Utkarsh Bhatnagar • Senior Software Engineer @ Sony Interactive Entertainment (PlayStation). • An active contributor to Grafana. • Project initiator for wizzy – a user friendly CLI tool for GRAFANA GitHub - https://github.com/utkarshcmu Email – utkarsh.bhatnagar@sony.com