SlideShare a Scribd company logo
Overprov:
A Tool for Cluster Overprovisioning Detection
Del Bao
Problem
ad_backend cpu.idle uswest2-prod
Problem (2)
bizfeed oldgen gc count a day
Problem (3)
generic cassandra byte_percentfree
what does the tool do?
Design Goals
• save cost in the long run
• based on simple rules
• eliminate false positive
• extensible
Code Structure
● run()
for cluster_name in clusters:
dt = detector.ClusterOverprovDetector(
product,
ecosystem,
cluster_name,
metric_list,
start,
stop,
signalfx_auth_token
dt.execute()
● metric_list
metric_list_cass = [
ModuleClass('overprov.analyzers.cpu_idle_analyzer', 'CpuIdleAnalyzer'),
ModuleClass('overprov.analyzers.cass_gc_count_analyzer', 'CassGcCountAnalyzer'),
ModuleClass('overprov.analyzers.cass_disk_free_analyzer', 'CassDiskFreeAnalyzer'),
]
You can extend it
• create your
own analyzer
• pass in your
start, stop day
Assumptions
• static check, so the daily/hourly resolution, e.g., p95 is
fine.
• cluster is almost well balanced, so take max/min across
cluster hosts in a region represents the entire cluster
What it’s Not
• Fleetmiser
– Instantaneous autoscale spot fleet for seagull
clusters
– a signal of 10 min interval
• Paasta
– similar to above, only for paasta service
Demo
• virtualenv_run/bin/overprov -p cassandra -c
ad_backend --start 60 --stop 30 -e prod -k ./api_token
• virtualenv_run/bin/overprov -p cassandra -c
ad_backend --start 60 --stop 30 -e prod -k ./api_token --
debug
• virtualenv_run/bin/overprov -p elasticsearch -c ads144 -
-start 60 --stop 30 -e prod -k ./api_token
Questions

More Related Content

PDF
Machine Learning - Introduction
PDF
Machine learning 2 - Neural Networks
PDF
PostgreSQL Meetup Berlin at Zalando HQ
PDF
The Ring programming language version 1.5.2 book - Part 59 of 181
PDF
Py ohio
PDF
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
PPTX
Amazon elastic map reduce
PPTX
Ac cuda c_5
Machine Learning - Introduction
Machine learning 2 - Neural Networks
PostgreSQL Meetup Berlin at Zalando HQ
The Ring programming language version 1.5.2 book - Part 59 of 181
Py ohio
RAPIDS: ускоряем Pandas и scikit-learn на GPU Павел Клеменков, NVidia
Amazon elastic map reduce
Ac cuda c_5

What's hot (20)

PDF
Google App Engine Developer - Day3
PPTX
利用Init connect做mysql clients stat 用户审计
PDF
orca_fosdem_FINAL
PDF
Parallel Computing with R
PDF
Deep dive into PostgreSQL statistics.
PDF
PostgreSQL: Advanced indexing
PDF
The Ring programming language version 1.5.3 book - Part 73 of 184
PDF
Deep dive into PostgreSQL statistics.
PDF
PostgreSQL: Data analysis and analytics
PDF
The Ring programming language version 1.7 book - Part 67 of 196
PDF
PostgreSQL: Joining 1 million tables
PDF
PostgreSQL 9.6 새 기능 소개
PPTX
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
PDF
The Ring programming language version 1.2 book - Part 43 of 84
PDF
Replication MongoDB Days 2013
PDF
ClickHouse Features for Advanced Users, by Aleksei Milovidov
PDF
The Ring programming language version 1.5.4 book - Part 62 of 185
PPT
Tricks
PDF
Troubleshooting PostgreSQL Streaming Replication
PDF
The Ring programming language version 1.6 book - Part 63 of 189
Google App Engine Developer - Day3
利用Init connect做mysql clients stat 用户审计
orca_fosdem_FINAL
Parallel Computing with R
Deep dive into PostgreSQL statistics.
PostgreSQL: Advanced indexing
The Ring programming language version 1.5.3 book - Part 73 of 184
Deep dive into PostgreSQL statistics.
PostgreSQL: Data analysis and analytics
The Ring programming language version 1.7 book - Part 67 of 196
PostgreSQL: Joining 1 million tables
PostgreSQL 9.6 새 기능 소개
MongoDB London 2013: Basic Replication in MongoDB presented by Marc Schwering...
The Ring programming language version 1.2 book - Part 43 of 84
Replication MongoDB Days 2013
ClickHouse Features for Advanced Users, by Aleksei Milovidov
The Ring programming language version 1.5.4 book - Part 62 of 185
Tricks
Troubleshooting PostgreSQL Streaming Replication
The Ring programming language version 1.6 book - Part 63 of 189
Ad

Viewers also liked (18)

PPTX
PPTX
PPTX
PPTX
Whole foods presentation
PPTX
Vezikülo büllöz lezyonlar
PPTX
Fossa temporali̇s, i̇nfratemporali̇ ve pterygopalati̇na
PPTX
Diş hekimliği pratiğinde İlaç etki̇leşi̇mleri̇
PPT
Kafatasi kemi̇kleri̇
PPT
Osteomyeli̇t
PPTX
Maksi̇llar si̇nüsi̇ti̇s
PPTX
Tme ve çi̇ğneme kaslari
PPTX
Kafa deri̇si̇ ve yüz mi̇mi̇k kaslari
PDF
Presupuesto para construccion
PPTX
Statistical Concepts Every PPC Specialist Should Know - Hero Conf London 2016
PPTX
PDF
PATEL MATERIAL HANDLING EQUIPMENT
ZIP
Foto fiera del rosario 2011
PPTX
Whole foods presentation
Vezikülo büllöz lezyonlar
Fossa temporali̇s, i̇nfratemporali̇ ve pterygopalati̇na
Diş hekimliği pratiğinde İlaç etki̇leşi̇mleri̇
Kafatasi kemi̇kleri̇
Osteomyeli̇t
Maksi̇llar si̇nüsi̇ti̇s
Tme ve çi̇ğneme kaslari
Kafa deri̇si̇ ve yüz mi̇mi̇k kaslari
Presupuesto para construccion
Statistical Concepts Every PPC Specialist Should Know - Hero Conf London 2016
PATEL MATERIAL HANDLING EQUIPMENT
Foto fiera del rosario 2011
Ad

Similar to Overprov a tool for cluster overprovisioning detection (20)

PPTX
Webinar: Performance Tuning + Optimization
PDF
Feature Engineering - Getting most out of data for predictive models - TDC 2017
PPTX
Static code analysis: what? how? why?
PPTX
Performance Tuning and Optimization
PPTX
DIY Java Profiling
PDF
Feature Engineering - Getting most out of data for predictive models
PDF
Масштабируемый и эффективный фаззинг Google Chrome
PPTX
Become a GC Hero
PDF
Datastax day 2016 : Cassandra data modeling basics
PDF
Why you should be using structured logs
PPTX
A miało być tak... bez wycieków
PDF
Customer Clustering For Retail Marketing
PPTX
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
PDF
Clojure ♥ cassandra
PPTX
this-is-garbage-talk-2022.pptx
PPTX
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
PPTX
Learning Predictive Modeling with TSA and Kaggle
PDF
Building source code level profiler for C++.pdf
PPT
Centricity EMRCPS_PNS_Troubleshooting
PPTX
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Webinar: Performance Tuning + Optimization
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Static code analysis: what? how? why?
Performance Tuning and Optimization
DIY Java Profiling
Feature Engineering - Getting most out of data for predictive models
Масштабируемый и эффективный фаззинг Google Chrome
Become a GC Hero
Datastax day 2016 : Cassandra data modeling basics
Why you should be using structured logs
A miało być tak... bez wycieków
Customer Clustering For Retail Marketing
CONFidence 2015: DTrace + OSX = Fun - Andrzej Dyjak
Clojure ♥ cassandra
this-is-garbage-talk-2022.pptx
EVERYTHING ABOUT STATIC CODE ANALYSIS FOR A JAVA PROGRAMMER
Learning Predictive Modeling with TSA and Kaggle
Building source code level profiler for C++.pdf
Centricity EMRCPS_PNS_Troubleshooting
Андрей Козлов (Altoros): Оптимизация производительности Cassandra

Recently uploaded (20)

PPT
Total quality management ppt for engineering students
PDF
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
PPT
introduction to datamining and warehousing
PPTX
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
PPTX
Artificial Intelligence
PDF
PPT on Performance Review to get promotions
PPTX
Current and future trends in Computer Vision.pptx
PDF
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
PDF
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
PDF
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
PDF
Visual Aids for Exploratory Data Analysis.pdf
PPTX
Safety Seminar civil to be ensured for safe working.
PDF
Soil Improvement Techniques Note - Rabbi
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PDF
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
PDF
Categorization of Factors Affecting Classification Algorithms Selection
PPTX
Fundamentals of Mechanical Engineering.pptx
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
Total quality management ppt for engineering students
SMART SIGNAL TIMING FOR URBAN INTERSECTIONS USING REAL-TIME VEHICLE DETECTI...
introduction to datamining and warehousing
6ME3A-Unit-II-Sensors and Actuators_Handouts.pptx
Artificial Intelligence
PPT on Performance Review to get promotions
Current and future trends in Computer Vision.pptx
EXPLORING LEARNING ENGAGEMENT FACTORS INFLUENCING BEHAVIORAL, COGNITIVE, AND ...
Level 2 – IBM Data and AI Fundamentals (1)_v1.1.PDF
Analyzing Impact of Pakistan Economic Corridor on Import and Export in Pakist...
COURSE DESCRIPTOR OF SURVEYING R24 SYLLABUS
Visual Aids for Exploratory Data Analysis.pdf
Safety Seminar civil to be ensured for safe working.
Soil Improvement Techniques Note - Rabbi
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Artificial Superintelligence (ASI) Alliance Vision Paper.pdf
PREDICTION OF DIABETES FROM ELECTRONIC HEALTH RECORDS
Categorization of Factors Affecting Classification Algorithms Selection
Fundamentals of Mechanical Engineering.pptx
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks

Overprov a tool for cluster overprovisioning detection

  • 1. Overprov: A Tool for Cluster Overprovisioning Detection Del Bao
  • 3. Problem (2) bizfeed oldgen gc count a day
  • 4. Problem (3) generic cassandra byte_percentfree
  • 5. what does the tool do?
  • 6. Design Goals • save cost in the long run • based on simple rules • eliminate false positive • extensible
  • 7. Code Structure ● run() for cluster_name in clusters: dt = detector.ClusterOverprovDetector( product, ecosystem, cluster_name, metric_list, start, stop, signalfx_auth_token dt.execute() ● metric_list metric_list_cass = [ ModuleClass('overprov.analyzers.cpu_idle_analyzer', 'CpuIdleAnalyzer'), ModuleClass('overprov.analyzers.cass_gc_count_analyzer', 'CassGcCountAnalyzer'), ModuleClass('overprov.analyzers.cass_disk_free_analyzer', 'CassDiskFreeAnalyzer'), ]
  • 8. You can extend it • create your own analyzer • pass in your start, stop day
  • 9. Assumptions • static check, so the daily/hourly resolution, e.g., p95 is fine. • cluster is almost well balanced, so take max/min across cluster hosts in a region represents the entire cluster
  • 10. What it’s Not • Fleetmiser – Instantaneous autoscale spot fleet for seagull clusters – a signal of 10 min interval • Paasta – similar to above, only for paasta service
  • 11. Demo • virtualenv_run/bin/overprov -p cassandra -c ad_backend --start 60 --stop 30 -e prod -k ./api_token • virtualenv_run/bin/overprov -p cassandra -c ad_backend --start 60 --stop 30 -e prod -k ./api_token -- debug • virtualenv_run/bin/overprov -p elasticsearch -c ads144 - -start 60 --stop 30 -e prod -k ./api_token

Editor's Notes

  • #7: 来张图配design
  • #8: def execute(self): """Run through all registered analyzers and print result tables each analyzer may generate data for only some regions due to lack of data, e.g., outage, signalflow issue, etc. to make it robust, we create region list on the fly and then assemble table data in another loop. if we are missing some region, the result will be None, which is also not overprovisioned """