SlideShare a Scribd company logo
Data Analysis
  Nicholas Scott

  nscott@nagios.com
Disclaimer


   Math may occur later.


   I apologize in advance.




                        2012   2
Abstract


   Introduction
   Capacity Planning Component
           Features
           Different Forecasting Methods
                 When to use
   RRD Analysis Tool
           Statistics Pillow Talk




                               2012        3
Introduction


   Nagios Data Gathering Attributes
        SO MUCH DATA (TOO MUCH?)
        Generally noisy
   Sources usually not simple
        How many factors are affecting service X on a
         given host Y?
        We have data showing X is like this but why?




                          2012                          4
Capacity Planning Terminology


   Residuals – Variation that exists after fitting
   Period – A frame of time where a pattern cycles
   through a complete iteration
   Example:




                          2012                       5
Capacity Planning

/home/nscott/Documents/NWC Presentations/DataAnalytics/capacityplanning/capacityplanning.mp4




                                           2012                                         6
Capacity Planning


   Holt-Winters
        Great next-step forecasting for complex
         systems




                          2012                    7
Capacity Planning


   Gets Dicey for anything more, tradeoffs




                        2012                 8
Capacity Planning


   Least Squares
        Better for simple trending, obviously
        Finds trend line that minimizes the sum of the
          residuals squared
        Less computationally expensive than HW




                          2012                           9
Capacity Planning


   Good choice for noisy data
   Possible future mean value




                       2012     10
Capacity Planning


   Linear Algebra is fun
   Linear Algebra is grindy
   Linear Algebra is a great way to really think
   about algorithms
   RRD Python abstraction class is available




                           2012                    11
Capacity Planning


   Quadratic/Cubic Fit
   Naive Experimental
   Fits a polynomial of given order to data




                         2012                 12
Capacity Planning


   For quadratic or cubic datasets
   User decision




                        2012         13
RRD Analysis Tool


   Goals
       General stats, mean, variance, etc
       Also do derivatives, multiple order derivatives
       Bivariate correlation


   Dependencies:
       Python >= 2.4
       numpy, rrdtool, scipy, matplotlib, mako



                          2012                           14
RRD Analysis Tool


   Example running of this thing:
   ./analyze.py -H localhost -S Current_Load -s




                        2012                      15
RRD Analysis Tool


   Why do you want to smooth your stuff?
        Noise noise noise
        Comedy Option: Pretty graphs


   Mean
   Stddev
   Variance



                            2012           16
RRD Analysis Tool


   Derivatives                     Δx
        Quick refresher:
                                   Δy
   Actual form we'll use:


         y t − y t−1   y t − yt −1
                     =
         t t −t t−1 RRD Resolution


                            2012        17
RRD Analysis Tool


   Uses?


   Relateable to physics?
        Position
        Velocity
        Acceleration
        Jerk (seriously)




                           2012   18
RRD Analysis Tool


   Example, first derivative on CPU Load:
   analyze.py -H localhost -S Current_Load -d 1




                        2012                      19
RRD Analysis Tool


   Direct use case?




   Back to bytes/sec




                       2012   20
RRD Analysis Tool


   Second derivative (acceleration)
   analyze.py -H localhost -S Root_Partition -d 1,2




                        2012                          21
RRD Analysis Tool


   Bivariate Analysis
        Compare two possibly related variables
        Define a relationship
        Graph them on the same graph
        Find Pearson's Correlation Coefficient




                          2012                   22
RRD Analysis Tool


   Example:
   analyze.py -H localhost,localhost -S _HOST_,PING




                              2012                    23
RRD Analysis Tool


   Example:
   analyze.py -H localhost,localhost -S HTTP,Current_Load




                              2012                          24
RRD Analysis Tool


   Example:
   analyze.py -H localhost,localhost -S Current_Load,Root_Partition




                                         2012                         25

More Related Content

PDF
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
ODP
DGraph: Introduction To Basics & Quick Start W/Ratel
PDF
Dgraph: Graph database for production environment
PPTX
Introduction to DGraph - A Graph Database
PDF
TPC-H analytics' scenarios and performances on Hadoop data clouds
PDF
Hadoop ensma poitiers
PDF
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
PPTX
Introduction to HADOOP
Overloading in Overdrive: A Generic Data-Centric Messaging Library for DDS
DGraph: Introduction To Basics & Quick Start W/Ratel
Dgraph: Graph database for production environment
Introduction to DGraph - A Graph Database
TPC-H analytics' scenarios and performances on Hadoop data clouds
Hadoop ensma poitiers
FIWARE Global Summit - IDS Implementation with FIWARE Software Components
Introduction to HADOOP

What's hot (6)

PDF
Graph Gurus Episode 1: Enterprise Graph
PPTX
GraphQL & DGraph with Go
PDF
ER 2016 Tutorial
PDF
Resilient Distributed Datasets
PDF
ISNCC 2017
PPTX
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Graph Gurus Episode 1: Enterprise Graph
GraphQL & DGraph with Go
ER 2016 Tutorial
Resilient Distributed Datasets
ISNCC 2017
Interoperability with netCDF-4 - Experience with NPP and HDF-EOS5 products
Ad

Similar to Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios (20)

PDF
2015 03-28-eb-final
PDF
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
PDF
Distributed Data Analysis with Hadoop and R - OSCON 2011
PDF
An Analytics Toolkit Tour
PDF
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
PDF
Getting started with R & Hadoop
PDF
Running R on Hadoop - CHUG - 20120815
ODP
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
PDF
Big Data Analysis Starts with R
PPTX
Unlocking value in your (big) data
PDF
Migrating from matlab to python
PDF
Présentation on radoop
PDF
Python for Financial Data Analysis with pandas
PDF
Slides 111017220255-phpapp01
PDF
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
PPTX
Growing Intelligence by Properly Storing and Mining Call Center Data
PPTX
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
PPTX
The Powerful Marriage of Hadoop and R (David Champagne)
PPTX
Data analysis using python in Jupyter notebook.pptx
PPTX
IQSS Presentation to Program in Health Policy
2015 03-28-eb-final
Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)
Distributed Data Analysis with Hadoop and R - OSCON 2011
An Analytics Toolkit Tour
Distributed Data Analysis with Hadoop and R - Strangeloop 2011
Getting started with R & Hadoop
Running R on Hadoop - CHUG - 20120815
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Big Data Analysis Starts with R
Unlocking value in your (big) data
Migrating from matlab to python
Présentation on radoop
Python for Financial Data Analysis with pandas
Slides 111017220255-phpapp01
Turbo-Charge Your Analytics with IBM Netezza and Revolution R Enterprise: A S...
Growing Intelligence by Properly Storing and Mining Call Center Data
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
The Powerful Marriage of Hadoop and R (David Champagne)
Data analysis using python in Jupyter notebook.pptx
IQSS Presentation to Program in Health Policy
Ad

More from Nagios (20)

PPTX
Nagios XI Best Practices
PDF
Jesse Olson - Nagios Log Server Architecture Overview
PDF
Trevor McDonald - Nagios XI Under The Hood
PDF
Sean Falzon - Nagios - Resilient Notifications
PDF
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
PDF
Janice Singh - Writing Custom Nagios Plugins
PDF
Dave Williams - Nagios Log Server - Practical Experience
PDF
Mike Weber - Nagios and Group Deployment of Service Checks
PDF
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
PDF
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
PDF
Matt Bruzek - Monitoring Your Public Cloud With Nagios
PDF
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
PDF
Eric Loyd - Fractal Nagios
PDF
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
PDF
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
PPTX
Nagios World Conference 2015 - Scott Wilkerson Opening
PDF
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
PDF
Nagios Log Server - Features
PDF
Nagios Network Analyzer - Features
PPTX
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios
Nagios XI Best Practices
Jesse Olson - Nagios Log Server Architecture Overview
Trevor McDonald - Nagios XI Under The Hood
Sean Falzon - Nagios - Resilient Notifications
Marcus Rochelle - Landis+Gyr - Monitoring with Nagios Enterprise Edition
Janice Singh - Writing Custom Nagios Plugins
Dave Williams - Nagios Log Server - Practical Experience
Mike Weber - Nagios and Group Deployment of Service Checks
Mike Guthrie - Revamping Your 10 Year Old Nagios Installation
Bryan Heden - Agile Networks - Using Nagios XI as the platform for Monitoring...
Matt Bruzek - Monitoring Your Public Cloud With Nagios
Lee Myers - What To Do When Nagios Notification Don't Meet Your Needs.
Eric Loyd - Fractal Nagios
Marcelo Perazolo, Lead Software Architect, IBM Corporation - Monitoring a Pow...
Thomas Schmainda - Tracking Boeing Satellites With Nagios - Nagios World Conf...
Nagios World Conference 2015 - Scott Wilkerson Opening
Nrpe - Nagios Remote Plugin Executor. NRPE plugin for Nagios Core
Nagios Log Server - Features
Nagios Network Analyzer - Features
Nagios Conference 2014 - Dorance Martinez Cortes - Customizing Nagios

Recently uploaded (20)

PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
cuic standard and advanced reporting.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Getting Started with Data Integration: FME Form 101
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Machine Learning_overview_presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Electronic commerce courselecture one. Pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Encapsulation_ Review paper, used for researhc scholars
Building Integrated photovoltaic BIPV_UPV.pdf
Unlocking AI with Model Context Protocol (MCP)
A comparative analysis of optical character recognition models for extracting...
Programs and apps: productivity, graphics, security and other tools
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation theory and applications.pdf
Spectral efficient network and resource selection model in 5G networks
cuic standard and advanced reporting.pdf
MYSQL Presentation for SQL database connectivity
Getting Started with Data Integration: FME Form 101
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine Learning_overview_presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
Network Security Unit 5.pdf for BCA BBA.
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf

Nagios Conference 2012 - Nicholas Scott - Advanced Data Analytics For Nagios

  • 1. Data Analysis Nicholas Scott nscott@nagios.com
  • 2. Disclaimer Math may occur later. I apologize in advance. 2012 2
  • 3. Abstract Introduction Capacity Planning Component Features Different Forecasting Methods When to use RRD Analysis Tool Statistics Pillow Talk 2012 3
  • 4. Introduction Nagios Data Gathering Attributes SO MUCH DATA (TOO MUCH?) Generally noisy Sources usually not simple How many factors are affecting service X on a given host Y? We have data showing X is like this but why? 2012 4
  • 5. Capacity Planning Terminology Residuals – Variation that exists after fitting Period – A frame of time where a pattern cycles through a complete iteration Example: 2012 5
  • 7. Capacity Planning Holt-Winters Great next-step forecasting for complex systems 2012 7
  • 8. Capacity Planning Gets Dicey for anything more, tradeoffs 2012 8
  • 9. Capacity Planning Least Squares Better for simple trending, obviously Finds trend line that minimizes the sum of the residuals squared Less computationally expensive than HW 2012 9
  • 10. Capacity Planning Good choice for noisy data Possible future mean value 2012 10
  • 11. Capacity Planning Linear Algebra is fun Linear Algebra is grindy Linear Algebra is a great way to really think about algorithms RRD Python abstraction class is available 2012 11
  • 12. Capacity Planning Quadratic/Cubic Fit Naive Experimental Fits a polynomial of given order to data 2012 12
  • 13. Capacity Planning For quadratic or cubic datasets User decision 2012 13
  • 14. RRD Analysis Tool Goals General stats, mean, variance, etc Also do derivatives, multiple order derivatives Bivariate correlation Dependencies: Python >= 2.4 numpy, rrdtool, scipy, matplotlib, mako 2012 14
  • 15. RRD Analysis Tool Example running of this thing: ./analyze.py -H localhost -S Current_Load -s 2012 15
  • 16. RRD Analysis Tool Why do you want to smooth your stuff? Noise noise noise Comedy Option: Pretty graphs Mean Stddev Variance 2012 16
  • 17. RRD Analysis Tool Derivatives Δx Quick refresher: Δy Actual form we'll use: y t − y t−1 y t − yt −1 = t t −t t−1 RRD Resolution 2012 17
  • 18. RRD Analysis Tool Uses? Relateable to physics? Position Velocity Acceleration Jerk (seriously) 2012 18
  • 19. RRD Analysis Tool Example, first derivative on CPU Load: analyze.py -H localhost -S Current_Load -d 1 2012 19
  • 20. RRD Analysis Tool Direct use case? Back to bytes/sec 2012 20
  • 21. RRD Analysis Tool Second derivative (acceleration) analyze.py -H localhost -S Root_Partition -d 1,2 2012 21
  • 22. RRD Analysis Tool Bivariate Analysis Compare two possibly related variables Define a relationship Graph them on the same graph Find Pearson's Correlation Coefficient 2012 22
  • 23. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S _HOST_,PING 2012 23
  • 24. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S HTTP,Current_Load 2012 24
  • 25. RRD Analysis Tool Example: analyze.py -H localhost,localhost -S Current_Load,Root_Partition 2012 25

Editor's Notes

  • #3: Try to keep this applicable to real life, as this is the Nagios world conference, I just like the math portion of it Looking for hardcore application, Wittenberg is presenting right now and its very applicative However, I will foray into implementation a bit, and since I like programming to some tips on what I learned when implementing these Statistics, I like it, perhaps some things I overlooked Haile story
  • #4: Cover the new CP component for Nagios XI - Some of the features, dates, extrapolation, RRD data validity exclusions - sprinkled with the how and why behind whats going on RRD Data Analysis tool - Derivatives, Bivariate comparisons, correlation - Free, I put it together for fun contact me if you want it, want to use it in a project or personal use, whatevs
  • #5: Nagios collects data at 5 minutes, and, god help us, our uptime... Each service is a complex function, how would you write a function to represent all factors that affect the services perfdata? After thinking about that? Are you sure? Financial sectors deals with this everyday Goal is to make this data usable, heart of forecasting and analysis, understand the numbers better, seems abstract at first, and takes time
  • #6: The capacity planning component was designed so that you don't have to know much to get a some forecasting going
  • #7: Periods: Time where a pattern may repeat itself Extrap is limited to 4 * period Methods: A few more are in development, but the current set is a 'good start' All are self-projecting, rather than cause-and-effect
  • #8: Without going through the forumula, well kind of Smoothed value – exponentially weighted Trend value - Represents variations of the time series that happen at a lower frequency Seasonal Value Represent items that occur across trends, could be a construed as the trend of the trend Calculates initial trend by: Split the two known periods, calculate trend by summing second period_t – first period_t, divide by L, then divide that sum by L,
  • #9: Feeds back on itself, if the difference from period 1 to period 2 contained some strange outlier, it will be represented, and exaggerated in next steps However, there is something satisfying about having a somewhat educated guess as to what a stat is going to be in several weeks/months Which is a shortcoming of holt winters, outliers can destroy it Smoothing may be necessary or preferred, not currently implemented, on todo list for future release, presents own issues, Would like to discuss implementation as its fascinating, but we'll move on as its also time consuming
  • #10: Should not be used to predict future values, but to predict future direction Should be treated as more of a “this should be around this level at this time.” Will however be wrong if dealing with an exponential or quadratic dataset, wouldn't be noticeable if extrapolation period was short enough however, eg derivatives.
  • #11: Good for noisy data as it is mean only as a trender Actual graph line shows where the least squared of the residuals will be in the future Aside: Fun to implement. If you're interested in Linear Algebra you'll have a blast.
  • #12: Do it if you like Linear Algebra, or just want to hone youre programming prowess, doing any sort of matrix operations will make you better at algorithms. Don't look for pot of gold at the end, its hard to do clever stuff that severely reduces time complexity of basic matrix operations RRD abstraction class is avaiable through the stats thing I wrote about, makes it take less thought on getting info out of the RRD
  • #13: Much like least squares, fits polynomial to have the minimum sum of the squared residuals Gears more towards items where you would expect exponential growth Given thats its for exponential growth, can be very touch, the more data you have to compare with, the better it will be, which goes for every one of these, but this one in particular
  • #14: Once again, this is for anticpated exponential datasets User decision, are you expecting quadratic or cubic growth or decay, or want to plan for it?
  • #15: Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup If you want to use it, or help develop it, feel free
  • #16: Looking to take a crack at some general stat data with an eye on nagios Analysis stuff has been around a while, just looking to make something specific to Nagios and RRDs Take a look at what these definitions actually mean to a network operation, or the usual nagios setup
  • #17: Weird random stuff happens, and this weird random stuff throws off statistical analysis, kind of strange if you think about it philosophically, however this isn't philosophy, this is math, there are rules Would you have wanted that spke to 5 to register as a critical? That speaks to the noise, as we'll see when we go into the derivatives Stdev – helps to understand the outliers and for setting up normal distributions for calculating the odds of what future values may be Variance – Can help identify multiplicative trend when mean and variance are increasing with some period
  • #18: Our use case is thatx = RRD data with the y being the time value those values occured. Since we're not in math class, no need to do this as h approaches 0 business This actually makes our job pretty easy, obviously we'll need a y_t-1 value, which we'll just leave as 0 as we
  • #19: Everyday. Every single time you see a Bytes/Sec reading, thats a delta, and thats all this is trying to do Why is the current byte count useless to us? Do our brains not keep its state? Probably, can we apply that other metrics? Would it be useful? When would it not be useful? Bytes per second is always increasing, CPU load is not Can we relate this to physics, if we can we can use their entire wealth of information, however the nature may be different
  • #20: Do you care what the rate of change is of your CPU load per 300 seconds? What does the mean actually symbolize here? Or any of them Interpret: Mean – The CPU load was slowly growing Max – magnitude of the highest rate of positive increase, and we can see the time that it happened, not when it peaked, but when it started its rise to it Min – Same thing
  • #21: Root partition on Nagios test box, obviously a very active nagios box Obviously not an active hard drive and these values are nothing to worry about Keep in mind peaks of actual bytes happen when the derivative is going from pos -> neg at zero. Helps isolate actual times of events.
  • #22: Now we get back to the second derivative, which if you remember is similar to the acceleration How fast was the rate of change changing? What does this mean? At zero the velocity is at its local max/min Cycle is back as far as timing goes d(d(cos)) F = ma, is there something we could assign to be m, F? Might show relative magnitude of impulse
  • #23: Correlation We have all these services/hosts, are they related? We can postulate, but we don't know for sure If there are lags we woudn't really know, but lets start simple Graph em Find Pearsons
  • #24: We can see that there is definitely a relationship, two different checks that are checking local ping, but are getting slightly different results Transcends that though We can imagine a line on that graph that would do a pretty good job of representing those points 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong
  • #25: Hard to pull the relationship out of this graph R shows a medium NEGATIVE correlation, meaning that when one goes up, the other goes down Would've been hard to pull that out without a little help 0 - .09 : None .1 - .3 : Small .3 - .5 : Medium Else Strong
  • #26: Shows an example of no, or very weak correlation