SlideShare a Scribd company logo
Fault Prediction using
Logistic Regression
Using Python 3.5
17-Mar-17 binayakdutta@gmail.com 1
Preface
 This deck illustrates the considerations and method for use of
Logistic Regression and analytics in general
 For the illustration, a hypotheticalWind turbine based electricity
generation system is considered along with its associated IT
systems
 The LearningAndTesting Data is mocked up. Range bound
random weights are added to mimic natural randomness in data
 Detailing the Statistics behind the models or the overall
data/analytics architecture is not the focus and hence saved for
later
 Python 3.5 along with its libraries (pandas, statsmodels and
matplotlib) is used for implementing the regression models
17-Mar-17 binayakdutta@gmail.com 2
Approach to
Modeling
Acquire, Prepare,
Analyze
Select & Develop
Model
Train,Test &
Evaluate Model
Implement
Model in Solution
Re-evaluate &
Re-train Model
17-Mar-17 binayakdutta@gmail.com 3
Problem
Statement &
Solution
Overview
WIND
TURBINES
SENSOR DATA
ASSET & WORK
MGMT. SYSTEM
WEATHER DATA
ASSET & INCIDENT
DATA
DATA PREP.
FOR
ANALYTICS
FAULT
PREDICTION
ANALYTICS
MIS DATA
PREVENTIVE
ALERTs
PREVENTIVE MAINTENANCE
INCIDENT & ACTION LOG
Problem : ForWind turbines, duration based maintenance policy does not
factor the dynamic operational stats. of assets and its ambient conditions.
Hence sometimes, the interventions are too early and too late
Solution : Preserve data for asset, its condition and fault history as well as
weather condition. Build analytic model to learn from past data and predict
failures
17-Mar-17 binayakdutta@gmail.com 4
Acquire,
Prepare,
Analyze
Data
In this context, lets assume that following information is available or can be
made available
 Equipment Identifier :Which will uniquely identify an instance of wind turbine
 EquipmentType : Classifying where wind turbine transmission is Geared or
Direct Drive
 Tower Height : Height of tower (in meters) of the wind turbines
 Historical Daily Status of Equipment (Operational /Failed). [To be predicted]
 Weather Data (Temperature andWind Speed) for each Equipment Location
can be made available from external sources
Further we will assume that
 All such data sets are available for historical dates and will be available for
future dates
For our exercise, we will take two sets of data
 January Data which will be used to train the model
(File name = Logit_Wind_Data_Train_File.csv )
 February Data which will be used to test the model
(File name = Logit_Wind_Data_Test_File.csv )
17-Mar-17 binayakdutta@gmail.com 5
Acquire,
Prepare,
Analyze
Data
Read and analyze available data points
17-Mar-17 binayakdutta@gmail.com 6
Acquire,
Prepare,
Analyze
Data
• OPS_STATUS, our dependent variable, can take only binary values. For most of
records, its UP ( 1 ) while for a small portion its DOWN ( 0 )
• TOWER HEIGHT is spread between 50, 60 and 70 meters
• BLADE RPM is spread between 0 to 30 revolutions per minute
• TEMPERATURE can fluctuate between 0-60 degrees Celsius
• WIND speed is varying between 0-30 meter per second for the given sample
17-Mar-17 binayakdutta@gmail.com 7
Prepare Data
& Develop
Model
Create dummy variable for categorical attribute
Setting up the predictor variables for prediction
Set up Logistic Regression Model
17-Mar-17 binayakdutta@gmail.com 8
Train,Test &
Evaluate
ModelWith
Training and
Test DataSets
• Lower z-stats and higher p-value for [TOWER-HT] indicates null-hypothesis
(suggesting that this predictor does not have significant influence on outcome)
• Higher Standard error for [INTERCEPT] indicates high variability in coefficient
(possibly suggesting other unidentified predictor variables)
• Standard error, z-stat and p-value of other predictors look healthy
17-Mar-17 binayakdutta@gmail.com 9
Train,Test &
Evaluate
ModelWith
Training and
Test DataSets
TYPE = GEARED
RPM of BLADE
TEMPERATURE
WIND SPEED INTERCEPT
OPS_STATUS close to 1predicting Up & Running
OPS_STATUS close to 0predicting potential failure
OPS_STATUS close to 0.5indicating indecisiveness
of prediction
As the evaluation suggested that [TOWER_HT] is inconsequential for
[OPS_STATUS], we will knock it off from “predictor” list and recalibrate the model
17-Mar-17 binayakdutta@gmail.com 10
AlternateTest
WithTest Data
 As it stands, our model is evaluated and tuned with help of the
training Data (Jan-17 data)
 The model seems to behave well to couple of sample test data
 As next step, lets see how our model (trained on Jan-17 data) predicts
faults for Feb-17
 Lets devise our own simple method to score the prediction. Since the
outcome is binary (Up/Down or 1/0), we will have below rule
 Predictions above 0.75 will be considered 1 (or Up/Running)
 Predictions below 0.25 will be considered 0 (or Down/Fault)
 Predictions between 0.25 and 0.75 will be considered indecisive
 If Prediction matchesActualOPS_STATUS, award one point (+1)
 If Prediction mismatchesActual, penalize one point (-1)
 If Prediction is indecisive, no points given or taken
 At last, sum the points and divide by sample size to obtain mean
 The worst case mean will be -1 (all wrong prediction) and best case
will be +1 (all correct prediction)
 Let us see the affinity of our model to best case (or worst case)
17-Mar-17 binayakdutta@gmail.com 11
AlternateTest
WithTest Data
Accuracy Score of 0.78 on our alternate test is not too bad ☺
17-Mar-17 binayakdutta@gmail.com 12
Implement &
Setup
Learning,
Tuning
and
Re-evaluation
Process
Analytics is about continuous learning and IS processes should
support this philosophy
 So while our model appears to be a good fit on train and test data,
throughout its lifecycle, it should be regularly fine tuned by
 Learning for bigger and newer data sets. Fit the model with Jan-17
plus Feb-17 data (as learning set) and test against Mar-17 data
 Look out for new predictor variables which may reduce Standard
Error of [INTERCEPT]
 Comparing prediction of other analytic models
 Accommodating changing business needs
17-Mar-17 binayakdutta@gmail.com 13
THANKYOU
As we close, below are some teasers to keep up the momentum
 Can Multiple Linear Regression be used instead ?
 What is difference between MLE (Maximum Likelihood
Estimation) and OLS (Ordinary Least Square) regression ?
 Instead of Logit, can Probit be used ?
17-Mar-17 binayakdutta@gmail.com 14

More Related Content

PPTX
MSA – Attribute ARR Test
PDF
Forecasting Techniques - Data Science SG
PPTX
Application of Principal Components Analysis in Quality Control Problem
PPT
13. Query Processing in DBMS
PPT
Conceptual modeling
PPT
Query optimization
PPTX
Data Structure Assignment help , Data Structure Online tutors
PDF
IMPL Data Analysis
MSA – Attribute ARR Test
Forecasting Techniques - Data Science SG
Application of Principal Components Analysis in Quality Control Problem
13. Query Processing in DBMS
Conceptual modeling
Query optimization
Data Structure Assignment help , Data Structure Online tutors
IMPL Data Analysis

What's hot (13)

PPTX
Query evaluation and optimization
PDF
How to understand and implement regression analysis
PDF
IRJET- Probability based Missing Value Imputation Method and its Analysis
PPT
Overview of query evaluation
PDF
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
PDF
Basic concepts of data structures and algorithms
PDF
Machine learning
PDF
A report on designing a model for improving CPU Scheduling by using Machine L...
PPTX
House Sale Price Prediction
PPTX
Test data generation
PDF
Rachit Mishra_stock prediction_report
PDF
Regularized Principal Component Analysis for Spatial Data
PPTX
Using an SQL coverage measurement for testing database application
Query evaluation and optimization
How to understand and implement regression analysis
IRJET- Probability based Missing Value Imputation Method and its Analysis
Overview of query evaluation
IRJET- Error Reduction in Data Prediction using Least Square Regression Method
Basic concepts of data structures and algorithms
Machine learning
A report on designing a model for improving CPU Scheduling by using Machine L...
House Sale Price Prediction
Test data generation
Rachit Mishra_stock prediction_report
Regularized Principal Component Analysis for Spatial Data
Using an SQL coverage measurement for testing database application
Ad

Viewers also liked (20)

PDF
World’s Best Airports According To Skytrax World Airport Awards
PDF
Logistic regression
PDF
2017 LDS Youth Theme Fireside Slides - "Don't Forget to A.S.K."
PDF
Charla control parental e IoT v2. Etek.ppsx
PPTX
Logistic regression
PDF
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
PDF
Building 5 star review apps with Xamarin Test Cloud
PPTX
PTPP4 - Marcas de tenis (Cristian Delgado)
PDF
Blood transfusion in obstetrics: evidence based approach
PPT
OS 5 SENTIDOS.
PPTX
Logistic regression
DOCX
O que-é-trabalho-1
PPT
Suntem egali slavand natura
PPT
Ne distrăm împreună
PPT
Logistic regression (blyth 2006) (simplified)
PPTX
Nume feminine
PPTX
PPT
Clase 2 para continuar
PDF
Intro to Classification: Logistic Regression & SVM
PPTX
Logistic regression with SPSS examples
World’s Best Airports According To Skytrax World Airport Awards
Logistic regression
2017 LDS Youth Theme Fireside Slides - "Don't Forget to A.S.K."
Charla control parental e IoT v2. Etek.ppsx
Logistic regression
Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics a...
Building 5 star review apps with Xamarin Test Cloud
PTPP4 - Marcas de tenis (Cristian Delgado)
Blood transfusion in obstetrics: evidence based approach
OS 5 SENTIDOS.
Logistic regression
O que-é-trabalho-1
Suntem egali slavand natura
Ne distrăm împreună
Logistic regression (blyth 2006) (simplified)
Nume feminine
Clase 2 para continuar
Intro to Classification: Logistic Regression & SVM
Logistic regression with SPSS examples
Ad

Similar to Fault prediction using logistic regression (Python) (20)

PPTX
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
PPT
Vldb14
PPTX
Machine Learning Foundations Project Presentation
PDF
IRJET- Deep Learning Model to Predict Hardware Performance
PDF
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
PDF
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
PPT
Chap_05_Data_Collection_and_Analysis.ppt
PPTX
Big Data Analytics
PDF
How to do accurate RE forecasting & scheduling
PDF
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
PDF
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
PDF
Network predictive analysis
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
PDF
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
PPTX
DSE-complete.pptx
PPTX
MSA – Gage R&R Test
PDF
Principal Component Analysis in Machine Learning.pdf
PDF
Nose Dive into Apache Spark ML
PDF
IRJET- Supervised Learning Classification Algorithms Comparison
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Vldb14
Machine Learning Foundations Project Presentation
IRJET- Deep Learning Model to Predict Hardware Performance
IRJET- Analysis of PV Fed Vector Controlled Induction Motor Drive
IRJET- Analysis of Crucial Oil Gas and Liquid Sensor Statistics and Productio...
Chap_05_Data_Collection_and_Analysis.ppt
Big Data Analytics
How to do accurate RE forecasting & scheduling
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
DA ST-1 SET-B-Solution.pdf we also provide the many type of solution
Network predictive analysis
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
THE IMPLICATION OF STATISTICAL ANALYSIS AND FEATURE ENGINEERING FOR MODEL BUI...
DSE-complete.pptx
MSA – Gage R&R Test
Principal Component Analysis in Machine Learning.pdf
Nose Dive into Apache Spark ML
IRJET- Supervised Learning Classification Algorithms Comparison

Recently uploaded (20)

PPTX
1_Introduction to advance data techniques.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
PDF
Introduction to Data Science and Data Analysis
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
Computer network topology notes for revision
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
1_Introduction to advance data techniques.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
Introduction to Data Science and Data Analysis
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Computer network topology notes for revision
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Reliability_Chapter_ presentation 1221.5784
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
[EN] Industrial Machine Downtime Prediction
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction-to-Cloud-ComputingFinal.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx

Fault prediction using logistic regression (Python)

  • 1. Fault Prediction using Logistic Regression Using Python 3.5 17-Mar-17 binayakdutta@gmail.com 1
  • 2. Preface  This deck illustrates the considerations and method for use of Logistic Regression and analytics in general  For the illustration, a hypotheticalWind turbine based electricity generation system is considered along with its associated IT systems  The LearningAndTesting Data is mocked up. Range bound random weights are added to mimic natural randomness in data  Detailing the Statistics behind the models or the overall data/analytics architecture is not the focus and hence saved for later  Python 3.5 along with its libraries (pandas, statsmodels and matplotlib) is used for implementing the regression models 17-Mar-17 binayakdutta@gmail.com 2
  • 3. Approach to Modeling Acquire, Prepare, Analyze Select & Develop Model Train,Test & Evaluate Model Implement Model in Solution Re-evaluate & Re-train Model 17-Mar-17 binayakdutta@gmail.com 3
  • 4. Problem Statement & Solution Overview WIND TURBINES SENSOR DATA ASSET & WORK MGMT. SYSTEM WEATHER DATA ASSET & INCIDENT DATA DATA PREP. FOR ANALYTICS FAULT PREDICTION ANALYTICS MIS DATA PREVENTIVE ALERTs PREVENTIVE MAINTENANCE INCIDENT & ACTION LOG Problem : ForWind turbines, duration based maintenance policy does not factor the dynamic operational stats. of assets and its ambient conditions. Hence sometimes, the interventions are too early and too late Solution : Preserve data for asset, its condition and fault history as well as weather condition. Build analytic model to learn from past data and predict failures 17-Mar-17 binayakdutta@gmail.com 4
  • 5. Acquire, Prepare, Analyze Data In this context, lets assume that following information is available or can be made available  Equipment Identifier :Which will uniquely identify an instance of wind turbine  EquipmentType : Classifying where wind turbine transmission is Geared or Direct Drive  Tower Height : Height of tower (in meters) of the wind turbines  Historical Daily Status of Equipment (Operational /Failed). [To be predicted]  Weather Data (Temperature andWind Speed) for each Equipment Location can be made available from external sources Further we will assume that  All such data sets are available for historical dates and will be available for future dates For our exercise, we will take two sets of data  January Data which will be used to train the model (File name = Logit_Wind_Data_Train_File.csv )  February Data which will be used to test the model (File name = Logit_Wind_Data_Test_File.csv ) 17-Mar-17 binayakdutta@gmail.com 5
  • 6. Acquire, Prepare, Analyze Data Read and analyze available data points 17-Mar-17 binayakdutta@gmail.com 6
  • 7. Acquire, Prepare, Analyze Data • OPS_STATUS, our dependent variable, can take only binary values. For most of records, its UP ( 1 ) while for a small portion its DOWN ( 0 ) • TOWER HEIGHT is spread between 50, 60 and 70 meters • BLADE RPM is spread between 0 to 30 revolutions per minute • TEMPERATURE can fluctuate between 0-60 degrees Celsius • WIND speed is varying between 0-30 meter per second for the given sample 17-Mar-17 binayakdutta@gmail.com 7
  • 8. Prepare Data & Develop Model Create dummy variable for categorical attribute Setting up the predictor variables for prediction Set up Logistic Regression Model 17-Mar-17 binayakdutta@gmail.com 8
  • 9. Train,Test & Evaluate ModelWith Training and Test DataSets • Lower z-stats and higher p-value for [TOWER-HT] indicates null-hypothesis (suggesting that this predictor does not have significant influence on outcome) • Higher Standard error for [INTERCEPT] indicates high variability in coefficient (possibly suggesting other unidentified predictor variables) • Standard error, z-stat and p-value of other predictors look healthy 17-Mar-17 binayakdutta@gmail.com 9
  • 10. Train,Test & Evaluate ModelWith Training and Test DataSets TYPE = GEARED RPM of BLADE TEMPERATURE WIND SPEED INTERCEPT OPS_STATUS close to 1predicting Up & Running OPS_STATUS close to 0predicting potential failure OPS_STATUS close to 0.5indicating indecisiveness of prediction As the evaluation suggested that [TOWER_HT] is inconsequential for [OPS_STATUS], we will knock it off from “predictor” list and recalibrate the model 17-Mar-17 binayakdutta@gmail.com 10
  • 11. AlternateTest WithTest Data  As it stands, our model is evaluated and tuned with help of the training Data (Jan-17 data)  The model seems to behave well to couple of sample test data  As next step, lets see how our model (trained on Jan-17 data) predicts faults for Feb-17  Lets devise our own simple method to score the prediction. Since the outcome is binary (Up/Down or 1/0), we will have below rule  Predictions above 0.75 will be considered 1 (or Up/Running)  Predictions below 0.25 will be considered 0 (or Down/Fault)  Predictions between 0.25 and 0.75 will be considered indecisive  If Prediction matchesActualOPS_STATUS, award one point (+1)  If Prediction mismatchesActual, penalize one point (-1)  If Prediction is indecisive, no points given or taken  At last, sum the points and divide by sample size to obtain mean  The worst case mean will be -1 (all wrong prediction) and best case will be +1 (all correct prediction)  Let us see the affinity of our model to best case (or worst case) 17-Mar-17 binayakdutta@gmail.com 11
  • 12. AlternateTest WithTest Data Accuracy Score of 0.78 on our alternate test is not too bad ☺ 17-Mar-17 binayakdutta@gmail.com 12
  • 13. Implement & Setup Learning, Tuning and Re-evaluation Process Analytics is about continuous learning and IS processes should support this philosophy  So while our model appears to be a good fit on train and test data, throughout its lifecycle, it should be regularly fine tuned by  Learning for bigger and newer data sets. Fit the model with Jan-17 plus Feb-17 data (as learning set) and test against Mar-17 data  Look out for new predictor variables which may reduce Standard Error of [INTERCEPT]  Comparing prediction of other analytic models  Accommodating changing business needs 17-Mar-17 binayakdutta@gmail.com 13
  • 14. THANKYOU As we close, below are some teasers to keep up the momentum  Can Multiple Linear Regression be used instead ?  What is difference between MLE (Maximum Likelihood Estimation) and OLS (Ordinary Least Square) regression ?  Instead of Logit, can Probit be used ? 17-Mar-17 binayakdutta@gmail.com 14