SlideShare a Scribd company logo
Predictive Analytics
Dr. Brian ANG
Senior Lecturer and Consultant
Data Science
brian_ang@nus.edu.sg
#ISSLearningFest 1
© 2022 National University of Singapore. All Rights Reserved
What is Predictive Analytics?
#ISSLearningFest
Higher
Profit
Cost
Savings
Better
Resource
Allocation
Better
Efficiency
Predict or forecast
future trends or events,
or the likelihood of an
event happening
Predictive
Predictive
Analyse currently available
data using computational
approaches
Analytics
Analytics
To predict or forecast future trends and events based on
currently available data.
To predict or forecast future trends and events based on
currently available data.
2
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Example Applications of Predictive Analytics
Medical Finance Marketing
Sales Forecast
Predictive
Maintenance
Environmental
Prediction
Icons in this slide deck are from Flaticon.com
3
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Stages of Predictive Analytics Model Development
Business Objectives and Problem
Statement Identification
Data Collection, Exploration and
Preparation
Model Development & Testing
Model Deployment
Model Monitoring & Maintenance
4
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Stages of Predictive Analytics Model Development
Business Objectives and Problem
Statement Identification
• Organisations have to identify the need of the predictive analytics model.
This would be more user driven.
• Identify the different stakeholders involved and how the predictive analytics
model will affect them.
• Have to consider cost versus benefit of the model adoption
5
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Identifying the Stakeholders
Who are the stakeholders?
Anyone who has an interest or is affected by the Predictive Analytics project.
Internal stakeholders
• Project team
• Project sponsors
• Approval authorities/management
• Supporting departments
External stakeholders
• Vendors
• External clients
• Other organisations
6
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Cost versus Benefit Analysis of Predictive
Analytics Models
Cost in terms of, e.g.,
- Infrastructure
- Manpower
- Maintenance
Benefits in terms of, e.g.,
- Cost savings & efficiency due to better resource allocation using
predictive analytics.
- Increase in profit due to knowing better which factor contributes to sales
7
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Data
Collection
Data
Processing
Training and
Testing Data Split
Data Exploration &
Analysis
Data Collection, Exploration and
Preparation
Stages of Predictive Analytics Model Development
8
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Data Collection & Sources of Data
Data
Collection
Data
Processing
Training and
Testing Data Split
Data Exploration &
Analysis
Data Collection, Exploration and
Preparation
Origins
• Within the department/organisation
• External (affiliated) organisations
• Engage vendors for data collection
• Open source data
• Local and overseas sources
9
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Data Exploration & Pre-Processing
Data
Collection
Data
Processing
Training and
Testing Data Split
Data Exploration &
Analysis
Data Collection, Exploration and
Preparation
• Check whether there are missing data, outliers, erroneous data, etc.
• Perform data pre-processing to transform data into a form that can be
used for model training.
• Current data or new data collected may not be ready for model
training. E.g., the correct features or attributes need to be extracted
and put into the table columns and rows.
10
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Training & Testing Data Split
A data set can be divided into the following components:
• Training/Development Dataset
Used for development of the model during the training phase
• Testing/Validation Dataset (hold-out dataset)
Used to evaluate how well a model performs on unseen data
Data
Collection
Data
Processing
Training and
Testing Data Split
Data Exploration &
Analysis
Data Collection, Exploration and
Preparation
11
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Training & Testing Data Split
Data Set
Training Testing
Cross-validation
12
Repeat this N times
Present the results as the average of
the N runs and with the standard
deviation.
Random selection
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Stages of Predictive Analytics Model Development
Model
Development
Training
Data
Testing Data
Prediction Output
Testing Data
Model Development & Testing
Accepted model should perform well on both the training and testing
datasets
Proposed
Model
Proposed Model Accepted Model
13
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Predictive Analytics Model Examples
• To predict numeric quantities
• E.g., predict revenue based on
marketing expenditure, car sales
based on car features.
Regression
14
• Predict categorical quantities
• E.g., predict whether a customer will buy a
product or not. Among a few diseases,
which disease is a patient likely to contract.
Classification
• Predict future quantities based previous
trend
• E.g., forecast next few months
temperature based on historical data.
Forecasting
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Regression Model Examples
Fit a straight line for a given set of points
y=bo+b1x1+b2x2+b3x3 +b4x4 +b5x5 + e
y=bo+b1x1+b2𝑥 + e
Simple Linear Regression Model Quadratic regression model
Multiple Linear Regression model
15
y=bo+b1x1+ e
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
10 30 50 70
Healthcare
Cost
Age
Predicted Value
Actual Value
Residual
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Classification Model Examples
Image from:
https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
Image from: https://en.wikipedia.org/wiki/Random_forest
16
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Forecasting Model Examples
Auto-Regressive Integrated Moving Average (ARIMA)
or the
Seasonal ARIMA models
(p,d,q) (P,D,Q)s
Seasonal Component
Non-Seasonal Component
17
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Evaluation of Regression & Time Series Models
𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 𝑅𝑀𝑆𝐸
∑ 𝑒
𝑛
𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑀𝐴𝑃𝐷
100%
𝑛
|
𝑒
𝑦
|
Error: 𝑒 𝑦 𝑦
18
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Evaluation of Classification Models
𝑛
• Accuracy =
𝑐
× 100%
• Is accuracy the only evaluation metric?
Where
- c is the total number of correctly classified samples
- n is the total number of samples
19
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Confusion Matrix
Predicted Values
Negative Positive
Actual
Values
Negative
Positive
We can further analyze the model performance by breaking down the results.
Consider a Binary Classification Problem
20
True Negative (TN)
True Positive (TP)
False Negative (FN)
False Positive (FP)
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Confusion Matrix
Predicted Values
Negative Positive
Actual
Values
Negative True Negative (TN) False Positive (FP)
Positive False Negative (FN) True Positive (TP)
- Accuracy = (TP+TN)/(TP+TN+FP+FN) %
- Specificity = TN/(TN+FP)
Example: Percentage of patients correctly predicted as not having a certain disease, or
percentage of transactions correctly predicted as not fraud.
- Sensitivity = TP/(TP+FN)
Example: Percentage of patients correctly predicted as having certain disease, or
percentage of transactions correctly predicted as fraud.
21
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Try it Out
Predicted Values
Negative Positive
Actual
Values
Negative 765 55
Positive 154 26
Accuracy = (765+26)/1000 = 79.1% Accuracy = (605+138)/1000 = 74.3%
Specificity = 765/(765+55) = 0.933
Sensitivity= 26/(26+154) = 0.14
Specificity = 605/(605+215) = 0.738
Sensitivity = 138/(42+138) = 0.767
Model 1 Model 2
Predicted Values
Negative Positive
Actual
Values
Negative 605 215
Positive 42 138
22
One may be more interested in sensitivity, e.g., in identifying patients who are
going to get a certain disease or a transaction being a fraud.
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Selecting the Best Model
Accuracy Model 1a Model 1b Model2a Model2b Model3
Training 80.5 (± 0.3) 83.5 (± 2.3) 82.5 (± 1.35) 81.5 (± 0.3) 83.2 (± 1.3)
Testing 77.8 (± 0.25) 78.5 (± 3.8) 79.8 (± 0.22) 75.8 (± 0.15) 80.7 (± 0.28)
• One may try different models
• Same model with different hyper-parameters
• Need to compare across the various models before choosing the best model
23
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Model Deployment
Model Deployment
24
Consideration examples:
• Communication plans to staff or users of the analytics model, timeline
and action items for the deployment.
• Which teams are involved in the deployment? Are the various teams
aware and sufficiently engaged?
• When, where and how to deploy the model?
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Model Monitoring and Maintenance
Model Monitoring & Maintenance
• After the model is deployed, the model has to be monitored to
ensure that it is working the way it is intended.
• It needs to be maintained so that it is updated and relevant.
• New data may be added to the older data (some cases but not
always) to retrain the whole model
• Some models allow incremental training, i.e., do not need to
retrain the whole model.
25
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
How Often Should Models be Updated?
Model review & update may be performed at:
• Regular Interval
• Performance has degraded
• Ad hoc
• New and better algorithms are available
26
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
Stages of Predictive Analytics Model Development
Business Objectives and Problem
Statement Identification
Data Collection, Exploration and
Preparation
Model Development & Testing
Model Deployment
Model Monitoring & Maintenance
27
© 2022 National University of Singapore. All Rights Reserved
#ISSLearningFest
https://www.iss.nus.edu.sg/
28
Give Us Your Feedback
#ISSLearningFest
Day 2 Programme
29
Thank You!
#ISSLearningFest 30
Q & A
#ISSLearningFest 31
Predictive Analytics Talk
Survey
#ISSLearningFest 32
https://forms.gle/2zYmocqC7AyCu6ua9
Thank You!
#ISSLearningFest 33
brian_ang@nus.edu.sg

More Related Content

PDF
Overview of Data and Analytics Essentials and Foundations
PPTX
Predictive analytics
PPTX
Business Analytics Overview
PPTX
Decision trees
PPTX
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
PDF
Storytelling with Data
PDF
Data Storytelling: The only way to unlock true insight from your data
PPTX
Tiedonhallinnasta tiedolla johtamiseen - millaista tietoa palvelujen johtamis...
Overview of Data and Analytics Essentials and Foundations
Predictive analytics
Business Analytics Overview
Decision trees
Big Data Analytics | What Is Big Data Analytics? | Big Data Analytics For Beg...
Storytelling with Data
Data Storytelling: The only way to unlock true insight from your data
Tiedonhallinnasta tiedolla johtamiseen - millaista tietoa palvelujen johtamis...

What's hot (20)

PPTX
Hierarchical clustering
PDF
Best Practices for Killer Data Visualization
PDF
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
PDF
Business Intelligence Data Warehouse System
PDF
Anomaly Detection
PDF
Building the Enterprise Data Lake - Important Considerations Before You Jump In
PDF
Key Elements of a Successful Data Governance Program
PDF
Data mining
PDF
Cluster analysis
PPTX
Analytics & Data Strategy 101 by Deko Dimeski
PPT
Introduction To Predictive Analytics Part I
PDF
You Need a Data Catalog. Do You Know Why?
PPTX
Data Visualization1.pptx
PDF
Measuring Data Quality Return on Investment
PDF
Bring your data to life with Power BI
PPTX
Cluster Analysis
PDF
10 Principles for Data Storytelling
PPTX
How to Build & Sustain a Data Governance Operating Model
PPTX
Machine learning clustering
PDF
Data visualization & Story Telling with Data
Hierarchical clustering
Best Practices for Killer Data Visualization
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
Business Intelligence Data Warehouse System
Anomaly Detection
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Key Elements of a Successful Data Governance Program
Data mining
Cluster analysis
Analytics & Data Strategy 101 by Deko Dimeski
Introduction To Predictive Analytics Part I
You Need a Data Catalog. Do You Know Why?
Data Visualization1.pptx
Measuring Data Quality Return on Investment
Bring your data to life with Power BI
Cluster Analysis
10 Principles for Data Storytelling
How to Build & Sustain a Data Governance Operating Model
Machine learning clustering
Data visualization & Story Telling with Data
Ad

Similar to Predictive Analytics (20)

PPTX
Egypt hackathon 2014 analytics & spss session
PPTX
DataAnalyticsIntroduction and its ci.pptx
PPTX
Predictive Maintenance- From fixing to predicting problems
PPTX
Smarter Analytics - Businesses Use Analytics to Find Hidden Opportunities
PDF
Predictive data analytics models and their applications
PPTX
Tech meetup Data Driven - Codemotion
PPTX
Summer Shorts: Using Predictive Analytics For Data-Driven Decisions
 
PDF
Chapter 4 Classification in data sience .pdf
PPTX
Big Data Analytics - Unit 3.pptx
PDF
Machine learning meetup
PDF
Data Driven Engineering 2014
PDF
Predictive Analytics Solutions, Edsson 2019
PDF
Predictive Analytics Modeling
PDF
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
PPT
Introducing SPSS customer overview
PPTX
Predictive analytics BA4206 Anna University Business Analytics
PDF
Statistics — Your Friend, Not Your Foe
PPTX
Predire il futuro con Machine Learning & Big Data
PDF
bda-unit-5-bda-notes material big da.pdf
PPT
3 DM Classification HFCS kilometres .ppt
Egypt hackathon 2014 analytics & spss session
DataAnalyticsIntroduction and its ci.pptx
Predictive Maintenance- From fixing to predicting problems
Smarter Analytics - Businesses Use Analytics to Find Hidden Opportunities
Predictive data analytics models and their applications
Tech meetup Data Driven - Codemotion
Summer Shorts: Using Predictive Analytics For Data-Driven Decisions
 
Chapter 4 Classification in data sience .pdf
Big Data Analytics - Unit 3.pptx
Machine learning meetup
Data Driven Engineering 2014
Predictive Analytics Solutions, Edsson 2019
Predictive Analytics Modeling
PoT - probeer de mogelijkheden van datamining zelf uit 30-10-2014
Introducing SPSS customer overview
Predictive analytics BA4206 Anna University Business Analytics
Statistics — Your Friend, Not Your Foe
Predire il futuro con Machine Learning & Big Data
bda-unit-5-bda-notes material big da.pdf
3 DM Classification HFCS kilometres .ppt
Ad

More from NUS-ISS (20)

PDF
Designing Impactful Services and User Experience - Lim Wee Khee
PDF
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
PDF
The Importance of Cybersecurity for Digital Transformation
PDF
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
PDF
Understanding GenAI/LLM and What is Google Offering - Felix Goh
PDF
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
PDF
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
PDF
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
PDF
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
PDF
Future of Learning - Yap Aye Wee.pdf
PDF
Future of Learning - Khoong Chan Meng
PPTX
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
PDF
Product Management in The Trenches for a Cloud Service
PDF
Feature Engineering for IoT
PDF
Master of Technology in Software Engineering
PDF
Master of Technology in Enterprise Business Analytics
PDF
Diagnosing Complex Problems Using System Archetypes
PPTX
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
PDF
Preparing and Acing your Kubernetes Certification
PDF
AI in Finance: An Ensembling Architecture Incorporating Machine Learning Mode...
Designing Impactful Services and User Experience - Lim Wee Khee
Upskilling the Evolving Workforce with Digital Fluency for Tomorrow's Challen...
The Importance of Cybersecurity for Digital Transformation
Architecting CX Measurement Frameworks and Ensuring CX Metrics are fit for Pu...
Understanding GenAI/LLM and What is Google Offering - Felix Goh
Digital Product-Centric Enterprise and Enterprise Architecture - Tan Eng Tsze
Emerging & Future Technology - How to Prepare for the Next 10 Years of Radica...
Beyond the Hype: What Generative AI Means for the Future of Work - Damien Cum...
Supply Chain Security for Containerised Workloads - Lee Chuk Munn
Future of Learning - Yap Aye Wee.pdf
Future of Learning - Khoong Chan Meng
Site Reliability Engineer (SRE), We Keep The Lights On 24/7
Product Management in The Trenches for a Cloud Service
Feature Engineering for IoT
Master of Technology in Software Engineering
Master of Technology in Enterprise Business Analytics
Diagnosing Complex Problems Using System Archetypes
Satisfying the ‘-ilities’ of an Enterprise Cloud Service
Preparing and Acing your Kubernetes Certification
AI in Finance: An Ensembling Architecture Incorporating Machine Learning Mode...

Recently uploaded (20)

PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Encapsulation theory and applications.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Modernizing your data center with Dell and AMD
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Unlocking AI with Model Context Protocol (MCP)
PPT
Teaching material agriculture food technology
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Encapsulation theory and applications.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Modernizing your data center with Dell and AMD
Encapsulation_ Review paper, used for researhc scholars
Review of recent advances in non-invasive hemoglobin estimation
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
The Rise and Fall of 3GPP – Time for a Sabbatical?
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Understanding_Digital_Forensics_Presentation.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
Unlocking AI with Model Context Protocol (MCP)
Teaching material agriculture food technology
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

Predictive Analytics

  • 1. Predictive Analytics Dr. Brian ANG Senior Lecturer and Consultant Data Science brian_ang@nus.edu.sg #ISSLearningFest 1
  • 2. © 2022 National University of Singapore. All Rights Reserved What is Predictive Analytics? #ISSLearningFest Higher Profit Cost Savings Better Resource Allocation Better Efficiency Predict or forecast future trends or events, or the likelihood of an event happening Predictive Predictive Analyse currently available data using computational approaches Analytics Analytics To predict or forecast future trends and events based on currently available data. To predict or forecast future trends and events based on currently available data. 2
  • 3. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Example Applications of Predictive Analytics Medical Finance Marketing Sales Forecast Predictive Maintenance Environmental Prediction Icons in this slide deck are from Flaticon.com 3
  • 4. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Stages of Predictive Analytics Model Development Business Objectives and Problem Statement Identification Data Collection, Exploration and Preparation Model Development & Testing Model Deployment Model Monitoring & Maintenance 4
  • 5. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Stages of Predictive Analytics Model Development Business Objectives and Problem Statement Identification • Organisations have to identify the need of the predictive analytics model. This would be more user driven. • Identify the different stakeholders involved and how the predictive analytics model will affect them. • Have to consider cost versus benefit of the model adoption 5
  • 6. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Identifying the Stakeholders Who are the stakeholders? Anyone who has an interest or is affected by the Predictive Analytics project. Internal stakeholders • Project team • Project sponsors • Approval authorities/management • Supporting departments External stakeholders • Vendors • External clients • Other organisations 6
  • 7. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Cost versus Benefit Analysis of Predictive Analytics Models Cost in terms of, e.g., - Infrastructure - Manpower - Maintenance Benefits in terms of, e.g., - Cost savings & efficiency due to better resource allocation using predictive analytics. - Increase in profit due to knowing better which factor contributes to sales 7
  • 8. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Data Collection Data Processing Training and Testing Data Split Data Exploration & Analysis Data Collection, Exploration and Preparation Stages of Predictive Analytics Model Development 8
  • 9. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Data Collection & Sources of Data Data Collection Data Processing Training and Testing Data Split Data Exploration & Analysis Data Collection, Exploration and Preparation Origins • Within the department/organisation • External (affiliated) organisations • Engage vendors for data collection • Open source data • Local and overseas sources 9
  • 10. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Data Exploration & Pre-Processing Data Collection Data Processing Training and Testing Data Split Data Exploration & Analysis Data Collection, Exploration and Preparation • Check whether there are missing data, outliers, erroneous data, etc. • Perform data pre-processing to transform data into a form that can be used for model training. • Current data or new data collected may not be ready for model training. E.g., the correct features or attributes need to be extracted and put into the table columns and rows. 10
  • 11. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Training & Testing Data Split A data set can be divided into the following components: • Training/Development Dataset Used for development of the model during the training phase • Testing/Validation Dataset (hold-out dataset) Used to evaluate how well a model performs on unseen data Data Collection Data Processing Training and Testing Data Split Data Exploration & Analysis Data Collection, Exploration and Preparation 11
  • 12. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Training & Testing Data Split Data Set Training Testing Cross-validation 12 Repeat this N times Present the results as the average of the N runs and with the standard deviation. Random selection
  • 13. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Stages of Predictive Analytics Model Development Model Development Training Data Testing Data Prediction Output Testing Data Model Development & Testing Accepted model should perform well on both the training and testing datasets Proposed Model Proposed Model Accepted Model 13
  • 14. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Predictive Analytics Model Examples • To predict numeric quantities • E.g., predict revenue based on marketing expenditure, car sales based on car features. Regression 14 • Predict categorical quantities • E.g., predict whether a customer will buy a product or not. Among a few diseases, which disease is a patient likely to contract. Classification • Predict future quantities based previous trend • E.g., forecast next few months temperature based on historical data. Forecasting
  • 15. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Regression Model Examples Fit a straight line for a given set of points y=bo+b1x1+b2x2+b3x3 +b4x4 +b5x5 + e y=bo+b1x1+b2𝑥 + e Simple Linear Regression Model Quadratic regression model Multiple Linear Regression model 15 y=bo+b1x1+ e 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 10 30 50 70 Healthcare Cost Age Predicted Value Actual Value Residual
  • 16. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Classification Model Examples Image from: https://www.ibm.com/cloud/blog/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks Image from: https://en.wikipedia.org/wiki/Random_forest 16
  • 17. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Forecasting Model Examples Auto-Regressive Integrated Moving Average (ARIMA) or the Seasonal ARIMA models (p,d,q) (P,D,Q)s Seasonal Component Non-Seasonal Component 17
  • 18. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Evaluation of Regression & Time Series Models 𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 𝑅𝑀𝑆𝐸 ∑ 𝑒 𝑛 𝑀𝑒𝑎𝑛 𝐴𝑏𝑠𝑜𝑙𝑢𝑡𝑒 𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑀𝐴𝑃𝐷 100% 𝑛 | 𝑒 𝑦 | Error: 𝑒 𝑦 𝑦 18
  • 19. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Evaluation of Classification Models 𝑛 • Accuracy = 𝑐 × 100% • Is accuracy the only evaluation metric? Where - c is the total number of correctly classified samples - n is the total number of samples 19
  • 20. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Confusion Matrix Predicted Values Negative Positive Actual Values Negative Positive We can further analyze the model performance by breaking down the results. Consider a Binary Classification Problem 20 True Negative (TN) True Positive (TP) False Negative (FN) False Positive (FP)
  • 21. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Confusion Matrix Predicted Values Negative Positive Actual Values Negative True Negative (TN) False Positive (FP) Positive False Negative (FN) True Positive (TP) - Accuracy = (TP+TN)/(TP+TN+FP+FN) % - Specificity = TN/(TN+FP) Example: Percentage of patients correctly predicted as not having a certain disease, or percentage of transactions correctly predicted as not fraud. - Sensitivity = TP/(TP+FN) Example: Percentage of patients correctly predicted as having certain disease, or percentage of transactions correctly predicted as fraud. 21
  • 22. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Try it Out Predicted Values Negative Positive Actual Values Negative 765 55 Positive 154 26 Accuracy = (765+26)/1000 = 79.1% Accuracy = (605+138)/1000 = 74.3% Specificity = 765/(765+55) = 0.933 Sensitivity= 26/(26+154) = 0.14 Specificity = 605/(605+215) = 0.738 Sensitivity = 138/(42+138) = 0.767 Model 1 Model 2 Predicted Values Negative Positive Actual Values Negative 605 215 Positive 42 138 22 One may be more interested in sensitivity, e.g., in identifying patients who are going to get a certain disease or a transaction being a fraud.
  • 23. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Selecting the Best Model Accuracy Model 1a Model 1b Model2a Model2b Model3 Training 80.5 (± 0.3) 83.5 (± 2.3) 82.5 (± 1.35) 81.5 (± 0.3) 83.2 (± 1.3) Testing 77.8 (± 0.25) 78.5 (± 3.8) 79.8 (± 0.22) 75.8 (± 0.15) 80.7 (± 0.28) • One may try different models • Same model with different hyper-parameters • Need to compare across the various models before choosing the best model 23
  • 24. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Model Deployment Model Deployment 24 Consideration examples: • Communication plans to staff or users of the analytics model, timeline and action items for the deployment. • Which teams are involved in the deployment? Are the various teams aware and sufficiently engaged? • When, where and how to deploy the model?
  • 25. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Model Monitoring and Maintenance Model Monitoring & Maintenance • After the model is deployed, the model has to be monitored to ensure that it is working the way it is intended. • It needs to be maintained so that it is updated and relevant. • New data may be added to the older data (some cases but not always) to retrain the whole model • Some models allow incremental training, i.e., do not need to retrain the whole model. 25
  • 26. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest How Often Should Models be Updated? Model review & update may be performed at: • Regular Interval • Performance has degraded • Ad hoc • New and better algorithms are available 26
  • 27. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest Stages of Predictive Analytics Model Development Business Objectives and Problem Statement Identification Data Collection, Exploration and Preparation Model Development & Testing Model Deployment Model Monitoring & Maintenance 27
  • 28. © 2022 National University of Singapore. All Rights Reserved #ISSLearningFest https://www.iss.nus.edu.sg/ 28
  • 29. Give Us Your Feedback #ISSLearningFest Day 2 Programme 29
  • 32. Predictive Analytics Talk Survey #ISSLearningFest 32 https://forms.gle/2zYmocqC7AyCu6ua9