To Explain or To Predict?
Predictive Analytics in IS Research
3rd Taiwan Summer Workshop
on Information Management
July 2015
Galit Shmuéli
Galit Shmueli (徐茉莉)
www.galitshmueli.com
❷ 2000-2002
Carnegie Mellon Univ.
Visiting Assistant Prof.
Dept. of Statistics
❸ 2002-2012
Univ. of Maryland College Park
Assistant then Associate Prof. of
Statistics & Management Science
R H Smith School of Business
2008-2014
Rigsum Institute (Bhutan)
Co-Director, Rigsum
Research Lab
❹ 2011-2014
Indian School of Business
SRITNE Chaired Prof. of Data
Analytics, Associate Prof. of
Statistics & Info Systems
❶ 1994-2000
Israel Institute of
Technology
MSc + PhD, Statistics
2014-… NTHU
Institute of Service Science
Director, Center for Service
Innovation & Analytics
Research in Data Analytics
www.galitshmueli.com
• Statistical strategy
• ‘Entrepreneurial’ statistical &
data mining modeling (new
conditions & environments)
• Business analytics
In progress…
www.iss.nthu.edu.tw
Road Map
Definitions
Explanatory-dominated MIS
Explanatory modeling ≠ predictive modeling
Why?
Different modeling paths
Explanatory power vs. predictive power
How do I use this?
Definitions
Explanatory modeling:
Theory-based, statistical testing of
causal hypotheses
Explanatory power:
Strength of relationship in statistical
model
Definitions
Predictive modeling:
Empirical method for predicting new
observations
Predictive power:
Ability to accurately predict new
observations
Explain PredictDescribe
Matching Game
Social Sciences
(MIS included)
Machine
learning
Statistics
Statistical modeling in
MIS research
Purpose: test causal theory (“explain”)
Association-based statistical models
Prediction nearly absent
Start with a causal
theory
Generate causal
hypotheses on
constructs
Operationalize constructs → Measurable variables
Fit statistical model
Statistical inference → Causal conclusions
Explanatory modeling à-la MIS
In MIS,
data analysis is mainly used for testing
causal theory.
“If it explains, it predicts”
“Empirical prediction alone
is un-scientific”
Some statisticians share this view:
The two goals in analyzing data... I prefer to describe
as “management” and “science”. Management seeks
profit... Science seeks truth.
- Parzen, Statistical Science 2001
Prediction in top research journals in
Information Systems
Predictive goal?
Predictive modeling?
Predictive assessment?
1990-2006
52 “predictive” articles among 1,072
in Information Systems top journals
generate new theory
develop measures
compare theories
improve theory
assess relevance
evaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research”
MIS Quarterly, 2011
“A good explanatory model will also
predict well”
“You must understand the underlying
causes in order to predict”
Philosophy of Science
“Explanation and prediction have the
same logical structure”
Hempel & Oppenheim, 1948
“It becomes pertinent to investigate the
possibilities of predictive procedures
autonomous of those used for explanation”
Helmer & Rescher, 1959
“Theories of social and human behavior
address themselves to two distinct goals of
science: (1) prediction and (2) understanding”
Dubin, Theory Building, 1969
Why statistical
explanatory modeling
differs from
predictive modeling
Explanatory Model:
Test/quantify causal effect for
“average” record in population
Predictive Model:
Predict new individual
observations
Different Scientific Goals
Different generalization
Theory vs. its manifestation
?
Notation
Theoretical constructs: X, Y
Causal theoretical model: Y=F(X)
Measurable variables: X, Y
Statistical model: E(y)=f(X)
Four aspects
1. Theory – Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Y=F(X)
E(Y)=f(X)
“The goal of finding models that are
predictively accurate differs from the
goal of finding models that are true.”
Best explanatory model
Best predictive model
≠
Point #1
Four aspects
1. Theory - Data
2. Causation – Association
3. Retrospective – Prospective
4. Bias - Variance
Y=F(X)
Y=f(X)
Predict ≠ Explain
+ ?
“we tried to benefit from an extensive
set of attributes describing each of the
movies in the dataset. Those attributes
certainly carry a significant signal and
can explain some of the user behavior.
However… they could not help at all
for improving the [predictive]
accuracy.”
Bell et al., 2008
Predict ≠ Explain
Explain ≠ Predict
The FDA considers two products
bioequivalent if the 90% CI of the
relative mean of the generic to brand
formulation is within 80%-125%
“We are planning to… develop predictive models for bioavailability
and bioequivalence”
Lester M. Crawford, 2005
Acting Commissioner of Food & Drugs
“For a long time, we thought that
Tamoxifen was roughly 80%
effective for breast cancer
patients.
But now we know much more:
we know that it’s 100% effective
in 70%-80% of the patients, and
ineffective in the rest.”
Goal
Definition
Design &
Collection
Data
Preparation
EDA
Variables?
Methods? Evaluation,
Validation
& Model
Selection
Model Use &
Reporting
Study design
Hierarchical data
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?
& data collection
Data Preprocessing
reduced-
feature
models
missing
partitioning
PCA
SVD
Interactive
visualization
Data exploration & reduction
Which Variables?
Multicollinearity?
causation associations
endogeneity
ex-post
availability
A, B, A*B?
ensembles
Shrinkage models
variance bias
Methods / Models
Blackbox / interpretable
Mapping to theory
Evaluation, Validation
& Model Selection
Training dataEmpirical
model Holdout data
Predictive power
Over-fitting
analysis
Theoretical
model
Empirical
model
Data
Validation
Model fit ≠
Explanatory power
Inference
Model Use
test causal theory
generate new theory
develop measures
compare theories
improve theory
assess relevance
Evaluate predictability
Predictive performance
Over-fitting analysis
Null hypothesis
Naïve/baseline
Point #2
Explanatory
Power
Predictive
Power≠
Cannot infer one from the other
out-of-sample
Performance
Metrics
type I,II errors
goodness-of-fit
p-values
over-fitting
costs
prediction
accuracy
interpretation
Training vs.
holdout
R2
Explanatory Power
PredictivePower
The predictive power of an
explanatory model has important
scientific value
Relevance, reality check, predictability
Current State in Social Sciences
(and MIS)
“While the value of scientific prediction… is beyond
question… the inexact sciences [do not] have…the
use of predictive expertise well in hand.”
Helmer & Rescher, 1959
Distinction blurred
Unfamiliarity with predictive
modeling/assessment
Prediction underappreciated
How does this impact
Scientific Research?
State-of-the-art in Industry
Distinction blurred
Prediction over-appreciated
“Big Data” synonymous with prediction
How does this impact an
organization’s actions?
…and our lives?
What can be done?
Acknowledge difference
Learn/teach prediction
Leverage prediction in research
BUT
focus on its scientific uses:
generate new theory
develop measures
compare theories
improve theory
assess relevance
evaluate predictability
Why Predict? for Scientific Research
Shmueli & Koppius, “Predictive Analytics in IS Research”
MIS Quarterly, 2011
Shmueli (2010) “To Explain or To Predict?”, Statistical Science
Shmueli & Koppius (2011) “Predictive Analytics in IS Research”, MISQ

More Related Content

PPTX
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
PPTX
To explain or to predict
PPTX
To Explain, To Predict, or To Describe?
PPTX
Repurposing Classification & Regression Trees for Causal Research with High-D...
PPTX
Predictive Model Selection in PLS-PM (SCECR 2015)
PPTX
To Explain Or To Predict?
PPTX
Repurposing predictive tools for causal research
PPTX
Statistical Modeling in 3D: Describing, Explaining and Predicting
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
To explain or to predict
To Explain, To Predict, or To Describe?
Repurposing Classification & Regression Trees for Causal Research with High-D...
Predictive Model Selection in PLS-PM (SCECR 2015)
To Explain Or To Predict?
Repurposing predictive tools for causal research
Statistical Modeling in 3D: Describing, Explaining and Predicting

What's hot (20)

PDF
Shmueli
PDF
Statistical and Predictive Modelling
PPTX
Machine Learning and Causal Inference
PPT
Interpretation
PPTX
Reviewing quantitative articles_and_checklist
PDF
SDM Mini Project Group F
PDF
Causal Inference in Data Science and Machine Learning
PDF
C0252014021
PDF
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTING
ODP
Exploratory
PDF
Exploratory Factor Analysis; Concepts and Theory
PPTX
Introduction to the statistics project
DOCX
Scope and objective of the assignment
PPTX
Introduction to Statistics and Probability:
PPTX
Statistics Homework Help
PPTX
Doing Research with PLS_SEM using SmartPLS
PDF
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
PPTX
Building Better Models
PPTX
Statistics homework help
PDF
A probabilistic-functional approach to perspectivism and a case study
Shmueli
Statistical and Predictive Modelling
Machine Learning and Causal Inference
Interpretation
Reviewing quantitative articles_and_checklist
SDM Mini Project Group F
Causal Inference in Data Science and Machine Learning
C0252014021
PERFORMANCE ANALYSIS OF HYBRID FORECASTING MODEL IN STOCK MARKET FORECASTING
Exploratory
Exploratory Factor Analysis; Concepts and Theory
Introduction to the statistics project
Scope and objective of the assignment
Introduction to Statistics and Probability:
Statistics Homework Help
Doing Research with PLS_SEM using SmartPLS
TECHNOLOGY ACCEPTANCE MODELS & FRAMEWORKS
Building Better Models
Statistics homework help
A probabilistic-functional approach to perspectivism and a case study
Ad

Viewers also liked (11)

PPTX
Mis of herohonda
PPTX
MIS facebook final
PPTX
MIS, facebook case
KEY
Facebook Case MIS
PDF
MIS business approach
PPTX
Application of mis in textile industry
PPTX
MIS (Management Information System) in Fashion & Textile Industry
PPS
AEROSPACE TEXTILES
PPTX
Bio robotics
PPTX
Management information system
PPTX
Robotics project ppt
Mis of herohonda
MIS facebook final
MIS, facebook case
Facebook Case MIS
MIS business approach
Application of mis in textile industry
MIS (Management Information System) in Fashion & Textile Industry
AEROSPACE TEXTILES
Bio robotics
Management information system
Robotics project ppt
Ad

Similar to Predictive analytics in Information Systems Research (TSWIM 2015 keynote) (20)

PPTX
Statistical Modeling in 3D: Explaining, Predicting, Describing
PDF
Business Research and Quantitative Methods.pdf
PPT
1. Understanding research and statistics.ppt
PPTX
Research methods and paradigms
PPT
1.model building
PPTX
Research EDU821-1.pptx
PDF
Relevance of statistics sgd-slideshare
DOCX
Chao Wrote Some trends that influence human resource are, Leade.docx
DOCX
Chao Wrote Some trends that influence human resource are, Leade.docx
PDF
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
PPTX
Melissa Informatics - Data Quality and AI
DOCX
Difference Between Qualitative and Quantitative Research.docx
PDF
Mayo O&M slides (4-28-13)
PDF
The Goal Of Qualitative Research Essay
DOCX
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
PDF
A Framework for Statistical Simulation of Physiological Responses (SSPR).
PDF
CORE: Quantitative Research Methodology: An Overview
PPTX
Carrying out analysis
PPTX
research hypothesis.pptx
PDF
00 - Lecture - 04_MVA - Applications and Assumptions of MVA.pdf
Statistical Modeling in 3D: Explaining, Predicting, Describing
Business Research and Quantitative Methods.pdf
1. Understanding research and statistics.ppt
Research methods and paradigms
1.model building
Research EDU821-1.pptx
Relevance of statistics sgd-slideshare
Chao Wrote Some trends that influence human resource are, Leade.docx
Chao Wrote Some trends that influence human resource are, Leade.docx
Statistical Techniques for Processing & Analysis of Data Part 9.pdf
Melissa Informatics - Data Quality and AI
Difference Between Qualitative and Quantitative Research.docx
Mayo O&M slides (4-28-13)
The Goal Of Qualitative Research Essay
Deliverable 5 - Hypothesis Tests for Two SamplesCompetencyForm.docx
A Framework for Statistical Simulation of Physiological Responses (SSPR).
CORE: Quantitative Research Methodology: An Overview
Carrying out analysis
research hypothesis.pptx
00 - Lecture - 04_MVA - Applications and Assumptions of MVA.pdf

More from Galit Shmueli (20)

PDF
“Improving” prediction of human behavior using behavior modification
PPTX
Behavioral Big Data & Healthcare Research
PDF
Reinventing the Data Analytics Classroom
PPTX
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
PPTX
Workshop on Information Quality
PPTX
Behavioral Big Data: Why Quality Engineers Should Care
PPTX
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
PPTX
Prediction-based Model Selection in PLS-PM
PDF
When Prediction Met PLS: What We learned in 3 Years of Marriage
PPTX
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
PPTX
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
PDF
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
PDF
Research Using Behavioral Big Data (BBD)
PDF
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
PDF
Information Quality: A Framework for Evaluating Empirical Studies
PPTX
E.SUN Academic Award presentation (Jan 2016)
PPTX
Big Data & Analytics in the Digital Creative Industries
PPTX
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
PPTX
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
PPTX
Opening Data With Kaggle
“Improving” prediction of human behavior using behavior modification
Behavioral Big Data & Healthcare Research
Reinventing the Data Analytics Classroom
Behavioral Big Data & Healthcare Research: Talk at WiDS Taipei
Workshop on Information Quality
Behavioral Big Data: Why Quality Engineers Should Care
Researcher Dilemmas using Behavioral Big Data in Healthcare (INFORMS DMDA Wo...
Prediction-based Model Selection in PLS-PM
When Prediction Met PLS: What We learned in 3 Years of Marriage
A Tree-Based Approach for Addressing Self-selection in Impact Studies with B...
Research Using Behavioral Big Data: A Tour and Why Mechanical Engineers Shoul...
A Tree-Based Approach for Addressing Self-Selection in Impact Studies with Bi...
Research Using Behavioral Big Data (BBD)
Analyzing Behavioral Big Data: Methodological, Practical, Ethical & Moral Issues
Information Quality: A Framework for Evaluating Empirical Studies
E.SUN Academic Award presentation (Jan 2016)
Big Data & Analytics in the Digital Creative Industries
On Information Quality: Can Your Data Do The Job? (SCECR 2015 Keynote)
Introducing the NTHU-EZTABLE Kaggle Contest (Predicting Repeat Restaurant Boo...
Opening Data With Kaggle

Recently uploaded (20)

PPT
Image processing and pattern recognition 2.ppt
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Global Data and Analytics Market Outlook Report
PPTX
Introduction to Inferential Statistics.pptx
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
Steganography Project Steganography Project .pptx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
IMPACT OF LANDSLIDE.....................
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Leprosy and NLEP programme community medicine
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
New ISO 27001_2022 standard and the changes
Image processing and pattern recognition 2.ppt
SET 1 Compulsory MNH machine learning intro
Global Data and Analytics Market Outlook Report
Introduction to Inferential Statistics.pptx
DU, AIS, Big Data and Data Analytics.ppt
Phase1_final PPTuwhefoegfohwfoiehfoegg.pptx
retention in jsjsksksksnbsndjddjdnFPD.pptx
A Complete Guide to Streamlining Business Processes
Steganography Project Steganography Project .pptx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
IMPACT OF LANDSLIDE.....................
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Leprosy and NLEP programme community medicine
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
SAP 2 completion done . PRESENTATION.pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
Topic 5 Presentation 5 Lesson 5 Corporate Fin
New ISO 27001_2022 standard and the changes

Predictive analytics in Information Systems Research (TSWIM 2015 keynote)