SlideShare a Scribd company logo
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited.
Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work
may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.
Confidential and proprietary.
Stepwise Logistic Regression
Lecture for FMI Students 27.05.2010
Alexander Efremov
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 2
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 3
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 4
Introduction
Applications of the Logistic Regression
Medicine – diagnostics, modeling of disease growth, treatment effect
Psychology – learn process modeling, psychological tests evaluation
Economics – risk analysis, countries debt investigation, occupational choices
Marketing – products consumption, retailers actions effect
Criminology – risk factors for performing of criminal act
Sociology – employment, graduation, vote analysis
Ecology – modeling population growth
linguistics – language changes
Chemistry – reaction models
Media – news effects, copycat reaction
Finance – credit scoring, fraud detection
Physics, Biology, etc.
The Logistic Model
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 5
Introduction
System Under Investigation
Individuals /rough data/ => System => Model
=>
=>
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 6
Introduction
System Identification Stages
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 7
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 8
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 9
Part I. Logistic Regression Model Development
Logistic Model
Linear relation Logistic relation
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 10
k
kyˆ
ky
N
– index of current individual – intercept
– number of observations – the i+1-th model parameter
– dependent variable – the i-th independent variable
/prob. of good/
– model output – i-th independent variable
/predicted prob. of good/
Part I. Logistic Regression Model Development
Logistic Model
Logistic Relation – General Form “Linear” Log. Regression Model
k
k
M
M
k
e
e
y
+
=
1
ˆ
kMk
e
y −
+
=
1
1
ˆ
knnkk xxM ,,110 ... θθθ +++=
)...( ,,110
1
1
ˆ
knnk xxk
e
y θθθ +++−
+
=
knnky
y
xx
k
k
,,110ˆ1
ˆ
...ln θθθ +++=−
0θ
iθ
kix ,
ni ,1=
Nk ,1=
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 11
Part I. Logistic Regression Model Development
Logistic Model
Notation
Parameters vector
Regression vector
Logistic model
1+
∈ n
Rθ
1+
∈ n
k Rϕ
T
n ]...[ 10 θθθθ =
T
knkk xx ]...1[ ,,1=ϕ
θϕθθθ T
kknnk
ee
y xxk
−+++−
+
=
+
=
1
1
1
1
ˆ )...( ,,110
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 12
Part I. Logistic Regression Model Development
Residual
The Residual
kkkk eye
e
y T
k
+=+
+
=
−
ˆ
1
1
θϕ



=−
=−
=−=
0,ˆ
1,ˆ1
ˆ
for
for
kk
kk
kkk
yy
yy
yye
Sources of Uncertainty
Unavailable significant factors
Simplified relations
Time-varying performance
Database errors
Fraud
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 13
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 14
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Cost Function
Model output
Likelihood contribution
Likelihood function
Log-likelihood function
Maximum Likelihood Criterion
kk y
k
y
kk yyl −
−= 1
, )ˆ1(ˆθ
θ
θ
θ
θ
LL ln2minlnmax −⇔
∏
=
=
N
k
klL
1
,θθ
∑
=
−−+=
N
k
kkkk yyyyL
1
))ˆ1ln()1(ˆln(ln θ
)|1(ˆ kkk yPy ϕ==
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 15
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Cost Function /-2 Log L/ for a Real Life Case
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 16
Tailor Series Expansion
Cost Function Models
Linear model
Quadratic model
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
)()()1( ˆˆ iii
θθθ ∆+=+
)()()(
ˆ
)(
)( iTiii
gfM θ
θ
∆+=
)()()(
2
1)()()(
ˆ
)(
)()( iiTiiTiii
HgfM θθθ
θ
∆∆+∆+=
3
)()()(
2
1)()()(
ˆ
)(
ˆ )()( OHgff iiTiiTiii
+∆∆+∆+=
∆+
θθθ
θθθ
)(
ˆ
)( iTi
fg
θ
∇=
)(
ˆ
2)( ii
fH
θ
∇=
Cost function
Gradient
Hessian
)(
ˆ
)(
ˆ ln ii
Lf
θθ
−=
?)(
=∆ i
θ
Estimates Update
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 17
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
Gradient Hessian
I-st Order Methods II-nd Order Method
/e.g. Steepest Descent/ /e.g. Newton-Raphson/
gαθ −=∆ gH 1−
−=∆ αθ
[ ] 1
10
+
∂
∂
∂
∂
∂
∂
∈= nTfff
Rg
nθθθ
L
11
2
2
1
2
0
2
1
2
2
1
2
01
2
0
2
10
2
2
0
2
+×+
∂
∂
∂∂
∂
∂∂
∂
∂∂
∂
∂
∂
∂∂
∂
∂∂
∂
∂∂
∂
∂
∂
∈


















= nn
fff
fff
fff
RH
nnn
n
n
θθθθθ
θθθθθ
θθθθθ
L
MOMM
L
L
θ
(0)
1
2
θ*θopt
1
2
θ
(0)
θ*
θopt
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 18
Steepest Newton-
Descent Raphson
(NR)
NR with NR with
Line Search Quadratic
Interpolation
1
2
θ
(0)
θ*
θopt
θ
(0)
1
2
θ*θopt
Part I. Logistic Regression Model Development
Maximum Likelihood Estimator
gαθ −=∆
gH 1−
−=∆ αθ
gH 1* −
−=∆ αθ
gH 1* −
−=∆ αθ
θ
(0)
1
2
θ*θopt
θ
(0)
1
2
θ*θopt
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 19
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 20
Numerical Problems
Matrix inversion, hence SVD, EVD, QR, etc.
Local Minima
Part I. Logistic Regression Model Development
Potential problems
Model Overfitting
αθθ −=+ )()1( ˆˆ ii 1−
H g
-2lnL
k
y2,k
yk
1,ky
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 21
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 22
Part I. Logistic Regression Model Development
Frequently Used Statistics for Model Analysis
Individual Estimate Measures
Standard error
Wald statistic
p-value
Overall Model Measures
Coefficient of determination (R2)
generalized R2
gen. max. resc. R2
Cost function
2
1
ˆ)ˆ(
~2
ˆ
2
2
ˆ
2
χ
θθ
σ
θ
σ
θθ
i
i
i
ii
iW ==
−
N
LL
eR
θθ ˆln0
ˆln
2
12
−
−=
1
0
ˆln2
1 −−= N
L
esR
θ
Rs
R
mR
22
=
)(
ˆ
)(
ˆ ln2 ii
Lf
θθ
−=
iH
i
)][diag( 1
ˆ
−
=θ
σ
2
1Pr χ>
χ
p-value
WWi
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 23
Part I. Logistic Regression Model Development
Frequently Used Statistics for Model Analysis
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
Minimum Description Length (MDL), Final Prediction Error (FPE), etc.
Model Validation
Data split into development and validation samples
nLAIC 2ln2 ˆˆ +−= θθ
)1ln(ln2 ˆˆ −+−= NnLSC θθ
AIC
-2lnL
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 24
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 25
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 26
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
xo, xe – sets of all variables, out/entered in the model
xoi, xei – the most/less significant variable
SLE – Significance Level to Enter
SLS – Significance Level to Stay
SWR
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 27
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Available information
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 28
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
1
Initialization
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 29
Forward Selection
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
1
2
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 30
1
2 3
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Forward Selection
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 31
2 3
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression – Basic Idea
Backward Elimination
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 32
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 33
Part II. Stepwise Logistic Regression
Step 0. Initialization
Logistic model
1. Intercept Model
2. Full model
3. One Factor Model
Check for Enter
Score Chi-Sq for all potential models
Maximum Score Chi-Square
p-value & threshold
Model Determination (Optimization)
θϕT
ke
yk
−
+
=
1
1
ˆ
ii
T
ii gHgS 1−
=
R∈θ 1=kϕ
1+
∈ n
Rθ T
knkk xx ]1[ ,,1 K=ϕ
i
i
Smaxarg1 =l
SLEvalue-p 1
<l
T
kk x ]1[ ,1l=ϕ2
R∈θ
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 34
Part II. Stepwise Logistic Regression
Step 1. Forward Selection
1. Check for Enter
Score Chi-Square of all potential models
Maximum Score Chi-Square
p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
Individual Estimate Measures
standard error
Wald statistic & p-value
ii
T
ii gHgS 1−
=
i
i
i Smaxarg=l
SLEvalue-p <il
T
kkk i
xx ]1[ ,,1 ll K=ϕ1+
∈ i
Rθ
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 35
Part II. Stepwise Logistic Regression
Step 1. Forward Selection
3. Statistics for Model Analysis (part 2)
Overall Model Measures
Coefficients of determination
Cost function
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 36
Part II. Stepwise Logistic Regression
Stepwise Logistic Regression
SWR
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 37
Part II. Stepwise Logistic Regression
Step 2. Backward Elimination
1. Check for Leave
Wald statistic & p-value of all potential models
p-value & threshold
2. Model Determination (Optimization)
3. Statistics for Model Analysis
Individual Estimate Measures
standard error
Wald statistic & p-value
T
kkkkk ijj
xxxx ]1[ ,,,, 111 llll KK +−
=ϕi
R∈θ
SLLvalue-pmax >il
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 38
3. Statistics for Model Analysis (part 2)
Overall Model Measures
Coefficients of determination
Cost function
Modified criteria
Akaike Information Criterion (AIC)
Schwarz Criterion (SC)
Part II. Stepwise Logistic Regression
Step 2. Backward Elimination
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 39
Agenda
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 40
Part II. Stepwise Logistic Regression
Potential problems in the Stepwise Regression
Local Minima & Initial Conditions
Numerical Problems /SVD, EVD, QR, etc./
Model Overfitting
© Experian Limited 2007. All rights reserved.
Confidential and proprietary. 41
Summary
Introduction
Applications of the Logistic Regression
System Identification & Stepwise Regression
Part I. Logistic Regression Model Development
Logistic Model
Maximum Likelihood Estimator
Potential Problems
Model Analysis and Validation
Part II. Stepwise Logistic Regression (SWR)
Basic Idea
SWR Algorithm
Potential Problems
Summary
© Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited.
Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work
may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited.
Confidential and proprietary.
Stepwise Logistic Regression
Lecture for FMI Students 27.05.2010
Alexander Efremov
Thank You!
http://anp.tu-sofia.bg/aefremov/index.htm

More Related Content

DOCX
Assignment on Cloud Computing
PPT
14. Query Optimization in DBMS
PPTX
Grid Computing
PDF
Mod05lec23(map reduce tutorial)
PPTX
Logic programming in python
PPTX
Overview: Building Open Source Cloud Computing Environments
PPTX
Hyper threading
PPT
ECG ANALYSIS IN CLOUD COMPUTING
Assignment on Cloud Computing
14. Query Optimization in DBMS
Grid Computing
Mod05lec23(map reduce tutorial)
Logic programming in python
Overview: Building Open Source Cloud Computing Environments
Hyper threading
ECG ANALYSIS IN CLOUD COMPUTING

What's hot (20)

PDF
8. R Graphics with R
 
PPTX
Hyper threading technology
PPTX
Green cloud computing
PPTX
Basic constituent elements
PPTX
Hadoop
PPT
Unit-3_BDA.ppt
PDF
CPU vs. GPU presentation
PDF
Introduction to R Programming
PDF
VTU 6th Sem Elective CSE - Module 5 cloud computing
PDF
Introduction to Apache Spark
PPT
Map reduce in BIG DATA
PPT
Customizing the look and-feel of DSpace
PPT
Lecture 4 mobile database system
PPT
Data Federation
PPTX
An Introduction To NoSQL & MongoDB
DOC
Data structures project
PPTX
Cloud computing seminar
PPTX
Megastore by Google
PDF
Implementation of k means algorithm on Hadoop
8. R Graphics with R
 
Hyper threading technology
Green cloud computing
Basic constituent elements
Hadoop
Unit-3_BDA.ppt
CPU vs. GPU presentation
Introduction to R Programming
VTU 6th Sem Elective CSE - Module 5 cloud computing
Introduction to Apache Spark
Map reduce in BIG DATA
Customizing the look and-feel of DSpace
Lecture 4 mobile database system
Data Federation
An Introduction To NoSQL & MongoDB
Data structures project
Cloud computing seminar
Megastore by Google
Implementation of k means algorithm on Hadoop
Ad

Viewers also liked (20)

PPTX
Logistic regression
PDF
Fault prediction using logistic regression (Python)
PPTX
Logistic regression with SPSS examples
PDF
Logistic regression
PDF
Intro to Classification: Logistic Regression & SVM
PPTX
Logistic regression
PDF
Implementation of linear regression and logistic regression on Spark
PDF
Spss course session-II
DOCX
Multivariate Techniques
DOCX
Dissertation Paper
PPT
Solving stepwise regression problems
PPT
Chapter05
PDF
SAPC 2009 - Patient satisfaction with Primary Care
PPT
Statistics Case Study - Stepwise Multiple Regression
PDF
Logistic regression teaching
PDF
Foundations for Scaling ML in Apache Spark
PPT
e-Commerce Academy - Winning Consumer Market from Online to Offline in Mobile...
PPTX
It's All E-commerce
PPTX
Frameworks and development of supply chain information architecture
PDF
A competency based human resources architecture - ppt
Logistic regression
Fault prediction using logistic regression (Python)
Logistic regression with SPSS examples
Logistic regression
Intro to Classification: Logistic Regression & SVM
Logistic regression
Implementation of linear regression and logistic regression on Spark
Spss course session-II
Multivariate Techniques
Dissertation Paper
Solving stepwise regression problems
Chapter05
SAPC 2009 - Patient satisfaction with Primary Care
Statistics Case Study - Stepwise Multiple Regression
Logistic regression teaching
Foundations for Scaling ML in Apache Spark
e-Commerce Academy - Winning Consumer Market from Online to Offline in Mobile...
It's All E-commerce
Frameworks and development of supply chain information architecture
A competency based human resources architecture - ppt
Ad

Similar to Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/ (20)

DOCX
Logistic Regression in machine learning.docx
PDF
Logistic regression vs. logistic classifier. History of the confusion and the...
PPTX
Supervised learning - Linear and Logistic Regression( AI, ML)
PPTX
Machine_Learning.pptx
PDF
Unit2_Linear Regression_Performance Metrics.pdf
PPTX
Predictive analytics and Type of Predictive Analytics
PDF
3ml.pdf
PDF
Applied Logistic Regression 3rd Edition David Hosmer
PDF
Logistic Regression Classifier - Conceptual Guide
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPT
Data Analysison Regression
PPTX
Logistic Regression power point presentation.pptx
PDF
Logistic regression
PPTX
Classification Algortyhm of Machine Learning
PDF
Logistic Regression: Behind the Scenes
PDF
7_logistic-regression presentation sur la regression logistique.pdf
PPTX
Presentation1
PDF
Logistic regression in_python_tutorial
PPTX
lec+5+_part+1 cloud .pptx
PDF
Logistic regression in Machine Learning
Logistic Regression in machine learning.docx
Logistic regression vs. logistic classifier. History of the confusion and the...
Supervised learning - Linear and Logistic Regression( AI, ML)
Machine_Learning.pptx
Unit2_Linear Regression_Performance Metrics.pdf
Predictive analytics and Type of Predictive Analytics
3ml.pdf
Applied Logistic Regression 3rd Edition David Hosmer
Logistic Regression Classifier - Conceptual Guide
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Data Analysison Regression
Logistic Regression power point presentation.pptx
Logistic regression
Classification Algortyhm of Machine Learning
Logistic Regression: Behind the Scenes
7_logistic-regression presentation sur la regression logistique.pdf
Presentation1
Logistic regression in_python_tutorial
lec+5+_part+1 cloud .pptx
Logistic regression in Machine Learning

Recently uploaded (20)

PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
PDF
Approach and Philosophy of On baking technology
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Electronic commerce courselecture one. Pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
Machine Learning_overview_presentation.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
cuic standard and advanced reporting.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
A Presentation on Artificial Intelligence
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding
Approach and Philosophy of On baking technology
Unlocking AI with Model Context Protocol (MCP)
Electronic commerce courselecture one. Pdf
MIND Revenue Release Quarter 2 2025 Press Release
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Machine Learning_overview_presentation.pptx
A comparative analysis of optical character recognition models for extracting...
cuic standard and advanced reporting.pdf
Network Security Unit 5.pdf for BCA BBA.
Assigned Numbers - 2025 - Bluetooth® Document
NewMind AI Weekly Chronicles - August'25-Week II
A Presentation on Artificial Intelligence
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Dropbox Q2 2025 Financial Results & Investor Presentation
20250228 LYD VKU AI Blended-Learning.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Review of recent advances in non-invasive hemoglobin estimation

Stepwise Logistic Regression - Lecture for Students /Faculty of Mathematics and Informatics/

  • 1. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited. Confidential and proprietary. Stepwise Logistic Regression Lecture for FMI Students 27.05.2010 Alexander Efremov
  • 2. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 2 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 3. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 3 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 4. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 4 Introduction Applications of the Logistic Regression Medicine – diagnostics, modeling of disease growth, treatment effect Psychology – learn process modeling, psychological tests evaluation Economics – risk analysis, countries debt investigation, occupational choices Marketing – products consumption, retailers actions effect Criminology – risk factors for performing of criminal act Sociology – employment, graduation, vote analysis Ecology – modeling population growth linguistics – language changes Chemistry – reaction models Media – news effects, copycat reaction Finance – credit scoring, fraud detection Physics, Biology, etc. The Logistic Model
  • 5. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 5 Introduction System Under Investigation Individuals /rough data/ => System => Model => =>
  • 6. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 6 Introduction System Identification Stages
  • 7. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 7 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 8. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 8 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 9. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 9 Part I. Logistic Regression Model Development Logistic Model Linear relation Logistic relation
  • 10. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 10 k kyˆ ky N – index of current individual – intercept – number of observations – the i+1-th model parameter – dependent variable – the i-th independent variable /prob. of good/ – model output – i-th independent variable /predicted prob. of good/ Part I. Logistic Regression Model Development Logistic Model Logistic Relation – General Form “Linear” Log. Regression Model k k M M k e e y + = 1 ˆ kMk e y − + = 1 1 ˆ knnkk xxM ,,110 ... θθθ +++= )...( ,,110 1 1 ˆ knnk xxk e y θθθ +++− + = knnky y xx k k ,,110ˆ1 ˆ ...ln θθθ +++=− 0θ iθ kix , ni ,1= Nk ,1=
  • 11. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 11 Part I. Logistic Regression Model Development Logistic Model Notation Parameters vector Regression vector Logistic model 1+ ∈ n Rθ 1+ ∈ n k Rϕ T n ]...[ 10 θθθθ = T knkk xx ]...1[ ,,1=ϕ θϕθθθ T kknnk ee y xxk −+++− + = + = 1 1 1 1 ˆ )...( ,,110
  • 12. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 12 Part I. Logistic Regression Model Development Residual The Residual kkkk eye e y T k +=+ + = − ˆ 1 1 θϕ    =− =− =−= 0,ˆ 1,ˆ1 ˆ for for kk kk kkk yy yy yye Sources of Uncertainty Unavailable significant factors Simplified relations Time-varying performance Database errors Fraud
  • 13. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 13 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 14. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 14 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Cost Function Model output Likelihood contribution Likelihood function Log-likelihood function Maximum Likelihood Criterion kk y k y kk yyl − −= 1 , )ˆ1(ˆθ θ θ θ θ LL ln2minlnmax −⇔ ∏ = = N k klL 1 ,θθ ∑ = −−+= N k kkkk yyyyL 1 ))ˆ1ln()1(ˆln(ln θ )|1(ˆ kkk yPy ϕ==
  • 15. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 15 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Cost Function /-2 Log L/ for a Real Life Case
  • 16. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 16 Tailor Series Expansion Cost Function Models Linear model Quadratic model Part I. Logistic Regression Model Development Maximum Likelihood Estimator )()()1( ˆˆ iii θθθ ∆+=+ )()()( ˆ )( )( iTiii gfM θ θ ∆+= )()()( 2 1)()()( ˆ )( )()( iiTiiTiii HgfM θθθ θ ∆∆+∆+= 3 )()()( 2 1)()()( ˆ )( ˆ )()( OHgff iiTiiTiii +∆∆+∆+= ∆+ θθθ θθθ )( ˆ )( iTi fg θ ∇= )( ˆ 2)( ii fH θ ∇= Cost function Gradient Hessian )( ˆ )( ˆ ln ii Lf θθ −= ?)( =∆ i θ Estimates Update
  • 17. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 17 Part I. Logistic Regression Model Development Maximum Likelihood Estimator Gradient Hessian I-st Order Methods II-nd Order Method /e.g. Steepest Descent/ /e.g. Newton-Raphson/ gαθ −=∆ gH 1− −=∆ αθ [ ] 1 10 + ∂ ∂ ∂ ∂ ∂ ∂ ∈= nTfff Rg nθθθ L 11 2 2 1 2 0 2 1 2 2 1 2 01 2 0 2 10 2 2 0 2 +×+ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂∂ ∂ ∂ ∂ ∈                   = nn fff fff fff RH nnn n n θθθθθ θθθθθ θθθθθ L MOMM L L θ (0) 1 2 θ*θopt 1 2 θ (0) θ* θopt
  • 18. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 18 Steepest Newton- Descent Raphson (NR) NR with NR with Line Search Quadratic Interpolation 1 2 θ (0) θ* θopt θ (0) 1 2 θ*θopt Part I. Logistic Regression Model Development Maximum Likelihood Estimator gαθ −=∆ gH 1− −=∆ αθ gH 1* − −=∆ αθ gH 1* − −=∆ αθ θ (0) 1 2 θ*θopt θ (0) 1 2 θ*θopt
  • 19. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 19 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 20. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 20 Numerical Problems Matrix inversion, hence SVD, EVD, QR, etc. Local Minima Part I. Logistic Regression Model Development Potential problems Model Overfitting αθθ −=+ )()1( ˆˆ ii 1− H g -2lnL k y2,k yk 1,ky
  • 21. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 21 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 22. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 22 Part I. Logistic Regression Model Development Frequently Used Statistics for Model Analysis Individual Estimate Measures Standard error Wald statistic p-value Overall Model Measures Coefficient of determination (R2) generalized R2 gen. max. resc. R2 Cost function 2 1 ˆ)ˆ( ~2 ˆ 2 2 ˆ 2 χ θθ σ θ σ θθ i i i ii iW == − N LL eR θθ ˆln0 ˆln 2 12 − −= 1 0 ˆln2 1 −−= N L esR θ Rs R mR 22 = )( ˆ )( ˆ ln2 ii Lf θθ −= iH i )][diag( 1 ˆ − =θ σ 2 1Pr χ> χ p-value WWi
  • 23. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 23 Part I. Logistic Regression Model Development Frequently Used Statistics for Model Analysis Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC) Minimum Description Length (MDL), Final Prediction Error (FPE), etc. Model Validation Data split into development and validation samples nLAIC 2ln2 ˆˆ +−= θθ )1ln(ln2 ˆˆ −+−= NnLSC θθ AIC -2lnL
  • 24. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 24 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 25. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 25 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 26. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 26 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea xo, xe – sets of all variables, out/entered in the model xoi, xei – the most/less significant variable SLE – Significance Level to Enter SLS – Significance Level to Stay SWR
  • 27. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 27 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Available information
  • 28. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 28 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea 1 Initialization
  • 29. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 29 Forward Selection Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea 1 2
  • 30. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 30 1 2 3 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Forward Selection
  • 31. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 31 2 3 Part II. Stepwise Logistic Regression Stepwise Logistic Regression – Basic Idea Backward Elimination
  • 32. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 32 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 33. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 33 Part II. Stepwise Logistic Regression Step 0. Initialization Logistic model 1. Intercept Model 2. Full model 3. One Factor Model Check for Enter Score Chi-Sq for all potential models Maximum Score Chi-Square p-value & threshold Model Determination (Optimization) θϕT ke yk − + = 1 1 ˆ ii T ii gHgS 1− = R∈θ 1=kϕ 1+ ∈ n Rθ T knkk xx ]1[ ,,1 K=ϕ i i Smaxarg1 =l SLEvalue-p 1 <l T kk x ]1[ ,1l=ϕ2 R∈θ
  • 34. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 34 Part II. Stepwise Logistic Regression Step 1. Forward Selection 1. Check for Enter Score Chi-Square of all potential models Maximum Score Chi-Square p-value & threshold 2. Model Determination (Optimization) 3. Statistics for Model Analysis Individual Estimate Measures standard error Wald statistic & p-value ii T ii gHgS 1− = i i i Smaxarg=l SLEvalue-p <il T kkk i xx ]1[ ,,1 ll K=ϕ1+ ∈ i Rθ
  • 35. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 35 Part II. Stepwise Logistic Regression Step 1. Forward Selection 3. Statistics for Model Analysis (part 2) Overall Model Measures Coefficients of determination Cost function Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC)
  • 36. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 36 Part II. Stepwise Logistic Regression Stepwise Logistic Regression SWR
  • 37. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 37 Part II. Stepwise Logistic Regression Step 2. Backward Elimination 1. Check for Leave Wald statistic & p-value of all potential models p-value & threshold 2. Model Determination (Optimization) 3. Statistics for Model Analysis Individual Estimate Measures standard error Wald statistic & p-value T kkkkk ijj xxxx ]1[ ,,,, 111 llll KK +− =ϕi R∈θ SLLvalue-pmax >il
  • 38. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 38 3. Statistics for Model Analysis (part 2) Overall Model Measures Coefficients of determination Cost function Modified criteria Akaike Information Criterion (AIC) Schwarz Criterion (SC) Part II. Stepwise Logistic Regression Step 2. Backward Elimination
  • 39. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 39 Agenda Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 40. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 40 Part II. Stepwise Logistic Regression Potential problems in the Stepwise Regression Local Minima & Initial Conditions Numerical Problems /SVD, EVD, QR, etc./ Model Overfitting
  • 41. © Experian Limited 2007. All rights reserved. Confidential and proprietary. 41 Summary Introduction Applications of the Logistic Regression System Identification & Stepwise Regression Part I. Logistic Regression Model Development Logistic Model Maximum Likelihood Estimator Potential Problems Model Analysis and Validation Part II. Stepwise Logistic Regression (SWR) Basic Idea SWR Algorithm Potential Problems Summary
  • 42. © Experian Limited 2007. All rights reserved. Experian and the marks used herein are service marks or registered trademarks of Experian Limited. Other product and company names mentioned herein may be the trademarks of their respective owners. No part of this copyrighted work may be reproduced, modified, or distributed in any form or manner without the prior written permission of Experian Limited. Confidential and proprietary. Stepwise Logistic Regression Lecture for FMI Students 27.05.2010 Alexander Efremov Thank You! http://anp.tu-sofia.bg/aefremov/index.htm