SlideShare a Scribd company logo
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Overall Description
The present work aims at presenting tools for model interpretation derived
from Partial Dependency Plots (in many different guises, explained in the text),
and contrasted to osterior probabilities, hereby called scores.
The work comprises 4 Powerpoint Documents, with a possible fifth (if I get to
it), numbered 0 to 4. 0 describes overall issues, introduces the working data
set and models.
At the risk of spoiling end results, the Multivariate section provides insights at
(almost) the observation level, and requires univariate and bivariate support.
This conclusion is quite surprising to me since I thought that Univariate and
Bivariate would be rendered lacking. But context reality is far more complex
than expected, and model interpretations are as varied as the different
contexts available in the data, that should not be dismissed all too eagerly.
This work is based mostly on visualitzation and I have tried to avoid statistical
inference and lengthy tables.
Abstract
Statistical and data science models: Are they Interpretive black-
boxes ? Let’s try for NO.
Molnar’s (2018) “Interpretable Machine Learning”: big effort in finding
solutions. Our presentation is humbler: visual tools for model
interpretation based on partial dependency plots and their variants,
such as collapsed PDPs created by the presenter, some of which may
be polemical and debatable. Almost no use of statistical inference.
Audience should be versed in models creation, and at least some insight into partial
dependency plots. Presentation based on simple working example with 8 predictors and
one binary target variable.
Not possible to detail exhaustively every method described in this presentation.
Extensive document in preparation. Presentation requires 3 hours and wide awake
audience. Double time if not awake. Sleepers will be punished accordingly.
Slides Marked **** can be skipped for easier first reading.
Contents: Model Interpretation (MI)
1. Introduction and General Notes
2. Confounding
3. Model Interpretation (MI) and Categorization: UMI, BMI, MMI.
4. Binary Target Study
4.1: Report of coefficients, estimates, etc.
4.2: Models Structures
4.3: GOF and model Interpretation
5. Univariate Model Interpretation: UMI
5. Profile and Model Interpretation area. Univariate Model Interpreation UMI.
6. Partial Dependency plots (PDPs) and their variants. UMI.
6. Bivariate Model Interpretation: BMI
7. PDPs and Bivariate Model Interpretation (BMI.)
7.1: UMI vs. BMI.
8. Multivariate model interpretation.: MMI
9. Future Steps
10. Observation level Interpretation.
11. References
0 Model Interpretation setting.pdf
Overall comments and introduction.
Presentation by way of example focusing on Fraud/Default
Data set and continuing previous chapters available on
web (standard class for Principal Analytics Prep).
Aim: study interpretation/diagnosis mostly via Partial
Dependency Plots of logistic regression, Classification
Trees and Gradient Boosting.
Presentation(s) available at
https://www.slideshare.net/LeonardoAuslender/visual-
tools-for-interpretation-of-machine-learning-models
At present, lots of written opinions and distinctions about topic. No room or desire to
discuss them all. See Molnar’s (2018) book for an overall view, O’Rourke (2018), Doshi-
Velez et (2017).
Overall comments and introduction (cont 1).
No discussion about imbalanced data set modeling
or other modeling issues such as model selection.
This presentation introduces novel visual concepts
as well as tools derived from Partial Dependency
Plots (PDP):
-Overall PDP
-Collapsed PDP and residuals
-Marginal PDP
-PDP vs. actual scores, ….
and how they assist in model interpretation.
Model Interpretation (MI) and model building issues.
1) Why/where model makes mistakes (large residuals, outliers, etc.)?
2) Which/when attributes (alone / group) end up being important?
3) Why non-importants?
4) Observation level predictions differ by models?
However, immediate aim is NOT interpretations at observation level (why
predicted sick/churner/innocent…) but
Objectives of MI (cont. 1)
Why not directly at observation level?
Suppose model to predict entertainment type preference for
database of families in large cities. Since not possible to
obtain updated family preferences consistently, (i.e., data
are ‘soft’), models necessarily are not interpretable at
specific family levels.
Contrariwise, disease diagnostic prediction is closer to
individual explanation and interpretablity (data typically
‘hard’).
MOTTO: Posterior probability follows Data + Model
algorithm/s. Interpretation follows primarily probability but
must include data (i.e., context) ➔
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Model Interpretation categorization.
Just as in EDA (but on model results, i.e., predictions), not on initial data),
three types of MI:
Univariate Model Interpretation (UMI): One variable at a time vis-à-vis
predictions/probs. EASIEST to understand and huge source of “makes
sense” discourse. E.g., Classical linear models interpretations;, reasons to
decline a bank loan, etc.
Bivariate Model Interpretation (BMI): Looking at pairs of variables to interpret
model results. Correlation measures immediately spring to mind.
Multivariate Model Interpretation (MMI): Overall model interpretation, most
difficult and valuable.
Typically, most work results in UMI and perhaps BMI. Will aim for MMI as well.
Aside: Does Occam’s razor help?
“Pluralitas non est ponenda sine necessitate. “ ➔ can lead to interpret and
then choose model, or choose model and then interpret ➔ does not help us.
Model Interpretation presentation
We will present results in UMI, BMI and MMI order, and at end, compare
across the three methodologies.
Aim is to find insights and contradictions when generalizing UMI without
validating interpretation in BMI and MMI.
And likewise, to verify strong UMI results that are still prevalent in BMI and
MMI.
0 Model Interpretation setting.pdf
Confounding rears its ugly head.
See earlier chapters for review and
examples.
Must read, not elaborated
Herein.
0 Model Interpretation setting.pdf
Golden Days of Linear Regression Interpretation ***
Based on “ceteris paribus” assumption that fails In case of
even relatively small VIFs. At present, rule of thumb VIF >=
10 (R-sq = .90 among predictors) ➔ unstable model (see earlier
slides in shareware …).
“Ceteris paribus” exercise: Keeping all other predictors
constant, an increase in …. But if R-sq among predictors is
even 10%, not possible to keep all predictors constant while
increasing by 1 the variable of interest, as per ceteris paribus
frame of analysis.
Advantages however: EASY to conceptualize because
practice follows notion of mostly bivariate correlation
(keeping all else constant, reduces relationship to just 1 var
vs. predictions ➔ UMI). But wrong with even small bivariate
corrs and mostly wrong in multivariate case. Let us see …..
➔Corr (X,Y) = if SD(Y) = SD(X). That is, if both vars
Standardized, otherwise same sign at least, and
interpretation from correlation holds in simple regression
case.
Notice that regression of X on Y is NOT inverse of
regression of Y on X because of SD(X) and SD(Y).
= + +
−
= = 
−
=


/
Confusion on signs of coefficients
and interpretation. Simple LR case.
( )
ˆ {
( )
} ˆ
( ) ( )
y
i
xy xy
x
i
xy
Y X
s
Y Y
r r
s
X X
sg r sg
  
 
2
1 2
2
β̂
̂
20 5/4/2022
In multiple linear regression, previous relationship does not hold
because predictors can be correlated (rxz) weighted by ryz, hinting at
co-linearity and/or relationships of supression/enhancement (paper on
suppression/enhancement in shareware.net)➔
= + + +

= + +
−
=
−
= 
  
. .
. 2
2
But in multivariate, e.g.: ,
estimated equation (emphasizing "partial")
and for example:
ˆ ˆ ˆ ,
ˆ
1
ˆ
( ) ( )
( ) ( ) and 1
YX Z YZ X
Y YX YZ XZ
YX Z
X XZ
YX
YX YZ XZ XZ
Y X Z
Y a X Z
s r r r
s r
sg sg r
abs r abs r r r
   
 
 
Comment on Linear Model Interpretation
Even in traditional UMI land, multivariate relations
given by Partial- and semi-partial correlations
must be part of the interpretation.
Note that while correlation is a bivariate
relationship, partial and semi-partial corrs can be
extended to multivariate setting. In case of binary
target, these relationships are not fully analyzed.
However, even BMI and certainly MMI not so often
performed.
0 Model Interpretation setting.pdf
EDA and Model Interpretation
EDA analyzes data sets without reference to dependent or target variable
(DV), which is instead done by modeling. Thus, MI = EDA + Predictions
Analysis.
Nevertheless, for given values(s) of DV or of predicted values, UMI, BMI
and MMI can utilize EDA tools. For instance, histogram of posterior
model probabilities is part of Model UEDA and thus part of UMI.
Thus, MI is based on relationship of predictions (and residuals) vis-à-vis
single, pairs, triads, tetrads, etc. of predictors. And this translates in
different techniques such as Original PDPs, Pair PDPs, triads, etc. to be
reviewed below.
NB: We utilize binning and rescaling of variables ranges for easier visual
interpretation. The number of bins is 10 mostly for UMI analysis, and 3
otherwise. We do not discuss issues of optimal binning, left to the reader.
0 Model Interpretation setting.pdf
Searching for Important variables en route to answering
modeling question.
QUESTION: minimum components to make a car go along
highway.
1) Engine
2) Tires
3) Steering wheel
4) Transmission
5) Gas
6) ….. Other MMI aspects and interrelations.
Take just one of them out, and car won’t MOVE ➔ EXISTENCE OF NO
SINGLE most important variable. Instead, minimum irreducible set of
them is NECESSARY. In Data Science case with n → ∞, possibly
many subsets of ‘important’ variables for (n, p) subsets.
Typically, “suspect VARIABLES” good starting point of
research. “STARTING” is key word.
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Basic DATA set(s) Information
Model
Name
Item Information
1
M2 TRN DATA set train
. TRN num obs 3595 1
VAL DATA set 1
. VAL num obs 0 1
TST DATA set 1
. TST num obs 0 1
2
Dep. Var fraud 1
TRN % Events 20.389 1
VAL % Events 1
TST % Events 1
— 30 —
Data set: Definition by way of Example
• Health insurance company:
Ophtamologic Insurance Claims
• Is claim valid or fraudulent? Binary
target.
• Full description and analysis of this data
set in
https://www.slideshare.net/LeonardoAuslender
(lectures at Principal Analytics Prep).
While presenting 3 models results, we’ll concentrate on ‘best’ model for
Interpretation for brevity sake, except to mention specific examples of
Different model interpretations across models.
RequestedModels:Names&Descriptions.
Mode
l#
FullModelName ModelDescription
2
002_M2_TRN_GRAD_BOOSTING GradientBoosting
004_M2_TRN_LOGISTIC_STEPWISE LogisticSTEPWISETRN 4
005_M2_TRN_TREE TREEmodel 5
Original Vars + Labels
Model
Name
M2
Var # Variable Label
**
1 FRAUD Fraudulent Activity yes/no
2 TOTAL_SPEND Total spent on opticals **
3 DOCTOR_VISITS Total visits to a doctor **
4 NO_CLAIMS No of claims made recently **
5 MEMBER_DURATION Membership duration **
6 OPTOM_PRESC Number of opticals claimed **
7 SPEND_PER_CLAIM Expenses per claim **
8 CLAIMS_PER_DURATIO
N
Claims per duration
**
Overall MI:
Comparison of
Models
Posterior
Probabilities and
Histograms.
Similar, not identical. Logistic & Trees achieve [0, 1].
Probability distributions very different ➔ Model interpretation must be dependent
on model selection. Possible to ‘mix’ all models into one, Ensemble, not in this
ppt. (See slides in shareware).
0 Model Interpretation setting.pdf
Some conclusions and comments so far: (cont.)
Probability distributions differ in:
1) Extreme points: Logistic and TREES achieve [0; 1], not necessarily
other methods, as GradBoost in our case.
2) Very different % obs in Models’ probability bins.
3) % events per bin fairly linear, except for Logistic ‘drop’ at 0.7. Grad
Boosting has higher % events for higher probability levels than other
2 models.
4) After about 0.4 of posterior probability, 3 methods have similar
distributions. Quite different in segment 0 - < 0.4. Notice GB and
TREE having large proportion of observation at lower probability
levels, compared to Logistic.
5) Relative but not absolute Ml Information can be inferred. % Events
different across models ➔ different probability estimates especially
above segment 0 - < 0.4. Since higher probability levels reflect higher
% events, MI necessarily different.
Let’s get into Data Details for sake
Of completion.
Quick EDA area.
U(nivariate) EDA = UEDA
Note “small” Claims_per_duration
And “NO_claims” values at p95.
4.2
PROBABILITIES,
PARAMETERS,
IMPORTANCE
....
4.2.1:
Logistic
Regression
Details.
Note: Importance and coefficients share one column as well
as p-values and number of rules. Note that models do not share all
Variables. Interestingly, CLAIMS_PER_DURATION is # 1 for the tree
methods and it was not selected by Logistic.
Coefficients, p-values and Importance.
Vars * Models *
Coeffs
Model Name
M2_TRN_GRAD_
BOOSTING
M2_TRN_LOGIST
IC_STEPWISE M2_TRN_TREE
Coeff /
Import
ance
PVal /
Nrules
Coeff /
Import
ance
PVal /
Nrules
Coeff /
Import
ance
PVal /
Nrules
Variable
1.0000 26.000 1.0000 5.000
CLAIMS_PER_DURATION
DOCTOR_VISITS 0.4035 20.000 -0.0180 0.014 0.2895 2.000
MEMBER_DURATION 0.5643 26.000 -0.0065 0.000 0.3650 2.000
NO_CLAIMS 0.2483 6.000 0.7137 0.000
OPTOM_PRESC 0.5963 21.000 0.2185 0.000 0.5383 5.000
SPEND_PER_CLAIM 0.2202 8.000 0.0000 0.001
TOTAL_SPEND 0.6148 29.000 -0.0000 0.000 0.4404 3.000
INTERCEPT -0.5160 0.000
Logistic Selection Steps
Model
Name
M2_TRN_
LOGISTI
C_STEPW
ISE
# in
mo
del
P-
valu
e
Ste
p
Effect Entered Effect Removed
1 .00
1 no_claims
2 member_duration 2 .00
3 optom_presc 3 .00
4 total_spend 4 .00
5 spend_per_claim 5 .00
6 doctor_visits 6 .01
....
4.2.2:
Specific Tree based
methods, EDA
and diagnostics.
0 Model Interpretation setting.pdf
0 Model Interpretation setting.pdf
Some conclusions and comments so far:
. Logistic stepwise did not select NUM_MEMBERS
that is shown with lowest relative importance in GB and
Trees. More importantly, “claims_per_duration” deemed
most important by tree methods, and disregarded by
logistic. Notice that Logistic Regression does not have
agreed-upon scale of importance. By default, using odds-
ratios.
. CLAIMS_PER is deemed most important single variable for
GB and TREE, but logistic deems NO_CLAIMS as # 1,
OPTOM_PRESC as # 2 (via odds ratios), while GB differed.
. Remaining variables have odds ratios of 1 which seem to
indicate similar effect across, while GB/TREE distinguish
relative importance after first two variables.
Strongly summarized area for brevity sake, added just for completion.
GOF ranks
GOF measure
rank
AURO
C
Avg
Squar
e
Error
Class
Rate
Cum
Lift
3rd
bin
Cum
Resp
Rate
3rd Gini
P - R
AUC
Precis
ion
Rate
Rsqua
re
Cram
er
Tjur
Rank Rank Rank Rank Rank Rank Rank Rank Rank
Unw.
Mean
Unw.
Median
Model Name
1 1 2 1 1 1 1 2 1 1.22 1
005_M2_TRN_GRAD_BOOSTING
007_M2_TRN_LOGISTIC_STEPWISE 3 3 1 3 3 3 3 3 3 2.78 3
008_M2_TRN_TREE 2 2 3 2 2 2 2 1 2 2.00 2
➔ Gradient Boosting is our champion, and omit usual
ROCs, Precision-recall curves, etc.
0 Model Interpretation setting.pdf
Tree representation(s) up to 4 levels Model 'M2_TRN_LG'
Intermediate prediction in parenthesis
7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.178
CLAIMS_PER_DURATION <
0.00791 ( 0.153 )
MEMBER_DURATION < 155.5 (
0.213 )
OPTOM_PRESC < 4.5 ( 0.197 ) OPTOM_PRESC < 1.5 ( 0.178 )
OPTOM_PRESC >= 1.5 ( 0.258 ) 0.258
OPTOM_PRESC >= 4.5 ( 0.514 ) OPTOM_PRESC < 6.5 ( 0.399 ) 0.399
OPTOM_PRESC >= 6.5 ( 0.622 ) 0.622
MEMBER_DURATION >= 155.5 (
0.113 )
CLAIMS_PER_DURATION <
0.00376 ( 0.099 )
OPTOM_PRESC >= 3.5 ( 0.204 ) 0.204
OPTOM_PRESC < 3.5 ( 0.093 ) 0.093
CLAIMS_PER_DURATION >=
0.00376 ( 0.262 )
OPTOM_PRESC < 2.5 ( 0.235 ) 0.235
OPTOM_PRESC >= 2.5 ( 0.39 ) 0.390
CLAIMS_PER_DURATION >=
0.00791 ( 0.572 )
CLAIMS_PER_DURATION <
0.017 ( 0.469 )
OPTOM_PRESC < 2.5 ( 0.421 ) CLAIMS_PER_DURATION >=
0.01272 ( 0.496 ) 0.496
CLAIMS_PER_DURATION <
0.01272 ( 0.386 ) 0.386
OPTOM_PRESC >= 2.5 ( 0.61 ) OPTOM_PRESC >= 6.5 ( 0.8 ) 0.800
OPTOM_PRESC < 6.5 ( 0.571 ) 0.571
CLAIMS_PER_DURATION >=
0.017 ( 0.755 )
NO_CLAIMS < 3.5 ( 0.652 ) OPTOM_PRESC >= 4.5 ( 0.845 ) 0.845
OPTOM_PRESC < 4.5 ( 0.633 ) 0.633
NO_CLAIMS >= 3.5 ( 0.859 ) NO_CLAIMS < 5.5 ( 0.796 ) 0.796
NO_CLAIMS >= 5.5 ( 0.938 ) 0.938
Tree representation(s) up to 4 levels Model 'M2_TRN_GB'
Intermediate prediction in parenthesis
7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND
Requested Tree Models: Names & Descriptions. Pred
Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob.
0.870
CLAIMS_PER_DURATION <
0.00583 ( 0.15 )
TOTAL_SPEND < 4150 ( 0.583 ) MEMBER_DURATION < 190 (
0.686 )
OPTOM_PRESC >= 1.5 ( 0.87 )
OPTOM_PRESC < 1.5 ( 0.63 ) 0.630
MEMBER_DURATION >= 190 (
0.25 )
TOTAL_SPEND >= 3400 ( 0.151
) 0.151
TOTAL_SPEND < 3400 ( 0.348 ) 0.348
TOTAL_SPEND >= 4150 ( 0.143
)
OPTOM_PRESC < 3.5 ( 0.13 ) MEMBER_DURATION < 182.5 (
0.165 ) 0.165
MEMBER_DURATION >= 182.5 (
0.087 ) 0.087
OPTOM_PRESC >= 3.5 ( 0.329 ) MEMBER_DURATION < 118.5 (
0.556 ) 0.556
MEMBER_DURATION >= 118.5 (
0.234 ) 0.234
CLAIMS_PER_DURATION >=
0.00583 ( 0.527 )
CLAIMS_PER_DURATION <
0.01954 ( 0.433 )
OPTOM_PRESC < 0.5 ( 0.246 ) SPEND_PER_CLAIM >= 4016.67
( 0.233 ) 0.233
SPEND_PER_CLAIM < 4016.67 (
0.354 ) 0.354
OPTOM_PRESC >= 0.5 ( 0.548 ) OPTOM_PRESC >= 3.5 ( 0.797 ) 0.797
OPTOM_PRESC < 3.5 ( 0.492 ) 0.492
CLAIMS_PER_DURATION >=
0.01954 ( 0.803 )
NO_CLAIMS < 4.5 ( 0.742 ) DOCTOR_VISITS >= 3 ( 0.788 ) 0.788
DOCTOR_VISITS < 3 ( 0.632 ) 0.632
NO_CLAIMS >= 4.5 ( 0.91 ) CLAIMS_PER_DURATION <
0.02491 ( 0.851 ) 0.851
CLAIMS_PER_DURATION >=
0.02491 ( 0.92 ) 0.920
Curiosly while node numbers don’t mean anything across models, obvious that
GB and LG share similar structure despite being very different algorithms. However, Tree
Representations are just approximations, except in Tree case.
Discussion of comparison of Tree representations between LG and GB.
The two methods split initially on Claims_per_duration, but at very
different values (0.00791 (LG) vs. 0.00583 (GB). Remember that actual
logistic regression results had dropped Claims_per_duration.
And later levels obviously differ since the initial split is quite different.
Therefore, these two models should ‘a priori’ differ in model
interpretation.
0 Model Interpretation setting.pdf

More Related Content

PDF
4_5_Model Interpretation and diagnostics part 4.pdf
PDF
4_5_Model Interpretation and diagnostics part 4_B.pdf
PDF
Visual Tools for explaining Machine Learning Models
PPTX
MODULE-2.pptx machine learning notes for vtu 6th sem cse
PPTX
MODULE-3edited.pptx machine learning modulk
PPTX
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
PDF
Machine Learning.pdf
PDF
Machine learning Mind Map
4_5_Model Interpretation and diagnostics part 4.pdf
4_5_Model Interpretation and diagnostics part 4_B.pdf
Visual Tools for explaining Machine Learning Models
MODULE-2.pptx machine learning notes for vtu 6th sem cse
MODULE-3edited.pptx machine learning modulk
Get hands-on with Explainable AI at Machine Learning Interpretability(MLI) Gym!
Machine Learning.pdf
Machine learning Mind Map

Similar to 0 Model Interpretation setting.pdf (20)

PDF
Human in the loop: Bayesian Rules Enabling Explainable AI
PPTX
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
PDF
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
PPTX
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
PPTX
Hima_Lakkaraju_XAI_ShortCourse.pptx
PDF
Machine Learning Model Validation (Aijun Zhang 2024).pdf
PDF
Chapter 02-logistic regression
PDF
Machine learning meetup
PPTX
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
PPT
Free Ebooks Download ! Edhole.com
PPTX
Statistical Modeling in 3D: Explaining, Predicting, Describing
PDF
Unit---5.pdf of ba in srcc du gst before exam
PPTX
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
PDF
Shmueli
PDF
Conference_paper.pdf
PPTX
When Models Meet Data: From ancient science to todays Artificial Intelligence...
PPTX
To explain or to predict
PPTX
EDA by Sastry.pptx
PDF
Machine Learning - Principles
PPTX
Step by Step guide to executing an analytics project
Human in the loop: Bayesian Rules Enabling Explainable AI
Predictive analytics in Information Systems Research (TSWIM 2015 keynote)
Learning to learn Model Behavior: How to use "human-in-the-loop" to explain d...
The 10 Algorithms Machine Learning Engineers Need to Know.pptx
Hima_Lakkaraju_XAI_ShortCourse.pptx
Machine Learning Model Validation (Aijun Zhang 2024).pdf
Chapter 02-logistic regression
Machine learning meetup
Theory and Practice of Integrating Machine Learning and Conventional Statisti...
Free Ebooks Download ! Edhole.com
Statistical Modeling in 3D: Explaining, Predicting, Describing
Unit---5.pdf of ba in srcc du gst before exam
Big Data - To Explain or To Predict? Talk at U Toronto's Rotman School of Ma...
Shmueli
Conference_paper.pdf
When Models Meet Data: From ancient science to todays Artificial Intelligence...
To explain or to predict
EDA by Sastry.pptx
Machine Learning - Principles
Step by Step guide to executing an analytics project
Ad

More from Leonardo Auslender (20)

PDF
PDF
Ensembles.pdf
PDF
Suppression Enhancement.pdf
PDF
4_2_Ensemble models and gradient boosting2.pdf
PDF
4_2_Ensemble models and grad boost part 2.pdf
PDF
4_2_Ensemble models and grad boost part 3.pdf
PDF
4_3_Ensemble models and grad boost part 2.pdf
PDF
4_2_Ensemble models and grad boost part 1.pdf
PDF
4_1_Tree World.pdf
PDF
Classification methods and assessment.pdf
PDF
Linear Regression.pdf
PDF
4 MEDA.pdf
PDF
2 UEDA.pdf
PDF
3 BEDA.pdf
PDF
PDF
0 Statistics Intro.pdf
PDF
4 2 ensemble models and grad boost part 3 2019-10-07
PDF
4 2 ensemble models and grad boost part 2 2019-10-07
PDF
4 2 ensemble models and grad boost part 1 2019-10-07
Ensembles.pdf
Suppression Enhancement.pdf
4_2_Ensemble models and gradient boosting2.pdf
4_2_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 3.pdf
4_3_Ensemble models and grad boost part 2.pdf
4_2_Ensemble models and grad boost part 1.pdf
4_1_Tree World.pdf
Classification methods and assessment.pdf
Linear Regression.pdf
4 MEDA.pdf
2 UEDA.pdf
3 BEDA.pdf
0 Statistics Intro.pdf
4 2 ensemble models and grad boost part 3 2019-10-07
4 2 ensemble models and grad boost part 2 2019-10-07
4 2 ensemble models and grad boost part 1 2019-10-07
Ad

Recently uploaded (20)

PDF
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
PPT
Predictive modeling basics in data cleaning process
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Microsoft Core Cloud Services powerpoint
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
How to run a consulting project- client discovery
PDF
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
PDF
Mega Projects Data Mega Projects Data
PDF
annual-report-2024-2025 original latest.
PPTX
Database Infoormation System (DBIS).pptx
Data Engineering Interview Questions & Answers Cloud Data Stacks (AWS, Azure,...
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Predictive modeling basics in data cleaning process
[EN] Industrial Machine Downtime Prediction
Topic 5 Presentation 5 Lesson 5 Corporate Fin
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Microsoft Core Cloud Services powerpoint
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Qualitative Qantitative and Mixed Methods.pptx
SAP 2 completion done . PRESENTATION.pptx
How to run a consulting project- client discovery
Oracle OFSAA_ The Complete Guide to Transforming Financial Risk Management an...
DATA COLLECTION METHODS-ppt for nursing research
Data_Analytics_and_PowerBI_Presentation.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
REAL ILLUMINATI AGENT IN KAMPALA UGANDA CALL ON+256765750853/0705037305
Mega Projects Data Mega Projects Data
annual-report-2024-2025 original latest.
Database Infoormation System (DBIS).pptx

0 Model Interpretation setting.pdf

  • 3. Overall Description The present work aims at presenting tools for model interpretation derived from Partial Dependency Plots (in many different guises, explained in the text), and contrasted to osterior probabilities, hereby called scores. The work comprises 4 Powerpoint Documents, with a possible fifth (if I get to it), numbered 0 to 4. 0 describes overall issues, introduces the working data set and models. At the risk of spoiling end results, the Multivariate section provides insights at (almost) the observation level, and requires univariate and bivariate support. This conclusion is quite surprising to me since I thought that Univariate and Bivariate would be rendered lacking. But context reality is far more complex than expected, and model interpretations are as varied as the different contexts available in the data, that should not be dismissed all too eagerly. This work is based mostly on visualitzation and I have tried to avoid statistical inference and lengthy tables.
  • 4. Abstract Statistical and data science models: Are they Interpretive black- boxes ? Let’s try for NO. Molnar’s (2018) “Interpretable Machine Learning”: big effort in finding solutions. Our presentation is humbler: visual tools for model interpretation based on partial dependency plots and their variants, such as collapsed PDPs created by the presenter, some of which may be polemical and debatable. Almost no use of statistical inference. Audience should be versed in models creation, and at least some insight into partial dependency plots. Presentation based on simple working example with 8 predictors and one binary target variable. Not possible to detail exhaustively every method described in this presentation. Extensive document in preparation. Presentation requires 3 hours and wide awake audience. Double time if not awake. Sleepers will be punished accordingly. Slides Marked **** can be skipped for easier first reading.
  • 5. Contents: Model Interpretation (MI) 1. Introduction and General Notes 2. Confounding 3. Model Interpretation (MI) and Categorization: UMI, BMI, MMI. 4. Binary Target Study 4.1: Report of coefficients, estimates, etc. 4.2: Models Structures 4.3: GOF and model Interpretation 5. Univariate Model Interpretation: UMI 5. Profile and Model Interpretation area. Univariate Model Interpreation UMI. 6. Partial Dependency plots (PDPs) and their variants. UMI. 6. Bivariate Model Interpretation: BMI 7. PDPs and Bivariate Model Interpretation (BMI.) 7.1: UMI vs. BMI. 8. Multivariate model interpretation.: MMI 9. Future Steps 10. Observation level Interpretation. 11. References
  • 7. Overall comments and introduction. Presentation by way of example focusing on Fraud/Default Data set and continuing previous chapters available on web (standard class for Principal Analytics Prep). Aim: study interpretation/diagnosis mostly via Partial Dependency Plots of logistic regression, Classification Trees and Gradient Boosting. Presentation(s) available at https://www.slideshare.net/LeonardoAuslender/visual- tools-for-interpretation-of-machine-learning-models At present, lots of written opinions and distinctions about topic. No room or desire to discuss them all. See Molnar’s (2018) book for an overall view, O’Rourke (2018), Doshi- Velez et (2017).
  • 8. Overall comments and introduction (cont 1). No discussion about imbalanced data set modeling or other modeling issues such as model selection. This presentation introduces novel visual concepts as well as tools derived from Partial Dependency Plots (PDP): -Overall PDP -Collapsed PDP and residuals -Marginal PDP -PDP vs. actual scores, …. and how they assist in model interpretation.
  • 9. Model Interpretation (MI) and model building issues. 1) Why/where model makes mistakes (large residuals, outliers, etc.)? 2) Which/when attributes (alone / group) end up being important? 3) Why non-importants? 4) Observation level predictions differ by models? However, immediate aim is NOT interpretations at observation level (why predicted sick/churner/innocent…) but
  • 10. Objectives of MI (cont. 1) Why not directly at observation level? Suppose model to predict entertainment type preference for database of families in large cities. Since not possible to obtain updated family preferences consistently, (i.e., data are ‘soft’), models necessarily are not interpretable at specific family levels. Contrariwise, disease diagnostic prediction is closer to individual explanation and interpretablity (data typically ‘hard’). MOTTO: Posterior probability follows Data + Model algorithm/s. Interpretation follows primarily probability but must include data (i.e., context) ➔
  • 13. Model Interpretation categorization. Just as in EDA (but on model results, i.e., predictions), not on initial data), three types of MI: Univariate Model Interpretation (UMI): One variable at a time vis-à-vis predictions/probs. EASIEST to understand and huge source of “makes sense” discourse. E.g., Classical linear models interpretations;, reasons to decline a bank loan, etc. Bivariate Model Interpretation (BMI): Looking at pairs of variables to interpret model results. Correlation measures immediately spring to mind. Multivariate Model Interpretation (MMI): Overall model interpretation, most difficult and valuable. Typically, most work results in UMI and perhaps BMI. Will aim for MMI as well. Aside: Does Occam’s razor help? “Pluralitas non est ponenda sine necessitate. “ ➔ can lead to interpret and then choose model, or choose model and then interpret ➔ does not help us.
  • 14. Model Interpretation presentation We will present results in UMI, BMI and MMI order, and at end, compare across the three methodologies. Aim is to find insights and contradictions when generalizing UMI without validating interpretation in BMI and MMI. And likewise, to verify strong UMI results that are still prevalent in BMI and MMI.
  • 16. Confounding rears its ugly head. See earlier chapters for review and examples. Must read, not elaborated Herein.
  • 18. Golden Days of Linear Regression Interpretation *** Based on “ceteris paribus” assumption that fails In case of even relatively small VIFs. At present, rule of thumb VIF >= 10 (R-sq = .90 among predictors) ➔ unstable model (see earlier slides in shareware …). “Ceteris paribus” exercise: Keeping all other predictors constant, an increase in …. But if R-sq among predictors is even 10%, not possible to keep all predictors constant while increasing by 1 the variable of interest, as per ceteris paribus frame of analysis. Advantages however: EASY to conceptualize because practice follows notion of mostly bivariate correlation (keeping all else constant, reduces relationship to just 1 var vs. predictions ➔ UMI). But wrong with even small bivariate corrs and mostly wrong in multivariate case. Let us see …..
  • 19. ➔Corr (X,Y) = if SD(Y) = SD(X). That is, if both vars Standardized, otherwise same sign at least, and interpretation from correlation holds in simple regression case. Notice that regression of X on Y is NOT inverse of regression of Y on X because of SD(X) and SD(Y). = + + − = =  − =   / Confusion on signs of coefficients and interpretation. Simple LR case. ( ) ˆ { ( ) } ˆ ( ) ( ) y i xy xy x i xy Y X s Y Y r r s X X sg r sg      2 1 2 2 β̂ ̂
  • 20. 20 5/4/2022 In multiple linear regression, previous relationship does not hold because predictors can be correlated (rxz) weighted by ryz, hinting at co-linearity and/or relationships of supression/enhancement (paper on suppression/enhancement in shareware.net)➔ = + + +  = + + − = − =     . . . 2 2 But in multivariate, e.g.: , estimated equation (emphasizing "partial") and for example: ˆ ˆ ˆ , ˆ 1 ˆ ( ) ( ) ( ) ( ) and 1 YX Z YZ X Y YX YZ XZ YX Z X XZ YX YX YZ XZ XZ Y X Z Y a X Z s r r r s r sg sg r abs r abs r r r        
  • 21. Comment on Linear Model Interpretation Even in traditional UMI land, multivariate relations given by Partial- and semi-partial correlations must be part of the interpretation. Note that while correlation is a bivariate relationship, partial and semi-partial corrs can be extended to multivariate setting. In case of binary target, these relationships are not fully analyzed. However, even BMI and certainly MMI not so often performed.
  • 23. EDA and Model Interpretation EDA analyzes data sets without reference to dependent or target variable (DV), which is instead done by modeling. Thus, MI = EDA + Predictions Analysis. Nevertheless, for given values(s) of DV or of predicted values, UMI, BMI and MMI can utilize EDA tools. For instance, histogram of posterior model probabilities is part of Model UEDA and thus part of UMI. Thus, MI is based on relationship of predictions (and residuals) vis-à-vis single, pairs, triads, tetrads, etc. of predictors. And this translates in different techniques such as Original PDPs, Pair PDPs, triads, etc. to be reviewed below. NB: We utilize binning and rescaling of variables ranges for easier visual interpretation. The number of bins is 10 mostly for UMI analysis, and 3 otherwise. We do not discuss issues of optimal binning, left to the reader.
  • 25. Searching for Important variables en route to answering modeling question. QUESTION: minimum components to make a car go along highway. 1) Engine 2) Tires 3) Steering wheel 4) Transmission 5) Gas 6) ….. Other MMI aspects and interrelations. Take just one of them out, and car won’t MOVE ➔ EXISTENCE OF NO SINGLE most important variable. Instead, minimum irreducible set of them is NECESSARY. In Data Science case with n → ∞, possibly many subsets of ‘important’ variables for (n, p) subsets. Typically, “suspect VARIABLES” good starting point of research. “STARTING” is key word.
  • 29. Basic DATA set(s) Information Model Name Item Information 1 M2 TRN DATA set train . TRN num obs 3595 1 VAL DATA set 1 . VAL num obs 0 1 TST DATA set 1 . TST num obs 0 1 2 Dep. Var fraud 1 TRN % Events 20.389 1 VAL % Events 1 TST % Events 1
  • 30. — 30 — Data set: Definition by way of Example • Health insurance company: Ophtamologic Insurance Claims • Is claim valid or fraudulent? Binary target. • Full description and analysis of this data set in https://www.slideshare.net/LeonardoAuslender (lectures at Principal Analytics Prep).
  • 31. While presenting 3 models results, we’ll concentrate on ‘best’ model for Interpretation for brevity sake, except to mention specific examples of Different model interpretations across models. RequestedModels:Names&Descriptions. Mode l# FullModelName ModelDescription 2 002_M2_TRN_GRAD_BOOSTING GradientBoosting 004_M2_TRN_LOGISTIC_STEPWISE LogisticSTEPWISETRN 4 005_M2_TRN_TREE TREEmodel 5
  • 32. Original Vars + Labels Model Name M2 Var # Variable Label ** 1 FRAUD Fraudulent Activity yes/no 2 TOTAL_SPEND Total spent on opticals ** 3 DOCTOR_VISITS Total visits to a doctor ** 4 NO_CLAIMS No of claims made recently ** 5 MEMBER_DURATION Membership duration ** 6 OPTOM_PRESC Number of opticals claimed ** 7 SPEND_PER_CLAIM Expenses per claim ** 8 CLAIMS_PER_DURATIO N Claims per duration **
  • 34. Similar, not identical. Logistic & Trees achieve [0, 1].
  • 35. Probability distributions very different ➔ Model interpretation must be dependent on model selection. Possible to ‘mix’ all models into one, Ensemble, not in this ppt. (See slides in shareware).
  • 37. Some conclusions and comments so far: (cont.) Probability distributions differ in: 1) Extreme points: Logistic and TREES achieve [0; 1], not necessarily other methods, as GradBoost in our case. 2) Very different % obs in Models’ probability bins. 3) % events per bin fairly linear, except for Logistic ‘drop’ at 0.7. Grad Boosting has higher % events for higher probability levels than other 2 models. 4) After about 0.4 of posterior probability, 3 methods have similar distributions. Quite different in segment 0 - < 0.4. Notice GB and TREE having large proportion of observation at lower probability levels, compared to Logistic. 5) Relative but not absolute Ml Information can be inferred. % Events different across models ➔ different probability estimates especially above segment 0 - < 0.4. Since higher probability levels reflect higher % events, MI necessarily different.
  • 38. Let’s get into Data Details for sake Of completion. Quick EDA area. U(nivariate) EDA = UEDA
  • 39. Note “small” Claims_per_duration And “NO_claims” values at p95.
  • 42. Note: Importance and coefficients share one column as well as p-values and number of rules. Note that models do not share all Variables. Interestingly, CLAIMS_PER_DURATION is # 1 for the tree methods and it was not selected by Logistic. Coefficients, p-values and Importance. Vars * Models * Coeffs Model Name M2_TRN_GRAD_ BOOSTING M2_TRN_LOGIST IC_STEPWISE M2_TRN_TREE Coeff / Import ance PVal / Nrules Coeff / Import ance PVal / Nrules Coeff / Import ance PVal / Nrules Variable 1.0000 26.000 1.0000 5.000 CLAIMS_PER_DURATION DOCTOR_VISITS 0.4035 20.000 -0.0180 0.014 0.2895 2.000 MEMBER_DURATION 0.5643 26.000 -0.0065 0.000 0.3650 2.000 NO_CLAIMS 0.2483 6.000 0.7137 0.000 OPTOM_PRESC 0.5963 21.000 0.2185 0.000 0.5383 5.000 SPEND_PER_CLAIM 0.2202 8.000 0.0000 0.001 TOTAL_SPEND 0.6148 29.000 -0.0000 0.000 0.4404 3.000 INTERCEPT -0.5160 0.000
  • 43. Logistic Selection Steps Model Name M2_TRN_ LOGISTI C_STEPW ISE # in mo del P- valu e Ste p Effect Entered Effect Removed 1 .00 1 no_claims 2 member_duration 2 .00 3 optom_presc 3 .00 4 total_spend 4 .00 5 spend_per_claim 5 .00 6 doctor_visits 6 .01
  • 47. Some conclusions and comments so far: . Logistic stepwise did not select NUM_MEMBERS that is shown with lowest relative importance in GB and Trees. More importantly, “claims_per_duration” deemed most important by tree methods, and disregarded by logistic. Notice that Logistic Regression does not have agreed-upon scale of importance. By default, using odds- ratios. . CLAIMS_PER is deemed most important single variable for GB and TREE, but logistic deems NO_CLAIMS as # 1, OPTOM_PRESC as # 2 (via odds ratios), while GB differed. . Remaining variables have odds ratios of 1 which seem to indicate similar effect across, while GB/TREE distinguish relative importance after first two variables.
  • 48. Strongly summarized area for brevity sake, added just for completion.
  • 49. GOF ranks GOF measure rank AURO C Avg Squar e Error Class Rate Cum Lift 3rd bin Cum Resp Rate 3rd Gini P - R AUC Precis ion Rate Rsqua re Cram er Tjur Rank Rank Rank Rank Rank Rank Rank Rank Rank Unw. Mean Unw. Median Model Name 1 1 2 1 1 1 1 2 1 1.22 1 005_M2_TRN_GRAD_BOOSTING 007_M2_TRN_LOGISTIC_STEPWISE 3 3 1 3 3 3 3 3 3 2.78 3 008_M2_TRN_TREE 2 2 3 2 2 2 2 1 2 2.00 2 ➔ Gradient Boosting is our champion, and omit usual ROCs, Precision-recall curves, etc.
  • 51. Tree representation(s) up to 4 levels Model 'M2_TRN_LG' Intermediate prediction in parenthesis 7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.178 CLAIMS_PER_DURATION < 0.00791 ( 0.153 ) MEMBER_DURATION < 155.5 ( 0.213 ) OPTOM_PRESC < 4.5 ( 0.197 ) OPTOM_PRESC < 1.5 ( 0.178 ) OPTOM_PRESC >= 1.5 ( 0.258 ) 0.258 OPTOM_PRESC >= 4.5 ( 0.514 ) OPTOM_PRESC < 6.5 ( 0.399 ) 0.399 OPTOM_PRESC >= 6.5 ( 0.622 ) 0.622 MEMBER_DURATION >= 155.5 ( 0.113 ) CLAIMS_PER_DURATION < 0.00376 ( 0.099 ) OPTOM_PRESC >= 3.5 ( 0.204 ) 0.204 OPTOM_PRESC < 3.5 ( 0.093 ) 0.093 CLAIMS_PER_DURATION >= 0.00376 ( 0.262 ) OPTOM_PRESC < 2.5 ( 0.235 ) 0.235 OPTOM_PRESC >= 2.5 ( 0.39 ) 0.390 CLAIMS_PER_DURATION >= 0.00791 ( 0.572 ) CLAIMS_PER_DURATION < 0.017 ( 0.469 ) OPTOM_PRESC < 2.5 ( 0.421 ) CLAIMS_PER_DURATION >= 0.01272 ( 0.496 ) 0.496 CLAIMS_PER_DURATION < 0.01272 ( 0.386 ) 0.386 OPTOM_PRESC >= 2.5 ( 0.61 ) OPTOM_PRESC >= 6.5 ( 0.8 ) 0.800 OPTOM_PRESC < 6.5 ( 0.571 ) 0.571 CLAIMS_PER_DURATION >= 0.017 ( 0.755 ) NO_CLAIMS < 3.5 ( 0.652 ) OPTOM_PRESC >= 4.5 ( 0.845 ) 0.845 OPTOM_PRESC < 4.5 ( 0.633 ) 0.633 NO_CLAIMS >= 3.5 ( 0.859 ) NO_CLAIMS < 5.5 ( 0.796 ) 0.796 NO_CLAIMS >= 5.5 ( 0.938 ) 0.938
  • 52. Tree representation(s) up to 4 levels Model 'M2_TRN_GB' Intermediate prediction in parenthesis 7 Vars: 1._CLAIMS_PER_DURATION 2._DOCTOR_VISITS 3._MEMBER_DURATION 4._NO_CLAIMS 5._OPTOM_PRESC 6._SPEND_PER_CLAIM 7._TOTAL_SPEND Requested Tree Models: Names & Descriptions. Pred Level 1 + Prob. Level 2 + Prob. Level 3 + Prob. Level 4 + Prob. 0.870 CLAIMS_PER_DURATION < 0.00583 ( 0.15 ) TOTAL_SPEND < 4150 ( 0.583 ) MEMBER_DURATION < 190 ( 0.686 ) OPTOM_PRESC >= 1.5 ( 0.87 ) OPTOM_PRESC < 1.5 ( 0.63 ) 0.630 MEMBER_DURATION >= 190 ( 0.25 ) TOTAL_SPEND >= 3400 ( 0.151 ) 0.151 TOTAL_SPEND < 3400 ( 0.348 ) 0.348 TOTAL_SPEND >= 4150 ( 0.143 ) OPTOM_PRESC < 3.5 ( 0.13 ) MEMBER_DURATION < 182.5 ( 0.165 ) 0.165 MEMBER_DURATION >= 182.5 ( 0.087 ) 0.087 OPTOM_PRESC >= 3.5 ( 0.329 ) MEMBER_DURATION < 118.5 ( 0.556 ) 0.556 MEMBER_DURATION >= 118.5 ( 0.234 ) 0.234 CLAIMS_PER_DURATION >= 0.00583 ( 0.527 ) CLAIMS_PER_DURATION < 0.01954 ( 0.433 ) OPTOM_PRESC < 0.5 ( 0.246 ) SPEND_PER_CLAIM >= 4016.67 ( 0.233 ) 0.233 SPEND_PER_CLAIM < 4016.67 ( 0.354 ) 0.354 OPTOM_PRESC >= 0.5 ( 0.548 ) OPTOM_PRESC >= 3.5 ( 0.797 ) 0.797 OPTOM_PRESC < 3.5 ( 0.492 ) 0.492 CLAIMS_PER_DURATION >= 0.01954 ( 0.803 ) NO_CLAIMS < 4.5 ( 0.742 ) DOCTOR_VISITS >= 3 ( 0.788 ) 0.788 DOCTOR_VISITS < 3 ( 0.632 ) 0.632 NO_CLAIMS >= 4.5 ( 0.91 ) CLAIMS_PER_DURATION < 0.02491 ( 0.851 ) 0.851 CLAIMS_PER_DURATION >= 0.02491 ( 0.92 ) 0.920
  • 53. Curiosly while node numbers don’t mean anything across models, obvious that GB and LG share similar structure despite being very different algorithms. However, Tree Representations are just approximations, except in Tree case.
  • 54. Discussion of comparison of Tree representations between LG and GB. The two methods split initially on Claims_per_duration, but at very different values (0.00791 (LG) vs. 0.00583 (GB). Remember that actual logistic regression results had dropped Claims_per_duration. And later levels obviously differ since the initial split is quite different. Therefore, these two models should ‘a priori’ differ in model interpretation.