Department of Mathematics unchitta@ucla.edu
Interpretability in ML
& Sparse Linear Models
Unchitta Kanjanasaratool
Mentor: Denali Molitor
Department of Mathematics unchitta@ucla.edu
Several components: model transparency, holistic model interpretability,
modular-level interpretability, local interpretability for a single prediction or a
group of predictions.
Proposed levels of evaluating interpretability (Doshi-Velez & Kim, 2017):
1. Application level
2. Human level
3. Function level
Definition (non-mathematical): Interpretability is the degree to which a
human can consistently explain or interpret why a model makes certain
decisions.
Interpretability in ML: Why It’s Important
Department of Mathematics unchitta@ucla.edu
We may not always need interpretability e.g. in low-risk or extensively studied
problems, but knowing the why can help us learn more about why a model
might fail.
Transparency & Ethics: The EU for example mandates that automated decisions
must be explainable and should respect fundamental rights.
Interpretability also satisfies human curiosity and learning.
Interpretability helps explain why a model makes certain predictions.
Interpretability in ML: Why It’s Important
Department of Mathematics unchitta@ucla.edu
Assume a linear relationship between the input and response variables, for example
Y = β0
+ β1
X1
+ … + βp
Xp
+ ϵ ,
we can predict
ŷ = + X1
+ Xp
,
where the ’s are the coefficients in the residual sum of squares (RSS) minimization
problem, namely
β^
= arg minβ0…βp
𝚺i=0,i=p
(yi
- ŷi
)2
.
The linear coefficients (the β’s) make this model easy to interpret on a modular
level, and help us see the level of influence each variable has on the prediction.
Interpretable Models: Linear Regression
Department of Mathematics unchitta@ucla.edu
In practice however, interpreting linear models can still be hard if there are too many
variables. There are certain variations of linear regression that can help with this
problem, such as the sparse models. An example would be the “least absolute
shrinkage and selection operator,” or lasso regression which minimizes
RSS subject to 𝚺j=0,j=p
|βj
| ≤ s.
This is equivalent to minimizing, with regularization, the quantity
RSS + ƛ 𝚺j=0,j=p
|βj
|,
where the second term in the quantity is some constant lambda times the L1 norm of
the coefficients. (Lambda normally chosen using cross-validation to yield the minimal
β’s).
Sparse Models & Lasso Regression
Department of Mathematics unchitta@ucla.edu
Intuitively, lasso penalizes large models and selects small coefficients. Unintuitively
(but we will see why), it also yields a model where a number of the coefficients are 0
or essentially close to 0. This is an example of a sparse regression model, which
penalizes large models (i.e. lots of features) and performs variable selection.
The penalization is controlled via lambda: the larger it is, the bigger the sparsity. If
lambda is small enough, lasso will yield the same results as the least square
estimates (standard linear regression).
Sparse Models & Lasso Regression
Department of Mathematics unchitta@ucla.edu
Having fewer variables often mean better interpretability.
Your model is explained by only a number of significant features, which reduces
complexity and increases explainability. This is especially important when your data
has hundreds or thousands of features; the complexity may be beyond human
comprehension and there may not be enough observations.
Other methods for introducing sparsity to linear models include feature selection
processes, subset selection and step-wise procedures e.g. forward and backward
selection, sparse PCA.
Sparsity & Interpretability
Department of Mathematics unchitta@ucla.edu
Sparse Property of Lasso
The variable selection property of lasso regression comes from the L1 regularizer.
Consider the following figure in 2D problems. In higher dimensions the constraint
region becomes polytopes (shapes with flat sides/sharp edges).
Courtesy of Introduction to Statistical Learning, G. James, et al.
The red ellipses represent the
contours of the RSS of lasso (left)
and ridge regression (right). The solid
green areas are the corresponding
constraint functions,
and .
Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Here we are trying to predict the
number of bike rentals as a function of
11 variables. As lambda increases the
response variable depends on smaller
number of predictors.
Department of Mathematics unchitta@ucla.edu
Example: UCI Bike Sharing Dataset
Likely the most
significant
features
Department of Mathematics unchitta@ucla.edu
Introduction to Statistical Learning with
Applications in R, Gareth James, et al.
Resources
Interpretable Machine Learning: A Guide for
Making Black Box Models Explainable,
Christopher Molnar

More Related Content

PPTX
All types of model(Simulation & Modelling) #ShareThisIfYouLike
PPTX
Machine learning
PDF
What is pattern recognition (lecture 3 of 6)
PDF
[Emnlp] what is glo ve part i - towards data science
PDF
Major and Minor Elements of Object Model
PPT
Materi diagram kelas-implementasi-1
PPTX
Bi model face recognition framework
PDF
Machine Learning Algorithms
All types of model(Simulation & Modelling) #ShareThisIfYouLike
Machine learning
What is pattern recognition (lecture 3 of 6)
[Emnlp] what is glo ve part i - towards data science
Major and Minor Elements of Object Model
Materi diagram kelas-implementasi-1
Bi model face recognition framework
Machine Learning Algorithms

Similar to Interpretability in ML & Sparse Linear Regression (20)

PDF
Sparsenet
PDF
Linear models for data science
PPT
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
PDF
PhysRevE.89.042911
PDF
lec3_annotated.pdf ml csci 567 vatsal sharan
PDF
Linear regression without tears
PDF
lab_linear_regression_hy539 (1)_221109_035050.pdf
PDF
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
PDF
Kaggle Days Paris - Alberto Danese - ML Interpretability
PDF
4_5_Model Interpretation and diagnostics part 4.pdf
PPTX
Interpretable Machine Learning
PPTX
MF Presentation.pptx
PDF
03a_Regression in Machine learning For BDA.pdf
PDF
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
PDF
Linear_Models_with_R_----_(2._Estimation).pdf
PDF
Interpretable machine learning : Methods for understanding complex models
PDF
Linear Regression.pdf
PDF
GDG Community Day 2023 - Interpretable ML in production
Sparsenet
Linear models for data science
Forecasting Default Probabilities in Emerging Markets and Dynamical Regula...
PhysRevE.89.042911
lec3_annotated.pdf ml csci 567 vatsal sharan
Linear regression without tears
lab_linear_regression_hy539 (1)_221109_035050.pdf
QMC: Operator Splitting Workshop, Sparse Non-Parametric Regression - Noah Sim...
Kaggle Days Paris - Alberto Danese - ML Interpretability
4_5_Model Interpretation and diagnostics part 4.pdf
Interpretable Machine Learning
MF Presentation.pptx
03a_Regression in Machine learning For BDA.pdf
Data Science - Part XV - MARS, Logistic Regression, & Survival Analysis
Linear_Models_with_R_----_(2._Estimation).pdf
Interpretable machine learning : Methods for understanding complex models
Linear Regression.pdf
GDG Community Day 2023 - Interpretable ML in production
Ad

Recently uploaded (20)

PPT
Image processing and pattern recognition 2.ppt
PPTX
1 hour to get there before the game is done so you don’t need a car seat for ...
PPTX
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PDF
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
PPTX
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
PDF
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
PPTX
retention in jsjsksksksnbsndjddjdnFPD.pptx
PPTX
Business_Capability_Map_Collection__pptx
PPTX
SET 1 Compulsory MNH machine learning intro
PPTX
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PPTX
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
PPT
statistics analysis - topic 3 - describing data visually
PPTX
MBA JAPAN: 2025 the University of Waseda
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
New ISO 27001_2022 standard and the changes
Image processing and pattern recognition 2.ppt
1 hour to get there before the game is done so you don’t need a car seat for ...
Crypto_Trading_Beginners.pptxxxxxxxxxxxxxx
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
Jean-Georges Perrin - Spark in Action, Second Edition (2020, Manning Publicat...
CYBER SECURITY the Next Warefare Tactics
Loose-Leaf for Auditing & Assurance Services A Systematic Approach 11th ed. E...
sac 451hinhgsgshssjsjsjheegdggeegegdggddgeg.pptx
Tetra Pak Index 2023 - The future of health and nutrition - Full report.pdf
retention in jsjsksksksnbsndjddjdnFPD.pptx
Business_Capability_Map_Collection__pptx
SET 1 Compulsory MNH machine learning intro
Lesson-01intheselfoflifeofthekennyrogersoftheunderstandoftheunderstanded
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
chuitkarjhanbijunsdivndsijvndiucbhsaxnmzsicvjsd
statistics analysis - topic 3 - describing data visually
MBA JAPAN: 2025 the University of Waseda
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
New ISO 27001_2022 standard and the changes
Ad

Interpretability in ML & Sparse Linear Regression

  • 1. Department of Mathematics unchitta@ucla.edu Interpretability in ML & Sparse Linear Models Unchitta Kanjanasaratool Mentor: Denali Molitor
  • 2. Department of Mathematics unchitta@ucla.edu Several components: model transparency, holistic model interpretability, modular-level interpretability, local interpretability for a single prediction or a group of predictions. Proposed levels of evaluating interpretability (Doshi-Velez & Kim, 2017): 1. Application level 2. Human level 3. Function level Definition (non-mathematical): Interpretability is the degree to which a human can consistently explain or interpret why a model makes certain decisions. Interpretability in ML: Why It’s Important
  • 3. Department of Mathematics unchitta@ucla.edu We may not always need interpretability e.g. in low-risk or extensively studied problems, but knowing the why can help us learn more about why a model might fail. Transparency & Ethics: The EU for example mandates that automated decisions must be explainable and should respect fundamental rights. Interpretability also satisfies human curiosity and learning. Interpretability helps explain why a model makes certain predictions. Interpretability in ML: Why It’s Important
  • 4. Department of Mathematics unchitta@ucla.edu Assume a linear relationship between the input and response variables, for example Y = β0 + β1 X1 + … + βp Xp + ϵ , we can predict ŷ = + X1 + Xp , where the ’s are the coefficients in the residual sum of squares (RSS) minimization problem, namely β^ = arg minβ0…βp 𝚺i=0,i=p (yi - ŷi )2 . The linear coefficients (the β’s) make this model easy to interpret on a modular level, and help us see the level of influence each variable has on the prediction. Interpretable Models: Linear Regression
  • 5. Department of Mathematics unchitta@ucla.edu In practice however, interpreting linear models can still be hard if there are too many variables. There are certain variations of linear regression that can help with this problem, such as the sparse models. An example would be the “least absolute shrinkage and selection operator,” or lasso regression which minimizes RSS subject to 𝚺j=0,j=p |βj | ≤ s. This is equivalent to minimizing, with regularization, the quantity RSS + ƛ 𝚺j=0,j=p |βj |, where the second term in the quantity is some constant lambda times the L1 norm of the coefficients. (Lambda normally chosen using cross-validation to yield the minimal β’s). Sparse Models & Lasso Regression
  • 6. Department of Mathematics unchitta@ucla.edu Intuitively, lasso penalizes large models and selects small coefficients. Unintuitively (but we will see why), it also yields a model where a number of the coefficients are 0 or essentially close to 0. This is an example of a sparse regression model, which penalizes large models (i.e. lots of features) and performs variable selection. The penalization is controlled via lambda: the larger it is, the bigger the sparsity. If lambda is small enough, lasso will yield the same results as the least square estimates (standard linear regression). Sparse Models & Lasso Regression
  • 7. Department of Mathematics unchitta@ucla.edu Having fewer variables often mean better interpretability. Your model is explained by only a number of significant features, which reduces complexity and increases explainability. This is especially important when your data has hundreds or thousands of features; the complexity may be beyond human comprehension and there may not be enough observations. Other methods for introducing sparsity to linear models include feature selection processes, subset selection and step-wise procedures e.g. forward and backward selection, sparse PCA. Sparsity & Interpretability
  • 8. Department of Mathematics unchitta@ucla.edu Sparse Property of Lasso The variable selection property of lasso regression comes from the L1 regularizer. Consider the following figure in 2D problems. In higher dimensions the constraint region becomes polytopes (shapes with flat sides/sharp edges). Courtesy of Introduction to Statistical Learning, G. James, et al. The red ellipses represent the contours of the RSS of lasso (left) and ridge regression (right). The solid green areas are the corresponding constraint functions, and .
  • 9. Department of Mathematics unchitta@ucla.edu Example: UCI Bike Sharing Dataset Here we are trying to predict the number of bike rentals as a function of 11 variables. As lambda increases the response variable depends on smaller number of predictors.
  • 10. Department of Mathematics unchitta@ucla.edu Example: UCI Bike Sharing Dataset Likely the most significant features
  • 11. Department of Mathematics unchitta@ucla.edu Introduction to Statistical Learning with Applications in R, Gareth James, et al. Resources Interpretable Machine Learning: A Guide for Making Black Box Models Explainable, Christopher Molnar