SlideShare a Scribd company logo
4
Most read
9
Most read
Advanced Regression
and Model Selection
UpGrad Live Session - Ankit Jain
Model Selection Techniques
ā— If you are looking for a good place to start to choose a
machine learning algorithm for your dataset here are some
general guidelines.
ā— How large is your training set?
ā—‹ Small -- Prefer high bias/low variance classifiers (e.g.
Naive Bayes) over low bias/high variance classifiers (e.g.
KNN) to avoid overfitting.
ā—‹ Large - Low Bias/High Variance classifiers tend to produce
more accurate models
Adv/Disadv of Various Algorithms
ā— Naive Bayes:
ā—‹ Very simple to implement as it’s just a bunch of counts.
ā—‹ If conditional independence exists, it converges faster
than say Logistic Regression and thus requires less
training data.
ā—‹ If you want something fast,easy and performs well NB is a
good choice
ā—‹ Biggest disadvantage is that it can’t learn interactions in
the dataset
Adv/Disadv of Various Algorithms
ā— Logistic Regression:
ā—‹ Lots of ways to regularize the model and no need to worry
about features being correlated like in Naive Bayes.
ā—‹ Nice probabilistic interpretation. Helpful in problems like
churn prediction etc .
ā—‹ Online algorithm: Easy to update the model with the new
data (using an online gradient descent method)
Adv/Disadv of Various Algorithms
ā— Decision Trees:
ā—‹ Easy to explain and interpret (at least for some people)
ā—‹ Easily handles feature interactions.
ā—‹ No need to worry about outliers or whether data is linearly
separable or not.
ā—‹ Doesn’t support online learning. Rebuilding the model with
new data every time can be painful.
ā—‹ Tend to easily overfit. Solution: ensemble methods (RF)
Adv/Disadv of Various Algorithms
ā— SVM:
ā—‹ High accuracy for many datasets
ā—‹ With appropriate kernel, can work well even if your data
isn’t linearly separable in the base feature space.
ā—‹ Popular in text processing applications where high
dimensionality is a norm
ā—‹ Memory intensive, hard to interpret and kind of annoying
to run and tune
ADVANCED REGRESSION
Linear Regression Issues
ā— Sensitivity to outliers
ā— Multicollinearity leads to high variance of the estimator.
ā— Prone to overfit if there are lot of variables
ā— Hard to interpret when the number of predictors is large.Need
a smaller subset that exhibits strongest effects.
Regularization Techniques
ā— Regularization techniques typically work by penalizing the
magnitude of coefficients of features along with minimizing
the error between predicted and actual observations
ā— Different types of penalization
ā—‹ Ridge Regression: Penalize on squared coefficients
ā—‹ Lasso Regression: Penalize on absolute value of
coefficients
Why penalize on model coefficients?
Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10
beta1 = -0.58 beta1 = -1.4e05
Ridge Regression
ā— L2 penalty
ā— Pros
ā—‹ Variables >> Rows
ā—‹ Multicollinearity
ā—‹ Increased bias and lower variance from Linear Regression
ā— Cons
ā—‹ Doesn’t produce parsimonious model
Let’s see a collinearity example in R
Example: Luekemia Prediction
ā— Leukemia Data, Golub et al. Science 1999
ā— There are 38 training samples and 34 test samples with total
genes ~ 7000 (p >> n)
ā— Xij is the gene expression value for sample i and gene j
ā— Sample i either has tumor type AML or ALL
ā— We want to select genes relevant to tumor type
ā—‹ eliminate the trivial genes
ā—‹ grouped selection as many genes are highly correlated
ā— Ridge Regression can help to pursue this modeling
Grouped Selection
ā— If two predictors are highly correlated among themselves, the
estimated coefficients will be similar for them.
ā— if some variables are exactly identical, they will have same
coefficients
Ridge is good for grouped selection but not good for eliminating
trivial genes
LASSO
ā— Pros
ā—‹ Allow p >> n
ā—‹ Enforce sparsity in parameters
ā— Cons
ā—‹ If a group of predictors are highly correlated among
themselves, LASSO tends to pick only one of them and
shrink the other to zero
ā—‹ can not do grouped selection, tend to select one variable
LASSO is good for eliminating trivial genes but not good for
grouped selection
Elastic Net
ā— Weighted combination of L1 and L2 penalty
ā— Helps in enforcing sparsity
ā— Encourage grouping effect in highly correlated predictors
In gene selection problem, it can achieve both purposes of
removing trivial genes and doing group selection
Other Advanced Regression Methods
Poisson Regression
ā—‹ Typically used when the Y variable follows poisson
distribution (typically counts of events within a time t)
ā—‹ # times a customer will visit an ecommerce website next
month
Piecewise Linear Regression
ā— Polynomial regression
won’t work perfectly as it
will have high tendency to
overfit/underfit
ā— Instead, splitting the curve
into separate linear pieces
and building linear model
for each piece leads to
better results
QUESTIONS

More Related Content

PDF
Generalized Linear Models in Spark MLlib and SparkR
PDF
Image segmentation based on color
PPTX
polynomial linear regression
PPTX
Bayesian statistics
PDF
3 Data Structure in R
PPTX
Simple Linear Regression explanation.pptx
Ā 
PPT
Texture in image processing
PPT
Hypothesis testing
Generalized Linear Models in Spark MLlib and SparkR
Image segmentation based on color
polynomial linear regression
Bayesian statistics
3 Data Structure in R
Simple Linear Regression explanation.pptx
Ā 
Texture in image processing
Hypothesis testing

What's hot (15)

PPTX
Mcqs (measures of dispersion)
PPTX
Fuzzy Set
PPTX
Logistic regression
PDF
04 image enhancement edge detection
PDF
Statistics for data scientists
PDF
R normal distribution
PPT
Basics of edge detection and forier transform
PPT
Stat3 central tendency & dispersion
PPTX
Principal component analysis
PPTX
Data Preprocessing
PPTX
Polynomial regression
PPTX
polygon clipping IN COMPUTER GRAPHICS.pptx
PPT
Learning sets of rules, Sequential Learning Algorithm,FOIL
PDF
Ridge regression
PPTX
Normal Distribution – Introduction and Properties
Mcqs (measures of dispersion)
Fuzzy Set
Logistic regression
04 image enhancement edge detection
Statistics for data scientists
R normal distribution
Basics of edge detection and forier transform
Stat3 central tendency & dispersion
Principal component analysis
Data Preprocessing
Polynomial regression
polygon clipping IN COMPUTER GRAPHICS.pptx
Learning sets of rules, Sequential Learning Algorithm,FOIL
Ridge regression
Normal Distribution – Introduction and Properties
Ad

Similar to Advanced regression and model selection (20)

PPTX
Ai saturdays presentation
PPTX
Gradient Descent or Assent is to find optimal parameters that minimize the l...
PDF
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
PPTX
ngboost.pptx
PPTX
04 Classification in Data Mining
PDF
Machine learning4dummies
PDF
Machine Learning and Deep Learning 4 dummies
PPTX
machine learning navies bayes therom and how it is soved.pptx
PPTX
How to Win Machine Learning Competitions ?
PPTX
AI Algorithms
PDF
A brief introduction to Searn Algorithm
PPTX
Linear Regression Paper Review.pptx
PDF
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
PDF
GLM & GBM in H2O
PPTX
Adapting neural networks for the estimation of treatment effects
PPTX
Dimensionality Reduction.pptx
PDF
Machine Learning.pdf
PDF
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
PPTX
Ensemble methods
PPTX
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
Ai saturdays presentation
Gradient Descent or Assent is to find optimal parameters that minimize the l...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
ngboost.pptx
04 Classification in Data Mining
Machine learning4dummies
Machine Learning and Deep Learning 4 dummies
machine learning navies bayes therom and how it is soved.pptx
How to Win Machine Learning Competitions ?
AI Algorithms
A brief introduction to Searn Algorithm
Linear Regression Paper Review.pptx
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
GLM & GBM in H2O
Adapting neural networks for the estimation of treatment effects
Dimensionality Reduction.pptx
Machine Learning.pdf
GDG Cloud Community Day 2022 - Managing data quality in Machine Learning
Ensemble methods
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
Ad

More from Ankit Jain (7)

PDF
Ai in logistics at uber
PPTX
Data analytics in fraud detection and customer feedback
PPTX
Data Science in Ecommerce
PPTX
Structure Approach to Analytics Interviews
PPTX
Data Science Projects @ Runnr
PPTX
Life Lessons
PPTX
Data analytics workshop @IIIT Bangalore
Ai in logistics at uber
Data analytics in fraud detection and customer feedback
Data Science in Ecommerce
Structure Approach to Analytics Interviews
Data Science Projects @ Runnr
Life Lessons
Data analytics workshop @IIIT Bangalore

Recently uploaded (20)

PPTX
Introduction to machine learning and Linear Models
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
Lecture1 pattern recognition............
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PDF
ā€œGetting Started with Data Analytics Using R – Concepts, Tools & Case Studiesā€
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
Business Analytics and business intelligence.pdf
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Introduction to machine learning and Linear Models
Business Acumen Training GuidePresentation.pptx
Lecture1 pattern recognition............
Business Ppt On Nestle.pptx huunnnhhgfvu
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
IBA_Chapter_11_Slides_Final_Accessible.pptx
ā€œGetting Started with Data Analytics Using R – Concepts, Tools & Case Studiesā€
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
IB Computer Science - Internal Assessment.pptx
1_Introduction to advance data techniques.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Business Analytics and business intelligence.pdf
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Fluorescence-microscope_Botany_detailed content
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx

Advanced regression and model selection

  • 1. Advanced Regression and Model Selection UpGrad Live Session - Ankit Jain
  • 2. Model Selection Techniques ā— If you are looking for a good place to start to choose a machine learning algorithm for your dataset here are some general guidelines. ā— How large is your training set? ā—‹ Small -- Prefer high bias/low variance classifiers (e.g. Naive Bayes) over low bias/high variance classifiers (e.g. KNN) to avoid overfitting. ā—‹ Large - Low Bias/High Variance classifiers tend to produce more accurate models
  • 3. Adv/Disadv of Various Algorithms ā— Naive Bayes: ā—‹ Very simple to implement as it’s just a bunch of counts. ā—‹ If conditional independence exists, it converges faster than say Logistic Regression and thus requires less training data. ā—‹ If you want something fast,easy and performs well NB is a good choice ā—‹ Biggest disadvantage is that it can’t learn interactions in the dataset
  • 4. Adv/Disadv of Various Algorithms ā— Logistic Regression: ā—‹ Lots of ways to regularize the model and no need to worry about features being correlated like in Naive Bayes. ā—‹ Nice probabilistic interpretation. Helpful in problems like churn prediction etc . ā—‹ Online algorithm: Easy to update the model with the new data (using an online gradient descent method)
  • 5. Adv/Disadv of Various Algorithms ā— Decision Trees: ā—‹ Easy to explain and interpret (at least for some people) ā—‹ Easily handles feature interactions. ā—‹ No need to worry about outliers or whether data is linearly separable or not. ā—‹ Doesn’t support online learning. Rebuilding the model with new data every time can be painful. ā—‹ Tend to easily overfit. Solution: ensemble methods (RF)
  • 6. Adv/Disadv of Various Algorithms ā— SVM: ā—‹ High accuracy for many datasets ā—‹ With appropriate kernel, can work well even if your data isn’t linearly separable in the base feature space. ā—‹ Popular in text processing applications where high dimensionality is a norm ā—‹ Memory intensive, hard to interpret and kind of annoying to run and tune
  • 8. Linear Regression Issues ā— Sensitivity to outliers ā— Multicollinearity leads to high variance of the estimator. ā— Prone to overfit if there are lot of variables ā— Hard to interpret when the number of predictors is large.Need a smaller subset that exhibits strongest effects.
  • 9. Regularization Techniques ā— Regularization techniques typically work by penalizing the magnitude of coefficients of features along with minimizing the error between predicted and actual observations ā— Different types of penalization ā—‹ Ridge Regression: Penalize on squared coefficients ā—‹ Lasso Regression: Penalize on absolute value of coefficients
  • 10. Why penalize on model coefficients? Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10 beta1 = -0.58 beta1 = -1.4e05
  • 11. Ridge Regression ā— L2 penalty ā— Pros ā—‹ Variables >> Rows ā—‹ Multicollinearity ā—‹ Increased bias and lower variance from Linear Regression ā— Cons ā—‹ Doesn’t produce parsimonious model Let’s see a collinearity example in R
  • 12. Example: Luekemia Prediction ā— Leukemia Data, Golub et al. Science 1999 ā— There are 38 training samples and 34 test samples with total genes ~ 7000 (p >> n) ā— Xij is the gene expression value for sample i and gene j ā— Sample i either has tumor type AML or ALL ā— We want to select genes relevant to tumor type ā—‹ eliminate the trivial genes ā—‹ grouped selection as many genes are highly correlated ā— Ridge Regression can help to pursue this modeling
  • 13. Grouped Selection ā— If two predictors are highly correlated among themselves, the estimated coefficients will be similar for them. ā— if some variables are exactly identical, they will have same coefficients Ridge is good for grouped selection but not good for eliminating trivial genes
  • 14. LASSO ā— Pros ā—‹ Allow p >> n ā—‹ Enforce sparsity in parameters ā— Cons ā—‹ If a group of predictors are highly correlated among themselves, LASSO tends to pick only one of them and shrink the other to zero ā—‹ can not do grouped selection, tend to select one variable LASSO is good for eliminating trivial genes but not good for grouped selection
  • 15. Elastic Net ā— Weighted combination of L1 and L2 penalty ā— Helps in enforcing sparsity ā— Encourage grouping effect in highly correlated predictors In gene selection problem, it can achieve both purposes of removing trivial genes and doing group selection
  • 16. Other Advanced Regression Methods Poisson Regression ā—‹ Typically used when the Y variable follows poisson distribution (typically counts of events within a time t) ā—‹ # times a customer will visit an ecommerce website next month
  • 17. Piecewise Linear Regression ā— Polynomial regression won’t work perfectly as it will have high tendency to overfit/underfit ā— Instead, splitting the curve into separate linear pieces and building linear model for each piece leads to better results