SlideShare a Scribd company logo
Linear regression Machine Learning; Mon Apr 21, 2008
Motivation
Motivation Prediction for target? New observed predictor value
Motivation Problem:  We want a general way of obtaining a distribution  p ( x , t )  fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct  generic  approaches to learning distributions from data.
Motivation Problem:  We want a general way of obtaining a distribution  p ( x , t )  fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct  generic  approaches to learning distributions from data. In this lecture:  linear (normal/Gaussian) models.
Linear Gaussian Models In a linear Guassian model, we model  p ( x , t )  as a conditional Guassian distribution where the  x  dependent mean depends linearly on a set of weights  w .
Example
Example
General linear in input ...or adding a pseudo input x 0 =1
Non-linear in input (but still in weights)
Non-linear in input (but still in weights) But remember that we do not know the “true” underlying function...
Non-linear in input (but still in weights) ...nor the noise around the function...
General linear model Basis functions. Sometimes called “features”.
Examples of basis functions Polynomials Gaussians Sigmoids
Estimating parameters Log likelihood: Observed data:
Estimating parameters Log likelihood: Maximizing wrt  w  means minimizing  E   – the   error function . Observed data:
Estimating parameters
Estimating parameters
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters!
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! C with GSL and CBLAS
Estimating parameters Notice:  This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! Octave/ Matlab
Geometrical interpretation Geometrically  y  is the projection of  t  onto the space spanned by the features:
Bayesian linear regression For the Bayesian approach we need a prior over the parameters  w  and  b  =  1/ s 2 Conjugate for Gaussian is Gaussian: Functions of observed values
Bayesian linear regression For the Bayesian approach we need a prior over the parameters  w  and  b  =  1/ s 2 Conjugate for Gaussian is Gaussian: Proof not  exactly  like before, but similar, and uses linearity results from Gaussians' from 2.3.3.
Example
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3): Both  mean and variance of this distribution depends on  y !
Example
Over fitting Problem:  Over-fitting is always a problem when we fit data to generic models. With nested models, the ML parameters will  never  prefer a simple model over a more complex model...
Maximum likelihood problems
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities:
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The normalizing factor is the same for all models:
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The prior captures our preferences in the models.
Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The likelihood captures the data's preferences in models.
The marginal likelihood The likelihood of the model is the integral over all the models parameters:
The marginal likelihood The likelihood of the model is the integral over all the models parameters: which is also the normalizing factor for the posterior:
Implicit over-fitting penalty Assume this is the shape of prior and posterior prior posterior
Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality prior p( D | w )p( w )
Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality Integral approximately “width” times “height”
Implicit over-fitting penalty Increasingly negative as posterior becomes “pointy” compared to prior Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
Implicit over-fitting penalty Penalty increases with number of parameters  M   Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model.
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model. Negative when we prefer the second model positive when we prefer the first
On average we prefer the true model Penalty increases with number of parameters  M   This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model. On average, we will not prefer the second model when the first is true...
On average we prefer the true model Penalty increases with number of parameters  M   Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty. This doesn't mean we always prefer the simplest model! One can show  with zero only when i.e. on average the right model is the  preferred  model.
On average we prefer the true model
Summary Linear Gaussians as generic densities ML or Bayesian estimation for training Over-fitting is an inherent problem in ML estimation Bayesian methods avoid the maximization caused over-fitting problem (but is still  vulnerable  to model mis-specification)

More Related Content

PPTX
Machine learning session8(svm nlp)
PPTX
Machine learning session6(decision trees random forrest)
PDF
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
PPTX
Machine learning session4(linear regression)
PDF
Linear Regression in R
PPTX
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
PDF
Machine learning for deep learning
DOCX
Essentials of machine learning algorithms
Machine learning session8(svm nlp)
Machine learning session6(decision trees random forrest)
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Machine learning session4(linear regression)
Linear Regression in R
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine learning for deep learning
Essentials of machine learning algorithms

What's hot (20)

PPTX
PDF
Chapter 05 k nn
PPTX
Machine learning session9(clustering)
PDF
M08 BiasVarianceTradeoff
PPTX
Validation and Over fitting , Validation strategies
PPTX
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
PPTX
PDF
Data Science - Part III - EDA & Model Selection
PPTX
Machine Learning - Simple Linear Regression
PDF
Machine Learning Algorithm - Linear Regression
PDF
How to understand and implement regression analysis
PPT
MachineLearning.ppt
PDF
R - Multiple Regression
PDF
Explore ML day 1
PPTX
Application of Machine Learning in Agriculture
PPTX
Dummy variables
PDF
Bayesian Inference: An Introduction to Principles and ...
PDF
Assumptions of Linear Regression - Machine Learning
PPTX
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
PPTX
Housing price prediction
Chapter 05 k nn
Machine learning session9(clustering)
M08 BiasVarianceTradeoff
Validation and Over fitting , Validation strategies
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Data Science - Part III - EDA & Model Selection
Machine Learning - Simple Linear Regression
Machine Learning Algorithm - Linear Regression
How to understand and implement regression analysis
MachineLearning.ppt
R - Multiple Regression
Explore ML day 1
Application of Machine Learning in Agriculture
Dummy variables
Bayesian Inference: An Introduction to Principles and ...
Assumptions of Linear Regression - Machine Learning
Logistic Regression | Logistic Regression In Python | Machine Learning Algori...
Housing price prediction
Ad

Viewers also liked (15)

ODP
Linear Regression Ex
PPT
Data Mining of Informational Stream in Social Networks
PPTX
PresentationMachine Learning, Linear and Bayesian Models for Logistic Regres...
PPT
Chap14 multiple regression model building
PPT
Linear regression
PPTX
Linear regression
PPTX
Multiple linear regression
PPT
Multiple regression presentation
PPT
Simple linear regression (final)
PDF
An Overview of Simple Linear Regression
PDF
Pearson Correlation, Spearman Correlation &Linear Regression
PPTX
Naive bayes
PDF
Machine Learning for Dummies
PPTX
Introduction to Machine Learning
PPTX
Introduction to Machine Learning
Linear Regression Ex
Data Mining of Informational Stream in Social Networks
PresentationMachine Learning, Linear and Bayesian Models for Logistic Regres...
Chap14 multiple regression model building
Linear regression
Linear regression
Multiple linear regression
Multiple regression presentation
Simple linear regression (final)
An Overview of Simple Linear Regression
Pearson Correlation, Spearman Correlation &Linear Regression
Naive bayes
Machine Learning for Dummies
Introduction to Machine Learning
Introduction to Machine Learning
Ad

Similar to Linear Regression (20)

PPT
chap4_Parametric_Methods.ppt
PDF
Statistics for deep learning
PDF
ML_Lec3 introduction to regression problems.pdf
PDF
Bayesian regression algorithm for machine learning
PPT
Intro to Model Selection
PDF
Chapter 18,19
PPTX
regression.pptx
PPTX
Navies bayes
PDF
lab_linear_regression_hy539 (1)_221109_035050.pdf
PPTX
Regularization_BY_MOHAMED_ESSAM.pptx
PPT
November, 2006 CCKM'06 1
PDF
Linear models for data science
PDF
lec3_annotated.pdf ml csci 567 vatsal sharan
PPTX
PRML Chapter 3
PPTX
Bootcamp of new world to taken seriously
DOCX
Anomaly detection Full Article
PPT
Cristopher M. Bishop's tutorial on graphical models
PPT
Cristopher M. Bishop's tutorial on graphical models
PPT
Cristopher M. Bishop's tutorial on graphical models
PPT
Cristopher M. Bishop's tutorial on graphical models
chap4_Parametric_Methods.ppt
Statistics for deep learning
ML_Lec3 introduction to regression problems.pdf
Bayesian regression algorithm for machine learning
Intro to Model Selection
Chapter 18,19
regression.pptx
Navies bayes
lab_linear_regression_hy539 (1)_221109_035050.pdf
Regularization_BY_MOHAMED_ESSAM.pptx
November, 2006 CCKM'06 1
Linear models for data science
lec3_annotated.pdf ml csci 567 vatsal sharan
PRML Chapter 3
Bootcamp of new world to taken seriously
Anomaly detection Full Article
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models

More from mailund (20)

PDF
Chapter 9 divide and conquer handouts with notes
PDF
Chapter 9 divide and conquer handouts
PDF
Chapter 9 divide and conquer
PDF
Chapter 7 recursion handouts with notes
PDF
Chapter 7 recursion handouts
PDF
Chapter 7 recursion
PDF
Chapter 5 searching and sorting handouts with notes
PDF
Chapter 5 searching and sorting handouts
PDF
Chapter 5 searching and sorting
PDF
Chapter 4 algorithmic efficiency handouts (with notes)
PDF
Chapter 4 algorithmic efficiency handouts
PDF
Chapter 4 algorithmic efficiency
PDF
Chapter 3 introduction to algorithms slides
PDF
Chapter 3 introduction to algorithms handouts (with notes)
PDF
Chapter 3 introduction to algorithms handouts
PDF
Ku 05 08 2009
PDF
Association mapping using local genealogies
PDF
Neural Networks
PDF
Probability And Stats Intro
PDF
Probability And Stats Intro2
Chapter 9 divide and conquer handouts with notes
Chapter 9 divide and conquer handouts
Chapter 9 divide and conquer
Chapter 7 recursion handouts with notes
Chapter 7 recursion handouts
Chapter 7 recursion
Chapter 5 searching and sorting handouts with notes
Chapter 5 searching and sorting handouts
Chapter 5 searching and sorting
Chapter 4 algorithmic efficiency handouts (with notes)
Chapter 4 algorithmic efficiency handouts
Chapter 4 algorithmic efficiency
Chapter 3 introduction to algorithms slides
Chapter 3 introduction to algorithms handouts (with notes)
Chapter 3 introduction to algorithms handouts
Ku 05 08 2009
Association mapping using local genealogies
Neural Networks
Probability And Stats Intro
Probability And Stats Intro2

Recently uploaded (20)

PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Electronic commerce courselecture one. Pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
1. Introduction to Computer Programming.pptx
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Machine learning based COVID-19 study performance prediction
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
MIND Revenue Release Quarter 2 2025 Press Release
Spectral efficient network and resource selection model in 5G networks
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Electronic commerce courselecture one. Pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
SOPHOS-XG Firewall Administrator PPT.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
1. Introduction to Computer Programming.pptx
NewMind AI Weekly Chronicles - August'25-Week II
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Spectroscopy.pptx food analysis technology
Machine learning based COVID-19 study performance prediction
Per capita expenditure prediction using model stacking based on satellite ima...

Linear Regression

  • 1. Linear regression Machine Learning; Mon Apr 21, 2008
  • 3. Motivation Prediction for target? New observed predictor value
  • 4. Motivation Problem: We want a general way of obtaining a distribution p ( x , t ) fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct generic approaches to learning distributions from data.
  • 5. Motivation Problem: We want a general way of obtaining a distribution p ( x , t ) fitted to observed data. If we don't try to interpret the distribution, then any distribution with non-zero value at the data points will do. We will use theory from last week to construct generic approaches to learning distributions from data. In this lecture: linear (normal/Gaussian) models.
  • 6. Linear Gaussian Models In a linear Guassian model, we model p ( x , t ) as a conditional Guassian distribution where the x dependent mean depends linearly on a set of weights w .
  • 9. General linear in input ...or adding a pseudo input x 0 =1
  • 10. Non-linear in input (but still in weights)
  • 11. Non-linear in input (but still in weights) But remember that we do not know the “true” underlying function...
  • 12. Non-linear in input (but still in weights) ...nor the noise around the function...
  • 13. General linear model Basis functions. Sometimes called “features”.
  • 14. Examples of basis functions Polynomials Gaussians Sigmoids
  • 15. Estimating parameters Log likelihood: Observed data:
  • 16. Estimating parameters Log likelihood: Maximizing wrt w means minimizing E – the error function . Observed data:
  • 19. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters!
  • 20. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! C with GSL and CBLAS
  • 21. Estimating parameters Notice: This is not just pure mathematics but an actual algorithm for estimating (learning) the parameters! Octave/ Matlab
  • 22. Geometrical interpretation Geometrically y is the projection of t onto the space spanned by the features:
  • 23. Bayesian linear regression For the Bayesian approach we need a prior over the parameters w and b = 1/ s 2 Conjugate for Gaussian is Gaussian: Functions of observed values
  • 24. Bayesian linear regression For the Bayesian approach we need a prior over the parameters w and b = 1/ s 2 Conjugate for Gaussian is Gaussian: Proof not exactly like before, but similar, and uses linearity results from Gaussians' from 2.3.3.
  • 26. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
  • 27. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3):
  • 28. Bayesian linear regression Predictor for future observations is also Guassian (again result from 2.3.3): Both mean and variance of this distribution depends on y !
  • 30. Over fitting Problem: Over-fitting is always a problem when we fit data to generic models. With nested models, the ML parameters will never prefer a simple model over a more complex model...
  • 32. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities:
  • 33. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The normalizing factor is the same for all models:
  • 34. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The prior captures our preferences in the models.
  • 35. Bayesian model selection We can take a more Bayesian approach and select model based on posterior model probabilities: The likelihood captures the data's preferences in models.
  • 36. The marginal likelihood The likelihood of the model is the integral over all the models parameters:
  • 37. The marginal likelihood The likelihood of the model is the integral over all the models parameters: which is also the normalizing factor for the posterior:
  • 38. Implicit over-fitting penalty Assume this is the shape of prior and posterior prior posterior
  • 39. Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality prior p( D | w )p( w )
  • 40. Implicit over-fitting penalty Assume this is the shape of prior and posterior By proportionality Integral approximately “width” times “height”
  • 41. Implicit over-fitting penalty Increasingly negative as posterior becomes “pointy” compared to prior Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
  • 42. Implicit over-fitting penalty Penalty increases with number of parameters M Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty.
  • 43. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model.
  • 44. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model. Negative when we prefer the second model positive when we prefer the first
  • 45. On average we prefer the true model Penalty increases with number of parameters M This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model. On average, we will not prefer the second model when the first is true...
  • 46. On average we prefer the true model Penalty increases with number of parameters M Close fitting to data is implicitly penalized, and the marginal likelihood is a trade-off between maximizing the posterior and minimizing this penalty. This doesn't mean we always prefer the simplest model! One can show with zero only when i.e. on average the right model is the preferred model.
  • 47. On average we prefer the true model
  • 48. Summary Linear Gaussians as generic densities ML or Bayesian estimation for training Over-fitting is an inherent problem in ML estimation Bayesian methods avoid the maximization caused over-fitting problem (but is still vulnerable to model mis-specification)