SlideShare a Scribd company logo
Module 2
Regression
• In machine learning, a regression problem is the problem of
predicting the value of a numeric variable based on observed values
of the variable.
• The value of the output variable may be a number, such as an integer
or a floating point value.
• These are often quantities, such as amounts and sizes. The input
variables may be discrete or real-valued.
regression.pptx
• Different regression models
• There are various types of regression techniques available to make
predictions.
• These techniques mostly differ in three aspects, namely, the number
and type of independent variables, the type of dependent variables
and the shape of regression line.
• Simple linear regression: There is only one continuous independent
variable x and the assumed relation between the independent variable
and the dependent variable y is y = a + bx.
• Multivariate linear regression: There are more than one independent
variable, say x1, . . . , xn, and the assumed relation between the
independent variables and the dependent variable is y = a0 + a1x1 + ⋯
+ anxn.
.
• Polynomial regression: There is only one continuous independent
variable x and the assumed model is y = a0 + a1x + ⋯ + anx n .
• Logistic regression: The dependent variable is binary, that is, a
variable which takes only the values 0 and 1. The assumed model
involves certain probability distributions
Errors in Machine Learning
•Irreducible errors are errors which will always be present in a machine learning model, because of
unknown variables, and whose values cannot be reduced.
•Reducible errors are those errors whose values can be further reduced to improve a model. They are
caused because our model’s output function does not match the desired output function and can be
optimized.
What is Bias?
• To make predictions, our model will analyze our data and find patterns in it.
Using these patterns, we can make generalizations about certain instances in
our data. Our model after training learns these patterns and applies them to
the test set to predict them.
• Bias is the difference between our actual and predicted values. Bias is the
simple assumptions that our model makes about our data to be able to predict
new data.
regression.pptx
• When the Bias is high, assumptions made by our model are too basic, the
model can’t capture the important features of our data.
• This means that our model hasn’t captured patterns in the training data and
hence cannot perform well on the testing data too.
• If this is the case, our model cannot perform on new data and cannot be sent
into production.
• This instance, where the model cannot find patterns in our training set and
hence fails for both seen and unseen data, is called Underfitting.
• The below figure shows an example of Underfitting. As we can see, the model
has found no patterns in our data and the line of best fit is a straight line that
does not pass through any of the data points. The model has failed to train
properly on the data given and cannot predict new data either.
regression.pptx
variance
• Variance is the very opposite of Bias. During training, it allows our model to
‘see’ the data a certain number of times to find patterns in it.
• If it does not work on the data for long enough, it will not find patterns and bias
occurs. On the other hand, if our model is allowed to view the data too many
times, it will learn very well for only that data.
• It will capture most patterns in the data, but it will also learn from the
unnecessary data present, or from the noise.
• We can define variance as the model’s sensitivity to fluctuations in the data.
Our model may learn from noise. This will cause our model to consider trivial
features as important.
• In the above figure, we can see that our model has learned extremely well for
our training data, which has taught it to identify cats.
• But when given new data, such as the picture of a fox, our model predicts it as
a cat, as that is what it has learned. This happens when the Variance is high,
our model will capture all the features of the data given to it, including the
noise, will tune itself to the data, and predict it very well but when given new
data, it cannot predict on it as it is too specific to training data.
regression.pptx
Bias-Variance Tradeoff
• For any model, we have to find the perfect balance between Bias and
Variance.
• This just ensures that we capture the essential patterns in our model while
ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps
optimize the error in our model and keeps it as low as possible.
• An optimized model will be sensitive to the patterns in our data, but at the
same time will be able to generalize to new data. In this, both the bias and
variance should be low so as to prevent overfitting and underfitting.
regression.pptx
• we can see that when bias is high, the error in both testing and training set is
also high.
• If we have a high variance, the model performs well on the testing set, we can
see that the error is low, but gives high error on the training set.
• We can see that there is a region in the middle, where the error in both
training and testing set is low and the bias and variance is in perfect balance.
The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther
and farther away from the center, the error increases in our model. The best model is one where bias and
variance are both low.
Theorem of total probability
• Let B1, B2, …, BN be mutually exclusive events whose union equals the
sample space S. We refer to these sets as a partition of S.
• An event A can be represented as:
• Since B1, B2, …, BN are mutually exclusive, then
P(A) = P(A  B1) + P(A  B2) + … + P(A  BN)
• And therefore
P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN)
= i P(A | Bi) * P(Bi) Exhaustive conditionalization
Marginalization
Bayes theorem
• P(A  B) = P(B) * P(A | B) = P(A) * P(B | A)
A
P
B
P
A
B
P
)
(
)
(
)
|
( =
=>
Posterior probability Prior of A (Normalizing constant)
B
A
P )
|
( Prior of B
Conditional probability
(likelihood)
This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful
relations in probability and statistics
Bayes Theorem is definitely the fundamental relation in Statistical Pattern
Recognition
Bayes theorem (cont’d)
• Given B1, B2, …, BN, a partition of the sample space S.
Suppose that event A occurs; what is the probability of
event Bj?
• P(Bj | A) = P(A | Bj) * P(Bj) / P(A)
= P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj)
Bj: different models / hypotheses
In the observation of A, should you choose a model that maximizes P(Bj | A)
or P(A | Bj)? Depending on how much you know about Bj !
Posterior probability
Likelihood Prior of Bj
Normalizing constant
(theorem of total probabilities)
Another example
• We’ve talked about the boxes of casinos: 99% fair, 1% loaded (50% at
six)
• We said if we randomly pick a die and roll, we have 17% of chance to
get a six
• If we get 3 six in a row, what’s the chance that the die is loaded?
• How about 5 six in a row?
• P(loaded | 666)
= P(666 | loaded) * P(loaded) / P(666)
= 0.53 * 0.01 / (0.53 * 0.01 + (1/6)3 * 0.99)
= 0.21
• P(loaded | 66666)
= P(66666 | loaded) * P(loaded) / P(66666)
= 0.55 * 0.01 / (0.55 * 0.01 + (1/6)5 * 0.99)
= 0.71
Classification : Bayes’ decision theory

More Related Content

PPTX
Evaluation measures Data Science Course.pptx
PPTX
Statistical Learning and Model Selection module 2.pptx
PPTX
Statistical Learning and Model Selection (1).pptx
PPTX
MACHINE LEARNING YEAR DL SECOND PART.pptx
PPTX
PDF
M08 BiasVarianceTradeoff
PDF
Bias and variance / tradeoff Machine learning/DMDW
PPTX
Mathematics Grade 12 - Statistics OF REGRESSION.pptx
Evaluation measures Data Science Course.pptx
Statistical Learning and Model Selection module 2.pptx
Statistical Learning and Model Selection (1).pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
M08 BiasVarianceTradeoff
Bias and variance / tradeoff Machine learning/DMDW
Mathematics Grade 12 - Statistics OF REGRESSION.pptx

Similar to regression.pptx (20)

PDF
LR 9 Estimation.pdf
PPTX
probability.pptx
PDF
Modelling and evaluation
PPT
dimension reduction.ppt
PDF
Data_Analytics_for_IoT_Solutions.pptx.pdf
PPTX
Model validation
PPTX
Unit 3 – AIML.pptx
PPTX
igdfabjfjdfjfnjnngfsdkjfjjgjgjjgjdjjgdopjgoj
PPTX
classX_Data Science_Teacher_Presentation.pptx
PPTX
When Models Meet Data: From ancient science to todays Artificial Intelligence...
PPTX
Data analytics course notes of Unit-1.pptx
PPT
chap4_Parametric_Methods.ppt
PPTX
Statistics
PPTX
PDF
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
PDF
Complete picture of Ensemble-Learning, boosting, bagging
PDF
HRUG - Linear regression with R
PDF
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
PPTX
Statistics four
PPTX
statistics.pptxghfhsahkjhsghkjhahkjhgfjkjkg
LR 9 Estimation.pdf
probability.pptx
Modelling and evaluation
dimension reduction.ppt
Data_Analytics_for_IoT_Solutions.pptx.pdf
Model validation
Unit 3 – AIML.pptx
igdfabjfjdfjfnjnngfsdkjfjjgjgjjgjdjjgdopjgoj
classX_Data Science_Teacher_Presentation.pptx
When Models Meet Data: From ancient science to todays Artificial Intelligence...
Data analytics course notes of Unit-1.pptx
chap4_Parametric_Methods.ppt
Statistics
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Complete picture of Ensemble-Learning, boosting, bagging
HRUG - Linear regression with R
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Statistics four
statistics.pptxghfhsahkjhsghkjhahkjhgfjkjkg
Ad

Recently uploaded (20)

PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Database Infoormation System (DBIS).pptx
PPT
Quality review (1)_presentation of this 21
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Global journeys: estimating international migration
STUDY DESIGN details- Lt Col Maksud (21).pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Clinical guidelines as a resource for EBP(1).pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Lecture1 pattern recognition............
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
climate analysis of Dhaka ,Banglades.pptx
Major-Components-ofNKJNNKNKNKNKronment.pptx
Reliability_Chapter_ presentation 1221.5784
Database Infoormation System (DBIS).pptx
Quality review (1)_presentation of this 21
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Launch Your Data Science Career in Kochi – 2025
Introduction-to-Cloud-ComputingFinal.pptx
1_Introduction to advance data techniques.pptx
Introduction to Knowledge Engineering Part 1
Global journeys: estimating international migration
Ad

regression.pptx

  • 2. Regression • In machine learning, a regression problem is the problem of predicting the value of a numeric variable based on observed values of the variable. • The value of the output variable may be a number, such as an integer or a floating point value. • These are often quantities, such as amounts and sizes. The input variables may be discrete or real-valued.
  • 4. • Different regression models • There are various types of regression techniques available to make predictions. • These techniques mostly differ in three aspects, namely, the number and type of independent variables, the type of dependent variables and the shape of regression line.
  • 5. • Simple linear regression: There is only one continuous independent variable x and the assumed relation between the independent variable and the dependent variable y is y = a + bx. • Multivariate linear regression: There are more than one independent variable, say x1, . . . , xn, and the assumed relation between the independent variables and the dependent variable is y = a0 + a1x1 + ⋯ + anxn. .
  • 6. • Polynomial regression: There is only one continuous independent variable x and the assumed model is y = a0 + a1x + ⋯ + anx n . • Logistic regression: The dependent variable is binary, that is, a variable which takes only the values 0 and 1. The assumed model involves certain probability distributions
  • 7. Errors in Machine Learning •Irreducible errors are errors which will always be present in a machine learning model, because of unknown variables, and whose values cannot be reduced. •Reducible errors are those errors whose values can be further reduced to improve a model. They are caused because our model’s output function does not match the desired output function and can be optimized.
  • 8. What is Bias? • To make predictions, our model will analyze our data and find patterns in it. Using these patterns, we can make generalizations about certain instances in our data. Our model after training learns these patterns and applies them to the test set to predict them. • Bias is the difference between our actual and predicted values. Bias is the simple assumptions that our model makes about our data to be able to predict new data.
  • 10. • When the Bias is high, assumptions made by our model are too basic, the model can’t capture the important features of our data. • This means that our model hasn’t captured patterns in the training data and hence cannot perform well on the testing data too. • If this is the case, our model cannot perform on new data and cannot be sent into production.
  • 11. • This instance, where the model cannot find patterns in our training set and hence fails for both seen and unseen data, is called Underfitting. • The below figure shows an example of Underfitting. As we can see, the model has found no patterns in our data and the line of best fit is a straight line that does not pass through any of the data points. The model has failed to train properly on the data given and cannot predict new data either.
  • 13. variance • Variance is the very opposite of Bias. During training, it allows our model to ‘see’ the data a certain number of times to find patterns in it. • If it does not work on the data for long enough, it will not find patterns and bias occurs. On the other hand, if our model is allowed to view the data too many times, it will learn very well for only that data. • It will capture most patterns in the data, but it will also learn from the unnecessary data present, or from the noise.
  • 14. • We can define variance as the model’s sensitivity to fluctuations in the data. Our model may learn from noise. This will cause our model to consider trivial features as important.
  • 15. • In the above figure, we can see that our model has learned extremely well for our training data, which has taught it to identify cats. • But when given new data, such as the picture of a fox, our model predicts it as a cat, as that is what it has learned. This happens when the Variance is high, our model will capture all the features of the data given to it, including the noise, will tune itself to the data, and predict it very well but when given new data, it cannot predict on it as it is too specific to training data.
  • 17. Bias-Variance Tradeoff • For any model, we have to find the perfect balance between Bias and Variance. • This just ensures that we capture the essential patterns in our model while ignoring the noise present it in. This is called Bias-Variance Tradeoff. It helps optimize the error in our model and keeps it as low as possible. • An optimized model will be sensitive to the patterns in our data, but at the same time will be able to generalize to new data. In this, both the bias and variance should be low so as to prevent overfitting and underfitting.
  • 19. • we can see that when bias is high, the error in both testing and training set is also high. • If we have a high variance, the model performs well on the testing set, we can see that the error is low, but gives high error on the training set. • We can see that there is a region in the middle, where the error in both training and testing set is low and the bias and variance is in perfect balance.
  • 20. The best fit is when the data is concentrated in the center, ie: at the bull’s eye. We can see that as we get farther and farther away from the center, the error increases in our model. The best model is one where bias and variance are both low.
  • 21. Theorem of total probability • Let B1, B2, …, BN be mutually exclusive events whose union equals the sample space S. We refer to these sets as a partition of S. • An event A can be represented as: • Since B1, B2, …, BN are mutually exclusive, then P(A) = P(A  B1) + P(A  B2) + … + P(A  BN) • And therefore P(A) = P(A|B1)*P(B1) + P(A|B2)*P(B2) + … + P(A|BN)*P(BN) = i P(A | Bi) * P(Bi) Exhaustive conditionalization Marginalization
  • 22. Bayes theorem • P(A  B) = P(B) * P(A | B) = P(A) * P(B | A) A P B P A B P ) ( ) ( ) | ( = => Posterior probability Prior of A (Normalizing constant) B A P ) | ( Prior of B Conditional probability (likelihood) This is known as Bayes Theorem or Bayes Rule, and is (one of) the most useful relations in probability and statistics Bayes Theorem is definitely the fundamental relation in Statistical Pattern Recognition
  • 23. Bayes theorem (cont’d) • Given B1, B2, …, BN, a partition of the sample space S. Suppose that event A occurs; what is the probability of event Bj? • P(Bj | A) = P(A | Bj) * P(Bj) / P(A) = P(A | Bj) * P(Bj) / jP(A | Bj)*P(Bj) Bj: different models / hypotheses In the observation of A, should you choose a model that maximizes P(Bj | A) or P(A | Bj)? Depending on how much you know about Bj ! Posterior probability Likelihood Prior of Bj Normalizing constant (theorem of total probabilities)
  • 24. Another example • We’ve talked about the boxes of casinos: 99% fair, 1% loaded (50% at six) • We said if we randomly pick a die and roll, we have 17% of chance to get a six • If we get 3 six in a row, what’s the chance that the die is loaded? • How about 5 six in a row?
  • 25. • P(loaded | 666) = P(666 | loaded) * P(loaded) / P(666) = 0.53 * 0.01 / (0.53 * 0.01 + (1/6)3 * 0.99) = 0.21 • P(loaded | 66666) = P(66666 | loaded) * P(loaded) / P(66666) = 0.55 * 0.01 / (0.55 * 0.01 + (1/6)5 * 0.99) = 0.71
  • 26. Classification : Bayes’ decision theory