SlideShare a Scribd company logo
Chapter 14 Part II

     ISDS 2001
      Matt Levy
Using the Estimated Regression Equation
for Estimation and Prediction
Recall, we use Simple Linear Regression, we are making an
assumption about a relationship between x and y.

Once we have analyzed we have a good fit, through r2 and
other measures, we use the regression equation for estimation
and prediction.

We do this by developing the following:
  Point Estimates
  Interval Estimates
  Confidence Intervals for a mean value of y
  Prediction Intervals for individual values of y
Estimation
We are interested in developing two distinct estimates and intervals
for those estimates:

  An estimate of the mean value of y for a specific x.
  An estimate of an individual value of y.

For a mean value, we develop a confidence interval .
For an individual value, we develop a prediction interval .

This is based on more than simply the number input (x) into the
regression equation, it is based on what the number represents.

For example, an x input and resulting ŷ may be the same values, but
if we are predicting a single y (vs. mean of y) there will be a wider
margin of error for the interval.
Developing the Confidence Interval for
the Mean Value of y
To do this, let's first define some terms:

  xp = the value of the independent variable (usually given)

  yp = the value of the dependent variable

  E(yp) = the expected value of y, given xp.

  ŷp = b0 + b1xp = the point estimate of E(yp) when x = xp.
Developing the Confidence Interval for
the Mean Value of y
In general, we cannot expect ŷp to equal E(yp) exactly.

So to make an inference about how close they are, we need the standard
deviation of ŷp (sŷp).

Equations 14.22 and 14.23 derive the variance and standard deviation,
respectively.
Consequently, the confidence interval for E(yp) is as follows:

   ŷp ± t α/2 * sŷp, b ased on a t-distribution with n-2 degrees of freedom.
Note that we can make the best estimate of y when xp is equal or very close to the mean of x. Meaning we will have a
tighter confidence interval. Figure 14.8 and the equation below it illustrates this concept.
Developing the Prediction Interval for
the Mean Value of y
Applicable when we are trying to predict individual value.

The prediction interval will be considerably larger the confidence interval.

To develop the prediction interval, we need 2 components which comprise the
variance for the prediction interval (s2ind):

1. The variance of y values about the mean E(yp), given by s2
2. The variance associated with using ŷp to estimate E(yp), given by s2ŷp

So that s2ind = s2 + s2ŷp                        2
                                  and sind = √ s ind


And the prediction interval is given by:
  ŷp ± t α/2 * sind, b ased on a t-distribution with n-2 degrees of freedom.
Residual Analysis
The residual is the difference between the observed and estimated dependent
variables (yi-ŷi ).

We use residual analysis to analyze the validity of our assumptions. For most
models, we make the following assumptions:
 1. E(ε) = 0
 2. The variance of ε, denoted by σ2, is the same for all values of x.
 3. The values of ε are independent.
 4. The error term ε has a normal distribution.

These assumptions are the theoretical basis for the t-test and F-test. Therefore
it is important the residuals are analyzed to further state the case for model
validity.

We analyze residuals using graphical plots of the following:
 1. Residuals against the independent variable (x)
 2. Residuals against the predicted values (ŷi).
 3. Standardized Residual Plot.
 4. Normal Probability Plot
Residual Analysis
Residual Plot Against x
   x is on the horizontal axis, (yi - ŷi) on the vertical axis.
   Should loosely resemble a horizontal band.

Residual Plot against ŷ
   ŷ is on the horizontal axis, (yi - ŷi) on the vertical axis.
   Should loosely resemble a horizontal band.

Standardized Residual Plot
    Divide each residual by its standard deviation.
        syi - ŷi = s√1-hi, where hi is the leverage of observation
        See Figures 14.30 and 14.31.
    When looking at the plot, 95% of values should be between (-2, +2).

Normal Probability Plot
   Uses the concept of normal scores, see Figure 14.15

For all probability plots we are visually looking for a pattern scattered about a
horizontal line. Other patterns may violate assumptions.
Residual Analysis:  Outliers and
Influential Observations
An outlier is a data point that does not fit the trend when shown
visually.

It may represent erroneous data, or something that may warrant more
careful examination.

Outliers have the potential to heavily influence our predictive ability in
regression.

For simple linear regression, we can simply use a scatter diagram to
detect outliers. For multiple regression, we must use the standardized
residuals.

Refer to Figure 14.20 for what it looks like to have a very influential
observation.
Residual Analysis:  Outliers and
Influential Observations
Observations with extreme values for the independent
variable are called high leverage points.

For these troublesome data points we introduce a new
measure called the leverage of observation (hi).

See Figure 14.33 for the formula for the hi.

This usually works if a residual is small.

For large residuals with high leverage another measure known
as Cook's D statistic is used. We discuss this in Chapter 15.
End of Chapter 14

Let me know if you have any questions.

Please read the chapter.

Please do your homework.

More Related Content

PPTX
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
PPT
Data Analysison Regression
PPTX
Machine learning session4(linear regression)
PDF
Parameter estimation
PDF
Basics of probability in statistical simulation and stochastic programming
PPT
Tbs910 regression models
PDF
Zero to ECC in 30 Minutes: A primer on Elliptic Curve Cryptography (ECC)
PPTX
Dummy variables
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Data Analysison Regression
Machine learning session4(linear regression)
Parameter estimation
Basics of probability in statistical simulation and stochastic programming
Tbs910 regression models
Zero to ECC in 30 Minutes: A primer on Elliptic Curve Cryptography (ECC)
Dummy variables

What's hot (19)

PPT
Regression Analysis
PDF
Applied numerical methods lec3
PPTX
Correlation nd regression
PPTX
Methodology of Econometrics / Hypothesis Testing
PPTX
Point Estimation
PPTX
Linear regression
PPTX
Point estimation
PPT
Simple lin regress_inference
PPT
Sfs4e ppt 07
DOC
Ordinary least squares linear regression
PDF
Regression analysis made easy
PPTX
Linear Regression and Logistic Regression in ML
PPT
The Normal Probability Distribution
PPTX
Chap05 continuous random variables and probability distributions
PDF
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
PPT
Rational functions 13.1 13.2
PPTX
Machine Learning - Simple Linear Regression
PPT
Normal Probability Distribution
Regression Analysis
Applied numerical methods lec3
Correlation nd regression
Methodology of Econometrics / Hypothesis Testing
Point Estimation
Linear regression
Point estimation
Simple lin regress_inference
Sfs4e ppt 07
Ordinary least squares linear regression
Regression analysis made easy
Linear Regression and Logistic Regression in ML
The Normal Probability Distribution
Chap05 continuous random variables and probability distributions
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Rational functions 13.1 13.2
Machine Learning - Simple Linear Regression
Normal Probability Distribution
Ad

Viewers also liked (6)

PPTX
Stats 3000 Week 2 - Winter 2011
PPT
Annex 14 ppt cheng
PPT
Airport powerpoint
DOCX
أسس تصميم المطارات 1
PDF
Annex 14 ICAO ( Arabic Version ) الإصدار العربي
PPT
Regression analysis ppt
Stats 3000 Week 2 - Winter 2011
Annex 14 ppt cheng
Airport powerpoint
أسس تصميم المطارات 1
Annex 14 ICAO ( Arabic Version ) الإصدار العربي
Regression analysis ppt
Ad

Similar to Chapter 14 Part Ii (20)

PDF
Chapter 14 Part I
PPTX
Chapter two 1 econometrics lecture note.pptx
PDF
The linear regression model: Theory and Application
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
PPTX
Advanced Econometrics L5-6.pptx
PDF
Linear regression model in econometrics undergraduate
PPTX
Regression
PPTX
Critical Care.pptx
PPTX
Evaluating hypothesis
PPT
PPTX
ML-UNIT-IV complete notes download here
PPTX
simple and multiple linear Regression. (1).pptx
PPTX
2. diagnostics, collinearity, transformation, and missing data
PPT
Simple Linear Regression.pptSimple Linear Regression.ppt
PDF
Chapter6
PPT
Corr-and-Regress (1).ppt
PPT
Corr-and-Regress.ppt
PPT
Cr-and-Regress.ppt
PPT
Correlation & Regression for Statistics Social Science
Chapter 14 Part I
Chapter two 1 econometrics lecture note.pptx
The linear regression model: Theory and Application
REGRESSION ANALYSIS THEORY EXPLAINED HERE
Advanced Econometrics L5-6.pptx
Linear regression model in econometrics undergraduate
Regression
Critical Care.pptx
Evaluating hypothesis
ML-UNIT-IV complete notes download here
simple and multiple linear Regression. (1).pptx
2. diagnostics, collinearity, transformation, and missing data
Simple Linear Regression.pptSimple Linear Regression.ppt
Chapter6
Corr-and-Regress (1).ppt
Corr-and-Regress.ppt
Cr-and-Regress.ppt
Correlation & Regression for Statistics Social Science

More from Matthew L Levy (9)

PPT
Chapter 15R Lecture
PDF
Chapter 14R
PDF
Chapter 5R
PDF
Chapter 4R Part II
PDF
Chapter 4 R Part I
PDF
Chapter 20 Lecture Notes
PDF
Chapter 18 Part I
PDF
Chapter 16
PDF
Chapter 15
Chapter 15R Lecture
Chapter 14R
Chapter 5R
Chapter 4R Part II
Chapter 4 R Part I
Chapter 20 Lecture Notes
Chapter 18 Part I
Chapter 16
Chapter 15

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Classroom Observation Tools for Teachers
PPTX
Pharma ospi slides which help in ospi learning
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
Lesson notes of climatology university.
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PDF
RMMM.pdf make it easy to upload and study
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Presentation on HIE in infants and its manifestations
PDF
Microbial disease of the cardiovascular and lymphatic systems
Microbial diseases, their pathogenesis and prophylaxis
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Classroom Observation Tools for Teachers
Pharma ospi slides which help in ospi learning
O7-L3 Supply Chain Operations - ICLT Program
Lesson notes of climatology university.
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
RMMM.pdf make it easy to upload and study
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
STATICS OF THE RIGID BODIES Hibbelers.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
202450812 BayCHI UCSC-SV 20250812 v17.pptx
Pharmacology of Heart Failure /Pharmacotherapy of CHF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Supply Chain Operations Speaking Notes -ICLT Program
Anesthesia in Laparoscopic Surgery in India
Presentation on HIE in infants and its manifestations
Microbial disease of the cardiovascular and lymphatic systems

Chapter 14 Part Ii

  • 1. Chapter 14 Part II ISDS 2001 Matt Levy
  • 2. Using the Estimated Regression Equation for Estimation and Prediction Recall, we use Simple Linear Regression, we are making an assumption about a relationship between x and y. Once we have analyzed we have a good fit, through r2 and other measures, we use the regression equation for estimation and prediction. We do this by developing the following: Point Estimates Interval Estimates Confidence Intervals for a mean value of y Prediction Intervals for individual values of y
  • 3. Estimation We are interested in developing two distinct estimates and intervals for those estimates: An estimate of the mean value of y for a specific x. An estimate of an individual value of y. For a mean value, we develop a confidence interval . For an individual value, we develop a prediction interval . This is based on more than simply the number input (x) into the regression equation, it is based on what the number represents. For example, an x input and resulting ŷ may be the same values, but if we are predicting a single y (vs. mean of y) there will be a wider margin of error for the interval.
  • 4. Developing the Confidence Interval for the Mean Value of y To do this, let's first define some terms: xp = the value of the independent variable (usually given) yp = the value of the dependent variable E(yp) = the expected value of y, given xp. ŷp = b0 + b1xp = the point estimate of E(yp) when x = xp.
  • 5. Developing the Confidence Interval for the Mean Value of y In general, we cannot expect ŷp to equal E(yp) exactly. So to make an inference about how close they are, we need the standard deviation of ŷp (sŷp). Equations 14.22 and 14.23 derive the variance and standard deviation, respectively. Consequently, the confidence interval for E(yp) is as follows: ŷp ± t α/2 * sŷp, b ased on a t-distribution with n-2 degrees of freedom. Note that we can make the best estimate of y when xp is equal or very close to the mean of x. Meaning we will have a tighter confidence interval. Figure 14.8 and the equation below it illustrates this concept.
  • 6. Developing the Prediction Interval for the Mean Value of y Applicable when we are trying to predict individual value. The prediction interval will be considerably larger the confidence interval. To develop the prediction interval, we need 2 components which comprise the variance for the prediction interval (s2ind): 1. The variance of y values about the mean E(yp), given by s2 2. The variance associated with using ŷp to estimate E(yp), given by s2ŷp So that s2ind = s2 + s2ŷp 2 and sind = √ s ind And the prediction interval is given by: ŷp ± t α/2 * sind, b ased on a t-distribution with n-2 degrees of freedom.
  • 7. Residual Analysis The residual is the difference between the observed and estimated dependent variables (yi-ŷi ). We use residual analysis to analyze the validity of our assumptions. For most models, we make the following assumptions: 1. E(ε) = 0 2. The variance of ε, denoted by σ2, is the same for all values of x. 3. The values of ε are independent. 4. The error term ε has a normal distribution. These assumptions are the theoretical basis for the t-test and F-test. Therefore it is important the residuals are analyzed to further state the case for model validity. We analyze residuals using graphical plots of the following: 1. Residuals against the independent variable (x) 2. Residuals against the predicted values (ŷi). 3. Standardized Residual Plot. 4. Normal Probability Plot
  • 8. Residual Analysis Residual Plot Against x x is on the horizontal axis, (yi - ŷi) on the vertical axis. Should loosely resemble a horizontal band. Residual Plot against ŷ ŷ is on the horizontal axis, (yi - ŷi) on the vertical axis. Should loosely resemble a horizontal band. Standardized Residual Plot Divide each residual by its standard deviation. syi - ŷi = s√1-hi, where hi is the leverage of observation See Figures 14.30 and 14.31. When looking at the plot, 95% of values should be between (-2, +2). Normal Probability Plot Uses the concept of normal scores, see Figure 14.15 For all probability plots we are visually looking for a pattern scattered about a horizontal line. Other patterns may violate assumptions.
  • 9. Residual Analysis:  Outliers and Influential Observations An outlier is a data point that does not fit the trend when shown visually. It may represent erroneous data, or something that may warrant more careful examination. Outliers have the potential to heavily influence our predictive ability in regression. For simple linear regression, we can simply use a scatter diagram to detect outliers. For multiple regression, we must use the standardized residuals. Refer to Figure 14.20 for what it looks like to have a very influential observation.
  • 10. Residual Analysis:  Outliers and Influential Observations Observations with extreme values for the independent variable are called high leverage points. For these troublesome data points we introduce a new measure called the leverage of observation (hi). See Figure 14.33 for the formula for the hi. This usually works if a residual is small. For large residuals with high leverage another measure known as Cook's D statistic is used. We discuss this in Chapter 15.
  • 11. End of Chapter 14 Let me know if you have any questions. Please read the chapter. Please do your homework.