SlideShare a Scribd company logo
U N I V E R S I T Y O F S O U T H F L O R I D A //
Linear Regression Concepts
Dr. S. Shivendu
U N I V E R S I T Y O F S O U T H F L O R I D A // 2
Objectives
Linear Regression Concepts
Identify the mathematical basis of linear
regression.
01
Differentiate statistical inferences about
relationships based on regression output.
02
Analyze the concepts of p-value, hypothesis
testing, and confidence intervals, and their
interpretation.
03
U N I V E R S I T Y O F S O U T H F L O R I D A // 3
Agenda
Linear Regression Concepts
Regression Analysis
Introduction
Linear Regression
Concepts
Assumptions
Concepts
Coefficient Confidence Intervals
Concepts
Prediction Confidence Intervals
Concepts
U N I V E R S I T Y O F S O U T H F L O R I D A // 4
Models
A mathematical model is a mathematical expression of some phenomenon
Describe relationships between variables
Deterministic
Models
Probabilistic
Models
U N I V E R S I T Y O F S O U T H F L O R I D A // 5
Deterministic Models
Hypothesize exact relationships.
Suitable when the relationship is certain and known.
Example: Force is exactly mass times acceleration
 F = m·a
U N I V E R S I T Y O F S O U T H F L O R I D A // 6
The relationship is not certain and all factors that impact
the outcome are not known
Hypothesize two components
Probabilistic Models
 Deterministic and random error
Example: Sales volume (y) is 10 times advertising
spending (x) + random error
 y = 10x + 
 The random error may be due to factors
other than advertising
U N I V E R S I T Y O F S O U T H F L O R I D A // 7
Regression Models
Answers: “What is the relationship between the variables?”
Equations used:
One numerical dependent (response) variable
Used mainly for estimating the strength of the relationship and
for prediction
One or more numerical or categorical independent
(explanatory) variables
U N I V E R S I T Y O F S O U T H F L O R I D A // 8
Regression Modeling Steps
Hypothesize the
deterministic
relationship
between the
response variable
(dependent
variable) and one
or more
explanatory
(independent
variables) in the
Population
Specify
probability
distribution of
random error
term. Estimate
the standard
deviation of the
error
Estimate
unknown model
parameters
Interpret the
estimated
parameters?
What is a
parameter?
U N I V E R S I T Y O F S O U T H F L O R I D A // 9
Model Specification is Based on Theory
Theory of field
(e.g., Sociology)
Mathematical
theory
Previous research
“Common sense”
U N I V E R S I T Y O F S O U T H F L O R I D A // 10
Types of Regression Models
Simple
1 Explanatory
Variable
Regression
Models
2+ Explanatory
Variables
Multiple
Linear Linear Non- Linear
Non- Linear
U N I V E R S I T Y O F S O U T H F L O R I D A // 11
Linear Regression Models
Relationship between variables is a linear function
y 
Dependent (Response)
Variable
 x 
= + +
Population y - intercept Participation Slope Random Error
Independent (Explanatory)
Variable
0 1
U N I V E R S I T Y O F S O U T H F L O R I D A // 12
Population Linear Regression Model
y
x
0 1
i i i
y x
  
  
  0 1
E y x
 
 
Observed value
Observed value
i = Random error
U N I V E R S I T Y O F S O U T H F L O R I D A // 13
Sample Linear Regression Model
y
x
0 1
ˆ ˆ ˆ
i i i
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
Unsampled observation
i = Random error
Observed value
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 14
Estimating Parameters: Least Squares Method
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 15
Scattergram
0
20
40
60
0 20 40 60
x
y
Plot of all (xi, yi) pairs
Suggests how well the model will fit
U N I V E R S I T Y O F S O U T H F L O R I D A // 16
Thinking Challenge
How would you draw a line
through the points?
0
20
40
60
0 20 40 60
x
y
How would you determine
which line fits best?
U N I V E R S I T Y O F S O U T H F L O R I D A // 17
Least Squares
“Best fit’ means the
difference between
actual y values and
estimated or predicted y
values are a minimum
 
2 2
1 1
ˆ ˆ
n n
i
i i
i i
y y 
 
 
 
Positive differences off-set
negative
Least Squares minimizes
the Sum of the Squared
Differences (SSE)
U N I V E R S I T Y O F S O U T H F L O R I D A // 18
Least Squares Graphically
e2
y
x
e1 e3
e4
^
^
^
^
2 0 1 2 2
ˆ ˆ ˆ
y x
  
  
0 1
ˆ ˆ
ˆi i
y x
 
 
2 2 2 2 2
1 2 3 4
1
ˆ ˆ ˆ ˆ ˆ
LS minimizes
n
i
i
    

   

U N I V E R S I T Y O F S O U T H F L O R I D A // 19
Coefficient Equations
Prediction Equation
0 1
ˆ ˆ
ŷ x
 
 
1 1
1
1 2
1
2
1
ˆ
n n
i i
n
i i
i i
xy i
n
xx
i
n
i
i
i
x y
x y
SS n
SS
x
x
n

 



  
  
  

 
 
 
 

 



Slope
0 1
ˆ ˆ
y x
 
 
y-intercept
U N I V E R S I T Y O F S O U T H F L O R I D A // 20
Estimated y changes by 1 for each 1unit increase in x
Interpretation of Coefficients
If 1 = 2, then Sales (y) is expected to increase by 2 for each
1 unit increase in Advertising (x)
The average value of y when x = 0
If 0 = 4, then Average Sales (y) is expected to be 4 when
Advertising (x) is 0
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 21
Parameter Estimation Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
0
^
1
^
ˆ .1 .7
y x
  
U N I V E R S I T Y O F S O U T H F L O R I D A // 22
Sales Volume (y) is expected to increase by .7 units for
each $1 increase in Advertising (x)
Coefficient Interpretation Solution
Average value of Sales Volume (y) is -.10 units when
Advertising (x) is 0
 Difficult to explain to marketing manager
 Expect some sales without advertising
Slope (1)
Y-Intercept (0)
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 23
Probability Distribution of Random Error
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Evaluate model
Use model for prediction and estimation
U N I V E R S I T Y O F S O U T H F L O R I D A // 24
Linear Regression Assumptions
The mean probability
distribution of error, ε, is
0
The probability
distribution of error, ε, is
approximately normally
distributed
The probability
distribution of error has
a constant variance
Errors are independent
U N I V E R S I T Y O F S O U T H F L O R I D A // 25
Error Probability Distribution
x1 x2 x3
y
E(y) = β0 + β1x
x
Variation of actual y from
predicted y, y
Random Error Variation
Measured by standard error of
regression model. Sample
standard deviation of  : s
Affects several factors like
parameter significance and
prediction accuracy
U N I V E R S I T Y O F S O U T H F L O R I D A // 27
Variation Measures
y
x
xi
0 1
ˆ ˆ
ˆi i
y x
 
 
yi
2
ˆ
( )
i i
y y

Unexplained sum of
squares or SSE
2
( )
i
y y

Total sum of squares
2
ˆ
( )
i
y y

Explained sum of
squares
y
U N I V E R S I T Y O F S O U T H F L O R I D A // 28
Estimation of Variance of Error σ2
 
2
2
ˆ
2
i i
SSE
s where SSE y y
n
  


2
2
SSE
s s
n
 

U N I V E R S I T Y O F S O U T H F L O R I D A // 29
Residual Analysis
e Y Y
= -
i i
ˆ
Check the assumptions of regression by examining the residuals
 Examine for linearity assumption
 Evaluate independence assumption
 Evaluate normal distribution assumption
 Examine for constant variance for all levels of X (homoscedasticity)
The residual for observation i, ei, is the difference between its
observed and predicted value
U N I V E R S I T Y O F S O U T H F L O R I D A // 30
Residual Analysis for Linearity
Not Linear Linear
x
residuals
x
Y
x
Y
x
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 31
Residual Analysis for Independence
Not Independent Independent
X
X
residuals
residuals
X
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 32
Check for Normality
Examine the Sem-and-Leaf Display of the Residuals
Examine the Boxplot of the Residuals
Examine the Histogram of the Residuals
Construct a Normal Probability Plot of the Residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 33
Residual Analysis for Normality
Percent
Residual
When using a normal probability plot, normal errors
will approximately display in a straight line
-3 -2 -1 0 1 2 3
0
100
U N I V E R S I T Y O F S O U T H F L O R I D A // 34
Residual Analysis for Equal Variance
Non-constant variance Constant variance
x x
Y
x x
Y
residuals
residuals
U N I V E R S I T Y O F S O U T H F L O R I D A // 35
Interpreting the Model - Testing for Significance
Hypothesize deterministic component
Estimate unknown model parameters
Regression Modeling Steps
Specify probability distribution of random error term
Interpret model
U N I V E R S I T Y O F S O U T H F L O R I D A // 36
Test of Slope Coefficient
Shows if there is a linear
relationship between x
and y
Hypotheses:
Involves population
slope 1
Theoretical basis is
sampling distribution of
slope
 H0: 1 = 0 (No Linear Relationship)
 Ha: 1  0 (Linear Relationship)
U N I V E R S I T Y O F S O U T H F L O R I D A // 37
Sampling Distribution of Sample Slopes
y
Population Line
x
Sample 1 Line
Sample 2 Line
1
Sampling Distribution
1
1
S
^
^
All Possible
Sample Slopes
Sample 1: 2.5
Sample 2: 1.6
Sample 3: 1.8
Sample 4: 2.1
: :
Very large number of
sample slopes
U N I V E R S I T Y O F S O U T H F L O R I D A // 38
Slope Coefficient Test Statistic
1
1 1
ˆ
2
1
2
1
ˆ ˆ
2
where
xx
n
i
n
i
xx i
i
t df n
s
S
SS
x
SS x
n

 


   
 
 
 
 


U N I V E R S I T Y O F S O U T H F L O R I D A // 39
Test of Slope Coefficient Computer Output
Parameter Estimates
Parameter Standard T for H0:
Variable DF Estimate Error Param=0 Prob>|T|
INTERCEP 1 -0.1000 0.6350 -0.157 0.8849
ADVERT 1 0.7000 0.1914 3.656 0.0354
t = 1 / S
P-Value
S
1 1 1
^
^
^
^
U N I V E R S I T Y O F S O U T H F L O R I D A // 40
Prediction with Regression Models
Types of predictions
What is predicted?
 Point estimates
 Interval estimates
 Population mean response E (y) for given x
 Point on population regression line
 Individual response (y) for given x
U N I V E R S I T Y O F S O U T H F L O R I D A // 41
Confidence Interval Estimate for Mean Value of y at x = x
 
xx
p
SS
x
x
n
S
t
y
2
2
/
1
ˆ


 
df = n – 2
p
U N I V E R S I T Y O F S O U T H F L O R I D A // 42
Factors Affecting Interval Width
Level of confidence (1 – )
 Width increases as confidence increases
Data dispersion (s)
 Width increases as variation increases
Sample size
 Width decreases as sample size increases
Distance of x from mean x
 Width increases as distance increases
p
-
U N I V E R S I T Y O F S O U T H F L O R I D A // 43
Prediction Interval of Individual Value of y at x = x
df = n – 2
p
 
2
/2
1
ˆ 1
p
xx
x x
y t S
n SS


  
U N I V E R S I T Y O F S O U T H F L O R I D A // 44
Key Takeaway
The statistical
interpretation is the
value proposition of
the linear
regression model
The statistical
interpretation
depends on
assumptions of the
linear model being
met
Understanding
outliers is critical for
drawing meaningful
inferences from the
linear regression
model
U N I V E R S I T Y O F S O U T H F L O R I D A //
You have reached the end
of the presentation.

More Related Content

PPTX
Regression Analysis
PPTX
regression.pptx
PDF
Business statistics-ii-aarhus-bss
PPT
Data Analysison Regression
PPTX
01_SLR_final (1).pptx
PDF
Regression analysis
PPT
15.Simple Linear Regression of case study-530 (2).ppt
PPTX
Regression refers to the statistical technique of modeling
Regression Analysis
regression.pptx
Business statistics-ii-aarhus-bss
Data Analysison Regression
01_SLR_final (1).pptx
Regression analysis
15.Simple Linear Regression of case study-530 (2).ppt
Regression refers to the statistical technique of modeling

Similar to Linear Regression (20)

PPT
Simple Linear Regression.pptSimple Linear Regression.ppt
PPT
Linear regression
PPTX
An Introduction to Regression Models: Linear and Logistic approaches
DOCX
The future is uncertain. Some events do have a very small probabil.docx
PDF
eR-Biostat_LinearRegressioninR_2017_V1.pdf
PDF
Lecture 1.pdf
PPTX
Regression
PPT
Simple lin regress_inference
PPTX
Bivariate
PPTX
business Lesson-Linear-Regression-1.pptx
PPTX
Sessions 18 19- Regression- SLR MLR.pptx
PPT
Linear regression
PPTX
Regression Analysis.pptx
DOCX
1Chapter 11 • Interval Estimation of a Populatio.docx
DOCX
1Chapter 11 • Interval Estimation of a Populatio.docx
PDF
Linear regression model in econometrics undergraduate
PPT
Linear Regression with simple way to learn
PPT
linear Regression, multiple Regression and Annova
PPTX
1. linear model, inference, prediction
PPT
Lesson07_new
Simple Linear Regression.pptSimple Linear Regression.ppt
Linear regression
An Introduction to Regression Models: Linear and Logistic approaches
The future is uncertain. Some events do have a very small probabil.docx
eR-Biostat_LinearRegressioninR_2017_V1.pdf
Lecture 1.pdf
Regression
Simple lin regress_inference
Bivariate
business Lesson-Linear-Regression-1.pptx
Sessions 18 19- Regression- SLR MLR.pptx
Linear regression
Regression Analysis.pptx
1Chapter 11 • Interval Estimation of a Populatio.docx
1Chapter 11 • Interval Estimation of a Populatio.docx
Linear regression model in econometrics undergraduate
Linear Regression with simple way to learn
linear Regression, multiple Regression and Annova
1. linear model, inference, prediction
Lesson07_new
Ad

More from Michael770443 (8)

PPTX
Discrete Choice Model - Part 2
PPTX
Discrete Choice Model
PPTX
Categorical Data and Statistical Analysis
PPTX
Analysis of Variance
PPTX
Classification
PPTX
Segmentation: Clustering and Classification
PPTX
Introduction to Statistical Methods
PPTX
Overview of Statistical Concepts
Discrete Choice Model - Part 2
Discrete Choice Model
Categorical Data and Statistical Analysis
Analysis of Variance
Classification
Segmentation: Clustering and Classification
Introduction to Statistical Methods
Overview of Statistical Concepts
Ad

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PPTX
Cell Structure & Organelles in detailed.
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
A systematic review of self-coping strategies used by university students to ...
PDF
Trump Administration's workforce development strategy
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Microbial diseases, their pathogenesis and prophylaxis
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
Cell Structure & Organelles in detailed.
Supply Chain Operations Speaking Notes -ICLT Program
2.FourierTransform-ShortQuestionswithAnswers.pdf
O7-L3 Supply Chain Operations - ICLT Program
STATICS OF THE RIGID BODIES Hibbelers.pdf
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
GENETICS IN BIOLOGY IN SECONDARY LEVEL FORM 3
human mycosis Human fungal infections are called human mycosis..pptx
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
A systematic review of self-coping strategies used by university students to ...
Trump Administration's workforce development strategy
GDM (1) (1).pptx small presentation for students
Abdominal Access Techniques with Prof. Dr. R K Mishra

Linear Regression

  • 1. U N I V E R S I T Y O F S O U T H F L O R I D A // Linear Regression Concepts Dr. S. Shivendu
  • 2. U N I V E R S I T Y O F S O U T H F L O R I D A // 2 Objectives Linear Regression Concepts Identify the mathematical basis of linear regression. 01 Differentiate statistical inferences about relationships based on regression output. 02 Analyze the concepts of p-value, hypothesis testing, and confidence intervals, and their interpretation. 03
  • 3. U N I V E R S I T Y O F S O U T H F L O R I D A // 3 Agenda Linear Regression Concepts Regression Analysis Introduction Linear Regression Concepts Assumptions Concepts Coefficient Confidence Intervals Concepts Prediction Confidence Intervals Concepts
  • 4. U N I V E R S I T Y O F S O U T H F L O R I D A // 4 Models A mathematical model is a mathematical expression of some phenomenon Describe relationships between variables Deterministic Models Probabilistic Models
  • 5. U N I V E R S I T Y O F S O U T H F L O R I D A // 5 Deterministic Models Hypothesize exact relationships. Suitable when the relationship is certain and known. Example: Force is exactly mass times acceleration  F = m·a
  • 6. U N I V E R S I T Y O F S O U T H F L O R I D A // 6 The relationship is not certain and all factors that impact the outcome are not known Hypothesize two components Probabilistic Models  Deterministic and random error Example: Sales volume (y) is 10 times advertising spending (x) + random error  y = 10x +   The random error may be due to factors other than advertising
  • 7. U N I V E R S I T Y O F S O U T H F L O R I D A // 7 Regression Models Answers: “What is the relationship between the variables?” Equations used: One numerical dependent (response) variable Used mainly for estimating the strength of the relationship and for prediction One or more numerical or categorical independent (explanatory) variables
  • 8. U N I V E R S I T Y O F S O U T H F L O R I D A // 8 Regression Modeling Steps Hypothesize the deterministic relationship between the response variable (dependent variable) and one or more explanatory (independent variables) in the Population Specify probability distribution of random error term. Estimate the standard deviation of the error Estimate unknown model parameters Interpret the estimated parameters? What is a parameter?
  • 9. U N I V E R S I T Y O F S O U T H F L O R I D A // 9 Model Specification is Based on Theory Theory of field (e.g., Sociology) Mathematical theory Previous research “Common sense”
  • 10. U N I V E R S I T Y O F S O U T H F L O R I D A // 10 Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Linear Non- Linear Non- Linear
  • 11. U N I V E R S I T Y O F S O U T H F L O R I D A // 11 Linear Regression Models Relationship between variables is a linear function y  Dependent (Response) Variable  x  = + + Population y - intercept Participation Slope Random Error Independent (Explanatory) Variable 0 1
  • 12. U N I V E R S I T Y O F S O U T H F L O R I D A // 12 Population Linear Regression Model y x 0 1 i i i y x         0 1 E y x     Observed value Observed value i = Random error
  • 13. U N I V E R S I T Y O F S O U T H F L O R I D A // 13 Sample Linear Regression Model y x 0 1 ˆ ˆ ˆ i i i y x       0 1 ˆ ˆ ˆi i y x     Unsampled observation i = Random error Observed value ^
  • 14. U N I V E R S I T Y O F S O U T H F L O R I D A // 14 Estimating Parameters: Least Squares Method Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 15. U N I V E R S I T Y O F S O U T H F L O R I D A // 15 Scattergram 0 20 40 60 0 20 40 60 x y Plot of all (xi, yi) pairs Suggests how well the model will fit
  • 16. U N I V E R S I T Y O F S O U T H F L O R I D A // 16 Thinking Challenge How would you draw a line through the points? 0 20 40 60 0 20 40 60 x y How would you determine which line fits best?
  • 17. U N I V E R S I T Y O F S O U T H F L O R I D A // 17 Least Squares “Best fit’ means the difference between actual y values and estimated or predicted y values are a minimum   2 2 1 1 ˆ ˆ n n i i i i i y y        Positive differences off-set negative Least Squares minimizes the Sum of the Squared Differences (SSE)
  • 18. U N I V E R S I T Y O F S O U T H F L O R I D A // 18 Least Squares Graphically e2 y x e1 e3 e4 ^ ^ ^ ^ 2 0 1 2 2 ˆ ˆ ˆ y x       0 1 ˆ ˆ ˆi i y x     2 2 2 2 2 1 2 3 4 1 ˆ ˆ ˆ ˆ ˆ LS minimizes n i i           
  • 19. U N I V E R S I T Y O F S O U T H F L O R I D A // 19 Coefficient Equations Prediction Equation 0 1 ˆ ˆ ŷ x     1 1 1 1 2 1 2 1 ˆ n n i i n i i i i xy i n xx i n i i i x y x y SS n SS x x n                               Slope 0 1 ˆ ˆ y x     y-intercept
  • 20. U N I V E R S I T Y O F S O U T H F L O R I D A // 20 Estimated y changes by 1 for each 1unit increase in x Interpretation of Coefficients If 1 = 2, then Sales (y) is expected to increase by 2 for each 1 unit increase in Advertising (x) The average value of y when x = 0 If 0 = 4, then Average Sales (y) is expected to be 4 when Advertising (x) is 0 Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 21. U N I V E R S I T Y O F S O U T H F L O R I D A // 21 Parameter Estimation Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 0 ^ 1 ^ ˆ .1 .7 y x   
  • 22. U N I V E R S I T Y O F S O U T H F L O R I D A // 22 Sales Volume (y) is expected to increase by .7 units for each $1 increase in Advertising (x) Coefficient Interpretation Solution Average value of Sales Volume (y) is -.10 units when Advertising (x) is 0  Difficult to explain to marketing manager  Expect some sales without advertising Slope (1) Y-Intercept (0) ^ ^ ^ ^
  • 23. U N I V E R S I T Y O F S O U T H F L O R I D A // 23 Probability Distribution of Random Error Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Evaluate model Use model for prediction and estimation
  • 24. U N I V E R S I T Y O F S O U T H F L O R I D A // 24 Linear Regression Assumptions The mean probability distribution of error, ε, is 0 The probability distribution of error, ε, is approximately normally distributed The probability distribution of error has a constant variance Errors are independent
  • 25. U N I V E R S I T Y O F S O U T H F L O R I D A // 25 Error Probability Distribution x1 x2 x3 y E(y) = β0 + β1x x
  • 26. Variation of actual y from predicted y, y Random Error Variation Measured by standard error of regression model. Sample standard deviation of  : s Affects several factors like parameter significance and prediction accuracy
  • 27. U N I V E R S I T Y O F S O U T H F L O R I D A // 27 Variation Measures y x xi 0 1 ˆ ˆ ˆi i y x     yi 2 ˆ ( ) i i y y  Unexplained sum of squares or SSE 2 ( ) i y y  Total sum of squares 2 ˆ ( ) i y y  Explained sum of squares y
  • 28. U N I V E R S I T Y O F S O U T H F L O R I D A // 28 Estimation of Variance of Error σ2   2 2 ˆ 2 i i SSE s where SSE y y n      2 2 SSE s s n   
  • 29. U N I V E R S I T Y O F S O U T H F L O R I D A // 29 Residual Analysis e Y Y = - i i ˆ Check the assumptions of regression by examining the residuals  Examine for linearity assumption  Evaluate independence assumption  Evaluate normal distribution assumption  Examine for constant variance for all levels of X (homoscedasticity) The residual for observation i, ei, is the difference between its observed and predicted value
  • 30. U N I V E R S I T Y O F S O U T H F L O R I D A // 30 Residual Analysis for Linearity Not Linear Linear x residuals x Y x Y x residuals
  • 31. U N I V E R S I T Y O F S O U T H F L O R I D A // 31 Residual Analysis for Independence Not Independent Independent X X residuals residuals X residuals
  • 32. U N I V E R S I T Y O F S O U T H F L O R I D A // 32 Check for Normality Examine the Sem-and-Leaf Display of the Residuals Examine the Boxplot of the Residuals Examine the Histogram of the Residuals Construct a Normal Probability Plot of the Residuals
  • 33. U N I V E R S I T Y O F S O U T H F L O R I D A // 33 Residual Analysis for Normality Percent Residual When using a normal probability plot, normal errors will approximately display in a straight line -3 -2 -1 0 1 2 3 0 100
  • 34. U N I V E R S I T Y O F S O U T H F L O R I D A // 34 Residual Analysis for Equal Variance Non-constant variance Constant variance x x Y x x Y residuals residuals
  • 35. U N I V E R S I T Y O F S O U T H F L O R I D A // 35 Interpreting the Model - Testing for Significance Hypothesize deterministic component Estimate unknown model parameters Regression Modeling Steps Specify probability distribution of random error term Interpret model
  • 36. U N I V E R S I T Y O F S O U T H F L O R I D A // 36 Test of Slope Coefficient Shows if there is a linear relationship between x and y Hypotheses: Involves population slope 1 Theoretical basis is sampling distribution of slope  H0: 1 = 0 (No Linear Relationship)  Ha: 1  0 (Linear Relationship)
  • 37. U N I V E R S I T Y O F S O U T H F L O R I D A // 37 Sampling Distribution of Sample Slopes y Population Line x Sample 1 Line Sample 2 Line 1 Sampling Distribution 1 1 S ^ ^ All Possible Sample Slopes Sample 1: 2.5 Sample 2: 1.6 Sample 3: 1.8 Sample 4: 2.1 : : Very large number of sample slopes
  • 38. U N I V E R S I T Y O F S O U T H F L O R I D A // 38 Slope Coefficient Test Statistic 1 1 1 ˆ 2 1 2 1 ˆ ˆ 2 where xx n i n i xx i i t df n s S SS x SS x n                   
  • 39. U N I V E R S I T Y O F S O U T H F L O R I D A // 39 Test of Slope Coefficient Computer Output Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 -0.1000 0.6350 -0.157 0.8849 ADVERT 1 0.7000 0.1914 3.656 0.0354 t = 1 / S P-Value S 1 1 1 ^ ^ ^ ^
  • 40. U N I V E R S I T Y O F S O U T H F L O R I D A // 40 Prediction with Regression Models Types of predictions What is predicted?  Point estimates  Interval estimates  Population mean response E (y) for given x  Point on population regression line  Individual response (y) for given x
  • 41. U N I V E R S I T Y O F S O U T H F L O R I D A // 41 Confidence Interval Estimate for Mean Value of y at x = x   xx p SS x x n S t y 2 2 / 1 ˆ     df = n – 2 p
  • 42. U N I V E R S I T Y O F S O U T H F L O R I D A // 42 Factors Affecting Interval Width Level of confidence (1 – )  Width increases as confidence increases Data dispersion (s)  Width increases as variation increases Sample size  Width decreases as sample size increases Distance of x from mean x  Width increases as distance increases p -
  • 43. U N I V E R S I T Y O F S O U T H F L O R I D A // 43 Prediction Interval of Individual Value of y at x = x df = n – 2 p   2 /2 1 ˆ 1 p xx x x y t S n SS     
  • 44. U N I V E R S I T Y O F S O U T H F L O R I D A // 44 Key Takeaway The statistical interpretation is the value proposition of the linear regression model The statistical interpretation depends on assumptions of the linear model being met Understanding outliers is critical for drawing meaningful inferences from the linear regression model
  • 45. U N I V E R S I T Y O F S O U T H F L O R I D A // You have reached the end of the presentation.