SlideShare a Scribd company logo
Regression Analysis
Multiple Regression
[ Cross-Sectional Data ]
Learning Objectives
 Explain the linear multiple regression
model [for cross-sectional data]
 Interpret linear multiple regression
computer output
 Explain multicollinearity
 Describe the types of multiple regression
models
Regression Modeling Steps
 Define problem or question
 Specify model
 Collect data
 Do descriptive data analysis
 Estimate unknown parameters
 Evaluate model
 Use model for prediction
Simple vs. Multiple
 
 represents the
unit change in Y
per unit change in
X .
 Does not take into
account any other
variable besides
single independent
variable.
 
i represents the unit
change in Y per unit
change in Xi.
 Takes into account
the effect of other

i s.
 “Net regression
coefficient.”
Assumptions
 Linearity - the Y variable is linearly related
to the value of the X variable.
 Independence of Error - the error
(residual) is independent for each value of X.
 Homoscedasticity - the variation around
the line of regression be constant for all values
of X.
 Normality - the values of Y be normally
distributed at each value of X.
Goal
Develop a statistical model that
can predict the values of a
dependent (response
response) variable
based upon the values of the
independent (explanatory
explanatory)
variables.
Simple Regression
A statistical model that utilizes
one quantitative
quantitative independent
variable “X” to predict the
quantitative
quantitative dependent
variable “Y.”
Multiple Regression
A statistical model that utilizes two
or more quantitative and
qualitative explanatory variables
(x1,..., xp) to predict a quantitative
quantitative
dependent variable Y.
Caution: have at least two or more quantitative
explanatory variables (rule of thumb)
Multiple Regression Model
X2
X1
Y
e
Hypotheses
 H0: 
1 = 
2 = 
3 = ... = 
P = 0
 H1: At least one regression
coefficient is not equal to
zero
Hypotheses (alternate format)
H0: 
i
i = 0
H1: 
i
i  0
Types of Models
 Positive linear relationship
 Negative linear relationship
 No relationship between X and Y
 Positive curvilinear relationship
 U-shaped curvilinear
 Negative curvilinear relationship
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Multiple Regression Equations
This is too
complicated! You’ve got to
be kiddin’!
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Linear Model
Relationship between one dependent & two
or more independent variables is a linear
function




 




 P
P X
X
X
Y 
2
2
1
1
0
Dependent
Dependent
(response)
(response)
variable
variable
Independent
Independent
(explanatory)
(explanatory)
variables
variables
Population
Population
slopes
slopes
Population
Population
Y-intercept
Y-intercept
Random
Random
error
error
Method of Least Squares
 The straight line that best fits the data.
 Determine the straight line for which the
differences between the actual values (Y)
and the values that would be predicted
from the fitted line of regression (Y-hat)
are as small as possible.
Measures of Variation
 Explained variation (sum of
squares due to regression)
 Unexplained variation (error sum
of squares)
 Total sum of squares
Coefficient of Multiple Determination
When null hypothesis
is rejected, a
relationship between
Y and the X variables
exists.
Strength measured by
R2
[ several types ]
Coefficient of Multiple
Determination
R2
y.123- - -P
The proportion of Y that is
explained by the set of
explanatory variables selected
Standard Error of the Estimate
s
sy.x
y.x
the measure of
variability
around the line
of regression
Confidence interval estimates
»True mean
Y.X
»Individual
Y-hati
Interval Bands [from simple regression]
X
Y
X
Y i
= b 0
+ b 1
X
^
Xgiven
_
Multiple Regression Equation
Y-hat = 
0 + 
1x1 + 
2x2 + ... + 
PxP + 

where:

0 = y-intercept {a constant value}

1
1 = slope of Y with variable x1 holding the
variables x2, x3, ..., xP effects constant

P = slope of Y with variable xP holding all
other variables’ effects constant
Who is in Charge?
Mini-Case
Predict the consumption of home
heating oil during January for
homes located around Screne Lakes.
Two explanatory variables are
selected - - average daily
atmospheric temperature (o
F) and
the amount of attic insulation (“).
Oil (Gal) Temp Insulation
275.30 40 3
363.80 27 3
164.30 40 10
40.80 73 6
94.30 64 6
230.90 34 6
366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
Mini-Case
(0
F)
Develop a model for
estimating heating oil
used for a single family
home in the month of
January based on average
temperature and amount
of insulation in inches.
Mini-Case
 What preliminary conclusions can home
owners draw from the data?
 What could a home owner expect heating
oil consumption (in gallons) to be if the
outside temperature is 15 o
F when the
attic insulation is 10 inches thick?
Multiple Regression Equation
[mini-case]
Dependent variable: Gallons Consumed
-------------------------------------------------------------------------------------
Standard T
Parameter Estimate Error Statistic P-Value
--------------------------------------------------------------------------------------
CONSTANT 562.151 21.0931 26.6509 0.0000
Insulation -20.0123 2.34251 -8.54313 0.0000
Temperature -5.43658 0.336216 -16.1699 0.0000
--------------------------------------------------------------------------------------
R-squared = 96.561 percent
R-squared (adjusted for d.f.) = 95.9879 percent
Standard Error of Est. = 26.0138
+
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
Y-hat = 562.15 - 5.44x1
1 - 20.01x
- 20.01x2
2
where: x
x1
1 = temperature [degrees F]
x
x2
2 = attic
attic insulation [inches]
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
Y-hat = 562.15 - 5.44x1
1 - 20.01x
- 20.01x2
2
thus:
thus:
 For a home with zero inches of attic
insulation and an outside temperature of 0 o
F,
562.15 gallons of heating oil would be consumed.
[ caution .. data boundaries .. extrapolation ]
[ caution .. data boundaries .. extrapolation ]
+
Extrapolation
Y
Interpolation
X
Extrapolation Extrapolation
Relevant Range
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
Y-hat = 562.15 - 5.44x1
1 - 20.01x
- 20.01x2
2
 For a home with zero attic insulation and an outside temperature of zero,
562.15 gallons of heating oil would be consumed. [ caution .. data boundaries
[ caution .. data boundaries
.. extrapolation ]
.. extrapolation ]
 For each incremental increase in degree F of
temperature, for a given amount of attic insulation,
for a given amount of attic insulation,
heating oil consumption drops 5.44 gallons.
+
Multiple Regression Equation
[mini-case]
Y-hat = 562.15 - 5.44x
Y-hat = 562.15 - 5.44x1
1 - 20.01x
- 20.01x2
2
 For a home with zero attic insulation and an outside temperature of zero, 562
gallons of heating oil would be consumed. [ caution … ]
[ caution … ]
 For each incremental increase in degree F of temperature, for a given amount
of attic insulation, heating oil consumption drops 5.44 gallons.
 For each incremental increase in inches of
attic insulation, at a given temperature,
at a given temperature,
heating oil consumption drops 20.01
gallons.
Multiple Regression Prediction
[mini-case]
Y-hat = 562.15 - 5.44x
Y-hat = 562.15 - 5.44x1
1 - 20.01x
- 20.01x2
2
with x1 = 15o
F and x2 = 10 inches
Y-hat = 562.15 - 5.44(15) - 20.01(10)
= 280.45 gallons consumed
Coefficient of Multiple Determination
[mini-case]
R2
y.12 = .9656
96.56 percent of the variation in
heating oil can be explained by
the variation in temperature and
and
insulation.
Coefficient of Multiple Determination
 Proportion of variation in Y ‘explained’ by all
X variables taken together
 R2
Y.12 = Explained variation = SSR
Total variation SST
 Never decreases when new X variable is
added to model
– Only Y values determine SST
– Disadvantage when comparing models
 Proportion of variation in Y ‘explained’ by all
X variables taken together
 Reflects
– Sample size
– Number of independent variables
 Smaller [more conservative] than R2
Y.12
 Used to compare models
Coefficient of Multiple Determination
Adjusted
Coefficient of Multiple Determination
(adjusted)
R2
(adj) y.123- - -P
The proportion of Y that is explained by the
set of independent [explanatory] variables
selected, adjusted for the number of
independent variables and the sample size.
Coefficient of Multiple Determination
(adjusted) [Mini-Case]
R2
adj = 0.9599
95.99 percent of the variation in
heating oil consumption can be
explained by the model - adjusted
for number of independent variables
and the sample size
Coefficient of Partial Determination
 Proportion of variation in Y ‘explained’ by
variable XP holding all others constant
 Must estimate separate models
 Denoted R2
Y1.2 in two X variables case
– Coefficient of partial determination of X1 with Y
holding X2 constant
 Useful in selecting X variables
Coefficient of Partial
Determination [p. 878]
R2
y1.234 --- P
The coefficient of partial variation of
variable Y with x1 holding constant
the effects of variables x2, x3, x4, ... xP.
Coefficient of Partial Determination
[Mini-Case]
R2
y1.2 = 0.9561
For a fixed (constant) amount of
insulation, 95.61 percent of the variation
in heating oil can be explained by the
variation in average atmospheric
temperature. [p. 879]
Coefficient of Partial Determination
[Mini-Case]
R2
y2.1 = 0.8588
For a fixed (constant) temperature,
85.88 percent of the variation in
heating oil can be explained by the
variation in amount of insulation.
Testing Overall Significance
 Shows if there is a linear relationship between
all X variables together & Y
 Uses p-value
 Hypotheses
– H0: 1 = 2 = ... = P = 0
»No linear relationship
– H1: At least one coefficient is not 0
»At least one X variable affects Y
 Examines the contribution of a set of X
variables to the relationship with Y
 Null hypothesis:
– Variables in set do not improve significantly
the model when all other variables are included
 Must estimate separate models
 Used in selecting X variables
Testing Model Portions
Diagnostic Checking
 H0 retain or reject
If reject - {p-value  0.05}
 R2
adj
 Correlation matrix
 Partial correlation matrix
Multicollinearity
 High correlation between X variables
 Coefficients measure combined effect
 Leads to unstable coefficients depending on X
variables in model
 Always exists; matter of degree
 Example: Using both total number of rooms
and number of bedrooms as explanatory
variables in same model
Detecting Multicollinearity
 Examine correlation matrix
– Correlations between pairs of X variables are
more than with Y variable
 Few remedies
– Obtain new sample data
– Eliminate one correlated X variable
Evaluating Multiple Regression Model Steps
 Examine variation measures
 Do residual analysis
 Test parameter significance
– Overall model
– Portions of model
– Individual coefficients
 Test for multicollinearity
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
– e.g., female-male, employed-not employed, etc.
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
– e.g., female-male, employed-not employed, etc.
 Variable levels coded 0 & 1
Dummy-Variable Regression Model
 Involves categorical X variable with
two levels
– e.g., female-male, employed-not employed, etc.
 Variable levels coded 0 & 1
 Assumes only intercept is different
– Slopes are constant across categories
Dummy-Variable Model Relationships
Y
Y
X
X1
1
0
0
0
0
Same slopes b1
b
b0
0
b
b0
0 + b
+ b2
2
Females
Males
Dummy Variables
 Permits use of
qualitative data
(e.g.: seasonal, class
standing, location,
gender).
 0, 1 coding
(nominative data)
 As part of Diagnostic
Checking;
incorporate outliers
(i.e.: large residuals)
and influence
measures.
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Interaction Regression Model
 Hypothesizes interaction between pairs of X
variables
– Response to one X variable varies at different
levels of another X variable
 Contains two-way cross product terms
Y = 0 + 1x1 + 2x2 + 3x1x2 + 
 Can be combined with other models
e.g. dummy variable models
Effect of Interaction
 Given:
 Without interaction term, effect of X1 on Y is
measured by 1
 With interaction term, effect of X1 on
Y is measured by 1 + 3X2
– Effect increases as X2i increases
Y X X X X
i i i i i i
    
    
0 1 1 2 2 3 1 2
Interaction Example
X
X1
1
4
4
8
8
12
12
0
0
0
0 1
1
0.5
0.5 1.5
1.5
Y
Y Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3
+ 3X
X2
2 + 4
+ 4X
X1
1X
X2
2
Interaction Example
X
X1
1
4
4
8
8
12
12
0
0
0
0 1
1
0.5
0.5 1.5
1.5
Y
Y Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3
+ 3X
X2
2 + 4
+ 4X
X1
1X
X2
2
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3(
+ 3(0
0) + 4
) + 4X
X1
1(
(0
0) = 1 + 2
) = 1 + 2X
X1
1
Interaction Example
Y
Y
X
X1
1
4
4
8
8
12
12
0
0
0
0 1
1
0.5
0.5 1.5
1.5
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3
+ 3X
X2
2 + 4
+ 4X
X1
1X
X2
2
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3(
+ 3(1
1) + 4
) + 4X
X1
1(
(1
1) = 4 + 6
) = 4 + 6X
X1
1
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3(
+ 3(0
0) + 4
) + 4X
X1
1(
(0
0) = 1 + 2
) = 1 + 2X
X1
1
Interaction Example
Effect (slope) of
Effect (slope) of X
X1
1 on
on Y
Y does depend on
does depend on X
X2
2 value
value
X
X1
1
4
4
8
8
12
12
0
0
0
0 1
1
0.5
0.5 1.5
1.5
Y
Y Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3
+ 3X
X2
2 + 4
+ 4X
X1
1X
X2
2
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3(
+ 3(1
1) + 4
) + 4X
X1
1(
(1
1) = 4 +
) = 4 + 6
6X
X1
1
Y
Y = 1 + 2
= 1 + 2X
X1
1 + 3(
+ 3(0
0) + 4
) + 4X
X1
1(
(0
0) = 1 +
) = 1 + 2
2X
X1
1
Multiple Regression Models
Multiple
Regression
Models
Linear
Dummy
Variable
Linear
Non-
Linear
Inter-
action
Poly-
Nomial
Square
Root
Log Reciprocal Exponential
Inherently Linear Models
 Non-linear models that can be expressed in
linear form
– Can be estimated by least square in linear form
 Require data transformation
Y
X1
Curvilinear Model Relationships
Y
X1
Y
X1
Y
X1
Logarithmic Transformation
Y
X1

1
1 > 0
> 0

1
1 < 0
< 0
Y =  + 1 lnx1 + 2 lnx2 + 
Square-Root Transformation
Y
X1
Y X X
i i i i
   
   
0 1 1 2 2

1
1 > 0
> 0

1
1 < 0
< 0
Reciprocal Transformation
Y
X1

1
1 > 0
> 0

1
1 < 0
< 0
i
i
i
i
X
X
Y 


 



2
2
1
1
0
1
1
Asymptote
Asymptote
Exponential Transformation
Y
X1

1
1 > 0
> 0

1
1 < 0
< 0
Y e
i
X X
i
i i
  
  

0 1 1 2 2
Overview
 Explained the linear multiple regression
model
 Interpreted linear multiple regression
computer output
 Explained multicollinearity
 Described the types of multiple regression
models
Source of Elaborate Slides
Prentice Hall, Inc
Levine, et. all, First Edition
Regression Analysis
[Multiple Regression]
*** End of Presentation ***
Questions?
multiple regression and other refgression analysis
multiple regression and other refgression analysis

More Related Content

PPT
Regression Analysis - Linear & Multiple Models
PPT
multiple.ppt
PPT
multiple.ppt
PPT
regression_with_variate_type_multiple.ppt
PPT
multiple.ppt
PPT
12 introduction to multiple regression model
PPT
Chapter 14
PPT
Lesson07_new
Regression Analysis - Linear & Multiple Models
multiple.ppt
multiple.ppt
regression_with_variate_type_multiple.ppt
multiple.ppt
12 introduction to multiple regression model
Chapter 14
Lesson07_new

Similar to multiple regression and other refgression analysis (20)

PDF
Ch14 multiple regression
PPT
Get Multiple Regression Assignment Help
PDF
Multiple linear regression
PPT
Chap014.ppt
PPT
regression analysis .ppt
PPTX
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
PPTX
14. Regression_RcOMMANDER .pptx
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPT
Chap014 BStat SemA201.ppt
PPTX
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
PPTX
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
PPTX
Regression analysis in R
PPT
Chapter14
PPTX
PPTX
simple and multiple linear Regression. (1).pptx
PPT
Linear regression.ppt
PPTX
SSP PRESENTATION COMPLETE ( ADVANCE ) .pptx
PPTX
Regression analysis
PPTX
Linear Regression.pptx
PPT
Chapter13
Ch14 multiple regression
Get Multiple Regression Assignment Help
Multiple linear regression
Chap014.ppt
regression analysis .ppt
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
14. Regression_RcOMMANDER .pptx
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
Chap014 BStat SemA201.ppt
What is Simple Linear Regression and How Can an Enterprise Use this Technique...
What is Multiple Linear Regression and How Can it be Helpful for Business Ana...
Regression analysis in R
Chapter14
simple and multiple linear Regression. (1).pptx
Linear regression.ppt
SSP PRESENTATION COMPLETE ( ADVANCE ) .pptx
Regression analysis
Linear Regression.pptx
Chapter13
Ad

More from ssuserd23711 (10)

PPT
data mining and the purpose of using mining
PPT
what is regression and why it needed in the analysis
PPT
regression basics with linear form and binary regression
PDF
1 - Introduction.pdf
PDF
DL.pdf
PPTX
NITW_Improving Deep Neural Networks.pptx
PDF
MELAKU.pdf
PDF
Digital_IOT_(Microsoft_Solution).pdf
PPT
Introduction.ppt
PPT
L2-3.FA19.ppt
data mining and the purpose of using mining
what is regression and why it needed in the analysis
regression basics with linear form and binary regression
1 - Introduction.pdf
DL.pdf
NITW_Improving Deep Neural Networks.pptx
MELAKU.pdf
Digital_IOT_(Microsoft_Solution).pdf
Introduction.ppt
L2-3.FA19.ppt
Ad

Recently uploaded (20)

PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Lesson notes of climatology university.
PPTX
master seminar digital applications in india
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Sports Quiz easy sports quiz sports quiz
PDF
01-Introduction-to-Information-Management.pdf
PDF
Complications of Minimal Access Surgery at WLH
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Cell Types and Its function , kingdom of life
PDF
Insiders guide to clinical Medicine.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
TR - Agricultural Crops Production NC III.pdf
Microbial diseases, their pathogenesis and prophylaxis
Lesson notes of climatology university.
master seminar digital applications in india
O5-L3 Freight Transport Ops (International) V1.pdf
human mycosis Human fungal infections are called human mycosis..pptx
VCE English Exam - Section C Student Revision Booklet
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Sports Quiz easy sports quiz sports quiz
01-Introduction-to-Information-Management.pdf
Complications of Minimal Access Surgery at WLH
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Computing-Curriculum for Schools in Ghana
Cell Types and Its function , kingdom of life
Insiders guide to clinical Medicine.pdf
Renaissance Architecture: A Journey from Faith to Humanism
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PPH.pptx obstetrics and gynecology in nursing
TR - Agricultural Crops Production NC III.pdf

multiple regression and other refgression analysis

  • 2. Learning Objectives  Explain the linear multiple regression model [for cross-sectional data]  Interpret linear multiple regression computer output  Explain multicollinearity  Describe the types of multiple regression models
  • 3. Regression Modeling Steps  Define problem or question  Specify model  Collect data  Do descriptive data analysis  Estimate unknown parameters  Evaluate model  Use model for prediction
  • 4. Simple vs. Multiple    represents the unit change in Y per unit change in X .  Does not take into account any other variable besides single independent variable.   i represents the unit change in Y per unit change in Xi.  Takes into account the effect of other  i s.  “Net regression coefficient.”
  • 5. Assumptions  Linearity - the Y variable is linearly related to the value of the X variable.  Independence of Error - the error (residual) is independent for each value of X.  Homoscedasticity - the variation around the line of regression be constant for all values of X.  Normality - the values of Y be normally distributed at each value of X.
  • 6. Goal Develop a statistical model that can predict the values of a dependent (response response) variable based upon the values of the independent (explanatory explanatory) variables.
  • 7. Simple Regression A statistical model that utilizes one quantitative quantitative independent variable “X” to predict the quantitative quantitative dependent variable “Y.”
  • 8. Multiple Regression A statistical model that utilizes two or more quantitative and qualitative explanatory variables (x1,..., xp) to predict a quantitative quantitative dependent variable Y. Caution: have at least two or more quantitative explanatory variables (rule of thumb)
  • 10. Hypotheses  H0:  1 =  2 =  3 = ... =  P = 0  H1: At least one regression coefficient is not equal to zero
  • 11. Hypotheses (alternate format) H0:  i i = 0 H1:  i i  0
  • 12. Types of Models  Positive linear relationship  Negative linear relationship  No relationship between X and Y  Positive curvilinear relationship  U-shaped curvilinear  Negative curvilinear relationship
  • 14. Multiple Regression Equations This is too complicated! You’ve got to be kiddin’!
  • 16. Linear Model Relationship between one dependent & two or more independent variables is a linear function            P P X X X Y  2 2 1 1 0 Dependent Dependent (response) (response) variable variable Independent Independent (explanatory) (explanatory) variables variables Population Population slopes slopes Population Population Y-intercept Y-intercept Random Random error error
  • 17. Method of Least Squares  The straight line that best fits the data.  Determine the straight line for which the differences between the actual values (Y) and the values that would be predicted from the fitted line of regression (Y-hat) are as small as possible.
  • 18. Measures of Variation  Explained variation (sum of squares due to regression)  Unexplained variation (error sum of squares)  Total sum of squares
  • 19. Coefficient of Multiple Determination When null hypothesis is rejected, a relationship between Y and the X variables exists. Strength measured by R2 [ several types ]
  • 20. Coefficient of Multiple Determination R2 y.123- - -P The proportion of Y that is explained by the set of explanatory variables selected
  • 21. Standard Error of the Estimate s sy.x y.x the measure of variability around the line of regression
  • 22. Confidence interval estimates »True mean Y.X »Individual Y-hati
  • 23. Interval Bands [from simple regression] X Y X Y i = b 0 + b 1 X ^ Xgiven _
  • 24. Multiple Regression Equation Y-hat =  0 +  1x1 +  2x2 + ... +  PxP +   where:  0 = y-intercept {a constant value}  1 1 = slope of Y with variable x1 holding the variables x2, x3, ..., xP effects constant  P = slope of Y with variable xP holding all other variables’ effects constant
  • 25. Who is in Charge?
  • 26. Mini-Case Predict the consumption of home heating oil during January for homes located around Screne Lakes. Two explanatory variables are selected - - average daily atmospheric temperature (o F) and the amount of attic insulation (“).
  • 27. Oil (Gal) Temp Insulation 275.30 40 3 363.80 27 3 164.30 40 10 40.80 73 6 94.30 64 6 230.90 34 6 366.70 9 6 300.60 8 10 237.80 23 10 121.40 63 3 31.40 65 10 203.50 41 6 441.10 21 3 323.00 38 3 52.50 58 10 Mini-Case (0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
  • 28. Mini-Case  What preliminary conclusions can home owners draw from the data?  What could a home owner expect heating oil consumption (in gallons) to be if the outside temperature is 15 o F when the attic insulation is 10 inches thick?
  • 29. Multiple Regression Equation [mini-case] Dependent variable: Gallons Consumed ------------------------------------------------------------------------------------- Standard T Parameter Estimate Error Statistic P-Value -------------------------------------------------------------------------------------- CONSTANT 562.151 21.0931 26.6509 0.0000 Insulation -20.0123 2.34251 -8.54313 0.0000 Temperature -5.43658 0.336216 -16.1699 0.0000 -------------------------------------------------------------------------------------- R-squared = 96.561 percent R-squared (adjusted for d.f.) = 95.9879 percent Standard Error of Est. = 26.0138 +
  • 30. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x Y-hat = 562.15 - 5.44x1 1 - 20.01x - 20.01x2 2 where: x x1 1 = temperature [degrees F] x x2 2 = attic attic insulation [inches]
  • 31. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x Y-hat = 562.15 - 5.44x1 1 - 20.01x - 20.01x2 2 thus: thus:  For a home with zero inches of attic insulation and an outside temperature of 0 o F, 562.15 gallons of heating oil would be consumed. [ caution .. data boundaries .. extrapolation ] [ caution .. data boundaries .. extrapolation ] +
  • 33. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x Y-hat = 562.15 - 5.44x1 1 - 20.01x - 20.01x2 2  For a home with zero attic insulation and an outside temperature of zero, 562.15 gallons of heating oil would be consumed. [ caution .. data boundaries [ caution .. data boundaries .. extrapolation ] .. extrapolation ]  For each incremental increase in degree F of temperature, for a given amount of attic insulation, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons. +
  • 34. Multiple Regression Equation [mini-case] Y-hat = 562.15 - 5.44x Y-hat = 562.15 - 5.44x1 1 - 20.01x - 20.01x2 2  For a home with zero attic insulation and an outside temperature of zero, 562 gallons of heating oil would be consumed. [ caution … ] [ caution … ]  For each incremental increase in degree F of temperature, for a given amount of attic insulation, heating oil consumption drops 5.44 gallons.  For each incremental increase in inches of attic insulation, at a given temperature, at a given temperature, heating oil consumption drops 20.01 gallons.
  • 35. Multiple Regression Prediction [mini-case] Y-hat = 562.15 - 5.44x Y-hat = 562.15 - 5.44x1 1 - 20.01x - 20.01x2 2 with x1 = 15o F and x2 = 10 inches Y-hat = 562.15 - 5.44(15) - 20.01(10) = 280.45 gallons consumed
  • 36. Coefficient of Multiple Determination [mini-case] R2 y.12 = .9656 96.56 percent of the variation in heating oil can be explained by the variation in temperature and and insulation.
  • 37. Coefficient of Multiple Determination  Proportion of variation in Y ‘explained’ by all X variables taken together  R2 Y.12 = Explained variation = SSR Total variation SST  Never decreases when new X variable is added to model – Only Y values determine SST – Disadvantage when comparing models
  • 38.  Proportion of variation in Y ‘explained’ by all X variables taken together  Reflects – Sample size – Number of independent variables  Smaller [more conservative] than R2 Y.12  Used to compare models Coefficient of Multiple Determination Adjusted
  • 39. Coefficient of Multiple Determination (adjusted) R2 (adj) y.123- - -P The proportion of Y that is explained by the set of independent [explanatory] variables selected, adjusted for the number of independent variables and the sample size.
  • 40. Coefficient of Multiple Determination (adjusted) [Mini-Case] R2 adj = 0.9599 95.99 percent of the variation in heating oil consumption can be explained by the model - adjusted for number of independent variables and the sample size
  • 41. Coefficient of Partial Determination  Proportion of variation in Y ‘explained’ by variable XP holding all others constant  Must estimate separate models  Denoted R2 Y1.2 in two X variables case – Coefficient of partial determination of X1 with Y holding X2 constant  Useful in selecting X variables
  • 42. Coefficient of Partial Determination [p. 878] R2 y1.234 --- P The coefficient of partial variation of variable Y with x1 holding constant the effects of variables x2, x3, x4, ... xP.
  • 43. Coefficient of Partial Determination [Mini-Case] R2 y1.2 = 0.9561 For a fixed (constant) amount of insulation, 95.61 percent of the variation in heating oil can be explained by the variation in average atmospheric temperature. [p. 879]
  • 44. Coefficient of Partial Determination [Mini-Case] R2 y2.1 = 0.8588 For a fixed (constant) temperature, 85.88 percent of the variation in heating oil can be explained by the variation in amount of insulation.
  • 45. Testing Overall Significance  Shows if there is a linear relationship between all X variables together & Y  Uses p-value  Hypotheses – H0: 1 = 2 = ... = P = 0 »No linear relationship – H1: At least one coefficient is not 0 »At least one X variable affects Y
  • 46.  Examines the contribution of a set of X variables to the relationship with Y  Null hypothesis: – Variables in set do not improve significantly the model when all other variables are included  Must estimate separate models  Used in selecting X variables Testing Model Portions
  • 47. Diagnostic Checking  H0 retain or reject If reject - {p-value  0.05}  R2 adj  Correlation matrix  Partial correlation matrix
  • 48. Multicollinearity  High correlation between X variables  Coefficients measure combined effect  Leads to unstable coefficients depending on X variables in model  Always exists; matter of degree  Example: Using both total number of rooms and number of bedrooms as explanatory variables in same model
  • 49. Detecting Multicollinearity  Examine correlation matrix – Correlations between pairs of X variables are more than with Y variable  Few remedies – Obtain new sample data – Eliminate one correlated X variable
  • 50. Evaluating Multiple Regression Model Steps  Examine variation measures  Do residual analysis  Test parameter significance – Overall model – Portions of model – Individual coefficients  Test for multicollinearity
  • 52. Dummy-Variable Regression Model  Involves categorical X variable with two levels – e.g., female-male, employed-not employed, etc.
  • 53. Dummy-Variable Regression Model  Involves categorical X variable with two levels – e.g., female-male, employed-not employed, etc.  Variable levels coded 0 & 1
  • 54. Dummy-Variable Regression Model  Involves categorical X variable with two levels – e.g., female-male, employed-not employed, etc.  Variable levels coded 0 & 1  Assumes only intercept is different – Slopes are constant across categories
  • 55. Dummy-Variable Model Relationships Y Y X X1 1 0 0 0 0 Same slopes b1 b b0 0 b b0 0 + b + b2 2 Females Males
  • 56. Dummy Variables  Permits use of qualitative data (e.g.: seasonal, class standing, location, gender).  0, 1 coding (nominative data)  As part of Diagnostic Checking; incorporate outliers (i.e.: large residuals) and influence measures.
  • 58. Interaction Regression Model  Hypothesizes interaction between pairs of X variables – Response to one X variable varies at different levels of another X variable  Contains two-way cross product terms Y = 0 + 1x1 + 2x2 + 3x1x2 +   Can be combined with other models e.g. dummy variable models
  • 59. Effect of Interaction  Given:  Without interaction term, effect of X1 on Y is measured by 1  With interaction term, effect of X1 on Y is measured by 1 + 3X2 – Effect increases as X2i increases Y X X X X i i i i i i           0 1 1 2 2 3 1 2
  • 60. Interaction Example X X1 1 4 4 8 8 12 12 0 0 0 0 1 1 0.5 0.5 1.5 1.5 Y Y Y Y = 1 + 2 = 1 + 2X X1 1 + 3 + 3X X2 2 + 4 + 4X X1 1X X2 2
  • 61. Interaction Example X X1 1 4 4 8 8 12 12 0 0 0 0 1 1 0.5 0.5 1.5 1.5 Y Y Y Y = 1 + 2 = 1 + 2X X1 1 + 3 + 3X X2 2 + 4 + 4X X1 1X X2 2 Y Y = 1 + 2 = 1 + 2X X1 1 + 3( + 3(0 0) + 4 ) + 4X X1 1( (0 0) = 1 + 2 ) = 1 + 2X X1 1
  • 62. Interaction Example Y Y X X1 1 4 4 8 8 12 12 0 0 0 0 1 1 0.5 0.5 1.5 1.5 Y Y = 1 + 2 = 1 + 2X X1 1 + 3 + 3X X2 2 + 4 + 4X X1 1X X2 2 Y Y = 1 + 2 = 1 + 2X X1 1 + 3( + 3(1 1) + 4 ) + 4X X1 1( (1 1) = 4 + 6 ) = 4 + 6X X1 1 Y Y = 1 + 2 = 1 + 2X X1 1 + 3( + 3(0 0) + 4 ) + 4X X1 1( (0 0) = 1 + 2 ) = 1 + 2X X1 1
  • 63. Interaction Example Effect (slope) of Effect (slope) of X X1 1 on on Y Y does depend on does depend on X X2 2 value value X X1 1 4 4 8 8 12 12 0 0 0 0 1 1 0.5 0.5 1.5 1.5 Y Y Y Y = 1 + 2 = 1 + 2X X1 1 + 3 + 3X X2 2 + 4 + 4X X1 1X X2 2 Y Y = 1 + 2 = 1 + 2X X1 1 + 3( + 3(1 1) + 4 ) + 4X X1 1( (1 1) = 4 + ) = 4 + 6 6X X1 1 Y Y = 1 + 2 = 1 + 2X X1 1 + 3( + 3(0 0) + 4 ) + 4X X1 1( (0 0) = 1 + ) = 1 + 2 2X X1 1
  • 65. Inherently Linear Models  Non-linear models that can be expressed in linear form – Can be estimated by least square in linear form  Require data transformation
  • 67. Logarithmic Transformation Y X1  1 1 > 0 > 0  1 1 < 0 < 0 Y =  + 1 lnx1 + 2 lnx2 + 
  • 68. Square-Root Transformation Y X1 Y X X i i i i         0 1 1 2 2  1 1 > 0 > 0  1 1 < 0 < 0
  • 69. Reciprocal Transformation Y X1  1 1 > 0 > 0  1 1 < 0 < 0 i i i i X X Y         2 2 1 1 0 1 1 Asymptote Asymptote
  • 70. Exponential Transformation Y X1  1 1 > 0 > 0  1 1 < 0 < 0 Y e i X X i i i        0 1 1 2 2
  • 71. Overview  Explained the linear multiple regression model  Interpreted linear multiple regression computer output  Explained multicollinearity  Described the types of multiple regression models
  • 72. Source of Elaborate Slides Prentice Hall, Inc Levine, et. all, First Edition
  • 73. Regression Analysis [Multiple Regression] *** End of Presentation *** Questions?

Editor's Notes

  • #2: As a result of this class, you will be able to...
  • #23: Note: 1. As we move farther from the mean, the bands get wider. 2. The prediction interval bands are wider. Why? (extra Syx)
  • #32: Extrapolation Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values
  • #37: SSR is sum of squares regression (not residual; that’s SSE).
  • #45: Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than .