what is regression and why it needed in the analysis

Topic 3: Simple Linear
Regression

Outline
• Simple linear regression model
– Model parameters
– Distribution of error terms
• Estimation of regression parameters
– Method of least squares
– Maximum likelihood

Data for Simple Linear
Regression
• Observe i=1,2,...,n pairs of variables
• Each pair often called a case
• Yi = ith
response variable
• Xi = ith
explanatory variable

Simple Linear Regression
Model
• Yi = b0 + b1Xi + ei
• b0 is the intercept
• b1 is the slope
• ei is a random error term
– E(ei)=0 and s2
(ei)=s2
– ei and ej are uncorrelated

Simple Linear Normal Error
Regression Model
• Yi = b0 + b1Xi + ei
• b0 is the intercept
• b1 is the slope
• ei is a Normally distributed random error
with mean 0 and variance σ2
• ei and ej are uncorrelated → indep

Model Parameters
• β0 : the intercept
• β1 : the slope
• σ2 :
the variance of the error term

Features of Both
Regression Models
• Yi = β0 + β1Xi + ei
• E (Yi) = β0 + β1Xi + E(ei) = β0 + β1Xi
• Var(Yi) = 0 + var(ei) = σ2
– Mean of Yi determined by value of Xi
– All possible means fall on a line
– The Yi vary about this line

Features of Normal Error
Regression Model
• Yi = β0 + β1Xi + ei
• If ei is Normally distributed then
Yi is N(β0 + β1Xi , σ2
) (A.36)
• Does not imply the collection of Yi are
Normally distributed

Fitted Regression Equation
and Residuals
• Ŷi = b0 + b1Xi
–b0 is the estimated intercept
–b1 is the estimated slope
• ei : residual for ith
case
• ei = Yi – Ŷi = Yi – (b0 + b1Xi)

X=82
Ŷ82=b0 + b182 e82=Y82-Ŷ82

Plot the residuals
proc gplot data=a2;
plot resid*year vref=0;
where lean ne .;
run;
Continuation of pisa.sas
Using data set from output statement
vref=0 adds horizontal line to plot at zero

Least Squares
• Want to find “best” b0 and b1
• Will minimize Σ(Yi – (b0 + b1Xi) )2
• Use calculus: take derivative with
respect to b0 and with respect to b1
and set the two resulting equations
equal to zero and solve for b0 and b1
• See KNNL pgs 16-17

Least Squares Solution
• These are also maximum likelihood estimators
for Normal error model, see KNNL pp 30-32
X
b
Y
b
X
X
Y
Y
X
X
b
1
0
2
i
i
i
1
)
(
)
)(
(









Maximum Likelihood
 
2
i 0 1 i
2
i 0 1 i
Y X
1
2
i
1 2 n
0 1
Y ~ X ,
1
2
(likelihood function)
Find and which maximizes
N
f e
L f f f
L
 

  

 
 
 
  
 


   


Estimation of σ2
MSE
s
s
MSE
df
SSE
s
E
Root
2
n
e
2
n
Ŷ
Y
2
2
i
i
i
2 )
(








 


Standard output from Proc REG
Analysis of Variance
Source DF
Sum of
Squares
Mean
Square F Value Pr > F
Model 1 15804 15804 904.12 <.0001
Error 11 192.2857
1
17.48052
Corrected Total 12 15997
Root MSE 4.18097 R-Square 0.9880
Dependent Mean 693.69231 Adj R-Sq 0.9869
Coeff Var 0.60271
MSE
dfe
s

Properties of Least Squares
Line
• The line always goes through
•
• Other properties on pgs 23-24
)
Y
,
X
(
0
)
)
((
))
(
(
0
1
1
0
1
0
1
0













 

 
b
X
b
Y
n
X
nb
nb
Y
n
X
b
b
Y
X
b
b
Y
e
i
i
i
i
i

Background Reading
• Chapter 1
– 1.6 : Estimation of regression function
– 1.7 : Estimation of error variance
– 1.8 : Normal regression model
• Chapter 2
– 2.1 and 2.2 : inference concerning  ’s
• Appendix A
– A.4, A.5, A.6, and A.7

what is regression and why it needed in the analysis

More Related Content

Similar to what is regression and why it needed in the analysis (20)

More from ssuserd23711 (10)

Recently uploaded (20)

what is regression and why it needed in the analysis