SlideShare a Scribd company logo
HYPOTHESIS TESTING
MATH DETAILS
The OLS estimators of the slope and the intercept are
1
1 2
2
1
0 1
( )( )
ˆ (4.7)
( )
ˆ ˆ . (4.8)
n
i i
i XY
n
X
i
i
X X Y Y
s
s
X X
Y X

 


 
 

 


1
β 0
β
Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
MEASURES OF FIT (SW SECTION 4.3)
• Two regression statistics provide complementary measures of how well the regression
line “fits” or explains the data:
• The regression R2 measures the fraction of the variance of Y that is explained by X; it is
unitless and ranges between zero (no fit) and one (perfect fit)
• The standard error of the regression (SER) measures the magnitude of a typical
regression residual in the units of Y.
THE REGRESSION R2 IS THE FRACTION OF
THE SAMPLE VARIANCE OF YI “EXPLAINED”
BY THE REGRESSION
ˆ ˆ OLS prediction OLS residual
ˆ ˆsample var ( ) sample var( ) sample var( )( ?)
total sum of squares “explained” SS “residual” SS
i i i
i i
Y Y u
Y Y u why
   
  
  
2
2 2 1
2
1
ˆ ˆ( )
:
( )



 



n
i
i
n
i
i
Y Y
ESS
Definition of R R
TSS
Y Y
• means ESS = 0
• means ESS = TSS
• For regression with a single X, = the square of the correlation
coefficient between X and Y
2
=0R
2
=1R
2
0 1R •
2
R
DO DISTRICTS WITH SMALLER
CLASSES HAVE HIGHER TEST SCORES?
Scatterplot of test score v. student-teacher ratio
What does this figure show?
WE NEED TO GET SOME NUMERICAL EVIDENCE
ON WHETHER DISTRICTS WITH LOW STRS HAVE
HIGHER TEST SCORES – BUT HOW?
1. Compare average test scores in districts with low STRs to those with high STRs
(“estimation”)
2. Test the “null” hypothesis that the mean test scores in the two types of districts are
the same, against the “alternative” hypothesis that they differ (“hypothesis testing”)
3. Estimate an interval for the difference in the mean test scores, high v. low STR districts
(“confidence interval”)
INITIAL DATA ANALYSIS: COMPARE
DISTRICTS WITH “SMALL” (STR < 20)
AND “LARGE” (STR ≥ 20) CLASS SIZES
Class Size Average score Standard deviation
(s2)
n
Small 657.4 19.4 238
Large 650.0 17.9 182
Y
1. Estimation of Δ = difference between group means
2. Test the hypothesis that Δ = 0
3. Construct a confidence interval for Δ
1. ESTIMATION
largesmall
small large
1 1small large
1 1
657.4 650.0
7.4
nn
i i
i i
Y Y Y Y
n n 
  
 

 
Is this a large difference in a real-world sense?
• Standard deviation across districts = 19.1
• Difference between 60th and 75th percentiles of test score distribution
is 667.6 – 659.4 = 8.2
• This is a big enough difference to be important for school reform
discussions, for parents, or for a school committee?
2. HYPOTHESIS TESTING
2 2
Difference-in-means test: compute the -statistic,
(remember this?)
( )s l
s l
s l s l
s s
s l
n n
t
Y Y Y Y
t
SE Y Y
 
 

2 2
1
where ( ) is the “standard error” of , the subscripts
and/refer to “small” and “large” STR districts, and
1
( ) (etc.)
1
s
s l s l
n
s i s
is
SE Y Y Y Y s
s Y Y
n 
 
 


•
2. HYPOTHESIS TESTING
Compute the difference-of-means t-statistic:
Size s2 n
small 657.4 19.4 238
large 650.0 17.9 182
Y
2 2 2 2
19.4 17.9
238 182
657.4 650.0 7.4
4.05
1.83s l
s l
s l
s s
n n
Y Y
t
 
   

so reject (at the 5% significance level) the null hypothesis that the two
means are the same.
1.96,t
3. CONFIDENCE INTERVAL
A 95% confidence interval for the difference between the means is,
1.96
7.4
( )
1.96 1.83 (3.8, 11.0)
( )S l S lY Y Y YSE  
   
Two equivalent statements:
1. The 95% confidence interval for Δ doesn’t include 0;
2. The hypothesis that Δ = 0 is rejected at the 5% level.
HYPOTHESIS TESTING – OLS REGRESSION
HYPOTHESIS TESTING AND THE STANDARD ERROR OF Β1
The objective is to test a hypothesis, like β1 = 0, using data – to reach a
tentative conclusion whether the (null) hypothesis is correct or incorrect.
General setup
Null hypothesis and two-sided alternative:
H0: β1 = β1,0 vs. H1: β1 ≠ β1,0
where β1,0 is the hypothesized value under the null.
Null hypothesis and one-sided alternative:
H0: β1 = β1,0 vs. H1: β1 < β1,0
GENERAL APPROACH: CONSTRUCT T-
STATISTIC, AND COMPUTE P-VALUE (OR
COMPARE TO THE N(0,1) CRITICAL VALUE)
:• In general
estimator - hypothesized value
standard error of the estimator
t 
where the SE of the estimator is the square root of an
estimator of the variance of the estimator.
,0
:
/
Y
Y
Y
t
S n

• For testing the mean of Y
,• 1For testing 1 1,0
1
ˆ
,
ˆ( )
t
SE
 



1
1
ˆwhere ( ) the square root of an estimator of the variance
ˆof the sampling distribution of
SE 


1
ˆ (1of 2)( )Formula for SE 
Recall the expression for the variance of (large n):
2
1 2 2 2 2
var[( ) ]ˆvar( ) , where ( ) .
( ) ( )
i x i v
i i X i
X X
X u
v X u
n n
 
 
 

   
1
2 2
ˆThe estimator of the variance of replaces the unknown population
values of and by estimators constructed from the data:v X

 
1
2
2
2 1
ˆ 22 2
2
1
1
ˆ
estimator of1 1 2
ˆ
(estimator of ) 1
( )
ˆ ˆwhere ( ) .
n
i
v i
n
X
i
i
i i i
v
n
n n
X X
n
v X X u







   
 
 
 
 


1
ˆ (2 of 2)( )Formula for SE 
1
1
2
2 1
ˆ 2
2
1
2
ˆ1 1
1
ˆ
1 2
ˆ ˆ ˆ, where ( ) .
1
( )
ˆ ˆˆ( ) the standard error of
n
i
i
i i i
n
i
i
v
n
v X X u
n
X X
n
SE



  



   
 
 
 
 


This is a bit nasty, but:
• It is less complicated than it seems. The numerator estimates
var(v), the denominator estimates [var(X)]2.
• Why the degrees-of-freedom adjustment n – 2? Because two
coefficients have been estimated (β0 and β1).
1
ˆ( ) is computed by regression softwareSE •
• Your regression software has memorized this formula so you
don’t need to.
SUMMARY:
TO TEST H0: Β1 = Β1,0 V. H1: Β1 ≠
Β1,0
• Construct the t-statistic
1
1 1,0 1 1,0
2
1 ˆ
ˆ ˆ
ˆ( ) ˆ
t
SE

   
 
 
 
• Reject at 5% significance level if
• The p-value is probability in tails of normal
outside |tact|; you reject at the 5% significance level if the p-
value is < 5%.
t 1.96
[ ]Pr act
t tp  
1
ˆThis procedure relies on the large- approximation that is normally
distributed; typically = 50 is large enough for the approximation
to be excellent.
n
n
•
EXAMPLE: TEST SCORES AND STR,
CALIFORNIA DATA (1 OF 2)
Regression software reports the standard errors:
0 1
1 1,0
1,0
1
ˆ ˆ( ) 10.4 ( ) 0.52
ˆ 2.28 0
-statistic testing 0 4.38
ˆ 0.52( )
SE SE
t
SE
 
 


 
  
    
• The 1% 2-sided significance level is 2.58, so we reject the null
at the 1% significance level.
• Alternatively, we can compute the p-value…
Estimated regression line: Test Score = 698.9 – 2.28 x STR
EXAMPLE: TEST SCORES AND STR,
CALIFORNIA DATA (2 OF 2)
The p-value based on the large-n standard normal
approximation to the t-statistic is 0.00001 (10–5)
CONFIDENCE INTERVALS FOR Β1
(SECTION 5.2)
Recall that a 95% confidence is, equivalently:
• The set of points that cannot be rejected at the 5% significance
level;
• A set-valued function of the data (an interval that is a function
of the data) that contains the true parameter value 95% of the
time in repeated samples.
Because the t-statistic for β1 is N(0,1) in large samples, construction of a 95%
confidence for β1 is just like the case of the sample mean:
1 1 1
ˆ ˆ95% confidence interval for { 1.96 ( )}SE    
0 1
1
1 1
ˆ ˆ( ) 10.4 ( ) 0.52
ˆ95% confidence interval for :
ˆ ˆ{ 1.96 ( )} { 2.28 1.96 0.52}
( 3.30, 1.26)
SE SE
SE
 

 
 
     
  
CONFIDENCE INTERVAL EXAMPLE
The following two statements are equivalent (why?)
• The 95% confidence interval does not include zero;
• The hypothesis β1 = 0 is rejected at the 5% level
OLS REGRESSION: READING STATA
OUTPUT
regress testscr str, robust
Regression with robust standard errors Number of obs = 420
F( 1, 418) = 19.26
Prob > F = 0.0000
R-squared = 0.0512
Root MSE = 18.581
-------------------------------------------------------------------------
| Robust
testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------+----------------------------------------------------------------
str | -2.279808 .5194892 -4.38 0.000 -3.300945 -1.258671
_cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057
-------------------------------------------------------------------------
so:
· 2
1
1
698.9 2.28 , , ,
( ) ( )
( 0) , -value (2-sided)
395% 2-sided conf. .interval
10.4 0.52
.
30for is
0
1.2, 6
5
( )

  
    
  
TestScore STR R SER
t p
1
4.38 0.00
8
0
.6

SUMMARY OF STATISTICAL
INFERENCE ABOUT Β0 AND Β1
Estimation:
0 1
0 1
ˆ ˆOLS estimators and
ˆ ˆand have approximately normal sampling distributions in large samples
 
 
•
•
Testing:
• H0: β1 = β1,0 v. β1 ≠ β1,0 (β1,0 is the value of β1 under H0)
1 1,0 1
ˆ ˆ ˆ( )/ ( )t SE   •
• p-value = area under standard normal outside tact (large n)
Confidence Intervals:
• This is the set of β1 that is not rejected at the 5% level
• The 95% CI contains the true β1 in 95% of all samples.
1 1 1
ˆ ˆ95% confidence interval for is { 1.96 ( )}SE   •

More Related Content

PPTX
Topic 1 part 2
PPTX
Design of experiment methodology
PPTX
3.2 Measures of variation
PDF
Some study materials
PPTX
Standard deviation
PPT
Chapter13
PPT
Malimu variance and standard deviation
PPT
Chi square using excel
Topic 1 part 2
Design of experiment methodology
3.2 Measures of variation
Some study materials
Standard deviation
Chapter13
Malimu variance and standard deviation
Chi square using excel

What's hot (20)

PPT
Normal Distribution
PDF
S3 - Process product optimization design experiments response surface methodo...
PPT
Measures of Variation
PPT
Chapter14
PPTX
Static Models of Continuous Variables
PPTX
ERF Training Workshop Panel Data 3
PDF
Solutions. Design and Analysis of Experiments. Montgomery
PPTX
Chapter 11 ,Measures of Dispersion(statistics)
PPT
Student t t est
PPTX
Chap12 multiple regression
PPTX
PPTX
Propteties of Standard Deviation
PPT
Measure of Dispersion
PPTX
ERF Training Workshop Panel Data 5
PPT
Z scores
PPTX
Variance & standard deviation
PPTX
The standard normal curve & its application in biomedical sciences
PDF
Measures of dispersion
PPTX
Variance and standard deviation
Normal Distribution
S3 - Process product optimization design experiments response surface methodo...
Measures of Variation
Chapter14
Static Models of Continuous Variables
ERF Training Workshop Panel Data 3
Solutions. Design and Analysis of Experiments. Montgomery
Chapter 11 ,Measures of Dispersion(statistics)
Student t t est
Chap12 multiple regression
Propteties of Standard Deviation
Measure of Dispersion
ERF Training Workshop Panel Data 5
Z scores
Variance & standard deviation
The standard normal curve & its application in biomedical sciences
Measures of dispersion
Variance and standard deviation
Ad

Similar to Hypothesis Testing (20)

DOC
Ch 4 Slides.doc655444444444444445678888776
PPTX
CrashCourse_0622
PDF
Chapter 14 Part I
PPTX
Stats chapter 15
PPTX
Elements of inferential statistics
PDF
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
PPTX
DOCX
Hypothesis-Testing-to-STATISTICAL-TESTS1-1 (1).docx
PDF
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
DOCX
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
PDF
Formulas statistics
PDF
Statistical Estimation and Testing Lecture Notes.pdf
DOCX
Tables and Formulas for Sullivan, Statistics Informed Decisio.docx
PPTX
Statistical Significance Tests.pptx
PDF
BUS173 Lecture 5.pdf
PPT
regression.ppt
PPT
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaregression.ppt
PPTX
01_SLR_final (1).pptx
PPTX
103_t-test_difference_of_means_Sept_2020.pptx
PPT
Admission in India
Ch 4 Slides.doc655444444444444445678888776
CrashCourse_0622
Chapter 14 Part I
Stats chapter 15
Elements of inferential statistics
Solution to the practice test ch 8 hypothesis testing ch 9 two populations
Hypothesis-Testing-to-STATISTICAL-TESTS1-1 (1).docx
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Econ 103 Homework 2Manu NavjeevanAugust 15, 2022S
Formulas statistics
Statistical Estimation and Testing Lecture Notes.pdf
Tables and Formulas for Sullivan, Statistics Informed Decisio.docx
Statistical Significance Tests.pptx
BUS173 Lecture 5.pdf
regression.ppt
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaregression.ppt
01_SLR_final (1).pptx
103_t-test_difference_of_means_Sept_2020.pptx
Admission in India
Ad

More from Ryan Herzog (20)

PDF
Chapter 14 - Great Recession
PDF
Chapter 13 - AD/AS
PDF
Chapter 12 - Monetary Policy
PDF
Chapter 11 - IS Curve
PDF
Chapter 10 - Great Recession
PDF
Chapter 9 - Short Run
PDF
Chapter 8 - Inflation
PDF
Chapter 7 - Labor Market
PDF
Chapter 6 - Romer Model
PDF
Chapter 5 - Solow Model for Growth
PDF
Chapter 4 - Model of Production
PDF
Chapter 3 - Long-Run Economic Growth
PDF
Chapter 2 - Measuring the Macroeconomy
PPTX
Topic 7 (data)
PPTX
Inequality
PPTX
Topic 7 (questions)
PPTX
Topic 5 (multiple regression)
PPTX
Topic 6 (model specification)
PPTX
Topic 5 (multiple regression)
PPTX
Topic 4 (binary)
Chapter 14 - Great Recession
Chapter 13 - AD/AS
Chapter 12 - Monetary Policy
Chapter 11 - IS Curve
Chapter 10 - Great Recession
Chapter 9 - Short Run
Chapter 8 - Inflation
Chapter 7 - Labor Market
Chapter 6 - Romer Model
Chapter 5 - Solow Model for Growth
Chapter 4 - Model of Production
Chapter 3 - Long-Run Economic Growth
Chapter 2 - Measuring the Macroeconomy
Topic 7 (data)
Inequality
Topic 7 (questions)
Topic 5 (multiple regression)
Topic 6 (model specification)
Topic 5 (multiple regression)
Topic 4 (binary)

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Insiders guide to clinical Medicine.pdf
PDF
Classroom Observation Tools for Teachers
PPTX
master seminar digital applications in india
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
Complications of Minimal Access Surgery at WLH
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PDF
Computing-Curriculum for Schools in Ghana
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
Pharma ospi slides which help in ospi learning
Final Presentation General Medicine 03-08-2024.pptx
STATICS OF THE RIGID BODIES Hibbelers.pdf
Insiders guide to clinical Medicine.pdf
Classroom Observation Tools for Teachers
master seminar digital applications in india
Supply Chain Operations Speaking Notes -ICLT Program
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Complications of Minimal Access Surgery at WLH
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Basic Mud Logging Guide for educational purpose
Pharmacology of Heart Failure /Pharmacotherapy of CHF
VCE English Exam - Section C Student Revision Booklet
Abdominal Access Techniques with Prof. Dr. R K Mishra
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
Computing-Curriculum for Schools in Ghana
Anesthesia in Laparoscopic Surgery in India
Pharma ospi slides which help in ospi learning

Hypothesis Testing

  • 2. MATH DETAILS The OLS estimators of the slope and the intercept are 1 1 2 2 1 0 1 ( )( ) ˆ (4.7) ( ) ˆ ˆ . (4.8) n i i i XY n X i i X X Y Y s s X X Y X               1 β 0 β
  • 7. MEASURES OF FIT (SW SECTION 4.3) • Two regression statistics provide complementary measures of how well the regression line “fits” or explains the data: • The regression R2 measures the fraction of the variance of Y that is explained by X; it is unitless and ranges between zero (no fit) and one (perfect fit) • The standard error of the regression (SER) measures the magnitude of a typical regression residual in the units of Y.
  • 8. THE REGRESSION R2 IS THE FRACTION OF THE SAMPLE VARIANCE OF YI “EXPLAINED” BY THE REGRESSION ˆ ˆ OLS prediction OLS residual ˆ ˆsample var ( ) sample var( ) sample var( )( ?) total sum of squares “explained” SS “residual” SS i i i i i Y Y u Y Y u why           2 2 2 1 2 1 ˆ ˆ( ) : ( )         n i i n i i Y Y ESS Definition of R R TSS Y Y • means ESS = 0 • means ESS = TSS • For regression with a single X, = the square of the correlation coefficient between X and Y 2 =0R 2 =1R 2 0 1R • 2 R
  • 9. DO DISTRICTS WITH SMALLER CLASSES HAVE HIGHER TEST SCORES? Scatterplot of test score v. student-teacher ratio What does this figure show?
  • 10. WE NEED TO GET SOME NUMERICAL EVIDENCE ON WHETHER DISTRICTS WITH LOW STRS HAVE HIGHER TEST SCORES – BUT HOW? 1. Compare average test scores in districts with low STRs to those with high STRs (“estimation”) 2. Test the “null” hypothesis that the mean test scores in the two types of districts are the same, against the “alternative” hypothesis that they differ (“hypothesis testing”) 3. Estimate an interval for the difference in the mean test scores, high v. low STR districts (“confidence interval”)
  • 11. INITIAL DATA ANALYSIS: COMPARE DISTRICTS WITH “SMALL” (STR < 20) AND “LARGE” (STR ≥ 20) CLASS SIZES Class Size Average score Standard deviation (s2) n Small 657.4 19.4 238 Large 650.0 17.9 182 Y 1. Estimation of Δ = difference between group means 2. Test the hypothesis that Δ = 0 3. Construct a confidence interval for Δ
  • 12. 1. ESTIMATION largesmall small large 1 1small large 1 1 657.4 650.0 7.4 nn i i i i Y Y Y Y n n          Is this a large difference in a real-world sense? • Standard deviation across districts = 19.1 • Difference between 60th and 75th percentiles of test score distribution is 667.6 – 659.4 = 8.2 • This is a big enough difference to be important for school reform discussions, for parents, or for a school committee?
  • 13. 2. HYPOTHESIS TESTING 2 2 Difference-in-means test: compute the -statistic, (remember this?) ( )s l s l s l s l s s s l n n t Y Y Y Y t SE Y Y      2 2 1 where ( ) is the “standard error” of , the subscripts and/refer to “small” and “large” STR districts, and 1 ( ) (etc.) 1 s s l s l n s i s is SE Y Y Y Y s s Y Y n        •
  • 14. 2. HYPOTHESIS TESTING Compute the difference-of-means t-statistic: Size s2 n small 657.4 19.4 238 large 650.0 17.9 182 Y 2 2 2 2 19.4 17.9 238 182 657.4 650.0 7.4 4.05 1.83s l s l s l s s n n Y Y t        so reject (at the 5% significance level) the null hypothesis that the two means are the same. 1.96,t
  • 15. 3. CONFIDENCE INTERVAL A 95% confidence interval for the difference between the means is, 1.96 7.4 ( ) 1.96 1.83 (3.8, 11.0) ( )S l S lY Y Y YSE       Two equivalent statements: 1. The 95% confidence interval for Δ doesn’t include 0; 2. The hypothesis that Δ = 0 is rejected at the 5% level.
  • 16. HYPOTHESIS TESTING – OLS REGRESSION
  • 17. HYPOTHESIS TESTING AND THE STANDARD ERROR OF Β1 The objective is to test a hypothesis, like β1 = 0, using data – to reach a tentative conclusion whether the (null) hypothesis is correct or incorrect. General setup Null hypothesis and two-sided alternative: H0: β1 = β1,0 vs. H1: β1 ≠ β1,0 where β1,0 is the hypothesized value under the null. Null hypothesis and one-sided alternative: H0: β1 = β1,0 vs. H1: β1 < β1,0
  • 18. GENERAL APPROACH: CONSTRUCT T- STATISTIC, AND COMPUTE P-VALUE (OR COMPARE TO THE N(0,1) CRITICAL VALUE) :• In general estimator - hypothesized value standard error of the estimator t  where the SE of the estimator is the square root of an estimator of the variance of the estimator. ,0 : / Y Y Y t S n  • For testing the mean of Y ,• 1For testing 1 1,0 1 ˆ , ˆ( ) t SE      1 1 ˆwhere ( ) the square root of an estimator of the variance ˆof the sampling distribution of SE   
  • 19. 1 ˆ (1of 2)( )Formula for SE  Recall the expression for the variance of (large n): 2 1 2 2 2 2 var[( ) ]ˆvar( ) , where ( ) . ( ) ( ) i x i v i i X i X X X u v X u n n            1 2 2 ˆThe estimator of the variance of replaces the unknown population values of and by estimators constructed from the data:v X    1 2 2 2 1 ˆ 22 2 2 1 1 ˆ estimator of1 1 2 ˆ (estimator of ) 1 ( ) ˆ ˆwhere ( ) . n i v i n X i i i i i v n n n X X n v X X u                     
  • 20. 1 ˆ (2 of 2)( )Formula for SE  1 1 2 2 1 ˆ 2 2 1 2 ˆ1 1 1 ˆ 1 2 ˆ ˆ ˆ, where ( ) . 1 ( ) ˆ ˆˆ( ) the standard error of n i i i i i n i i v n v X X u n X X n SE                        This is a bit nasty, but: • It is less complicated than it seems. The numerator estimates var(v), the denominator estimates [var(X)]2. • Why the degrees-of-freedom adjustment n – 2? Because two coefficients have been estimated (β0 and β1). 1 ˆ( ) is computed by regression softwareSE • • Your regression software has memorized this formula so you don’t need to.
  • 21. SUMMARY: TO TEST H0: Β1 = Β1,0 V. H1: Β1 ≠ Β1,0 • Construct the t-statistic 1 1 1,0 1 1,0 2 1 ˆ ˆ ˆ ˆ( ) ˆ t SE            • Reject at 5% significance level if • The p-value is probability in tails of normal outside |tact|; you reject at the 5% significance level if the p- value is < 5%. t 1.96 [ ]Pr act t tp   1 ˆThis procedure relies on the large- approximation that is normally distributed; typically = 50 is large enough for the approximation to be excellent. n n •
  • 22. EXAMPLE: TEST SCORES AND STR, CALIFORNIA DATA (1 OF 2) Regression software reports the standard errors: 0 1 1 1,0 1,0 1 ˆ ˆ( ) 10.4 ( ) 0.52 ˆ 2.28 0 -statistic testing 0 4.38 ˆ 0.52( ) SE SE t SE                 • The 1% 2-sided significance level is 2.58, so we reject the null at the 1% significance level. • Alternatively, we can compute the p-value… Estimated regression line: Test Score = 698.9 – 2.28 x STR
  • 23. EXAMPLE: TEST SCORES AND STR, CALIFORNIA DATA (2 OF 2) The p-value based on the large-n standard normal approximation to the t-statistic is 0.00001 (10–5)
  • 24. CONFIDENCE INTERVALS FOR Β1 (SECTION 5.2) Recall that a 95% confidence is, equivalently: • The set of points that cannot be rejected at the 5% significance level; • A set-valued function of the data (an interval that is a function of the data) that contains the true parameter value 95% of the time in repeated samples. Because the t-statistic for β1 is N(0,1) in large samples, construction of a 95% confidence for β1 is just like the case of the sample mean: 1 1 1 ˆ ˆ95% confidence interval for { 1.96 ( )}SE    
  • 25. 0 1 1 1 1 ˆ ˆ( ) 10.4 ( ) 0.52 ˆ95% confidence interval for : ˆ ˆ{ 1.96 ( )} { 2.28 1.96 0.52} ( 3.30, 1.26) SE SE SE                 CONFIDENCE INTERVAL EXAMPLE The following two statements are equivalent (why?) • The 95% confidence interval does not include zero; • The hypothesis β1 = 0 is rejected at the 5% level
  • 26. OLS REGRESSION: READING STATA OUTPUT regress testscr str, robust Regression with robust standard errors Number of obs = 420 F( 1, 418) = 19.26 Prob > F = 0.0000 R-squared = 0.0512 Root MSE = 18.581 ------------------------------------------------------------------------- | Robust testscr | Coef. Std. Err. t P>|t| [95% Conf. Interval] --------+---------------------------------------------------------------- str | -2.279808 .5194892 -4.38 0.000 -3.300945 -1.258671 _cons | 698.933 10.36436 67.44 0.000 678.5602 719.3057 ------------------------------------------------------------------------- so: · 2 1 1 698.9 2.28 , , , ( ) ( ) ( 0) , -value (2-sided) 395% 2-sided conf. .interval 10.4 0.52 . 30for is 0 1.2, 6 5 ( )             TestScore STR R SER t p 1 4.38 0.00 8 0 .6 
  • 27. SUMMARY OF STATISTICAL INFERENCE ABOUT Β0 AND Β1 Estimation: 0 1 0 1 ˆ ˆOLS estimators and ˆ ˆand have approximately normal sampling distributions in large samples     • • Testing: • H0: β1 = β1,0 v. β1 ≠ β1,0 (β1,0 is the value of β1 under H0) 1 1,0 1 ˆ ˆ ˆ( )/ ( )t SE   • • p-value = area under standard normal outside tact (large n) Confidence Intervals: • This is the set of β1 that is not rejected at the 5% level • The 95% CI contains the true β1 in 95% of all samples. 1 1 1 ˆ ˆ95% confidence interval for is { 1.96 ( )}SE   •

Editor's Notes

  • #12: gen less_20 = 0 replace less_20 = 1 if str<20 bysort less_20: sum testscr
  • #13: sum testscr if less_20==1 scalar mean_less20 = r(mean) sum testscr if less_20==0 scalar mean_more20 = r(mean)