SlideShare a Scribd company logo
Chapter 9 Linear Regression
How does the value of one variable depend on that of another one? How does the son’s height depend on the father’s height? How does the death rate of animal depend on the drug dosage? How does the infant weight depend on the month’s age? How does the body surface area depend on the height? ----  To explore linear dependence quantitatively between two continuous variables.
8.1.1 Linear regression equation Initial meaning of “regression”: Galdon noted that if the father is tall, his son will be relatively tall; if the father is short, his son will  be relatively short.  But, if the father is very tall, his son will not taller than his father usually; if the father is very short, his son will not shorter than his father usually.  Otherwise, ……?! Galdon called this phenomenon “regression to the mean” 8.1 Statistical Description of Linear Regression
Independent variable  (explanatory variable),  X randomly changing  or  fixed by the researcher Dependent variable  (response variable),  Y randomly following a linear equation
What is regression in statistics? To find out the track of the means 100 120 140 160 180 200 220 100 120 140 160 180 200 220 Father ’ s height ( cm ) Son’s height (cm)
Given the value of  X ,  Y  varies around a center  (  y|x ) All the centers locate on a line -- regression line.  The relationship between the center   y|x  and X is described by a linear equation
Linear regression Try to estimate    and    , getting  Where  a --  estimate of    , intercept b --  estimate of    , slope --  estimate of   y|x
8.1.2 Regression coefficient and its calculation To find a straight line to best fit the points. Residual:   Fitness of the regression line :  Principle of least squares :  To find a straight line that minimizes the sum of squared residuals.  Under such a principle, it is easy to get the formulas for and by calculus:   (8.3) (8.4) Such a line must go through the point of  , and cross the vertical axis at  ----  Why?
Example 8.1  Calculate the regression equation of the height of son  Y  on the height of father  X  .
 
8.2 Statistical Inference on Regression   8.2.1  Hypothesis tests 8.2.1.1  The t-test for regression coefficient b  is the sample regression coefficient, changing from sample to sample There is a population regression coefficient, denoted by   Question : Whether    =0 or not? H 0 :     =0,  H 1 :     ≠0 α =0.05
Statistic Standard deviation of regression coefficient Standard deviation of residual
For Example 8.1 p  <0.001 .  Reject  ---- the regression of the son’s height on the father’s height is statistically significant. :     =0,  :     ≠0
8.2.1.2 Analysis of variance   : The contribution of the linear regression is 0 : The contribution of the linear regression is not 0 (1) Before regression, we can only use  to estimate (2)  After regression, we can use  to estimate (3)  The regression makes the sum of squared deviations decline  (4) To test The contribution of regression is 0,  F -statistic is used
For Example 8.1 Conclusion: the regression of the son’s height on the father’s height is statistically significant. The slight difference between these two approaches : t  test could be used for both of one-side and two-side  problems; ANOVA for two-side only. However, the idea of ANOVA can easily be extended to the cases of nonlinear regression and multiple regression.
8.2.2  Determination coefficient   For Example 8.1 Determination coefficient:  Contribution of regression by % It reflects that the percentage of the total sum of squared deviations  can be explained by the regression. If both of  X  and  Y  are random variables ,
In practice, it is suggested to report the value of determination coefficient after an analysis of regression to describe how good the regression is.  Here is a story:  : An index of liver function : A score for psychological status   Regression is statistically significant, Claimed:  “the index for liver function can be improved by psychological consultation” It is wrong? Why?
8.3 The Application of Linear Regression 8.3.1 Two interval estimations 8.3.1.1 Confidence interval for 8.3.1.2 Prediction interval for  Y
8.3.3  On the basic assumptions    ----  LINE (1)  Linear  : There exists a linear tendency between the dependent variable and the independent variable (2)  Independent  : The individual observations are independent each other (3)  Normal  : Given the value of, the corresponding follows a normal distribution  (4)  Equal variances  : The variances of  for different values of are all equal, denoted with .
In practice, one may use scatter diagram to observe whether the basic assumptions are met.  The assumption of linearity is essential that using a linear model to describe a curvilinear relationship is obviously inappropriate;  The assumption of independency is also essential;  The violation to the assumptions of normal distribution and equal variance might not seriously affect the least square estimates though all the introduced formulas for statistical inference might not valid.  Once the assumptions (1), (3) and (4) are violated, some transformations are worthwhile to try.
Summary  Regression and Correlation   1. Distinguish and connection Distinguish :  Correlation: Both  X  and  Y  are random  Regression:  Y  must be random  X  could be random or not   random
Connection:   When  both  X  and  Y  are random  1) Same sign for correlation coefficient  and regression coefficient  2)  t  tests are equivalent  t r  =  t b 3)  Determination   Coefficient
2. Caution -- for regression and correlation Don’t put any two variables together for correlation and regression – They must have some relation in subject matter; Correlation and regression do not necessary mean causality ---- sometimes may be indirect relation or even no any real relation;
A big value of r does not necessary mean a big regression coefficient b; 4) To reject  does not necessary mean that the correlation is strong, only but  ; 5) A regression equation is statistically significant does not necessary mean that one can well predict  Y  by  X,  only but  ; well predict or not depends on coefficient of determination;   6) Scatter diagram is useful before working with  linear correlation and linear regression; 7)  The regression equation is not allowed to be applied beyond the range of the data set.
 

More Related Content

PPT
Ch8 Regression Revby Rao
DOCX
7 classical assumptions of ordinary least squares
PPTX
Ols by hiron
DOCX
2.3 the simple regression model
PDF
Data Science - Part IV - Regression Analysis & ANOVA
PDF
Interpreting Regression Results - Machine Learning
PPTX
Multicolinearity
PPT
Regression
Ch8 Regression Revby Rao
7 classical assumptions of ordinary least squares
Ols by hiron
2.3 the simple regression model
Data Science - Part IV - Regression Analysis & ANOVA
Interpreting Regression Results - Machine Learning
Multicolinearity
Regression

What's hot (17)

PPTX
Blue property assumptions.
PPT
Econometric model ing
PPTX
Multicollinearity PPT
PPTX
Econometrics chapter 8
PPTX
Regression presentation
PDF
Multiple linear regression
PPTX
Applications of regression analysis - Measurement of validity of relationship
PPT
Notes Ch8
PPTX
Regression
PPTX
Regression analysis
PPT
Chapter 10
PPTX
Regression
PPT
Chapter 14
PDF
Simple & Multiple Regression Analysis
PDF
Introduction to regression analysis 2
PPTX
Regression Analysis
PPTX
Business Quantitative Lecture 3
Blue property assumptions.
Econometric model ing
Multicollinearity PPT
Econometrics chapter 8
Regression presentation
Multiple linear regression
Applications of regression analysis - Measurement of validity of relationship
Notes Ch8
Regression
Regression analysis
Chapter 10
Regression
Chapter 14
Simple & Multiple Regression Analysis
Introduction to regression analysis 2
Regression Analysis
Business Quantitative Lecture 3
Ad

Viewers also liked (18)

PPT
10.Pneumothorax(
PPT
6.Pleural Effusions
PPT
8.Asthma
PPT
12 Tuberculosis Tanweiping
PPT
1 2009 Pediatrcs 留学生 Fang1
PPT
Tb And Ltbi Treatment
PPT
Chapter 2 Probabilty And Distribution
PPT
Bleeding Disorder
PPT
7.Cancer Genetics.Oct.09
PPT
9.Cor Pulmonale
PPT
8 Chromosome Disorder 2
PDF
4.Pulmonary Tuberculosis
PPT
Chapter1:introduction to medical statistics
PPT
19 Acute Glomerulonephritis
PPT
Respiratory Distress Syndrome (Rds)
PPT
10.Pneumothorax
PPT
23 Ppt Itp
PPT
Bleeding Disorder
10.Pneumothorax(
6.Pleural Effusions
8.Asthma
12 Tuberculosis Tanweiping
1 2009 Pediatrcs 留学生 Fang1
Tb And Ltbi Treatment
Chapter 2 Probabilty And Distribution
Bleeding Disorder
7.Cancer Genetics.Oct.09
9.Cor Pulmonale
8 Chromosome Disorder 2
4.Pulmonary Tuberculosis
Chapter1:introduction to medical statistics
19 Acute Glomerulonephritis
Respiratory Distress Syndrome (Rds)
10.Pneumothorax
23 Ppt Itp
Bleeding Disorder
Ad

Similar to Chapter 9 Regression (20)

PPT
Simple Linear Regression.pptSimple Linear Regression.ppt
PPTX
Correlation and regression
PPTX
Regression Analysis
PPTX
Correlation and regression
PPTX
simple and multiple linear Regression. (1).pptx
PPTX
Simple Linear Regression.pptx
PPT
2-20-04.ppt
PPTX
Simple egression.pptx
PPT
ders 8 Quantile-Regression.ppt
PDF
Stats ca report_18180485
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
PPT
Lesson 8 Linear Correlation And Regression
PPT
Chapter13
PPT
Ders 2 ols .ppt
DOCX
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
PPT
Coefficient of Correlation Pearsons .ppt
PPTX
Linear Regression | Machine Learning | Data Science
PPT
Regression
PPTX
Chapter 2 Simple Linear Regression Model.pptx
PPT
Exploring bivariate data
Simple Linear Regression.pptSimple Linear Regression.ppt
Correlation and regression
Regression Analysis
Correlation and regression
simple and multiple linear Regression. (1).pptx
Simple Linear Regression.pptx
2-20-04.ppt
Simple egression.pptx
ders 8 Quantile-Regression.ppt
Stats ca report_18180485
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Lesson 8 Linear Correlation And Regression
Chapter13
Ders 2 ols .ppt
Distribution of EstimatesLinear Regression ModelAssume (yt,.docx
Coefficient of Correlation Pearsons .ppt
Linear Regression | Machine Learning | Data Science
Regression
Chapter 2 Simple Linear Regression Model.pptx
Exploring bivariate data

More from ghalan (20)

PPT
Ct & mri of central nervous system
PPT
Central nervous system
PPT
Introduction skeletal radiology(11月20.)
PPT
Radiology of digestive system
PPT
Liver,bile duct,pancreas and spleen
PPT
14 Valvular Heart Disease
PPT
2 Primary Of Child Care And Infant Feeding
PPT
4 Congenital Heart Disease
PPT
2 Primary Of Child Care And Infant Feeding
PPT
6 Neonatal Septicemia
PPT
22 Purulent Meningitis
PPT
18 Rickets Of Vitamin D Deficiency,Tetany Of Vitamin D Deficiency
PPT
16 Infections Of The Respiratory Tract
PPT
15 Genetic Diseases
PPT
14 Primary Immunodeficiency Diseases
PPT
13 Fluid Therapy
PPT
11 Measles
PPT
10 Rheumatic Fever
PPT
Infantile Diarrhea
PPT
Neonatal Cold Injury Syndrome
Ct & mri of central nervous system
Central nervous system
Introduction skeletal radiology(11月20.)
Radiology of digestive system
Liver,bile duct,pancreas and spleen
14 Valvular Heart Disease
2 Primary Of Child Care And Infant Feeding
4 Congenital Heart Disease
2 Primary Of Child Care And Infant Feeding
6 Neonatal Septicemia
22 Purulent Meningitis
18 Rickets Of Vitamin D Deficiency,Tetany Of Vitamin D Deficiency
16 Infections Of The Respiratory Tract
15 Genetic Diseases
14 Primary Immunodeficiency Diseases
13 Fluid Therapy
11 Measles
10 Rheumatic Fever
Infantile Diarrhea
Neonatal Cold Injury Syndrome

Recently uploaded (20)

PPTX
New Microsoft PowerPoint Presentation - Copy.pptx
PPT
Data mining for business intelligence ch04 sharda
PDF
IFRS Notes in your pocket for study all the time
PDF
A Brief Introduction About Julia Allison
PDF
Unit 1 Cost Accounting - Cost sheet
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
WRN_Investor_Presentation_August 2025.pdf
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PPTX
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
DOCX
unit 1 COST ACCOUNTING AND COST SHEET
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PDF
Business model innovation report 2022.pdf
PDF
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
DOCX
Euro SEO Services 1st 3 General Updates.docx
PPTX
Business Ethics - An introduction and its overview.pptx
New Microsoft PowerPoint Presentation - Copy.pptx
Data mining for business intelligence ch04 sharda
IFRS Notes in your pocket for study all the time
A Brief Introduction About Julia Allison
Unit 1 Cost Accounting - Cost sheet
Belch_12e_PPT_Ch18_Accessible_university.pptx
Reconciliation AND MEMORANDUM RECONCILATION
WRN_Investor_Presentation_August 2025.pdf
Power and position in leadershipDOC-20250808-WA0011..pdf
The Marketing Journey - Tracey Phillips - Marketing Matters 7-2025.pptx
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
unit 1 COST ACCOUNTING AND COST SHEET
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
Ôn tập tiếng anh trong kinh doanh nâng cao
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Business model innovation report 2022.pdf
Traveri Digital Marketing Seminar 2025 by Corey and Jessica Perlman
Euro SEO Services 1st 3 General Updates.docx
Business Ethics - An introduction and its overview.pptx

Chapter 9 Regression

  • 1. Chapter 9 Linear Regression
  • 2. How does the value of one variable depend on that of another one? How does the son’s height depend on the father’s height? How does the death rate of animal depend on the drug dosage? How does the infant weight depend on the month’s age? How does the body surface area depend on the height? ---- To explore linear dependence quantitatively between two continuous variables.
  • 3. 8.1.1 Linear regression equation Initial meaning of “regression”: Galdon noted that if the father is tall, his son will be relatively tall; if the father is short, his son will be relatively short. But, if the father is very tall, his son will not taller than his father usually; if the father is very short, his son will not shorter than his father usually. Otherwise, ……?! Galdon called this phenomenon “regression to the mean” 8.1 Statistical Description of Linear Regression
  • 4. Independent variable (explanatory variable), X randomly changing or fixed by the researcher Dependent variable (response variable), Y randomly following a linear equation
  • 5. What is regression in statistics? To find out the track of the means 100 120 140 160 180 200 220 100 120 140 160 180 200 220 Father ’ s height ( cm ) Son’s height (cm)
  • 6. Given the value of X , Y varies around a center (  y|x ) All the centers locate on a line -- regression line. The relationship between the center  y|x and X is described by a linear equation
  • 7. Linear regression Try to estimate  and  , getting Where a -- estimate of  , intercept b -- estimate of  , slope -- estimate of  y|x
  • 8. 8.1.2 Regression coefficient and its calculation To find a straight line to best fit the points. Residual: Fitness of the regression line : Principle of least squares : To find a straight line that minimizes the sum of squared residuals. Under such a principle, it is easy to get the formulas for and by calculus:   (8.3) (8.4) Such a line must go through the point of , and cross the vertical axis at ---- Why?
  • 9. Example 8.1 Calculate the regression equation of the height of son Y on the height of father X .
  • 10.  
  • 11. 8.2 Statistical Inference on Regression 8.2.1 Hypothesis tests 8.2.1.1 The t-test for regression coefficient b is the sample regression coefficient, changing from sample to sample There is a population regression coefficient, denoted by  Question : Whether  =0 or not? H 0 :  =0, H 1 :  ≠0 α =0.05
  • 12. Statistic Standard deviation of regression coefficient Standard deviation of residual
  • 13. For Example 8.1 p <0.001 . Reject ---- the regression of the son’s height on the father’s height is statistically significant. :  =0, :  ≠0
  • 14. 8.2.1.2 Analysis of variance : The contribution of the linear regression is 0 : The contribution of the linear regression is not 0 (1) Before regression, we can only use to estimate (2) After regression, we can use to estimate (3) The regression makes the sum of squared deviations decline (4) To test The contribution of regression is 0, F -statistic is used
  • 15. For Example 8.1 Conclusion: the regression of the son’s height on the father’s height is statistically significant. The slight difference between these two approaches : t test could be used for both of one-side and two-side problems; ANOVA for two-side only. However, the idea of ANOVA can easily be extended to the cases of nonlinear regression and multiple regression.
  • 16. 8.2.2 Determination coefficient For Example 8.1 Determination coefficient: Contribution of regression by % It reflects that the percentage of the total sum of squared deviations can be explained by the regression. If both of X and Y are random variables ,
  • 17. In practice, it is suggested to report the value of determination coefficient after an analysis of regression to describe how good the regression is. Here is a story: : An index of liver function : A score for psychological status Regression is statistically significant, Claimed: “the index for liver function can be improved by psychological consultation” It is wrong? Why?
  • 18. 8.3 The Application of Linear Regression 8.3.1 Two interval estimations 8.3.1.1 Confidence interval for 8.3.1.2 Prediction interval for Y
  • 19. 8.3.3 On the basic assumptions ---- LINE (1) Linear : There exists a linear tendency between the dependent variable and the independent variable (2) Independent : The individual observations are independent each other (3) Normal : Given the value of, the corresponding follows a normal distribution (4) Equal variances : The variances of for different values of are all equal, denoted with .
  • 20. In practice, one may use scatter diagram to observe whether the basic assumptions are met. The assumption of linearity is essential that using a linear model to describe a curvilinear relationship is obviously inappropriate; The assumption of independency is also essential; The violation to the assumptions of normal distribution and equal variance might not seriously affect the least square estimates though all the introduced formulas for statistical inference might not valid. Once the assumptions (1), (3) and (4) are violated, some transformations are worthwhile to try.
  • 21. Summary Regression and Correlation 1. Distinguish and connection Distinguish : Correlation: Both X and Y are random Regression: Y must be random X could be random or not random
  • 22. Connection: When both X and Y are random 1) Same sign for correlation coefficient and regression coefficient 2) t tests are equivalent t r = t b 3) Determination Coefficient
  • 23. 2. Caution -- for regression and correlation Don’t put any two variables together for correlation and regression – They must have some relation in subject matter; Correlation and regression do not necessary mean causality ---- sometimes may be indirect relation or even no any real relation;
  • 24. A big value of r does not necessary mean a big regression coefficient b; 4) To reject does not necessary mean that the correlation is strong, only but ; 5) A regression equation is statistically significant does not necessary mean that one can well predict Y by X, only but ; well predict or not depends on coefficient of determination; 6) Scatter diagram is useful before working with linear correlation and linear regression; 7) The regression equation is not allowed to be applied beyond the range of the data set.
  • 25.