SlideShare a Scribd company logo
MULTIPLE REGRESSION
ECON 355 – Regression Analysis
SOME LOGISTICS
• Verify current directory path
• In Stata type pwd to show current directory.
• Use cd “path” to change directory
NEW STATA FUNCTIONS AND OPTIONS
• Preserve/restore – lets you preserve and go back to the sample you are working with
before you make any changes with the data
• Drop/keep – lets you keep drop/keep certain observations/variables
• Example:
• Work with realestate dataset
• preserve
• drop age
• restore
NEW STATA FUNCTIONS AND OPTIONS CONT’D
• Another example:
• Preserve
• hist price
• keep if age >100
• hist price
• Restore
HEDONIC PRICING
• We are going to discuss how the size of the house affects the relationship between its
price and its age
• What is the relationship between the price of the house and its age in general?
• Are all the houses in our sample the same size? Let’s look at its descriptive statistics and
histogram.
• We are going to divide our data sample into 5 groups depending on the size of the
house (under 1000 sqft, 1000-2000 sqft, 2000-3000 sqft, 3000-4000 sqft, 4000-5000
sqft) and see if the relationship between price and age changes for any of these groups.
THE SIZE OF THE HOUSE MATTERS!
• Not only is the size of the house related to its price but also most likely related to its
age. Intuitively, why do you think the size and the age of a house might be related?
• In general, we will want to include in the regression everything that possibly affects Y
and is correlated to X
• Do you think the number of bedrooms and bathrooms can also be related to the age of
the house and potentially affect its price?
• If so, we should probably include them in the regression too. What is the relationship
between the price of the house and its age now?
IN GENERAL
• Population regression will now look like this
• 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝛽3…𝑘−1
𝑋3 … 𝑘−1 𝑖 + 𝛽 𝑘 𝑋𝐾𝑖 + 𝑢𝑖
• The interpretation of betas slightly changes. Since there are more than one independent
variable included, when interpreting the beta on one of them, the others are held
constant.
• i.e. 𝛽1 =
∆𝑌
∆𝑋1
holding everything else constant (or ceteris paribus).
• With one unit change in 𝑋1, 𝑌 will change by 𝛽1 holding everything else constant
BACK TO THE HEDONIC PRICING EXAMPLE
• Before we interpret the betas in our multiple regression lets figure out the measurement
units for each variable
• Price, beds, baths, age, sqft
• What does the population regression look like when we regress price of a house on its
age, square feet, number beds, and number baths?
• What does the fitted regression look like?
• Please interpret each of the betas except the constant.
STATA – CREATING TABLES
• To be able to compare the results of different regressions with ease we usually create tables.
• You can see an example of a table on blackboard. We are going to try to replicate the table
Stata
• ssc install outreg2
• Each regression has to be added to the table separately
• Stata command: outreg2 using tablename.doc
• Every new column has to be added to the already existing table
• Stata: outreg2 using tablename.doc
• To start the document with the same name over:
• Stata: outreg2 using tablename.doc, replace
DIY TIME
• Please run the following regressions and create a table with the results of the regression
• Regress price of a house on its age
• Regress price of a house on its age and size
• Regress price of a house on its age, size, number of bedrooms and number of
bathrooms
• Please make sure the table looks clean and professional.
T-TEST IN A MULTIPLE REGRESSION
• The significance tests do not change between single and multiple regressions
• Coefficients are still significant at
• 1% if t-stat >|2.58| and p-value<0.01
• 5% if t-stat>|1.96| and p-value<0.05
• 10% if t-stat >|1.68| and p-value<0.1
MORE DIY TIME
• Please use the caschool dataset
• Please run four regressions of test score on class size (1) and control for total
enrollment(2); expenditure per student and average income (3); average income and
computers per students (4)
• We will not edit the table in the word file, we will rather look at the regression results in
stata
• Please interpret one of the betas in your regressions
IMPERFECT MULTICOLLINEARITY
• If we include variables in a regression that are closely related to one another the betas
on them will become statistically insignificant (because the standard errors will increase)
• Example:
• regress test scores on calworks percentage
• then regress test scores on percent qualifying for reduced-price lunch,
• then regress test scores on percent qualifying for reduced-price lunch and percent qualifying for
calworks.
• What happens to the significance of the betas?
• Sometimes a few variables combined together may be correlated with a variable already
included, we may never know.
IMPERFECT MULTICOLLINEARITY WHAT TO DO AND WHAT
NOT TO DO
• Do not run kitchen-sink regressions
• Concentrate on a variable of interest, the rest should be “controlled for”. Be deliberate
about the variables you add to the regression. Start with a baseline regression and then
add more one by one or by group.
• If multicollinearity exists in your results (and it most likely does), you are erring on the
conservative side. This means you are not claiming that the relationship exists when it
does not, much rather the opposite.
• Example: if we want to test the relationship between test scores and average income
what other variables should we control for?
PERFECT MULTICOLLINEARITY
• Happens when your regressors are perfectly correlated
• Use teaching ratings data set
• Create a variable equal to 1 if professor is a male
• Stata:
• generate male=0
• replace male=1 if female==0
• Regress course evaluations on the male and female dummy variables in the same
regression
• What happens? Why do you think it happens?
• This is called a dummy variable trap – we have included a dummy for each category.
Stata will correct for it, other software will not. Remember to always omit one category
and compare the betas on the included categories to the omitted category
PERFECT MULTICOLLINEARITY
EXAMPLE
• Use binarydata dataset
• There are a couple of ways to create dummy variables in Stata and include them into a
regression
• Variable “ethnicity” contains three possible outcomes in this dataset “Black”, “Hispanic”,
“white”. We can create a dummy variable for each, it will be equal to 1 if a person is
Hispanic, and 0 otherwise.
• Stata:
• tabulate ethnicity, generate(e)
• Let’s look at the three variables Stata created
• Now regress earnings on the three variables that control for ethnicity. What happens?
Why?
• Please interpret the coefficients in the above regression.
PERFECT MULTICOLLINEARITY
EXAMPLE, DIY.
1. Please create a set of dummy variables for the following variable: hsdropout, i. e. a
variable equal to 1 for those who dropped out of high school and 0 otherwise, and a
variable equal to 1 who did not drop out of high school and 0 otherwise
• Now regress EARNINGS on one of the dummy variables. Please interpret the results of
the regression. 2. Please create a set of dummy variables for the variable relationship
status
• Now regress EARNINGS on the group of the dummy variables omitting one of them.
• Please interpret the results of the regression
MULTIPLE REGRESSION, DIY TIME.
• Please use EAEF22 dataset to show the relationship between one’s earnings and amount
of schooling, while controlling for other variables.
• Please run a few regressions to determine which empirical model explains the
relationship between earnings and schooling best.
• Use the knowledge you have received in this topic to decide which variables to include
in your regressions.
• Please interpret the relationship that you found.
REVIEW
• Why do we need to include more than one regressor in a regression?
• How is a t-test conducted in a multiple regression?
• How do you create a table with the results of a regression in Stata?
• What is imperfect collinearity? Is it a problem? Should we try to avoid it?
• What is perfect multicollinearity? Is it a problem? Should we try to avoid it?
• How do you interpret results of a regression with a set of dummy variables?

More Related Content

PPTX
Topic 5 (multiple regression)
PDF
Mixed Effects Models - Fixed Effect Interactions
PDF
On the Variance of the Adaptive Learning Rate and Beyond
PPTX
Correlation biostatistics
PPTX
Multiple Linear Regression
PDF
Mixed Effects Models - Orthogonal Contrasts
PPT
Index Numbers
PDF
Mixed Effects Models - Logit Models
Topic 5 (multiple regression)
Mixed Effects Models - Fixed Effect Interactions
On the Variance of the Adaptive Learning Rate and Beyond
Correlation biostatistics
Multiple Linear Regression
Mixed Effects Models - Orthogonal Contrasts
Index Numbers
Mixed Effects Models - Logit Models

What's hot (7)

PDF
Mixed Effects Models - Data Processing
PPTX
Spearman’s rank correlation (1)
PPT
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
PPTX
4.3 basic concepts of correlation
PPTX
Null hypothesis for pearson correlation
PPTX
Null hypothesis for partial correlation
PPTX
4.4 correlation manual calcualtion
Mixed Effects Models - Data Processing
Spearman’s rank correlation (1)
Poli_399_Tutorial_Week_Three_-_Sept_29th_(2)
4.3 basic concepts of correlation
Null hypothesis for pearson correlation
Null hypothesis for partial correlation
4.4 correlation manual calcualtion
Ad

Similar to Topic 5 (multiple regression) (20)

PPTX
Accounting serx
PPTX
Accounting serx
PDF
Stata cheat sheet analysis
PPTX
Regression_Analysis_Handout_(Methodology_Part_1).pptx
PPT
Regression_Analysis_Handout_(Methodology_Part_1).ppt
PPTX
Linear Regression
PPTX
Regression
PPTX
simple and multiple linear Regression. (1).pptx
PDF
Step-by-Step Multivariate Regression for Econometrics Assignments
PDF
Multiple linear regression
PPTX
Regression of research methodlogyyy.pptx
DOCX
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
PDF
Stata statistics
PPTX
ML4 Regression.pptx
PDF
Ch14 multiple regression
DOCX
62083750 multiple-regression
PPTX
Intro to econometrics
PDF
X18136931 statistics ca2_updated
PDF
Simple & Multiple Regression Analysis
PDF
Multiple regression
Accounting serx
Accounting serx
Stata cheat sheet analysis
Regression_Analysis_Handout_(Methodology_Part_1).pptx
Regression_Analysis_Handout_(Methodology_Part_1).ppt
Linear Regression
Regression
simple and multiple linear Regression. (1).pptx
Step-by-Step Multivariate Regression for Econometrics Assignments
Multiple linear regression
Regression of research methodlogyyy.pptx
Estimating Models Using Dummy VariablesYou have had plenty of op.docx
Stata statistics
ML4 Regression.pptx
Ch14 multiple regression
62083750 multiple-regression
Intro to econometrics
X18136931 statistics ca2_updated
Simple & Multiple Regression Analysis
Multiple regression
Ad

More from Ryan Herzog (20)

PDF
Chapter 14 - Great Recession
PDF
Chapter 13 - AD/AS
PDF
Chapter 12 - Monetary Policy
PDF
Chapter 11 - IS Curve
PDF
Chapter 10 - Great Recession
PDF
Chapter 9 - Short Run
PDF
Chapter 8 - Inflation
PDF
Chapter 7 - Labor Market
PDF
Chapter 6 - Romer Model
PDF
Chapter 5 - Solow Model for Growth
PDF
Chapter 4 - Model of Production
PDF
Chapter 3 - Long-Run Economic Growth
PDF
Chapter 2 - Measuring the Macroeconomy
PPTX
Topic 7 (data)
PPTX
Inequality
PPTX
Topic 7 (questions)
PPTX
Topic 6 (model specification)
PPTX
Topic 4 (binary)
PPTX
Topic 3 (Stats summary)
PPTX
Topic 2 - More on Hypothesis Testing
Chapter 14 - Great Recession
Chapter 13 - AD/AS
Chapter 12 - Monetary Policy
Chapter 11 - IS Curve
Chapter 10 - Great Recession
Chapter 9 - Short Run
Chapter 8 - Inflation
Chapter 7 - Labor Market
Chapter 6 - Romer Model
Chapter 5 - Solow Model for Growth
Chapter 4 - Model of Production
Chapter 3 - Long-Run Economic Growth
Chapter 2 - Measuring the Macroeconomy
Topic 7 (data)
Inequality
Topic 7 (questions)
Topic 6 (model specification)
Topic 4 (binary)
Topic 3 (Stats summary)
Topic 2 - More on Hypothesis Testing

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Insiders guide to clinical Medicine.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
master seminar digital applications in india
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PDF
01-Introduction-to-Information-Management.pdf
PDF
Basic Mud Logging Guide for educational purpose
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Lesson notes of climatology university.
PDF
TR - Agricultural Crops Production NC III.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Structure & Organelles in detailed.
Insiders guide to clinical Medicine.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Renaissance Architecture: A Journey from Faith to Humanism
Final Presentation General Medicine 03-08-2024.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Sports Quiz easy sports quiz sports quiz
master seminar digital applications in india
VCE English Exam - Section C Student Revision Booklet
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
01-Introduction-to-Information-Management.pdf
Basic Mud Logging Guide for educational purpose
Microbial diseases, their pathogenesis and prophylaxis
Lesson notes of climatology university.
TR - Agricultural Crops Production NC III.pdf

Topic 5 (multiple regression)

  • 1. MULTIPLE REGRESSION ECON 355 – Regression Analysis
  • 2. SOME LOGISTICS • Verify current directory path • In Stata type pwd to show current directory. • Use cd “path” to change directory
  • 3. NEW STATA FUNCTIONS AND OPTIONS • Preserve/restore – lets you preserve and go back to the sample you are working with before you make any changes with the data • Drop/keep – lets you keep drop/keep certain observations/variables • Example: • Work with realestate dataset • preserve • drop age • restore
  • 4. NEW STATA FUNCTIONS AND OPTIONS CONT’D • Another example: • Preserve • hist price • keep if age >100 • hist price • Restore
  • 5. HEDONIC PRICING • We are going to discuss how the size of the house affects the relationship between its price and its age • What is the relationship between the price of the house and its age in general? • Are all the houses in our sample the same size? Let’s look at its descriptive statistics and histogram. • We are going to divide our data sample into 5 groups depending on the size of the house (under 1000 sqft, 1000-2000 sqft, 2000-3000 sqft, 3000-4000 sqft, 4000-5000 sqft) and see if the relationship between price and age changes for any of these groups.
  • 6. THE SIZE OF THE HOUSE MATTERS! • Not only is the size of the house related to its price but also most likely related to its age. Intuitively, why do you think the size and the age of a house might be related? • In general, we will want to include in the regression everything that possibly affects Y and is correlated to X • Do you think the number of bedrooms and bathrooms can also be related to the age of the house and potentially affect its price? • If so, we should probably include them in the regression too. What is the relationship between the price of the house and its age now?
  • 7. IN GENERAL • Population regression will now look like this • 𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝛽3…𝑘−1 𝑋3 … 𝑘−1 𝑖 + 𝛽 𝑘 𝑋𝐾𝑖 + 𝑢𝑖 • The interpretation of betas slightly changes. Since there are more than one independent variable included, when interpreting the beta on one of them, the others are held constant. • i.e. 𝛽1 = ∆𝑌 ∆𝑋1 holding everything else constant (or ceteris paribus). • With one unit change in 𝑋1, 𝑌 will change by 𝛽1 holding everything else constant
  • 8. BACK TO THE HEDONIC PRICING EXAMPLE • Before we interpret the betas in our multiple regression lets figure out the measurement units for each variable • Price, beds, baths, age, sqft • What does the population regression look like when we regress price of a house on its age, square feet, number beds, and number baths? • What does the fitted regression look like? • Please interpret each of the betas except the constant.
  • 9. STATA – CREATING TABLES • To be able to compare the results of different regressions with ease we usually create tables. • You can see an example of a table on blackboard. We are going to try to replicate the table Stata • ssc install outreg2 • Each regression has to be added to the table separately • Stata command: outreg2 using tablename.doc • Every new column has to be added to the already existing table • Stata: outreg2 using tablename.doc • To start the document with the same name over: • Stata: outreg2 using tablename.doc, replace
  • 10. DIY TIME • Please run the following regressions and create a table with the results of the regression • Regress price of a house on its age • Regress price of a house on its age and size • Regress price of a house on its age, size, number of bedrooms and number of bathrooms • Please make sure the table looks clean and professional.
  • 11. T-TEST IN A MULTIPLE REGRESSION • The significance tests do not change between single and multiple regressions • Coefficients are still significant at • 1% if t-stat >|2.58| and p-value<0.01 • 5% if t-stat>|1.96| and p-value<0.05 • 10% if t-stat >|1.68| and p-value<0.1
  • 12. MORE DIY TIME • Please use the caschool dataset • Please run four regressions of test score on class size (1) and control for total enrollment(2); expenditure per student and average income (3); average income and computers per students (4) • We will not edit the table in the word file, we will rather look at the regression results in stata • Please interpret one of the betas in your regressions
  • 13. IMPERFECT MULTICOLLINEARITY • If we include variables in a regression that are closely related to one another the betas on them will become statistically insignificant (because the standard errors will increase) • Example: • regress test scores on calworks percentage • then regress test scores on percent qualifying for reduced-price lunch, • then regress test scores on percent qualifying for reduced-price lunch and percent qualifying for calworks. • What happens to the significance of the betas? • Sometimes a few variables combined together may be correlated with a variable already included, we may never know.
  • 14. IMPERFECT MULTICOLLINEARITY WHAT TO DO AND WHAT NOT TO DO • Do not run kitchen-sink regressions • Concentrate on a variable of interest, the rest should be “controlled for”. Be deliberate about the variables you add to the regression. Start with a baseline regression and then add more one by one or by group. • If multicollinearity exists in your results (and it most likely does), you are erring on the conservative side. This means you are not claiming that the relationship exists when it does not, much rather the opposite. • Example: if we want to test the relationship between test scores and average income what other variables should we control for?
  • 15. PERFECT MULTICOLLINEARITY • Happens when your regressors are perfectly correlated • Use teaching ratings data set • Create a variable equal to 1 if professor is a male • Stata: • generate male=0 • replace male=1 if female==0 • Regress course evaluations on the male and female dummy variables in the same regression • What happens? Why do you think it happens? • This is called a dummy variable trap – we have included a dummy for each category. Stata will correct for it, other software will not. Remember to always omit one category and compare the betas on the included categories to the omitted category
  • 16. PERFECT MULTICOLLINEARITY EXAMPLE • Use binarydata dataset • There are a couple of ways to create dummy variables in Stata and include them into a regression • Variable “ethnicity” contains three possible outcomes in this dataset “Black”, “Hispanic”, “white”. We can create a dummy variable for each, it will be equal to 1 if a person is Hispanic, and 0 otherwise. • Stata: • tabulate ethnicity, generate(e) • Let’s look at the three variables Stata created • Now regress earnings on the three variables that control for ethnicity. What happens? Why? • Please interpret the coefficients in the above regression.
  • 17. PERFECT MULTICOLLINEARITY EXAMPLE, DIY. 1. Please create a set of dummy variables for the following variable: hsdropout, i. e. a variable equal to 1 for those who dropped out of high school and 0 otherwise, and a variable equal to 1 who did not drop out of high school and 0 otherwise • Now regress EARNINGS on one of the dummy variables. Please interpret the results of the regression. 2. Please create a set of dummy variables for the variable relationship status • Now regress EARNINGS on the group of the dummy variables omitting one of them. • Please interpret the results of the regression
  • 18. MULTIPLE REGRESSION, DIY TIME. • Please use EAEF22 dataset to show the relationship between one’s earnings and amount of schooling, while controlling for other variables. • Please run a few regressions to determine which empirical model explains the relationship between earnings and schooling best. • Use the knowledge you have received in this topic to decide which variables to include in your regressions. • Please interpret the relationship that you found.
  • 19. REVIEW • Why do we need to include more than one regressor in a regression? • How is a t-test conducted in a multiple regression? • How do you create a table with the results of a regression in Stata? • What is imperfect collinearity? Is it a problem? Should we try to avoid it? • What is perfect multicollinearity? Is it a problem? Should we try to avoid it? • How do you interpret results of a regression with a set of dummy variables?

Editor's Notes

  • #6: In general if we regress the price on age we will find out that there is no relationship. The relationship between price and age of the house is the following: Under 1000: 2.06 1000-2000: 2.3*** 2000-3000: 3.6*** 3000-4000: 2.18 4000-5000: 1.52 Show how to use “reg y x” for the first two groups, then divide the class into three groups and ask to do it for the last three groups
  • #8: Why do you think there are no subscripts “i” on betas?
  • #9: Price – thousands of dollars, beds – number, baths – number, age – years, sqft – square feet