SlideShare a Scribd company logo
EC2017: Introductory
Econometrics
Week 8
Functional Forms
2
Functional Forms of Regression Models
• Up to now, we have considered models that
are linear in parameters and linear in
variables, like the linear regression model:
• Example:
Example
• Suppose you want to understand whether and
how age, gender, and education predict
people’s earnings
• Formal specification of your econometric
model:
• Estimated form?
Example
𝑤𝑎𝑔𝑒𝑖=−2.12+0.06∗𝑎𝑔𝑒𝑖−2.16𝑓𝑒𝑚𝑎𝑙𝑒𝑖+0.54𝑒𝑑𝑢𝑐𝑖
Example
t-test:
H0:
HA:
F-test:
H0:
HA: at least one
Linear regression model
• So far we have assumed that the population
regression function is linear
• What does this mean?
• The slope of the population regression
function is constant and does not depend on
the values of x
Ex: relationship between test score and
student teacher ratio (Maybe linear)
Ex: relationship between test score and
income however does not look linear
Today
Two different approaches:
1) Polynomial regression models in X:
The population regression function is approximated by a
quadratic, cubic, or higher-degree polynomial
2) Logarithmic transformation:
Y and/or X is transformed by taking its logarithm, which provides
a “percentages” interpretation of the coefficients that makes
sense in many applications
10
Polynomial regression model
• One way to specify a nonlinear regression function is to use a polynomial of X.
• Let r be the highest power of X. The polynomial regression model of degree r is:
• When r=2, we have a quadratic regression model (Two-degree polynomial):
• When r=3, we have a cubic regression model (Three-degree polynomial):
• It is a multiple regression model
• Does it suffer from the problem of collinearity? Since it is the same variable entering
the model as squared; to the power of 3; …
11
Polynomial regression model (cont.)
• and are no linear functions of
• So it does not violate the assumption of no
perfect collinearity
Example
• You estimate the following
population regression model:
• Your estimated model is:
• How does an increase in X affect Y?
Example
• You estimate the following population
regression model:
• Your estimated model is:
• How does an increase in X affect Y?
• Now the marginal effect of X on Y depends on the
values of X: an additional increase in X brings an
increase in Y but it increases in a diminishing rate
When does the impact of X on Y becomes zero?
Example
• Exper has a diminishing effect on wage
• Every additional year of experience brings
an increase in wage but it increases in a
smaller speed every year
Example (cont.)
• Interesting is to see when does the return to
experience become zero.
• This is the turning point
 when you solve it for exper, exper=31 years (i.e.,
after 31 years there is no benefit of experience to
your wage)
• Is this realistic?
Example (cont.)
• It is not very realistic but is one of the consequences
of using a quadratic form.
• At some point the function reaches the maximum
and then will curve downward
• That point is usually large enough. For example,
Mean of experience = 17 (on average people in our
sample have 17 years of experience)
Example: The test score, income relation
• average district income in the ith district (thousands of
dollars per capita)
• Quadratic specification:
• Cubic Specification:
In these specification the marginal effect of income on test score depends on values of
income. To show this:
Income
Estimation of the quadratic specification in
STATA
Test the null hypothesis of linearity against the alternative that the regression function is a quadratic.…
Interpreting the estimated regression
function (1 of 3)
(a) Plot the predicted values
 2
607.3 + 3.85 0 0423( )
(2.9) (0.27) (0 0048)
i i
TestScore = Income Income
 

Interpreting the estimated regression
function (2 of 3)
(b) Compute the slope, evaluated at various values of X
Q: What is the predicted change in TestScore for a change
in income from $5,000 per capita to $6,000 per capita?
^
TestScore=607.3+3.85 Incomei−0.0423¿
Interpreting the estimated regression
function (2 of 3)
(b) Compute the slope, evaluated at various values of X
Predicted change in TestScore for a change in income from
$5,000 per capita to $6,000 per capita:
 2
2
607 3 + 3.85 6 0 0423 6
(607.3 + 3 85 5 0 0423 5 )
= 3 4
ΔTestScore =     
     

^
TestScore=607.3+3.85 Incomei−0.0423¿
Interpreting the estimated regression
function (2 of 3)
(b) Compute the slope, evaluated at various values of X
Predicted change in TestScore for a change in income from
$5,000 per capita to $6,000 per capita:
^
TestScore=607.3+3.85 Incomei−0.0423¿
Interpreting the estimated regression
function (2 of 3)
^
TestScore=607.3+3.85 Incomei−0.0423¿
Q: What is the predicted change in TestScore for a change
in income from $25,000 per capita to $26,000 per capita?
Interpreting the estimated regression
function (2 of 3)
^
TestScore=607.3+3.85 Incomei−0.0423¿
Q: What is the predicted change in TestScore for a change
in income from $25,000 per capita to $26,000 per capita?
Interpreting the estimated regression
function (3 of 3)
 2
607 3 + 3 85 0 0423( )

i i
TestScore = Income Income
  
Predicted “effects” for different values of X:
Change in Income ($1000 per capita) Delta T
est score.
from 5 to 6 3.4
from 25 to 26 1.7
from 45 to 46 0.0

Δ TestScore
The “effect” of a change in income is greater at low than high
income levels
What is the effect of a change from 65 to 66?
Interpreting the estimated regression
function (3 of 3)
 2
607 3 + 3 85 0 0423( )

i i
TestScore = Income Income
  
Predicted “effects” for different values of X:
Change in Income ($1000 per capita) Delta T
est score.
from 5 to 6 3.4
from 25 to 26 1.7
from 45 to 46 0.0

Δ TestScore
The “effect” of a change in income is greater at low than high
income levels
What is the effect of a change from 65 to 66?
Caution! Don’t extrapolate outside the range of the data!
Estimation of a cubic specification in STATA
(1 of 2)
Estimation of a cubic specification in STATA
(2 of 2)
Testing the null hypothesis of linearity, against the alternative that the
population regression is quadratic and/or cubic, that is, it is a polynomial of
degree up to 3:
0 :
H population coefficients on 2
Income and
3
= 0
Income
1 :
H at least one of these coefficients is nonzero.
test avginc2 avginc3
(1) avginc2 = 0.0
(2) avginc3 = 0.0
 
F 2 416 = 37 69
 
Prob > F = 0 0000

The hypothesis that the population regression is linear is rejected at the 1%
significance level against the alternative that it is a polynomial of degree up to
3.
Which degree polynomial should I use?
• Plot the data and follow a sequential hypothesis testing:
1) Pick a maximum value of r and estimate the polynomial regression for that r
2) Use the t-statistic to test whether the coefficient on is zero. If you reject this hypothesis,
then keep in the regression
3) If you do not reject, then eliminate and use a polynomial regression of degree r-1, test
whether the coefficient on is zero. If you reject use the polynomial of degree r-1.
4) If you do not reject, continue until you find the coefficient on the highest power to be
significant.
If you don’t see sharp jumps in the data (which is usually the case in economic data) then start
with polynomials of degree 2 to 4.
Summary: polynomial regression functions
2
0 1 2
r
i i i r i i
Y = β + β X + β X + + β X + u

• Estimation: by OLS after defining new regressors
• To interpret the estimated regression function:
– plot predicted values as a function of x
– compute predicted Δ Δ
Y / X for different values of x
• Hypotheses concerning degree r can be tested by t- and F-tests on
the appropriate (blocks of ) variable(s).
• Choice of degree r
– plot the data; t- and F-tests, check sensitivity of estimated
effects; judgment.
Logarithmic functions of Y and/or X
Logarithmic functions of Y and/or X
• Another way to specify a non-linear regression
function is to use the natural logarithm of Y
and/or X
• Logarithms convert changes in variables into
percentage changes
• Many relationships are naturally expressed in
terms of percentages
Examples where we are interested in
expressing relationships in percentages
• In our previous example, we studies that the relationship
between income and test scores are nonlinear. BUT would this
relationship be linear if we change income by 1% rather than
$1000?
• When we study consumer demand: we are interested in learning
about how 1% increase in price leads to a certain percentage
decrease in quantity demanded. (Price elasticity)
• Wage gap between male and female college graduates: we can
compare wage gaps in terms of dollars but it is easier to compare
wage gaps across professions in percentage terms
Logarithms and Percentages
• Logarithmic transformations allow us to model relations in “percentage”
terms (like elasticities), rather than linearly.
• Why? The link between logarithms and percentages relies on this
approximation:
• Numerically:
When X= 100 and , then or 1%
or 0.995% so it is also approximately 1%.
• This approximation is only true for small changes in x
The three logarithmic regression models
The three logarithmic regression models
Case Population regression function
Roman numeral one. linear-log Upper Y subscript i baselineequals betasubscript 0baselineplus betasubscript 1baselinethenatural logofleft parenthesis upper X subscripti rightparenthesis plus usubscript i.
Roman numeral two. log-linear The natural logof left parenthesis upper Y subscript i baselineright parenthesis equals betasubscript0 baseline plus betasubscript 1upper X subscript i plus usubscript i.
Roman numeral three. log-log Thenatural logof leftparenthesis upper Ysubscript i baseline right parenthesis equals betasubscript 0baselineplus betasubscript 1baselinethe natural logofleft parenthesis upper X subscripti right parenthesis plus usubscript i.
I.
II.
III.
 
0 1ln
i i i
Y = β + β X + u
 
n  
  
0 1
l i i i
Y X u
   
 
  
0 1ln
ln i i i
Y X u
The three logarithmic regression models
• The interpretation of the slope coefficient
differs in each case.
• The interpretation is found by applying the
general “before and after” rule: “figure out the
change in Y for a given change in X.”
• Each case has a natural interpretation (for
small changes in X )
Linear-log Model
• Compute Y before and after changing X:
• ) (b)
• Now change X: ) (a)
• Subtract a – b:
• Since
• =
Linear-log population function
=
• Now if X changes by 1%, then then Y will
change by
Example: Test Score versus ln(income)
• First define the new regressor, ln(income)
• The model is now linear in ln(income), so the linear log
model can be estimated by OLS:
• Interpretation: a 1% increase in income is associated with
an increase in TestScore of 36.42 0.36 points on the test
• Standard errors, confidence intervals, all the usual tools
of regression apply here
The linear-log and cubic regression
functions
Log-linear Model
• (b)
• Now change X: (a)
• Subtract (a) – (b):
• Because
• So
Log-linear population regression function
• For small ,
• Now if X changes by 1 unit, then changes by .
• Translate it into percentages: When X changes by 1 unit, y changes by
100 %
• This quantity is called semi-elasticity of y with respect to x: It shows
the percentage change in y when x increases by one unit.
Example
Example
As the years of education increase by 1 year, we expect
wage to increase by 8.27%.
Log-log population regression function
• (b)
• Now change X: (a)
• Subtract (a) – (b):
• Because
• So
Log-log population regression function
= =
• Now if the percentage change in X is 1%, then is the
percentage change in Y associated with a 1% change in X.
• In other words, is the elasticity of Y with respect to X.
49
Note: Elasticity and slope
• Using elasticity and slope are two different
concepts:
Example: ln(TestScore) vs. ln(Income)
• First define the new dependent variable, ln(TestScore)
and the new regressor, ln(income)
• The model is now a linear regression of ln(TestScore)
against ln(income), so it can be estimated by OLS:
• ln(
• Interpretation: a 1% increase in income is associated
with an increase of 0.0554% in TestScore (income up by
a factor of 1.01, Testscore up by a factor of 1.000554)
Example: ln(TestScore) vs. ln(Income)
continued
ln(
• For example, suppose income increases from $10,000 to
$11,000 or by 10%. Then test score increases by
approximately:
Testscore = .
• If TestScore =650, this corresponds to an increase of
0.00554 650 = 3.6 points.
Example: ln(TestScore) vs. ln(Income)
continued
ln(
• Q: If there is a 2% increase in income, by what
percentage will test scores increase? Calculate the
increase in test score if the test score is 650.
Example: ln(TestScore) vs. ln(Income)
continued
ln(
• Q: If there is a 2% increase in income, by what percentage will test
scores increase?
Testscore =
• If TestScore =650, this corresponds to an increase of 0.001108 650 =
0.72 points.
• How does this model compare to the log-linear model?
The log-linear and log-log specifications
• Note vertical axis
• The log-linear model doesn’t seem to fit as well as the log-log model, based
on visual inspection.
55
Summary of main functional forms
Linear-linear or
level-level
log-log
Log-linear %
linear-log
Logarithmic regression models with multiple
regressors and applications
57
Logarithmic Regression Models with multiple
regressors
• and are partial slope coefficients or partial elasticities
• : measures the elasticity of with respect to , holding the influence
of constant
– that is, it measures the percentage change in for a percentage
change in , holding the influence of constant
• : measures the elasticity of with respect to , holding the influence
of constant
– that is, it measures the percentage change in for a percentage
change in , holding the influence of constant
58
Application for log-log model: Estimating a
Cobb-Douglas production function (1 if 3)
• : output; : labour; : capital
• Using Mexican data over the years 1955-
1974, we obtain:
59
Application for log-log model: Estimating a
Cobb-Douglas production function (2 of 3)
• : measures the elasticity of output with respect to labour,
holding capital constant
•
• Interpretation: holding capital constant, if labour (employment)
increases by 1%, average output increases by about 0.34%
• : measures the elasticity of output with respect to capital,
holding labour constant
•
• Interpretation: holding labour constant, if capital increases by
1%, average output goes up by about 0.85%
60
Application: Estimating a Cobb-Douglas
production function (3 of 3)
• : returns to scale parameter
• It measures the response of output to a
proportional change in input
• constant returns to scale ()
• decreasing returns to scale ()
• increasing returns to scale ()
61
Application for log-linear model: Estimating a
wage equation (1 if 2)
• Using a sample of 1,801 City graduates, the following earnings
equation has been estimated:
• : annual earnings
• : years of education
• : years of working experience
• : variable equal to 1 if the individual is a female; 0 otherwise
• Standard errors are reported in parenthesis
• Q: Interpret the coefficient estimates on educ, experience and
female
Application for log-linear model:
Estimating a wage equation (2 if 2)
• Interpretation of the coefficients:
• : indicates that one additional year of education increases average
earnings by 14.7%, holding gender and experience constant
• : indicates that one additional year of experience increases average
earnings by 4.9%, holding gender and education constant
• : indicates that average earnings of females are 20.1% lower than
that of males, holding experience and education constant
62
Summary
Two different approaches to incorporate
nonlinear relationships in regression models:
1) Polynomial regression models in X:
Interpretation, test
2) Logarithmic transformation:
Interpretation and their application to economics

More Related Content

PPT
Lecture 4
PPTX
regression analysis presentation slides.
PPT
604_multiplee.ppt
PPTX
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
PPT
Non linear regression function -Introduction to Economics
PDF
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
PPTX
Lecture - 8 MLR.pptx
PPT
Data Analysison Regression
Lecture 4
regression analysis presentation slides.
604_multiplee.ppt
MachineLearning_Unit-II.pptxScrum.pptxAgile Model.pptxAgile Model.pptxAgile M...
Non linear regression function -Introduction to Economics
MachineLearning_Unit-II.FHDGFHJKpptx.pdf
Lecture - 8 MLR.pptx
Data Analysison Regression

Similar to Week 8 - Functional Forms.pptx this is presentation (20)

PPTX
Unit-III Correlation and Regression.pptx
PPTX
Lecture 8 Linear and Multiple Regression (1).pptx
PPT
Ders 2 ols .ppt
DOC
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
PPT
Introduction to LogisticRegression_RWESCK.ppt
PDF
Simple & Multiple Regression Analysis
PPTX
Correlation _ Regression Analysis statistics.pptx
PPT
Multiple Regression.ppt
PDF
Applied statistics lecture_6
PDF
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
PPTX
Forecasting Using the Predictive Analytics
PDF
Module 5.pdf Machine Learning Types and examples
PPTX
Regression-SIMPLE LINEAR (1).psssssssssptx
PDF
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
PPTX
Regression of research methodlogyyy.pptx
PPTX
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
PPTX
Logistic Regression in machine learning ppt
PPTX
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
PPTX
Detail Study of the concept of Regression model.pptx
PPTX
Stat 1163 -correlation and regression
Unit-III Correlation and Regression.pptx
Lecture 8 Linear and Multiple Regression (1).pptx
Ders 2 ols .ppt
Ch 6 Slides.doc/9929292929292919299292@:&:&:&9/92
Introduction to LogisticRegression_RWESCK.ppt
Simple & Multiple Regression Analysis
Correlation _ Regression Analysis statistics.pptx
Multiple Regression.ppt
Applied statistics lecture_6
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
Forecasting Using the Predictive Analytics
Module 5.pdf Machine Learning Types and examples
Regression-SIMPLE LINEAR (1).psssssssssptx
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Regression of research methodlogyyy.pptx
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Logistic Regression in machine learning ppt
11Polynomial RegressionPolynomial RegressionPolynomial RegressionPolynomial R...
Detail Study of the concept of Regression model.pptx
Stat 1163 -correlation and regression
Ad

Recently uploaded (20)

PDF
Blood Collected straight from the donor into a blood bag and mixed with an an...
PPTX
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PDF
TyAnn Osborn: A Visionary Leader Shaping Corporate Workforce Dynamics
PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
1911 Gold Corporate Presentation Aug 2025.pdf
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
PDF
How to Get Business Funding for Small Business Fast
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PDF
Booking.com The Global AI Sentiment Report 2025
PDF
How to Get Approval for Business Funding
PDF
Ôn tập tiếng anh trong kinh doanh nâng cao
PDF
Deliverable file - Regulatory guideline analysis.pdf
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
Keppel_Proposed Divestment of M1 Limited
Blood Collected straight from the donor into a blood bag and mixed with an an...
3. HISTORICAL PERSPECTIVE UNIIT 3^..pptx
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
TyAnn Osborn: A Visionary Leader Shaping Corporate Workforce Dynamics
Slide gioi thieu VietinBank Quy 2 - 2025
1911 Gold Corporate Presentation Aug 2025.pdf
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
THE COMPLETE GUIDE TO BUILDING PASSIVE INCOME ONLINE
How to Get Business Funding for Small Business Fast
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
Booking.com The Global AI Sentiment Report 2025
How to Get Approval for Business Funding
Ôn tập tiếng anh trong kinh doanh nâng cao
Deliverable file - Regulatory guideline analysis.pdf
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Module 2 - Modern Supervison Challenges - Student Resource.pdf
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Keppel_Proposed Divestment of M1 Limited
Ad

Week 8 - Functional Forms.pptx this is presentation

  • 2. 2 Functional Forms of Regression Models • Up to now, we have considered models that are linear in parameters and linear in variables, like the linear regression model: • Example:
  • 3. Example • Suppose you want to understand whether and how age, gender, and education predict people’s earnings • Formal specification of your econometric model: • Estimated form?
  • 6. Linear regression model • So far we have assumed that the population regression function is linear • What does this mean? • The slope of the population regression function is constant and does not depend on the values of x
  • 7. Ex: relationship between test score and student teacher ratio (Maybe linear)
  • 8. Ex: relationship between test score and income however does not look linear
  • 9. Today Two different approaches: 1) Polynomial regression models in X: The population regression function is approximated by a quadratic, cubic, or higher-degree polynomial 2) Logarithmic transformation: Y and/or X is transformed by taking its logarithm, which provides a “percentages” interpretation of the coefficients that makes sense in many applications
  • 10. 10 Polynomial regression model • One way to specify a nonlinear regression function is to use a polynomial of X. • Let r be the highest power of X. The polynomial regression model of degree r is: • When r=2, we have a quadratic regression model (Two-degree polynomial): • When r=3, we have a cubic regression model (Three-degree polynomial): • It is a multiple regression model • Does it suffer from the problem of collinearity? Since it is the same variable entering the model as squared; to the power of 3; …
  • 11. 11 Polynomial regression model (cont.) • and are no linear functions of • So it does not violate the assumption of no perfect collinearity
  • 12. Example • You estimate the following population regression model: • Your estimated model is: • How does an increase in X affect Y?
  • 13. Example • You estimate the following population regression model: • Your estimated model is: • How does an increase in X affect Y? • Now the marginal effect of X on Y depends on the values of X: an additional increase in X brings an increase in Y but it increases in a diminishing rate
  • 14. When does the impact of X on Y becomes zero?
  • 15. Example • Exper has a diminishing effect on wage • Every additional year of experience brings an increase in wage but it increases in a smaller speed every year
  • 16. Example (cont.) • Interesting is to see when does the return to experience become zero. • This is the turning point  when you solve it for exper, exper=31 years (i.e., after 31 years there is no benefit of experience to your wage) • Is this realistic?
  • 17. Example (cont.) • It is not very realistic but is one of the consequences of using a quadratic form. • At some point the function reaches the maximum and then will curve downward • That point is usually large enough. For example, Mean of experience = 17 (on average people in our sample have 17 years of experience)
  • 18. Example: The test score, income relation • average district income in the ith district (thousands of dollars per capita) • Quadratic specification: • Cubic Specification: In these specification the marginal effect of income on test score depends on values of income. To show this: Income
  • 19. Estimation of the quadratic specification in STATA Test the null hypothesis of linearity against the alternative that the regression function is a quadratic.…
  • 20. Interpreting the estimated regression function (1 of 3) (a) Plot the predicted values  2 607.3 + 3.85 0 0423( ) (2.9) (0.27) (0 0048) i i TestScore = Income Income   
  • 21. Interpreting the estimated regression function (2 of 3) (b) Compute the slope, evaluated at various values of X Q: What is the predicted change in TestScore for a change in income from $5,000 per capita to $6,000 per capita? ^ TestScore=607.3+3.85 Incomei−0.0423¿
  • 22. Interpreting the estimated regression function (2 of 3) (b) Compute the slope, evaluated at various values of X Predicted change in TestScore for a change in income from $5,000 per capita to $6,000 per capita:  2 2 607 3 + 3.85 6 0 0423 6 (607.3 + 3 85 5 0 0423 5 ) = 3 4 ΔTestScore =             ^ TestScore=607.3+3.85 Incomei−0.0423¿
  • 23. Interpreting the estimated regression function (2 of 3) (b) Compute the slope, evaluated at various values of X Predicted change in TestScore for a change in income from $5,000 per capita to $6,000 per capita: ^ TestScore=607.3+3.85 Incomei−0.0423¿
  • 24. Interpreting the estimated regression function (2 of 3) ^ TestScore=607.3+3.85 Incomei−0.0423¿ Q: What is the predicted change in TestScore for a change in income from $25,000 per capita to $26,000 per capita?
  • 25. Interpreting the estimated regression function (2 of 3) ^ TestScore=607.3+3.85 Incomei−0.0423¿ Q: What is the predicted change in TestScore for a change in income from $25,000 per capita to $26,000 per capita?
  • 26. Interpreting the estimated regression function (3 of 3)  2 607 3 + 3 85 0 0423( )  i i TestScore = Income Income    Predicted “effects” for different values of X: Change in Income ($1000 per capita) Delta T est score. from 5 to 6 3.4 from 25 to 26 1.7 from 45 to 46 0.0  Δ TestScore The “effect” of a change in income is greater at low than high income levels What is the effect of a change from 65 to 66?
  • 27. Interpreting the estimated regression function (3 of 3)  2 607 3 + 3 85 0 0423( )  i i TestScore = Income Income    Predicted “effects” for different values of X: Change in Income ($1000 per capita) Delta T est score. from 5 to 6 3.4 from 25 to 26 1.7 from 45 to 46 0.0  Δ TestScore The “effect” of a change in income is greater at low than high income levels What is the effect of a change from 65 to 66? Caution! Don’t extrapolate outside the range of the data!
  • 28. Estimation of a cubic specification in STATA (1 of 2)
  • 29. Estimation of a cubic specification in STATA (2 of 2) Testing the null hypothesis of linearity, against the alternative that the population regression is quadratic and/or cubic, that is, it is a polynomial of degree up to 3: 0 : H population coefficients on 2 Income and 3 = 0 Income 1 : H at least one of these coefficients is nonzero. test avginc2 avginc3 (1) avginc2 = 0.0 (2) avginc3 = 0.0   F 2 416 = 37 69   Prob > F = 0 0000  The hypothesis that the population regression is linear is rejected at the 1% significance level against the alternative that it is a polynomial of degree up to 3.
  • 30. Which degree polynomial should I use? • Plot the data and follow a sequential hypothesis testing: 1) Pick a maximum value of r and estimate the polynomial regression for that r 2) Use the t-statistic to test whether the coefficient on is zero. If you reject this hypothesis, then keep in the regression 3) If you do not reject, then eliminate and use a polynomial regression of degree r-1, test whether the coefficient on is zero. If you reject use the polynomial of degree r-1. 4) If you do not reject, continue until you find the coefficient on the highest power to be significant. If you don’t see sharp jumps in the data (which is usually the case in economic data) then start with polynomials of degree 2 to 4.
  • 31. Summary: polynomial regression functions 2 0 1 2 r i i i r i i Y = β + β X + β X + + β X + u  • Estimation: by OLS after defining new regressors • To interpret the estimated regression function: – plot predicted values as a function of x – compute predicted Δ Δ Y / X for different values of x • Hypotheses concerning degree r can be tested by t- and F-tests on the appropriate (blocks of ) variable(s). • Choice of degree r – plot the data; t- and F-tests, check sensitivity of estimated effects; judgment.
  • 33. Logarithmic functions of Y and/or X • Another way to specify a non-linear regression function is to use the natural logarithm of Y and/or X • Logarithms convert changes in variables into percentage changes • Many relationships are naturally expressed in terms of percentages
  • 34. Examples where we are interested in expressing relationships in percentages • In our previous example, we studies that the relationship between income and test scores are nonlinear. BUT would this relationship be linear if we change income by 1% rather than $1000? • When we study consumer demand: we are interested in learning about how 1% increase in price leads to a certain percentage decrease in quantity demanded. (Price elasticity) • Wage gap between male and female college graduates: we can compare wage gaps in terms of dollars but it is easier to compare wage gaps across professions in percentage terms
  • 35. Logarithms and Percentages • Logarithmic transformations allow us to model relations in “percentage” terms (like elasticities), rather than linearly. • Why? The link between logarithms and percentages relies on this approximation: • Numerically: When X= 100 and , then or 1% or 0.995% so it is also approximately 1%. • This approximation is only true for small changes in x
  • 36. The three logarithmic regression models
  • 37. The three logarithmic regression models Case Population regression function Roman numeral one. linear-log Upper Y subscript i baselineequals betasubscript 0baselineplus betasubscript 1baselinethenatural logofleft parenthesis upper X subscripti rightparenthesis plus usubscript i. Roman numeral two. log-linear The natural logof left parenthesis upper Y subscript i baselineright parenthesis equals betasubscript0 baseline plus betasubscript 1upper X subscript i plus usubscript i. Roman numeral three. log-log Thenatural logof leftparenthesis upper Ysubscript i baseline right parenthesis equals betasubscript 0baselineplus betasubscript 1baselinethe natural logofleft parenthesis upper X subscripti right parenthesis plus usubscript i. I. II. III.   0 1ln i i i Y = β + β X + u   n      0 1 l i i i Y X u          0 1ln ln i i i Y X u
  • 38. The three logarithmic regression models • The interpretation of the slope coefficient differs in each case. • The interpretation is found by applying the general “before and after” rule: “figure out the change in Y for a given change in X.” • Each case has a natural interpretation (for small changes in X )
  • 39. Linear-log Model • Compute Y before and after changing X: • ) (b) • Now change X: ) (a) • Subtract a – b: • Since • =
  • 40. Linear-log population function = • Now if X changes by 1%, then then Y will change by
  • 41. Example: Test Score versus ln(income) • First define the new regressor, ln(income) • The model is now linear in ln(income), so the linear log model can be estimated by OLS: • Interpretation: a 1% increase in income is associated with an increase in TestScore of 36.42 0.36 points on the test • Standard errors, confidence intervals, all the usual tools of regression apply here
  • 42. The linear-log and cubic regression functions
  • 43. Log-linear Model • (b) • Now change X: (a) • Subtract (a) – (b): • Because • So
  • 44. Log-linear population regression function • For small , • Now if X changes by 1 unit, then changes by . • Translate it into percentages: When X changes by 1 unit, y changes by 100 % • This quantity is called semi-elasticity of y with respect to x: It shows the percentage change in y when x increases by one unit.
  • 46. Example As the years of education increase by 1 year, we expect wage to increase by 8.27%.
  • 47. Log-log population regression function • (b) • Now change X: (a) • Subtract (a) – (b): • Because • So
  • 48. Log-log population regression function = = • Now if the percentage change in X is 1%, then is the percentage change in Y associated with a 1% change in X. • In other words, is the elasticity of Y with respect to X.
  • 49. 49 Note: Elasticity and slope • Using elasticity and slope are two different concepts:
  • 50. Example: ln(TestScore) vs. ln(Income) • First define the new dependent variable, ln(TestScore) and the new regressor, ln(income) • The model is now a linear regression of ln(TestScore) against ln(income), so it can be estimated by OLS: • ln( • Interpretation: a 1% increase in income is associated with an increase of 0.0554% in TestScore (income up by a factor of 1.01, Testscore up by a factor of 1.000554)
  • 51. Example: ln(TestScore) vs. ln(Income) continued ln( • For example, suppose income increases from $10,000 to $11,000 or by 10%. Then test score increases by approximately: Testscore = . • If TestScore =650, this corresponds to an increase of 0.00554 650 = 3.6 points.
  • 52. Example: ln(TestScore) vs. ln(Income) continued ln( • Q: If there is a 2% increase in income, by what percentage will test scores increase? Calculate the increase in test score if the test score is 650.
  • 53. Example: ln(TestScore) vs. ln(Income) continued ln( • Q: If there is a 2% increase in income, by what percentage will test scores increase? Testscore = • If TestScore =650, this corresponds to an increase of 0.001108 650 = 0.72 points. • How does this model compare to the log-linear model?
  • 54. The log-linear and log-log specifications • Note vertical axis • The log-linear model doesn’t seem to fit as well as the log-log model, based on visual inspection.
  • 55. 55 Summary of main functional forms Linear-linear or level-level log-log Log-linear % linear-log
  • 56. Logarithmic regression models with multiple regressors and applications
  • 57. 57 Logarithmic Regression Models with multiple regressors • and are partial slope coefficients or partial elasticities • : measures the elasticity of with respect to , holding the influence of constant – that is, it measures the percentage change in for a percentage change in , holding the influence of constant • : measures the elasticity of with respect to , holding the influence of constant – that is, it measures the percentage change in for a percentage change in , holding the influence of constant
  • 58. 58 Application for log-log model: Estimating a Cobb-Douglas production function (1 if 3) • : output; : labour; : capital • Using Mexican data over the years 1955- 1974, we obtain:
  • 59. 59 Application for log-log model: Estimating a Cobb-Douglas production function (2 of 3) • : measures the elasticity of output with respect to labour, holding capital constant • • Interpretation: holding capital constant, if labour (employment) increases by 1%, average output increases by about 0.34% • : measures the elasticity of output with respect to capital, holding labour constant • • Interpretation: holding labour constant, if capital increases by 1%, average output goes up by about 0.85%
  • 60. 60 Application: Estimating a Cobb-Douglas production function (3 of 3) • : returns to scale parameter • It measures the response of output to a proportional change in input • constant returns to scale () • decreasing returns to scale () • increasing returns to scale ()
  • 61. 61 Application for log-linear model: Estimating a wage equation (1 if 2) • Using a sample of 1,801 City graduates, the following earnings equation has been estimated: • : annual earnings • : years of education • : years of working experience • : variable equal to 1 if the individual is a female; 0 otherwise • Standard errors are reported in parenthesis • Q: Interpret the coefficient estimates on educ, experience and female
  • 62. Application for log-linear model: Estimating a wage equation (2 if 2) • Interpretation of the coefficients: • : indicates that one additional year of education increases average earnings by 14.7%, holding gender and experience constant • : indicates that one additional year of experience increases average earnings by 4.9%, holding gender and education constant • : indicates that average earnings of females are 20.1% lower than that of males, holding experience and education constant 62
  • 63. Summary Two different approaches to incorporate nonlinear relationships in regression models: 1) Polynomial regression models in X: Interpretation, test 2) Logarithmic transformation: Interpretation and their application to economics

Editor's Notes

  • #19: Long Description: The data is as follows. generate a v g i n c 2 equals a v g i n c asterisk a v g i n c r e g test s c r a v g i n c a v g i n c 2 comma r Regression with robust standard errors. Number of observation equals 420, F left parenthesis 2 comma 417 right parenthesis equals 428.52, P r o b is greater than F equals 0.0000, R-squared equals 0.5562, and Root M S E equals 12.724. Test s c r, a v g i n c; c o e f, 3.850995; Robust s t d e r r, 0.2680941; t, 14.36; p is greater than the absolute value of t, 0.000; 95 percent c o n f interval, 3.32401, 4.377979. Test s c r, a v g i n 2; c o e f, negative 0.0423085; Robust s t d e r r, 0.0047803; t, negative 8.85; p is greater than the absolute value of t, 0.000; 95 percent c o n f interval, negative 0.051705, negative 0.0329119. Test s c r, underscore c o n s; c o e f, 607.3017; Robust s t d e r r, 2.901754; t, 209.29; p is greater than the absolute value of t, 0.000; 95 percent c o n f interval, 601.5978, 613.0056.
  • #20: Long Description 1: The equation is as follows. Test score equals 607.3 left parenthesis 2.9 right parenthesis plus 3.85 left parenthesis 0.27 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis 0.0048 right parenthesis left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: In the scatterplot, the horizontal axis represents District income in thousands of dollars ranges from 0 to 60 in increments of 10 units. The vertical axis represents Test score ranges from 600 to 740 in increments of 20 units. The graph plots a rising slope labeled Linear regression from (4, 638) to (55, 736) and a concave curve labeled Quadratic regression from (4, 624), (26, 680) to (55, 690). The dots are densely unevenly scattered above and below the line and the curve. All values are estimated.
  • #21: Long Description 1: The equation is as follows. Test left parenthesis 2.9 right parenthesis score left parenthesis 0.27 right parenthesis equals 607.3 plus 3.85 left parenthesis 0.0048 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: Delta Test score equals 607.3 plus 3.85 times 6 minus 0.0423 times 6 squared minus left parenthesis 607.3 plus 3.85 times 5 minus 0.0423 times 5 squared right parenthesis equals 3.4.
  • #22: Long Description 1: The equation is as follows. Test left parenthesis 2.9 right parenthesis score left parenthesis 0.27 right parenthesis equals 607.3 plus 3.85 left parenthesis 0.0048 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: Delta Test score equals 607.3 plus 3.85 times 6 minus 0.0423 times 6 squared minus left parenthesis 607.3 plus 3.85 times 5 minus 0.0423 times 5 squared right parenthesis equals 3.4.
  • #23: Long Description 1: The equation is as follows. Test left parenthesis 2.9 right parenthesis score left parenthesis 0.27 right parenthesis equals 607.3 plus 3.85 left parenthesis 0.0048 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: Delta Test score equals 607.3 plus 3.85 times 6 minus 0.0423 times 6 squared minus left parenthesis 607.3 plus 3.85 times 5 minus 0.0423 times 5 squared right parenthesis equals 3.4.
  • #24: Long Description 1: The equation is as follows. Test left parenthesis 2.9 right parenthesis score left parenthesis 0.27 right parenthesis equals 607.3 plus 3.85 left parenthesis 0.0048 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: Delta Test score equals 607.3 plus 3.85 times 6 minus 0.0423 times 6 squared minus left parenthesis 607.3 plus 3.85 times 5 minus 0.0423 times 5 squared right parenthesis equals 3.4.
  • #25: Long Description 1: The equation is as follows. Test left parenthesis 2.9 right parenthesis score left parenthesis 0.27 right parenthesis equals 607.3 plus 3.85 left parenthesis 0.0048 right parenthesis Income subscript i baseline minus 0.0423 left parenthesis Income subscript i baseline right parenthesis squared. Long Description 2: Delta Test score equals 607.3 plus 3.85 times 6 minus 0.0423 times 6 squared minus left parenthesis 607.3 plus 3.85 times 5 minus 0.0423 times 5 squared right parenthesis equals 3.4.
  • #28: Long Description: The data is as follows. g e n a v g i n c 3 equals a v g i n c asterisk a v g i n c 2 r e g test s c r a v g i n c a v g i n c 2 a v g i n c 3 comma r Regression with robust standard errors. Number of observation equals 420, F left parenthesis 3 comma 416 right parenthesis equals 270.18, P r o b greater than F equals 0.0000, R-squared equals 0.5584, and Root M S E equals 12.707. Test s c r, a v g i n c; c o e f, 5.018677; Robust s t d e r r, 0.7073505; t, 7.10; p is greater than the absolute value of t, 0.000; 95 percent c o n f interval, 3.628251, 6.409104. Test s c r, a v g i n 2; c o e f, negative 0.0958052; Robust s t d e r r, 0.0289537; t, negative 3.31; p is greater than the absolute value of t, 0.001; 95 percent c o n f interval, negative 0.1527191, negative 0.0388913.Test s c r, a v g i n 3; c o e f, 0.0006855; Robust s t d e r r, 0.0003471; t, 1.98; p is greater than the absolute value of t, 0.049; 95 percent c o n f interval, 3.27 e negative 06, 0.0013677. Test s c r, c o n s; c o e f, 600.079; Robust s t d e r r, 5.102062; t, 117.61; p is greater than the absolute value of t, 0.000; 95 percent c o n f interval, 590.0499, 610.108.
  • #31: Long Description: Upper Y sub i baseline equals Beta sub 0 baseline plus Beta sub 1 baseline upper X sub i baseline plus Beta sub 2 baseline upper X squared sub i baseline plus ellipsis plus Beta sub r baseline upper X super r sub i baseline plus u sub i baseline.
  • #42: Long Description: In the scatterplot, the horizontal axis represents District income in thousands of dollars ranges from 0 to 60 in increments of 10 units. The vertical axis represents Test score ranges from 600 to 740 in increments of 20 units. The graph plots two concave curves labeled Linear-log regression from left parenthesis 5, 619 right parenthesis, left parenthesis 30, 680 right parenthesis to left parenthesis 55, 704 right parenthesis and the another concave curve labeled Cubic regression from left parenthesis 5, 621 right parenthesis, left parenthesis 30, 682 right parenthesis to left parenthesis 55, 698 right parenthesis. The dots are densely unevenly scattered above and below the curve. All values are estimated
  • #49: The elasticity clearly depends on the values of x and is not constant along a demand/supply curve
  • #54: Long Description: In the scatterplot, the horizontal axis represents District income in thousands of dollars ranges from 0 to 60 in increments of 10 units. The vertical axis represents In left parenthesis Test scores right parenthesis ranges from 6.40 to 6.60 in increments of 0.05 unit. The graph plots a rising slope labeled Log-Linear regression from left parenthesis 4, 6.45 right parenthesis to left parenthesis 55, 6.58 right parenthesis and a concave curve labeled Log-log regression from left parenthesis 5, 6.25 right parenthesis, left parenthesis 30, 6.52 right parenthesis to left parenthesis 55, 6.55 right parenthesis. The dots are densely unevenly scattered above and below the line and the curve. All values are estimated.