SlideShare a Scribd company logo
2
Most read
14
Most read
21
Most read
405: ECONOMETRICS
Chapter # 1: THE NATURE OF REGRESSION ANALYSIS
By: Domodar N. Gujarati
Prof. M. El-SakkaProf. M. El-Sakka
Dept of Economics: Kuwait UniversityDept of Economics: Kuwait University
THE MODERN INTERPRETATION OF REGRESSION
• Regression analysis is concerned with the study of theRegression analysis is concerned with the study of the dependencedependence ofof
one variable, theone variable, the dependentdependent variable, on one or more other variables,variable, on one or more other variables,
thethe explanatoryexplanatory variables,variables, with a view to estimating and/or predictingwith a view to estimating and/or predicting
the (population)the (population) mean or averagemean or average value of the former in terms of thevalue of the former in terms of the
known or fixed (known or fixed (in repeated samplingin repeated sampling) values of the latter.) values of the latter.
ExamplesExamples
1.1. Consider Galton’s law of universal regression. Our concern is findingConsider Galton’s law of universal regression. Our concern is finding
out how theout how the average height of sons changesaverage height of sons changes,, given the fathers’given the fathers’ heightheight..
To see how this can be done, consider Figure 1.1, which is a scatterTo see how this can be done, consider Figure 1.1, which is a scatter
diagram, or scattergram.diagram, or scattergram.
Econometrics ch2
2.2. Consider the scattergram in Figure 1.2, which gives the distribution in aConsider the scattergram in Figure 1.2, which gives the distribution in a
hypothetical populationhypothetical population of heights of boysof heights of boys measuredmeasured atat fixed agesfixed ages..
3.3. studying the dependence ofstudying the dependence of personal consumption expenditurepersonal consumption expenditure on after taxon after tax
or disposable real personalor disposable real personal incomeincome. Such an analysis may be helpful in. Such an analysis may be helpful in
estimating the marginal propensity to consume (MPC.estimating the marginal propensity to consume (MPC.
4.4. A monopolist who canA monopolist who can fix the price or outputfix the price or output (but not both) may want to(but not both) may want to
find out thefind out the responseresponse of the demand for a product to changes in price. Suchof the demand for a product to changes in price. Such
an experiment may enable the estimation of the pricean experiment may enable the estimation of the price elasticityelasticity of theof the
demand for the product and may help determine the mostdemand for the product and may help determine the most profitable price.profitable price.
5.5. We may want to study the rate of change ofWe may want to study the rate of change of money wagesmoney wages in relation to thein relation to the
unemployment rateunemployment rate. The curve in Figure 1.3 is an example of the. The curve in Figure 1.3 is an example of the PhillipsPhillips
curvecurve. Such a scattergram may enable the labor economist to predict the. Such a scattergram may enable the labor economist to predict the
average change in money wagesaverage change in money wages given a certaingiven a certain unemployment rateunemployment rate..
Econometrics ch2
6.6. The higher the rate ofThe higher the rate of inflationinflation ππ, the lower the proportion (k) of, the lower the proportion (k) of theirtheir
income that people would want to hold in the form ofincome that people would want to hold in the form of moneymoney. Figure 1.4.. Figure 1.4.
• 7. The marketing director of a company may want to know how the7. The marketing director of a company may want to know how the
demand for the company’s productdemand for the company’s product is related to, say,is related to, say, advertisingadvertising
expenditureexpenditure. Such a study will be of considerable help in finding out the. Such a study will be of considerable help in finding out the
elasticityelasticity of demand with respect to advertising expenditure. Thisof demand with respect to advertising expenditure. This
knowledge may be helpful in determining theknowledge may be helpful in determining the “optimum” advertising“optimum” advertising
budget.budget.
• 8. Finally, an agronomist may be interested in studying the dependence of8. Finally, an agronomist may be interested in studying the dependence of
crop yieldcrop yield, say, of wheat,, say, of wheat, on temperature, rainfall, amount of sunshine, andon temperature, rainfall, amount of sunshine, and
fertilizerfertilizer. Such a dependence analysis may enable the prediction of the. Such a dependence analysis may enable the prediction of the
average crop yield, given information about the explanatory variables.average crop yield, given information about the explanatory variables.
1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS
• In statistical relationshipsIn statistical relationships among variables we essentially deal withamong variables we essentially deal with randomrandom
or stochastic variablesor stochastic variables, that is, variables that have, that is, variables that have probability distributionsprobability distributions..
In functional or deterministicIn functional or deterministic dependency, on the other hand, we also dealdependency, on the other hand, we also deal
with variables, but these variableswith variables, but these variables are not random or stochasticare not random or stochastic..
• The dependence ofThe dependence of crop yieldcrop yield on temperature, rainfall, sunshine, andon temperature, rainfall, sunshine, and
fertilizer, for example, isfertilizer, for example, is statisticalstatistical in naturein nature
• InIn deterministicdeterministic phenomena, we deal with relationships of the type, say,phenomena, we deal with relationships of the type, say,
exhibited byexhibited by Newton’s law of gravityNewton’s law of gravity, which states: Every particle in the, which states: Every particle in the
universe attracts every other particle with a force directly proportional touniverse attracts every other particle with a force directly proportional to
the product of their masses and inversely proportional to the square of thethe product of their masses and inversely proportional to the square of the
distance between them. Symbolically,distance between them. Symbolically, F = k(mF = k(m11mm22/r/r22
), where F = force, m), where F = force, m11 andand
mm22 are the masses of the two particles, r = distance, and k = constant ofare the masses of the two particles, r = distance, and k = constant of
proportionality.proportionality. we are not concerned with such deterministic relationships.we are not concerned with such deterministic relationships.
1.4 REGRESSION VERSUS CAUSATION
• Although regression analysis deals with theAlthough regression analysis deals with the dependencedependence of one variable onof one variable on
other variables,other variables, it does not necessarily imply causationit does not necessarily imply causation. In the crop-yield. In the crop-yield
example cited previously, there is noexample cited previously, there is no statistical reasonstatistical reason toto assume thatassume that rainfallrainfall
does not depend on crop yielddoes not depend on crop yield. The fact that we treat crop yield as dependent. The fact that we treat crop yield as dependent
on rainfall (among other things) is due toon rainfall (among other things) is due to non-statistical considerationsnon-statistical considerations::
Common sense suggests that theCommon sense suggests that the relationship cannot be reversedrelationship cannot be reversed, for we, for we
cannot control rainfall by varying crop yield. A statistical relationshipcannot control rainfall by varying crop yield. A statistical relationship inin
itself cannot logically imply causationitself cannot logically imply causation. To ascribe causality, one must appeal. To ascribe causality, one must appeal
to a priori or theoretical considerations.to a priori or theoretical considerations.
1.5 REGRESSION VERSUS CORRELATION
• InIn correlationcorrelation analysis, the primary objective is to measure theanalysis, the primary objective is to measure the strength orstrength or
degree of linear association between two variablesdegree of linear association between two variables.. For example,For example, smokingsmoking
andand lung cancerlung cancer, scores on, scores on statisticsstatistics andand mathematicsmathematics examinations, and soexaminations, and so
on. In regression analysis,on. In regression analysis, we try to estimate or predict the average value ofwe try to estimate or predict the average value of
one variable on the basis of the fixed values of otherone variable on the basis of the fixed values of other variablesvariables..
• Regression and correlation have some fundamental differences. InRegression and correlation have some fundamental differences. In
regression analysis there is anregression analysis there is an asymmetryasymmetry in the way the dependent andin the way the dependent and
explanatory variables are treated.explanatory variables are treated.
• In correlation analysis, we treat any (two) variablesIn correlation analysis, we treat any (two) variables symmetricallysymmetrically; there is; there is
nono distinction between the dependent and explanatory variablesdistinction between the dependent and explanatory variables. The. The
correlation between scores on mathematics and statistics examinations iscorrelation between scores on mathematics and statistics examinations is
the same as that between scores on statistics and mathematicsthe same as that between scores on statistics and mathematics
examinations. Moreover, both variables are assumed to beexaminations. Moreover, both variables are assumed to be randomrandom..
Whereas most of the regression theory to be dealt with here is conditionalWhereas most of the regression theory to be dealt with here is conditional
upon the assumption that the dependent variable isupon the assumption that the dependent variable is stochasticstochastic but thebut the
explanatory variables areexplanatory variables are fixed or nonstochasticfixed or nonstochastic..
1.6 TERMINOLOGY AND NOTATION
• In the literature the termsIn the literature the terms dependent variable and explanatory variable aredependent variable and explanatory variable are
described variously. A representativedescribed variously. A representative list is:list is:
• We will use theWe will use the dependent variable/explanatorydependent variable/explanatory variable or the morevariable or the more
neutral,neutral, regressand and regressorregressand and regressor terminology.terminology.
• The termThe term random is a synonym for the term stochasticrandom is a synonym for the term stochastic. A random or. A random or
stochastic variable is a variable that can take on any set of values, positivestochastic variable is a variable that can take on any set of values, positive
or negative, with a given probability.or negative, with a given probability.
1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS
• Types of DataTypes of Data
• There are three types of data: time series, cross-section, and pooled (i.e.,There are three types of data: time series, cross-section, and pooled (i.e.,
combination of time series and cross-section) data.combination of time series and cross-section) data.
• AA time seriestime series is a set of observations on theis a set of observations on the values that a variable takes atvalues that a variable takes at
different timesdifferent times. It is collected at. It is collected at regular time intervalsregular time intervals, such as daily,, such as daily,
weekly, monthly quarterly, annually, quinquennially, that is, every 5 yearsweekly, monthly quarterly, annually, quinquennially, that is, every 5 years
(e.g., the census of manufactures), or decennially (e.g., the census of(e.g., the census of manufactures), or decennially (e.g., the census of
population).population).
• Most empirical work based on time series data assumes that the underlyingMost empirical work based on time series data assumes that the underlying
time series is stationary. Loosely speakingtime series is stationary. Loosely speaking a time series is stationary if itsa time series is stationary if its
mean and variance do not vary systematically over time.mean and variance do not vary systematically over time.
Econometrics ch2
• Cross-Section Data.Cross-Section Data. Cross-section data are data on one or more variablesCross-section data are data on one or more variables
collectedcollected at the same point in time,at the same point in time, such as the census of populationsuch as the census of population
conducted by the Census Bureau every 10 years. example of cross-sectionalconducted by the Census Bureau every 10 years. example of cross-sectional
data is given in Table 1.1. For each year the data on the 50 states are cross-data is given in Table 1.1. For each year the data on the 50 states are cross-
sectional data. because of the stationarity issue, cross-sectional data toosectional data. because of the stationarity issue, cross-sectional data too
have their own problems, specifically the problem ofhave their own problems, specifically the problem of heterogeneityheterogeneity..
• From Table 1.1From Table 1.1 we see that we have some states that produce huge amountswe see that we have some states that produce huge amounts
of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).
When we include suchWhen we include such heterogeneousheterogeneous units in a statistical analysis, the sizeunits in a statistical analysis, the size
or scale effect must be taken into account. To see this clearly, we plot inor scale effect must be taken into account. To see this clearly, we plot in
Figure 1.6 the data on eggs produced and their prices in 50 states for theFigure 1.6 the data on eggs produced and their prices in 50 states for the
year 1990. This figure shows how widely scattered the observations are.year 1990. This figure shows how widely scattered the observations are.
Alaska
California
Econometrics ch2
• Pooled Data.Pooled Data. In pooled, or combined, data are elements ofIn pooled, or combined, data are elements of both time seriesboth time series
and cross-sectionand cross-section datdata. The data in Table 1.1 are an example of pooled data.a. The data in Table 1.1 are an example of pooled data.
For each year we have 50 cross-sectional observations and for each state weFor each year we have 50 cross-sectional observations and for each state we
have two time series observations on prices and output of eggs, a total ofhave two time series observations on prices and output of eggs, a total of
100100 pooledpooled (or combined) observations.(or combined) observations.
• Panel, Longitudinal, or Micropanel Data.Panel, Longitudinal, or Micropanel Data. This is aThis is a special typespecial type of pooledof pooled
data in which thedata in which the same cross-sectional unit (say, a family or a firm)same cross-sectional unit (say, a family or a firm) isis
surveyed over time.surveyed over time.
• The Sources of DataThe Sources of Data
• The data used in empirical analysis may be collected by aThe data used in empirical analysis may be collected by a governmentalgovernmental
agencyagency (e.g., the Department of Commerce), an(e.g., the Department of Commerce), an international agencyinternational agency (e.g.,(e.g.,
the International Monetary Fund (IMF) or the World Bank), athe International Monetary Fund (IMF) or the World Bank), a privateprivate
organizationorganization (e.g., the Standard & Poor’s Corporation), or an(e.g., the Standard & Poor’s Corporation), or an individualindividual..
Literally, there are thousands of such agencies collecting data for oneLiterally, there are thousands of such agencies collecting data for one
purpose or another.purpose or another.
• The Accuracy of DataThe Accuracy of Data
• The quality of the data is often not that good. Some reasons for that are:The quality of the data is often not that good. Some reasons for that are:
• FirstFirst, as noted, most social science data are nonexperimental in nature., as noted, most social science data are nonexperimental in nature.
Therefore, there is the possibility ofTherefore, there is the possibility of observational errorsobservational errors, either of omission, either of omission
or commission.or commission.
• SecondSecond, even in experimentally collected data, even in experimentally collected data errors of measurementerrors of measurement arisearise
from approximations and roundoffs.from approximations and roundoffs.
• ThirdThird, in questionnaire-type surveys, the problem of, in questionnaire-type surveys, the problem of nonresponsenonresponse can becan be
serious; a researcher is lucky to get a 40% response to a questionnaire.serious; a researcher is lucky to get a 40% response to a questionnaire.
• FourthFourth, the sampling methods used in obtaining the data may vary so, the sampling methods used in obtaining the data may vary so
widely that it is often difficult to compare the results obtained from thewidely that it is often difficult to compare the results obtained from the
various samples.various samples.
• FifthFifth, economic data are generally available at a highly aggregate level. For, economic data are generally available at a highly aggregate level. For
example, most macrodata (e.g., GNP, inflation, unemployment).example, most macrodata (e.g., GNP, inflation, unemployment).
• The researcher should always keep in mind that the results of research areThe researcher should always keep in mind that the results of research are
only as good as the quality of the dataonly as good as the quality of the data..
A Note on the Measurement Scales of Variables.
• The variables that we will generally encounter fall into four broadThe variables that we will generally encounter fall into four broad
categories:categories: ratio scale, interval scale, ordinal scale, and nominal scaleratio scale, interval scale, ordinal scale, and nominal scale.. It isIt is
important that we understand each.important that we understand each.
• Ratio ScaleRatio Scale. For a variable. For a variable X, taking two values, X1 and X2, the ratio X1/X2X, taking two values, X1 and X2, the ratio X1/X2..
Comparisons such asComparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.X2 ≤ X1 or X2 ≥ X1 are meaningful.
• Interval ScaleInterval Scale. The distance between two time periods, say (2000–1995) is. The distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000meaningful, but not the ratio of two time periods (2000/1995)./1995).
• Ordinal Scale.Ordinal Scale. Examples are grading systems (A, B, C grades) or incomeExamples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but theclass (upper, middle, lower). For these variables the ordering exists but the
distances between the categories cannot be quantified.distances between the categories cannot be quantified.
• Nominal Scale.Nominal Scale. Variables such as gender and marital status simply denoteVariables such as gender and marital status simply denote
categories.categories. Such variables cannot be expressedSuch variables cannot be expressed on the ratio, interval, oron the ratio, interval, or
ordinal scales.ordinal scales.
• Econometric techniques that may be suitable for ratio scale variables mayEconometric techniques that may be suitable for ratio scale variables may
not be suitable for nominal scale variables. Therefore, it is important tonot be suitable for nominal scale variables. Therefore, it is important to
bear in mind the distinctions among the four types.bear in mind the distinctions among the four types.
Econometrics ch2

More Related Content

PPT
Econometrics ch3
PPT
Linear Models and Econometrics Chapter 4 Econometrics.ppt
PDF
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
PPT
Econometrics ch1
PDF
Introduction to Econometrics
PPTX
Overview of econometrics 1
PPTX
Basic concepts of_econometrics
PDF
Heteroscedasticity
Econometrics ch3
Linear Models and Econometrics Chapter 4 Econometrics.ppt
Econometrics notes (Introduction, Simple Linear regression, Multiple linear r...
Econometrics ch1
Introduction to Econometrics
Overview of econometrics 1
Basic concepts of_econometrics
Heteroscedasticity

What's hot (20)

PPT
Econometrics lecture 1st
PPT
Basic econometrics lectues_1
PPTX
Econometrics
PDF
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
PPT
Eco Basic 1 8
PPTX
Chapter 06 - Heteroskedasticity.pptx
PPTX
Dummy variables
PPTX
Heteroscedasticity
PPTX
Introduction to Econometrics
PPSX
4. domar's growth model
PPT
Heteroskedasticity
PPTX
Econometrics chapter 5-two-variable-regression-interval-estimation-
PPT
ECONOMETRICS
PPT
TWO-VARIABLE REGRESSION ANALYSIS SOME BASIC IDEAS.ppt
PPTX
New keynesian economics
DOCX
Econometrics
PPTX
Autocorrelation
PPTX
Dummy variable model
PPTX
Permanent Income Hypothesis.pptx
PPTX
Offer Curves | Economics
Econometrics lecture 1st
Basic econometrics lectues_1
Econometrics
Advanced Econometrics by Sajid Ali Khan Rawalakot: 0334-5439066
Eco Basic 1 8
Chapter 06 - Heteroskedasticity.pptx
Dummy variables
Heteroscedasticity
Introduction to Econometrics
4. domar's growth model
Heteroskedasticity
Econometrics chapter 5-two-variable-regression-interval-estimation-
ECONOMETRICS
TWO-VARIABLE REGRESSION ANALYSIS SOME BASIC IDEAS.ppt
New keynesian economics
Econometrics
Autocorrelation
Dummy variable model
Permanent Income Hypothesis.pptx
Offer Curves | Economics
Ad

Viewers also liked (20)

PPTX
Lecture 13 valuation report
PPTX
Session 15,16 relative valuation basics
PPTX
Lecture 14 appraiser's ethic
PPT
Econometrics ch4
PDF
Asuultuud30 unicode
PPT
12 introduction to multiple regression model
PPT
Econometrics ch5
PPT
Econometrics ch6
PPT
PPT
PPT
PPT
PPT
PDF
PDF
PDF
New standard ba316
PDF
PDF
Econometrics standard
PPT
12 introduction to multiple regression model
Lecture 13 valuation report
Session 15,16 relative valuation basics
Lecture 14 appraiser's ethic
Econometrics ch4
Asuultuud30 unicode
12 introduction to multiple regression model
Econometrics ch5
Econometrics ch6
New standard ba316
Econometrics standard
12 introduction to multiple regression model
Ad

Similar to Econometrics ch2 (20)

PPTX
2.-Nature-of-Regression.pptx
PPT
THE NATURE OF REGRESSION ANALYSIS IN ECONOMETRICS
PDF
Simple regressionand correlation (2).pdf
PPTX
Regression analysis
PPT
06Econometrics_Statistics_Basic_1-8.ppt
PDF
Introduction to Regression Analysis
PPT
Econometrics
PPTX
statsiscs mbastudent by ambekar of nicmar pune
PPTX
REG.pptx
PPTX
Chapter 2 Simple Linear Regression Model.pptx
PDF
econometrics
PPTX
Correlation and Regression.pptx
PPTX
Advanced Econometrics L3-4.pptx
PDF
chapter-1thenatureofregressionanalysis-240708080939-ed630094.pdf
PPTX
Chapter-1 The Nature of Regression Analysis.pptx
PPTX
manecohuhuhuhubasicEstimation-1.pptx
DOCX
Chapter 2.docxnjnjnijijijijijijoiopooutdhuj
PDF
Regression
PPT
Research Methodology-Chapter 14
DOCX
Statistical methods
2.-Nature-of-Regression.pptx
THE NATURE OF REGRESSION ANALYSIS IN ECONOMETRICS
Simple regressionand correlation (2).pdf
Regression analysis
06Econometrics_Statistics_Basic_1-8.ppt
Introduction to Regression Analysis
Econometrics
statsiscs mbastudent by ambekar of nicmar pune
REG.pptx
Chapter 2 Simple Linear Regression Model.pptx
econometrics
Correlation and Regression.pptx
Advanced Econometrics L3-4.pptx
chapter-1thenatureofregressionanalysis-240708080939-ed630094.pdf
Chapter-1 The Nature of Regression Analysis.pptx
manecohuhuhuhubasicEstimation-1.pptx
Chapter 2.docxnjnjnijijijijijijoiopooutdhuj
Regression
Research Methodology-Chapter 14
Statistical methods

Recently uploaded (20)

PPTX
Cell Structure & Organelles in detailed.
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
PPH.pptx obstetrics and gynecology in nursing
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
Institutional Correction lecture only . . .
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Classroom Observation Tools for Teachers
PDF
Sports Quiz easy sports quiz sports quiz
PDF
Computing-Curriculum for Schools in Ghana
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
Complications of Minimal Access Surgery at WLH
Cell Structure & Organelles in detailed.
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPH.pptx obstetrics and gynecology in nursing
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Institutional Correction lecture only . . .
O5-L3 Freight Transport Ops (International) V1.pdf
Pharma ospi slides which help in ospi learning
TR - Agricultural Crops Production NC III.pdf
Pre independence Education in Inndia.pdf
2.FourierTransform-ShortQuestionswithAnswers.pdf
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
human mycosis Human fungal infections are called human mycosis..pptx
RMMM.pdf make it easy to upload and study
102 student loan defaulters named and shamed – Is someone you know on the list?
Classroom Observation Tools for Teachers
Sports Quiz easy sports quiz sports quiz
Computing-Curriculum for Schools in Ghana
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Complications of Minimal Access Surgery at WLH

Econometrics ch2

  • 1. 405: ECONOMETRICS Chapter # 1: THE NATURE OF REGRESSION ANALYSIS By: Domodar N. Gujarati Prof. M. El-SakkaProf. M. El-Sakka Dept of Economics: Kuwait UniversityDept of Economics: Kuwait University
  • 2. THE MODERN INTERPRETATION OF REGRESSION • Regression analysis is concerned with the study of theRegression analysis is concerned with the study of the dependencedependence ofof one variable, theone variable, the dependentdependent variable, on one or more other variables,variable, on one or more other variables, thethe explanatoryexplanatory variables,variables, with a view to estimating and/or predictingwith a view to estimating and/or predicting the (population)the (population) mean or averagemean or average value of the former in terms of thevalue of the former in terms of the known or fixed (known or fixed (in repeated samplingin repeated sampling) values of the latter.) values of the latter. ExamplesExamples 1.1. Consider Galton’s law of universal regression. Our concern is findingConsider Galton’s law of universal regression. Our concern is finding out how theout how the average height of sons changesaverage height of sons changes,, given the fathers’given the fathers’ heightheight.. To see how this can be done, consider Figure 1.1, which is a scatterTo see how this can be done, consider Figure 1.1, which is a scatter diagram, or scattergram.diagram, or scattergram.
  • 4. 2.2. Consider the scattergram in Figure 1.2, which gives the distribution in aConsider the scattergram in Figure 1.2, which gives the distribution in a hypothetical populationhypothetical population of heights of boysof heights of boys measuredmeasured atat fixed agesfixed ages..
  • 5. 3.3. studying the dependence ofstudying the dependence of personal consumption expenditurepersonal consumption expenditure on after taxon after tax or disposable real personalor disposable real personal incomeincome. Such an analysis may be helpful in. Such an analysis may be helpful in estimating the marginal propensity to consume (MPC.estimating the marginal propensity to consume (MPC. 4.4. A monopolist who canA monopolist who can fix the price or outputfix the price or output (but not both) may want to(but not both) may want to find out thefind out the responseresponse of the demand for a product to changes in price. Suchof the demand for a product to changes in price. Such an experiment may enable the estimation of the pricean experiment may enable the estimation of the price elasticityelasticity of theof the demand for the product and may help determine the mostdemand for the product and may help determine the most profitable price.profitable price. 5.5. We may want to study the rate of change ofWe may want to study the rate of change of money wagesmoney wages in relation to thein relation to the unemployment rateunemployment rate. The curve in Figure 1.3 is an example of the. The curve in Figure 1.3 is an example of the PhillipsPhillips curvecurve. Such a scattergram may enable the labor economist to predict the. Such a scattergram may enable the labor economist to predict the average change in money wagesaverage change in money wages given a certaingiven a certain unemployment rateunemployment rate..
  • 7. 6.6. The higher the rate ofThe higher the rate of inflationinflation ππ, the lower the proportion (k) of, the lower the proportion (k) of theirtheir income that people would want to hold in the form ofincome that people would want to hold in the form of moneymoney. Figure 1.4.. Figure 1.4.
  • 8. • 7. The marketing director of a company may want to know how the7. The marketing director of a company may want to know how the demand for the company’s productdemand for the company’s product is related to, say,is related to, say, advertisingadvertising expenditureexpenditure. Such a study will be of considerable help in finding out the. Such a study will be of considerable help in finding out the elasticityelasticity of demand with respect to advertising expenditure. Thisof demand with respect to advertising expenditure. This knowledge may be helpful in determining theknowledge may be helpful in determining the “optimum” advertising“optimum” advertising budget.budget. • 8. Finally, an agronomist may be interested in studying the dependence of8. Finally, an agronomist may be interested in studying the dependence of crop yieldcrop yield, say, of wheat,, say, of wheat, on temperature, rainfall, amount of sunshine, andon temperature, rainfall, amount of sunshine, and fertilizerfertilizer. Such a dependence analysis may enable the prediction of the. Such a dependence analysis may enable the prediction of the average crop yield, given information about the explanatory variables.average crop yield, given information about the explanatory variables.
  • 9. 1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS • In statistical relationshipsIn statistical relationships among variables we essentially deal withamong variables we essentially deal with randomrandom or stochastic variablesor stochastic variables, that is, variables that have, that is, variables that have probability distributionsprobability distributions.. In functional or deterministicIn functional or deterministic dependency, on the other hand, we also dealdependency, on the other hand, we also deal with variables, but these variableswith variables, but these variables are not random or stochasticare not random or stochastic.. • The dependence ofThe dependence of crop yieldcrop yield on temperature, rainfall, sunshine, andon temperature, rainfall, sunshine, and fertilizer, for example, isfertilizer, for example, is statisticalstatistical in naturein nature • InIn deterministicdeterministic phenomena, we deal with relationships of the type, say,phenomena, we deal with relationships of the type, say, exhibited byexhibited by Newton’s law of gravityNewton’s law of gravity, which states: Every particle in the, which states: Every particle in the universe attracts every other particle with a force directly proportional touniverse attracts every other particle with a force directly proportional to the product of their masses and inversely proportional to the square of thethe product of their masses and inversely proportional to the square of the distance between them. Symbolically,distance between them. Symbolically, F = k(mF = k(m11mm22/r/r22 ), where F = force, m), where F = force, m11 andand mm22 are the masses of the two particles, r = distance, and k = constant ofare the masses of the two particles, r = distance, and k = constant of proportionality.proportionality. we are not concerned with such deterministic relationships.we are not concerned with such deterministic relationships.
  • 10. 1.4 REGRESSION VERSUS CAUSATION • Although regression analysis deals with theAlthough regression analysis deals with the dependencedependence of one variable onof one variable on other variables,other variables, it does not necessarily imply causationit does not necessarily imply causation. In the crop-yield. In the crop-yield example cited previously, there is noexample cited previously, there is no statistical reasonstatistical reason toto assume thatassume that rainfallrainfall does not depend on crop yielddoes not depend on crop yield. The fact that we treat crop yield as dependent. The fact that we treat crop yield as dependent on rainfall (among other things) is due toon rainfall (among other things) is due to non-statistical considerationsnon-statistical considerations:: Common sense suggests that theCommon sense suggests that the relationship cannot be reversedrelationship cannot be reversed, for we, for we cannot control rainfall by varying crop yield. A statistical relationshipcannot control rainfall by varying crop yield. A statistical relationship inin itself cannot logically imply causationitself cannot logically imply causation. To ascribe causality, one must appeal. To ascribe causality, one must appeal to a priori or theoretical considerations.to a priori or theoretical considerations.
  • 11. 1.5 REGRESSION VERSUS CORRELATION • InIn correlationcorrelation analysis, the primary objective is to measure theanalysis, the primary objective is to measure the strength orstrength or degree of linear association between two variablesdegree of linear association between two variables.. For example,For example, smokingsmoking andand lung cancerlung cancer, scores on, scores on statisticsstatistics andand mathematicsmathematics examinations, and soexaminations, and so on. In regression analysis,on. In regression analysis, we try to estimate or predict the average value ofwe try to estimate or predict the average value of one variable on the basis of the fixed values of otherone variable on the basis of the fixed values of other variablesvariables.. • Regression and correlation have some fundamental differences. InRegression and correlation have some fundamental differences. In regression analysis there is anregression analysis there is an asymmetryasymmetry in the way the dependent andin the way the dependent and explanatory variables are treated.explanatory variables are treated. • In correlation analysis, we treat any (two) variablesIn correlation analysis, we treat any (two) variables symmetricallysymmetrically; there is; there is nono distinction between the dependent and explanatory variablesdistinction between the dependent and explanatory variables. The. The correlation between scores on mathematics and statistics examinations iscorrelation between scores on mathematics and statistics examinations is the same as that between scores on statistics and mathematicsthe same as that between scores on statistics and mathematics examinations. Moreover, both variables are assumed to beexaminations. Moreover, both variables are assumed to be randomrandom.. Whereas most of the regression theory to be dealt with here is conditionalWhereas most of the regression theory to be dealt with here is conditional upon the assumption that the dependent variable isupon the assumption that the dependent variable is stochasticstochastic but thebut the explanatory variables areexplanatory variables are fixed or nonstochasticfixed or nonstochastic..
  • 12. 1.6 TERMINOLOGY AND NOTATION • In the literature the termsIn the literature the terms dependent variable and explanatory variable aredependent variable and explanatory variable are described variously. A representativedescribed variously. A representative list is:list is:
  • 13. • We will use theWe will use the dependent variable/explanatorydependent variable/explanatory variable or the morevariable or the more neutral,neutral, regressand and regressorregressand and regressor terminology.terminology. • The termThe term random is a synonym for the term stochasticrandom is a synonym for the term stochastic. A random or. A random or stochastic variable is a variable that can take on any set of values, positivestochastic variable is a variable that can take on any set of values, positive or negative, with a given probability.or negative, with a given probability.
  • 14. 1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS • Types of DataTypes of Data • There are three types of data: time series, cross-section, and pooled (i.e.,There are three types of data: time series, cross-section, and pooled (i.e., combination of time series and cross-section) data.combination of time series and cross-section) data. • AA time seriestime series is a set of observations on theis a set of observations on the values that a variable takes atvalues that a variable takes at different timesdifferent times. It is collected at. It is collected at regular time intervalsregular time intervals, such as daily,, such as daily, weekly, monthly quarterly, annually, quinquennially, that is, every 5 yearsweekly, monthly quarterly, annually, quinquennially, that is, every 5 years (e.g., the census of manufactures), or decennially (e.g., the census of(e.g., the census of manufactures), or decennially (e.g., the census of population).population). • Most empirical work based on time series data assumes that the underlyingMost empirical work based on time series data assumes that the underlying time series is stationary. Loosely speakingtime series is stationary. Loosely speaking a time series is stationary if itsa time series is stationary if its mean and variance do not vary systematically over time.mean and variance do not vary systematically over time.
  • 16. • Cross-Section Data.Cross-Section Data. Cross-section data are data on one or more variablesCross-section data are data on one or more variables collectedcollected at the same point in time,at the same point in time, such as the census of populationsuch as the census of population conducted by the Census Bureau every 10 years. example of cross-sectionalconducted by the Census Bureau every 10 years. example of cross-sectional data is given in Table 1.1. For each year the data on the 50 states are cross-data is given in Table 1.1. For each year the data on the 50 states are cross- sectional data. because of the stationarity issue, cross-sectional data toosectional data. because of the stationarity issue, cross-sectional data too have their own problems, specifically the problem ofhave their own problems, specifically the problem of heterogeneityheterogeneity.. • From Table 1.1From Table 1.1 we see that we have some states that produce huge amountswe see that we have some states that produce huge amounts of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska). When we include suchWhen we include such heterogeneousheterogeneous units in a statistical analysis, the sizeunits in a statistical analysis, the size or scale effect must be taken into account. To see this clearly, we plot inor scale effect must be taken into account. To see this clearly, we plot in Figure 1.6 the data on eggs produced and their prices in 50 states for theFigure 1.6 the data on eggs produced and their prices in 50 states for the year 1990. This figure shows how widely scattered the observations are.year 1990. This figure shows how widely scattered the observations are.
  • 19. • Pooled Data.Pooled Data. In pooled, or combined, data are elements ofIn pooled, or combined, data are elements of both time seriesboth time series and cross-sectionand cross-section datdata. The data in Table 1.1 are an example of pooled data.a. The data in Table 1.1 are an example of pooled data. For each year we have 50 cross-sectional observations and for each state weFor each year we have 50 cross-sectional observations and for each state we have two time series observations on prices and output of eggs, a total ofhave two time series observations on prices and output of eggs, a total of 100100 pooledpooled (or combined) observations.(or combined) observations. • Panel, Longitudinal, or Micropanel Data.Panel, Longitudinal, or Micropanel Data. This is aThis is a special typespecial type of pooledof pooled data in which thedata in which the same cross-sectional unit (say, a family or a firm)same cross-sectional unit (say, a family or a firm) isis surveyed over time.surveyed over time. • The Sources of DataThe Sources of Data • The data used in empirical analysis may be collected by aThe data used in empirical analysis may be collected by a governmentalgovernmental agencyagency (e.g., the Department of Commerce), an(e.g., the Department of Commerce), an international agencyinternational agency (e.g.,(e.g., the International Monetary Fund (IMF) or the World Bank), athe International Monetary Fund (IMF) or the World Bank), a privateprivate organizationorganization (e.g., the Standard & Poor’s Corporation), or an(e.g., the Standard & Poor’s Corporation), or an individualindividual.. Literally, there are thousands of such agencies collecting data for oneLiterally, there are thousands of such agencies collecting data for one purpose or another.purpose or another.
  • 20. • The Accuracy of DataThe Accuracy of Data • The quality of the data is often not that good. Some reasons for that are:The quality of the data is often not that good. Some reasons for that are: • FirstFirst, as noted, most social science data are nonexperimental in nature., as noted, most social science data are nonexperimental in nature. Therefore, there is the possibility ofTherefore, there is the possibility of observational errorsobservational errors, either of omission, either of omission or commission.or commission. • SecondSecond, even in experimentally collected data, even in experimentally collected data errors of measurementerrors of measurement arisearise from approximations and roundoffs.from approximations and roundoffs. • ThirdThird, in questionnaire-type surveys, the problem of, in questionnaire-type surveys, the problem of nonresponsenonresponse can becan be serious; a researcher is lucky to get a 40% response to a questionnaire.serious; a researcher is lucky to get a 40% response to a questionnaire. • FourthFourth, the sampling methods used in obtaining the data may vary so, the sampling methods used in obtaining the data may vary so widely that it is often difficult to compare the results obtained from thewidely that it is often difficult to compare the results obtained from the various samples.various samples. • FifthFifth, economic data are generally available at a highly aggregate level. For, economic data are generally available at a highly aggregate level. For example, most macrodata (e.g., GNP, inflation, unemployment).example, most macrodata (e.g., GNP, inflation, unemployment). • The researcher should always keep in mind that the results of research areThe researcher should always keep in mind that the results of research are only as good as the quality of the dataonly as good as the quality of the data..
  • 21. A Note on the Measurement Scales of Variables. • The variables that we will generally encounter fall into four broadThe variables that we will generally encounter fall into four broad categories:categories: ratio scale, interval scale, ordinal scale, and nominal scaleratio scale, interval scale, ordinal scale, and nominal scale.. It isIt is important that we understand each.important that we understand each. • Ratio ScaleRatio Scale. For a variable. For a variable X, taking two values, X1 and X2, the ratio X1/X2X, taking two values, X1 and X2, the ratio X1/X2.. Comparisons such asComparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.X2 ≤ X1 or X2 ≥ X1 are meaningful. • Interval ScaleInterval Scale. The distance between two time periods, say (2000–1995) is. The distance between two time periods, say (2000–1995) is meaningful, but not the ratio of two time periods (2000meaningful, but not the ratio of two time periods (2000/1995)./1995). • Ordinal Scale.Ordinal Scale. Examples are grading systems (A, B, C grades) or incomeExamples are grading systems (A, B, C grades) or income class (upper, middle, lower). For these variables the ordering exists but theclass (upper, middle, lower). For these variables the ordering exists but the distances between the categories cannot be quantified.distances between the categories cannot be quantified. • Nominal Scale.Nominal Scale. Variables such as gender and marital status simply denoteVariables such as gender and marital status simply denote categories.categories. Such variables cannot be expressedSuch variables cannot be expressed on the ratio, interval, oron the ratio, interval, or ordinal scales.ordinal scales. • Econometric techniques that may be suitable for ratio scale variables mayEconometric techniques that may be suitable for ratio scale variables may not be suitable for nominal scale variables. Therefore, it is important tonot be suitable for nominal scale variables. Therefore, it is important to bear in mind the distinctions among the four types.bear in mind the distinctions among the four types.