SlideShare a Scribd company logo
3
Most read
Unit-5
Correlation :-
Suppose we have aset of 30 studentsina class andwe want to measure the heightsandweightsof all
the students.We observe thateachindividual(unit) of the setassumestwovalues –one relatingtothe
heightandthe otherto the weight.Suchadistributionin whicheachindividual orunitof the setis made
up of two valuesiscalledabivariate distribution. Some examplesof bivariate distributionare
(i) In a classof 60 studentsthe seriesof marksobtainedintwosubjectsbyall of them.
(ii) The seriesof salesrevenue andadvertisingexpenditureof twocompaniesinaparticular
year.
(iii) The seriesof agesof husbandsandwivesinasample of selectedmarriedcouples.
Thus ina bivariate distribution,we are givenasetof pairsof observations,whereineachpairrepresents
the valuesof twovariables.
In a bivariate distribution,we are interestedinfindingarelationship(if itexists) betweenthe two
variablesunderstudy.The conceptof ‘correlation’isastatistical tool whichstudiesthe relationship
betweentwovariablesandCorrelationAnalysisinvolvesvariousmethodsandtechniquesusedfor
studyingandmeasuringthe extentof the relationshipbetweenthe twovariables.
Definition:-Twovariablesare saidtobe incorrelationif the change inone of the variablesresultsin a
change in the othervariable.
Types of Correlation:-
Varioustypesof correlation are positive,negative,nocorrelation,perfect,strongandweakcorrelation.
Positive Correlation
Positive correlationoccurswhenanincrease inone variable increasesthe valueinanother.
The line correspondingtothe scatterplotisan increasingline.
Negative Correlation
Negative correlationoccurswhenanincrease inone variable decreasesthe value of another.
The line correspondingtothe scatterplotisa decreasingline.
No Correlation
No correlationoccurswhenthere isnolineardependencybetweenthe variables.
PerfectCorrelation
Perfectcorrelationoccurswhenthere isafuncional dependencybetweenthe variables.
In thiscase all the pointsare ina straightline.
Strong Correlation
A correlationisstrongerthe closerthe pointsare locatedtoone anotheron the line.
WeakCorrelation
A correlationisweakerthe fartherapart the pointsare locatedto one anotheronthe line.
Some examplesof seriesof positive correlationare:
(i) Heightsandweights;
(ii) Householdincome andexpenditure;
(iii) Price and supplyof commodities;
(iv) Amountof rainfall andyieldof crops.
Correlationbetweentwovariablesissaidtobe negative orinverse if the variablesdeviateinopposite
direction.Thatis,if the increase inthe variablesdeviate inopposite direction.Thatis,if increase (or
decrease) inthe valuesof one variable resultsonanaverage,incorrespondingdecrease (orincrease) in
the valuesof othervariable.
Some examplesof seriesof negative correlationare:
(i) Volume andpressure of perfectgas;
(ii) Currentand resistance [keepingthe voltage constant](𝑅 =
𝑉
𝐼
);
(iii) Price and demandof goods.
Note:
(i) If the pointsare veryclose to eachother,a fairlygoodamountof correlationcanbe
expectedbetweenthe twovariables.Onthe otherhandif theyare widelyscatteredapoor
correlationcanbe expectedbetweenthem.
(ii) If the pointsare scatteredandtheyreveal noupwardor downwardtrendas inthe case of
(d) thenwe say the variablesare uncorrelated.
(iv) If there is an upwardtrendrisingfromthe lowerlefthandcornerandgoingupwardto the
upperrighthand corner, the correlationobtainedfromthe graphissaidto be positive.Also,
if there isa downward trendfromthe upperlefthandcornerthe correlationobtainedissaid
to be negative.
(v) The graphs shownabove are generallytermedasscatterdiagrams.
The CoefficientofCorrelation (Karl Pearson’smethod)
The Karl Pearson’smethodispopularlyknownasPearson’sCoefficientof correlation.
One of the mostwidelyusedstatisticsisthe coefficientof correlation ‘𝑟’whichmeasuresthe degree of
association betweenthe twovaluesof relatedvariablesgiveninthe dataset.The coefficientof
correlation‘r’isgivenbythe formula
𝑟 =
∑ 𝑋𝑌
𝑛𝜎 𝑥 𝜎 𝑦
=
∑ 𝑋𝑌
√∑ 𝑥2 ∑ 𝑦2
[∵ 𝜎2
𝑥 =
∑ 𝑥2
𝑛
; 𝜎2
𝑦 =
∑ 𝑦2
𝑛
]
Here 𝑋 = ( 𝑥 − 𝑥̅); 𝑌 = ( 𝑦 − 𝑦̅)
𝜎 𝑥 =Standarddeviationof series 𝑥
𝜎 𝑦 =Standarddeviationof series 𝑦
𝑛 = Numberof pairsof observations
𝑟 = The (productmoment) correctioncoefficient
Thismethodisto be appliedonlywhere deviationsof itemsare takenfromactual meanandnot from
the assumedmean.
The valuesof coefficientof correlation ‘𝑟’obtainedfromthe above formulaalwayslies between ±1.
Whenr = +1 it meansthere isa perfectpositivecorrelationbetweenthe variables. Whenr= -1 it means
there isa perfectnegative correlationbetweenthe variables. Howeverif r= 0 there isno relationship
betweenthe variables.
Direct method:-
Substitutingthe valuesof 𝜎 𝑥 and 𝜎 𝑦 inthe above formula,we get
𝑟 =
∑ 𝑋𝑌
√∑ 𝑋2 ∑ 𝑌2
,
or
𝑛 ∑ 𝑋𝑌
√[ 𝑛 ∑ 𝑥2−(∑𝑥)2×{ 𝑛∑ 𝑦2−∑ 𝑥2}]
Example:- Making use of the data summarizedbelow,calculate the coefficientof correlation.
Case A B C D E F G H
x 10 9 6 10 12 13 11 9
y 9 4 6 9 11 13 8 4
Solution:-
Case 𝑥 𝑥 − 10
= 𝑋
𝑋2 𝑦 𝑦 − 8
= 𝑌
𝑌2 𝑋𝑌
A 10 0 0 9 1 1 0
B 9 -4 16 4 -4 16 16
C 6 -1 1 6 -2 4 2
D 10 0 0 9 +1 1 0
E 12 +2 4 11 +3 9 6
F 13 +3 9 13 +5 25 15
G 11 +1 1 8 0 0 0
H 9 -1 1 4 -4 16 4
𝑛 = 8 ∑𝑥 = 80 ∑𝑋 = 0 ∑𝑋2 = 32 ∑𝑦 = 64 ∑𝑌 = 0 ∑𝑌2 = 72 ∑𝑋𝑌 = 43
𝑥̅ =
∑𝑥
𝑛
=
80
8
= 10 , 𝑦̅ =
∑𝑦
𝑛
=
64
8
= 8
𝑟 =
∑ 𝑋𝑌
√∑ 𝑋2 ∑ 𝑌2
=
43
√32 × 72
=
43
√2304
=
43
48
= +0.896
Directmethod:-
Substitutingthe valuesof 𝜎 𝑥 and 𝜎 𝑦 inthe above formula,we get
𝑟 =
∑ 𝑋𝑌
√∑ 𝑋2 ∑ 𝑌2
,
or
𝑛 ∑ 𝑋𝑌
√[ 𝑛 ∑ 𝑥2−(∑𝑥)2×{ 𝑛∑ 𝑦2−∑ 𝑥2}]
Regression
If two variablesare significantlycorrelated,andif there issome theoretical basisfordoingso,itis
possible topredict (estimate) valuesof one variable fromthe other.Thisobservationleadstoavery
importantconceptknownas ‘RegressionAnalysis’.
For example,if we knowthatthe advertisingandsalesare correlatedwe findoutexpectedamountof
salesfora givenadvertisingexpenditure forattainingagivenamountof sales.Similarlyif we knowthe
yieldof rice andrainfall are closelyrelatedwe mayfindoutthe amountof rainis requiredto achieve a
certainproductionfigure.
In general Regressionanalysis meansthe estimationorpredictionof the unknownvalue of one variable
fromthe knownvalue of the othervariable.Itisone of the most importantstatistical toolswhichis
extensivelyusedinalmost all sciences –Natural,Social andPhysical.Itis speciallyusedinbusinessand
economicstostudythe relationshipbetween twoormore variablesthatare relatedcausallyandforthe
estimationof demandandsupplygraphs,costfunctions,productionand consumption functionsandso
on.
Predictionorestimationisone of the majorproblemsinalmostall the spheresof humanactivity.The
estimationorpredictionof future production, consumption,prices,investments,sales,profits,income
etc.are of verygreatimportance tobusinessprofessionals.Similarly,populationestimatesand
Population projections,GNP,Revenue andExpenditure etc.are indispensableforeconomistsand
efficientplanningof aneconomy.
The dictionarymeaningof ‘Regression’isreturningorgoingback.The term‘Regression’isfirstusedby
Sir FrancisGalton(1822-1911) in 1877 while studyingthe relationshipbetweenthe heightof fatherand
sons.Thisterm wasintroducedbyhiminthe paper of “RegressiontowardsMediocrityinhealthcare
structure”.RegressionanalysiswasexplainedbyM.M. Blairas follows:
“Regressionanalysisisamathematical measure of the average relationship betweentwoormore
variablesintermsof the original unitsof the data”.
Line of Regression
If the dotsof the scattereddiagramgenerally,tendstoclusteralonga well-defineddirectionwhich
suggesta linearrelationshipbetweenthe variable x andy,suchline of bestfitfor givendistributionof
dotsis called‘line of regression’.
There are twosuch lines,one givingthe bestpossible meanvaluesof yforeach specifiedvalueof x and
the othergivingthe bestpossible meanvaluesforx forgivingvaluesof y.The formeriscalledthe line of
regressionof yon x and lateris calledthe line of regression of x ony.
Firstconsiderthe line of regressionof yonx.
Let straightline satisfyingthe general trendof ndotsin a scattereddiagrambe
𝑦 = 𝑎 + 𝑏𝑥 ⋯(𝑖)
We have to determinethe constantaand b so that 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 (𝑖) givesforthe each value of x,the best
estimate forthe average value of 𝑦. Thusthe normal equationfora and b are
∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥 ⋯(𝑖𝑖)
∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥2 ⋯(𝑖𝑖𝑖)
Equation (𝑖𝑖)gives
1
𝑛
∑𝑦 = 𝑎 + 𝑏.
1
𝑛
∑𝑥
i.e. 𝑦̅ = 𝑎 + 𝑏𝑥̅
Thisshowsthat ( 𝑥̅, 𝑦̅), i.e.meanof x and y lie on (𝑖).
Shiftingthe originto ( 𝑥̅, 𝑦̅), equation (𝑖𝑖𝑖) takesthe form
∑( 𝑥 − 𝑥̅)( 𝑦 − 𝑦̅) = 𝑎∑( 𝑥 − 𝑥̅) + 𝑏∑( 𝑥 − 𝑥̅)2
But ∑( 𝑥 − 𝑥̅) = 0
∴ 𝑏 =
∑( 𝑥 − 𝑥̅)( 𝑦 − 𝑦̅)
∑( 𝑥 − 𝑥̅)2 =
∑𝑋𝑌
∑𝑋2 =
∑𝑋𝑌
𝑛𝜎 𝑥
2 = 𝑟
𝜎 𝑦
𝜎 𝑥
⋯(∵ 𝑟 =
∑ 𝑋𝑌
𝑛𝜎 𝑥 𝜎𝑦
)
Thus the line of bestfitbecomes
( 𝑦 − 𝑦̅) = 𝑟
𝜎 𝑦
𝜎 𝑥
( 𝑥 − 𝑥̅)
whichisthe equation of line of regression of y on x.Its slope iscalledthe regression coefficientof yon x.
Interchangingx andy, the line of regressionx onyis
( 𝑥 − 𝑥̅) = 𝑟
𝜎 𝑥
𝜎 𝑦
( 𝑦 − 𝑦̅)
Thus the regressioncoefficientyonx = 𝑟
𝜎 𝑦
𝜎 𝑥
and the regressioncoefficientx ony = 𝑟
𝜎 𝑥
𝜎 𝑦
.
Corollary:-
Correlationcoefficientisthe geometricmeanbetweenthe tworegressioncoefficients
𝑟
𝜎 𝑦
𝜎 𝑥
× 𝑟
𝜎 𝑦
𝜎 𝑥
= 𝑟2.
Example:-
From the followingdataobtainthe tworegressionequation andcalculate the regressionequationtaking
deviationof itemsfrommeanof x andy series.
x 6 2 10 4 8
y 9 11 5 8 7
Solution:-
OBTAINING REGRESSION EQUATION
𝑥 𝑦 𝑥𝑦 x2 y2
6 9 54 36 81
2 11 22 4 121
10 5 50 100 25
4 8 32 16 64
8 7 56 64 49
∑𝑥 = 30 ∑𝑦 = 40 ∑𝑥𝑦 = 214 ∑x2 = 220 ∑y2 = 340
Regressionequationof yonx: 𝑦 = 𝑎 + 𝑏𝑥
∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥
∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥2
Substitutingthe values
40 = 5𝑎 + 30𝑏 ⋯(𝑖)
214 = 30𝑎 + 220𝑏 ⋯(𝑖𝑖)
Multiplyingequation (𝑖)by6, 240 = 30𝑎 + 180𝑏 ⋯(𝑖𝑖𝑖)
214 = 30𝑎 + 220𝑏 ⋯(𝑖𝑣)
Subtractingequation (𝑖𝑣)from (𝑖𝑖𝑖)−40𝑏 = 26 𝑜𝑟 𝑏 = −0.65
Substitutingthe value of binequation(𝑖)
40 = 5𝑎 + 30(−0.65) 𝑜𝑟 5𝑎 = 40 + 19.5 = 59.5 𝑜𝑟 𝑎 = 11.9
Puttingthe valuesof a and b in equation,the regressionof yonx is = 11.9 − 0.65𝑥 .
Regressionequationof x ony: 𝑥 = 𝑎 + 𝑏𝑦
∑𝑥 = 𝑛𝑎 + 𝑏∑𝑦
∑𝑥𝑦 = 𝑎∑𝑦 + 𝑏∑𝑦2
30 = 5𝑎 + 40𝑏 ⋯(𝑖)
214 = 40𝑎 + 340𝑏 ⋯(𝑖𝑖)
Multiplyingequation (𝑖)by 8: 240 = 40𝑎 + 320𝑏 ⋯(𝑖𝑖𝑖)
214 = 40𝑎 + 340𝑏 ⋯(𝑖𝑣)
From equation (𝑖𝑖𝑖) and(𝑖𝑣) − 20𝑏 = 26 𝑜𝑟 𝑏 = −13
Substitutingthe value of binequation (𝑖);
30 = 5𝑎 + 40(−1.3) 𝑜𝑟 5𝑎 = 30 + 52 = 82 𝑎 = 16.4
Puttingthe value of a and b inthe equation,the regressionlineof x ony is = 16.4 − 1.3𝑦 .
CALCULATION OF REGRESSION EQUATIONS
x 𝑥 − 𝑥̅ = 𝑋 𝑋2 y 𝑦 − 𝑦̅ = 𝑌 𝑌2 𝑋𝑌
6 0 0 9 +1 1 0
2 -4 16 11 +3 9 -12
10 +4 16 5 -3 9 -12
4 -2 4 8 0 0 0
8 +2 4 7 -1 1 -2
∑𝑥 = 30 ∑𝑋 = 0 ∑𝑋2 = 40 ∑𝑦 = 40 ∑𝑌 = 0 ∑𝑌2 = 20 ∑𝑋𝑌 = −26
𝑥̅ =
30
5
= 6 ; 𝑦̅ =
40
5
= 8
The line of regressionx ony is
( 𝑥 − 𝑥̅) = 𝑟
𝜎 𝑥
𝜎 𝑦
( 𝑦 − 𝑦̅)
𝑟
𝜎 𝑥
𝜎 𝑦
=
∑𝑋𝑌
∑𝑌2 =
−26
20
= −1.3
𝑥 − 6 = −1.3( 𝑦 − 8) = −1.3𝑦 + 10.4
𝑥 = −1.3𝑦 + 10.4 + 6 = 16.4 − 1.3𝑦
The line of regressionyonx is
( 𝑦 − 𝑦̅) = 𝑟
𝜎 𝑦
𝜎 𝑥
( 𝑥 − 𝑥̅)
𝑟
𝜎 𝑦
𝜎 𝑥
=
∑𝑋𝑌
∑𝑋2 =
−26
40
= −0.65
𝑦 − 8 = −0.65( 𝑥 − 6) = −0.65𝑥 + 3.9
𝑦 = −0.65𝑥 + 3.9 + 8 = 11.9 − 0.65𝑥
Thus we findthe same answerwhatobtainedearlier.However,the calculationsare verymuch
simplifiedwithoutthe use of the normal equation.
Experiment:-
An experimentisa treatmenton a groupof objectsor subjectsinthe interestof observingthe response.
Treatment:-
In experiments,atreatmentissomethingthatresearchersadministertoexperimental units.
For example,acornfieldisdividedintofour,eachpartis'treated'witha differentfertilizertosee which
producesthe mostcorn; a teacherpracticesdifferentteachingmethodsondifferentgroupsinherclass
to see whichyieldsthe bestresults;adoctortreats a patientwithaskinconditionwithdifferentcreams
to see whichismosteffective.Treatmentsare administeredtoexperimental unitsby'level',where level
impliesamountormagnitude.Forexample,if the experimental unitsweregiven5mg,10mg,15mg of a
medication,those amountswouldbe three levelsof the treatment.
(Definition taken fromValerie J. Easton and John H.McColl's StatisticsGlossary v1.1)
Factor:-
A factorof an experimentisacontrolledindependentvariable;avariable whose levelsare setbythe
experimenter.
A factor isa general type orcategory of treatments.Differenttreatmentsconstitute differentlevelsof a
factor.
For example,threedifferentgroupsof runnersare subjectedtodifferenttrainingmethods.The runners
are the experimental units,the trainingmethods,the treatments;where the three typesof training
methodsconstitute three levelsof the factor'type of training'.
(Definition taken fromValerie J. Easton and John H.McColl's StatisticsGlossary v1.1)
Experimental Design
The analysisof data generatedfromanexperiment.Asittakestime toorganize the experimentproperly
to ensure thatthe right type of data, andenoughof it, isavailable toanswerthe questionsof interestas
clearlyandefficientlyaspossible.Thisprocessiscalled experimental design.
There are six conceptsof experimentaldesign:
(i) IndependentVariable
(ii) DependentVariable
(iii) Constant
(iv) Control group
(v) ExperimentalGroup
(vi) Repeatedtrials
Variable:-Variable isthatchange duringthe experiment.
IndependentVariable:- IndependentVariableisthatchange on purpose bythe experimenter. Itisalso
knownas cause,stimulus,reasonormanipulated variable. Itisthe “if” part of the hypothesis.
DependentVariable:- The variable thatrespondtothe independentvariableiscalledDependent
Variable Itisknownas effect,resultorrespondingvariable.Itisthe thenpartof the hypothesis.
Constant:-All factorswhichare not allowedto change duringthe experimentsare calledconstant.
Control Group:- Control groupis the groupor the standardto whicheverythingiscompared.
Experimental Group:- The experimentalgroupisthe groupwhichistestedwiththe Independent
Variable.Eachtestgrouphas onlyone factor differentfromthe othergroup that isthe independent
variable.
Repeatedtrials:- Repeatedtrialsisthe numberof timesthe experimentisrepeated.The more timeswe
repeatthe experiment,we will getthe more validresult.
The IVCDV (IndependentVariable ConstantDependentVariable) chartisusedtodesignthe experiment.
IV Constant DV
Fertilizer
0 drop
2 drop
4 drop
6 drop
Amounts of water
Types of soil
Amount of soil
Type of plant
Type of planter
Size of planter
Type of light
Location
Plant growth
The Variable isthatchange duringthe experiment. Here the dropof fertilizer0,2,4or 6 is variedby
the experimenter.Plantgrowthisthe dependentvariable thatdependsonthe dropof fertilizer,So
it isthe dependentvariable.The othersare constants.
Amountsof water,Typesof soil,Amountof soil,Type of plant,Type of planter,Size of planter,Type of
light,locationare constants.
If we wantto testthe soil insteadof fertilizerthanfertilizerbecomethe constantandtype of soil
become the independentvariable.
The plant growththat we can observe here iscalled
(i) the result(of addingfertilizer)
(ii) the response (of addingfertilizer)
(iii) the effect(of addingfertilizer)
Completely Randomized Designs:-
Completely randomized designs are the simplest in which the treatments are assigned to the
experimental units completely at random. This allows every experimental unit, i.e., plot, animal,
soil sample, etc., to have an equal probability of receiving a treatment.
REFERRENCE
1. Statistical Method by S.C. Gupta
2. en.wikipedia.org/wiki/Statistics
3. www.mathsisfun.com/data/probability.html
4. www.stats.gla.ac.uk/steps/glossary/sampling.html
5. A First Course in statistics with application by A K P C Swain
6. A test book of agricultural Statistics by R. Rangaswamy
7. Fundamental of statistics, Vol.-I and II by A.M. Goon, M.K. Gupta and
B. Dasgupta
8. https://www.youtube.com/watch?feature=player_detailpage&v=UN206
cSaF0k#t=7
9. Statistics Glossary v1.1
Unit 5 Correlation

More Related Content

PPTX
26-Sajid-Ahmed.pptx
PPSX
Finite Element Analysis -Dr.P.Parandaman
DOCX
Bjmc i, igp, unit-iv, public opinion
PPT
Corr-and-Regress (1).ppt
PPT
Corr-and-Regress.ppt
PPT
Cr-and-Regress.ppt
PPT
Correlation & Regression for Statistics Social Science
PPT
Corr-and-Regress.ppt
26-Sajid-Ahmed.pptx
Finite Element Analysis -Dr.P.Parandaman
Bjmc i, igp, unit-iv, public opinion
Corr-and-Regress (1).ppt
Corr-and-Regress.ppt
Cr-and-Regress.ppt
Correlation & Regression for Statistics Social Science
Corr-and-Regress.ppt

Similar to Unit 5 Correlation (20)

PPT
Corr-and-Regress.ppt
PPT
Corr-and-Regress.ppt
PPT
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
PPT
Corr And Regress
DOCX
Course pack unit 5
PPTX
Regression.pptx
PDF
Bio-L8- Correlation and Regression Analysis.pdf
PDF
Correlation and Regression
PPTX
Regression and correlation in statistics
PDF
PDF
Introduction to correlation and regression analysis
PPTX
correlation.final.ppt (1).pptx
PDF
Simple Linear Regression
PPTX
4_Correlation and and Regression (1).pptx
PPTX
UNIT 4.pptx
PPTX
correlation.pptx
PPTX
PPTX
REGRESSION ANALYSIS THEORY EXPLAINED HERE
PPT
Correlation and Regression analysis .ppt
PPT
Chap04 01
Corr-and-Regress.ppt
Corr-and-Regress.ppt
Correlation by Neeraj Bhandari ( Surkhet.Nepal )
Corr And Regress
Course pack unit 5
Regression.pptx
Bio-L8- Correlation and Regression Analysis.pdf
Correlation and Regression
Regression and correlation in statistics
Introduction to correlation and regression analysis
correlation.final.ppt (1).pptx
Simple Linear Regression
4_Correlation and and Regression (1).pptx
UNIT 4.pptx
correlation.pptx
REGRESSION ANALYSIS THEORY EXPLAINED HERE
Correlation and Regression analysis .ppt
Chap04 01
Ad

More from Rai University (20)

PDF
Brochure Rai University
PPT
Mm unit 4point2
PPT
Mm unit 4point1
PPT
Mm unit 4point3
PPT
Mm unit 3point2
PPTX
Mm unit 3point1
PPTX
Mm unit 2point2
PPT
Mm unit 2 point 1
PPT
Mm unit 1point3
PPT
Mm unit 1point2
PPTX
Mm unit 1point1
DOCX
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
PPTX
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
PPTX
Bsc agri 2 pae u-4.3 public expenditure
PPTX
Bsc agri 2 pae u-4.2 public finance
PPS
Bsc agri 2 pae u-4.1 introduction
PPT
Bsc agri 2 pae u-3.3 inflation
PPTX
Bsc agri 2 pae u-3.2 introduction to macro economics
PPTX
Bsc agri 2 pae u-3.1 marketstructure
PPTX
Bsc agri 2 pae u-3 perfect-competition
Brochure Rai University
Mm unit 4point2
Mm unit 4point1
Mm unit 4point3
Mm unit 3point2
Mm unit 3point1
Mm unit 2point2
Mm unit 2 point 1
Mm unit 1point3
Mm unit 1point2
Mm unit 1point1
Bdft ii, tmt, unit-iii, dyeing & types of dyeing,
Bsc agri 2 pae u-4.4 publicrevenue-presentation-130208082149-phpapp02
Bsc agri 2 pae u-4.3 public expenditure
Bsc agri 2 pae u-4.2 public finance
Bsc agri 2 pae u-4.1 introduction
Bsc agri 2 pae u-3.3 inflation
Bsc agri 2 pae u-3.2 introduction to macro economics
Bsc agri 2 pae u-3.1 marketstructure
Bsc agri 2 pae u-3 perfect-competition
Ad

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Complications of Minimal Access Surgery at WLH
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Business Ethics Teaching Materials for college
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
Institutional Correction lecture only . . .
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
01-Introduction-to-Information-Management.pdf
PPTX
master seminar digital applications in india
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pre independence Education in Inndia.pdf
Microbial disease of the cardiovascular and lymphatic systems
O7-L3 Supply Chain Operations - ICLT Program
Complications of Minimal Access Surgery at WLH
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Business Ethics Teaching Materials for college
Supply Chain Operations Speaking Notes -ICLT Program
Institutional Correction lecture only . . .
FourierSeries-QuestionsWithAnswers(Part-A).pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
102 student loan defaulters named and shamed – Is someone you know on the list?
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
TR - Agricultural Crops Production NC III.pdf
01-Introduction-to-Information-Management.pdf
master seminar digital applications in india
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx

Unit 5 Correlation

  • 1. Unit-5 Correlation :- Suppose we have aset of 30 studentsina class andwe want to measure the heightsandweightsof all the students.We observe thateachindividual(unit) of the setassumestwovalues –one relatingtothe heightandthe otherto the weight.Suchadistributionin whicheachindividual orunitof the setis made up of two valuesiscalledabivariate distribution. Some examplesof bivariate distributionare (i) In a classof 60 studentsthe seriesof marksobtainedintwosubjectsbyall of them. (ii) The seriesof salesrevenue andadvertisingexpenditureof twocompaniesinaparticular year. (iii) The seriesof agesof husbandsandwivesinasample of selectedmarriedcouples. Thus ina bivariate distribution,we are givenasetof pairsof observations,whereineachpairrepresents the valuesof twovariables. In a bivariate distribution,we are interestedinfindingarelationship(if itexists) betweenthe two variablesunderstudy.The conceptof ‘correlation’isastatistical tool whichstudiesthe relationship betweentwovariablesandCorrelationAnalysisinvolvesvariousmethodsandtechniquesusedfor studyingandmeasuringthe extentof the relationshipbetweenthe twovariables. Definition:-Twovariablesare saidtobe incorrelationif the change inone of the variablesresultsin a change in the othervariable. Types of Correlation:- Varioustypesof correlation are positive,negative,nocorrelation,perfect,strongandweakcorrelation. Positive Correlation Positive correlationoccurswhenanincrease inone variable increasesthe valueinanother. The line correspondingtothe scatterplotisan increasingline. Negative Correlation Negative correlationoccurswhenanincrease inone variable decreasesthe value of another. The line correspondingtothe scatterplotisa decreasingline.
  • 2. No Correlation No correlationoccurswhenthere isnolineardependencybetweenthe variables. PerfectCorrelation Perfectcorrelationoccurswhenthere isafuncional dependencybetweenthe variables. In thiscase all the pointsare ina straightline. Strong Correlation A correlationisstrongerthe closerthe pointsare locatedtoone anotheron the line. WeakCorrelation A correlationisweakerthe fartherapart the pointsare locatedto one anotheronthe line. Some examplesof seriesof positive correlationare: (i) Heightsandweights; (ii) Householdincome andexpenditure; (iii) Price and supplyof commodities; (iv) Amountof rainfall andyieldof crops. Correlationbetweentwovariablesissaidtobe negative orinverse if the variablesdeviateinopposite direction.Thatis,if the increase inthe variablesdeviate inopposite direction.Thatis,if increase (or decrease) inthe valuesof one variable resultsonanaverage,incorrespondingdecrease (orincrease) in the valuesof othervariable. Some examplesof seriesof negative correlationare: (i) Volume andpressure of perfectgas; (ii) Currentand resistance [keepingthe voltage constant](𝑅 = 𝑉 𝐼 ); (iii) Price and demandof goods.
  • 3. Note: (i) If the pointsare veryclose to eachother,a fairlygoodamountof correlationcanbe expectedbetweenthe twovariables.Onthe otherhandif theyare widelyscatteredapoor correlationcanbe expectedbetweenthem. (ii) If the pointsare scatteredandtheyreveal noupwardor downwardtrendas inthe case of (d) thenwe say the variablesare uncorrelated. (iv) If there is an upwardtrendrisingfromthe lowerlefthandcornerandgoingupwardto the upperrighthand corner, the correlationobtainedfromthe graphissaidto be positive.Also, if there isa downward trendfromthe upperlefthandcornerthe correlationobtainedissaid to be negative. (v) The graphs shownabove are generallytermedasscatterdiagrams. The CoefficientofCorrelation (Karl Pearson’smethod) The Karl Pearson’smethodispopularlyknownasPearson’sCoefficientof correlation. One of the mostwidelyusedstatisticsisthe coefficientof correlation ‘𝑟’whichmeasuresthe degree of association betweenthe twovaluesof relatedvariablesgiveninthe dataset.The coefficientof correlation‘r’isgivenbythe formula 𝑟 = ∑ 𝑋𝑌 𝑛𝜎 𝑥 𝜎 𝑦 = ∑ 𝑋𝑌 √∑ 𝑥2 ∑ 𝑦2 [∵ 𝜎2 𝑥 = ∑ 𝑥2 𝑛 ; 𝜎2 𝑦 = ∑ 𝑦2 𝑛 ] Here 𝑋 = ( 𝑥 − 𝑥̅); 𝑌 = ( 𝑦 − 𝑦̅) 𝜎 𝑥 =Standarddeviationof series 𝑥 𝜎 𝑦 =Standarddeviationof series 𝑦 𝑛 = Numberof pairsof observations 𝑟 = The (productmoment) correctioncoefficient Thismethodisto be appliedonlywhere deviationsof itemsare takenfromactual meanandnot from the assumedmean. The valuesof coefficientof correlation ‘𝑟’obtainedfromthe above formulaalwayslies between ±1. Whenr = +1 it meansthere isa perfectpositivecorrelationbetweenthe variables. Whenr= -1 it means there isa perfectnegative correlationbetweenthe variables. Howeverif r= 0 there isno relationship betweenthe variables. Direct method:- Substitutingthe valuesof 𝜎 𝑥 and 𝜎 𝑦 inthe above formula,we get 𝑟 = ∑ 𝑋𝑌 √∑ 𝑋2 ∑ 𝑌2 , or 𝑛 ∑ 𝑋𝑌 √[ 𝑛 ∑ 𝑥2−(∑𝑥)2×{ 𝑛∑ 𝑦2−∑ 𝑥2}] Example:- Making use of the data summarizedbelow,calculate the coefficientof correlation. Case A B C D E F G H
  • 4. x 10 9 6 10 12 13 11 9 y 9 4 6 9 11 13 8 4 Solution:- Case 𝑥 𝑥 − 10 = 𝑋 𝑋2 𝑦 𝑦 − 8 = 𝑌 𝑌2 𝑋𝑌 A 10 0 0 9 1 1 0 B 9 -4 16 4 -4 16 16 C 6 -1 1 6 -2 4 2 D 10 0 0 9 +1 1 0 E 12 +2 4 11 +3 9 6 F 13 +3 9 13 +5 25 15 G 11 +1 1 8 0 0 0 H 9 -1 1 4 -4 16 4 𝑛 = 8 ∑𝑥 = 80 ∑𝑋 = 0 ∑𝑋2 = 32 ∑𝑦 = 64 ∑𝑌 = 0 ∑𝑌2 = 72 ∑𝑋𝑌 = 43 𝑥̅ = ∑𝑥 𝑛 = 80 8 = 10 , 𝑦̅ = ∑𝑦 𝑛 = 64 8 = 8 𝑟 = ∑ 𝑋𝑌 √∑ 𝑋2 ∑ 𝑌2 = 43 √32 × 72 = 43 √2304 = 43 48 = +0.896 Directmethod:- Substitutingthe valuesof 𝜎 𝑥 and 𝜎 𝑦 inthe above formula,we get 𝑟 = ∑ 𝑋𝑌 √∑ 𝑋2 ∑ 𝑌2 , or 𝑛 ∑ 𝑋𝑌 √[ 𝑛 ∑ 𝑥2−(∑𝑥)2×{ 𝑛∑ 𝑦2−∑ 𝑥2}] Regression If two variablesare significantlycorrelated,andif there issome theoretical basisfordoingso,itis possible topredict (estimate) valuesof one variable fromthe other.Thisobservationleadstoavery importantconceptknownas ‘RegressionAnalysis’. For example,if we knowthatthe advertisingandsalesare correlatedwe findoutexpectedamountof salesfora givenadvertisingexpenditure forattainingagivenamountof sales.Similarlyif we knowthe yieldof rice andrainfall are closelyrelatedwe mayfindoutthe amountof rainis requiredto achieve a certainproductionfigure. In general Regressionanalysis meansthe estimationorpredictionof the unknownvalue of one variable fromthe knownvalue of the othervariable.Itisone of the most importantstatistical toolswhichis extensivelyusedinalmost all sciences –Natural,Social andPhysical.Itis speciallyusedinbusinessand economicstostudythe relationshipbetween twoormore variablesthatare relatedcausallyandforthe estimationof demandandsupplygraphs,costfunctions,productionand consumption functionsandso on. Predictionorestimationisone of the majorproblemsinalmostall the spheresof humanactivity.The estimationorpredictionof future production, consumption,prices,investments,sales,profits,income etc.are of verygreatimportance tobusinessprofessionals.Similarly,populationestimatesand
  • 5. Population projections,GNP,Revenue andExpenditure etc.are indispensableforeconomistsand efficientplanningof aneconomy. The dictionarymeaningof ‘Regression’isreturningorgoingback.The term‘Regression’isfirstusedby Sir FrancisGalton(1822-1911) in 1877 while studyingthe relationshipbetweenthe heightof fatherand sons.Thisterm wasintroducedbyhiminthe paper of “RegressiontowardsMediocrityinhealthcare structure”.RegressionanalysiswasexplainedbyM.M. Blairas follows: “Regressionanalysisisamathematical measure of the average relationship betweentwoormore variablesintermsof the original unitsof the data”. Line of Regression If the dotsof the scattereddiagramgenerally,tendstoclusteralonga well-defineddirectionwhich suggesta linearrelationshipbetweenthe variable x andy,suchline of bestfitfor givendistributionof dotsis called‘line of regression’. There are twosuch lines,one givingthe bestpossible meanvaluesof yforeach specifiedvalueof x and the othergivingthe bestpossible meanvaluesforx forgivingvaluesof y.The formeriscalledthe line of regressionof yon x and lateris calledthe line of regression of x ony. Firstconsiderthe line of regressionof yonx. Let straightline satisfyingthe general trendof ndotsin a scattereddiagrambe 𝑦 = 𝑎 + 𝑏𝑥 ⋯(𝑖) We have to determinethe constantaand b so that 𝑒𝑞𝑢𝑎𝑡𝑖𝑜𝑛 (𝑖) givesforthe each value of x,the best estimate forthe average value of 𝑦. Thusthe normal equationfora and b are ∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥 ⋯(𝑖𝑖) ∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥2 ⋯(𝑖𝑖𝑖) Equation (𝑖𝑖)gives 1 𝑛 ∑𝑦 = 𝑎 + 𝑏. 1 𝑛 ∑𝑥 i.e. 𝑦̅ = 𝑎 + 𝑏𝑥̅ Thisshowsthat ( 𝑥̅, 𝑦̅), i.e.meanof x and y lie on (𝑖). Shiftingthe originto ( 𝑥̅, 𝑦̅), equation (𝑖𝑖𝑖) takesthe form ∑( 𝑥 − 𝑥̅)( 𝑦 − 𝑦̅) = 𝑎∑( 𝑥 − 𝑥̅) + 𝑏∑( 𝑥 − 𝑥̅)2 But ∑( 𝑥 − 𝑥̅) = 0 ∴ 𝑏 = ∑( 𝑥 − 𝑥̅)( 𝑦 − 𝑦̅) ∑( 𝑥 − 𝑥̅)2 = ∑𝑋𝑌 ∑𝑋2 = ∑𝑋𝑌 𝑛𝜎 𝑥 2 = 𝑟 𝜎 𝑦 𝜎 𝑥 ⋯(∵ 𝑟 = ∑ 𝑋𝑌 𝑛𝜎 𝑥 𝜎𝑦 ) Thus the line of bestfitbecomes ( 𝑦 − 𝑦̅) = 𝑟 𝜎 𝑦 𝜎 𝑥 ( 𝑥 − 𝑥̅) whichisthe equation of line of regression of y on x.Its slope iscalledthe regression coefficientof yon x. Interchangingx andy, the line of regressionx onyis
  • 6. ( 𝑥 − 𝑥̅) = 𝑟 𝜎 𝑥 𝜎 𝑦 ( 𝑦 − 𝑦̅) Thus the regressioncoefficientyonx = 𝑟 𝜎 𝑦 𝜎 𝑥 and the regressioncoefficientx ony = 𝑟 𝜎 𝑥 𝜎 𝑦 . Corollary:- Correlationcoefficientisthe geometricmeanbetweenthe tworegressioncoefficients 𝑟 𝜎 𝑦 𝜎 𝑥 × 𝑟 𝜎 𝑦 𝜎 𝑥 = 𝑟2. Example:- From the followingdataobtainthe tworegressionequation andcalculate the regressionequationtaking deviationof itemsfrommeanof x andy series. x 6 2 10 4 8 y 9 11 5 8 7 Solution:- OBTAINING REGRESSION EQUATION 𝑥 𝑦 𝑥𝑦 x2 y2 6 9 54 36 81 2 11 22 4 121 10 5 50 100 25 4 8 32 16 64 8 7 56 64 49 ∑𝑥 = 30 ∑𝑦 = 40 ∑𝑥𝑦 = 214 ∑x2 = 220 ∑y2 = 340 Regressionequationof yonx: 𝑦 = 𝑎 + 𝑏𝑥 ∑𝑦 = 𝑛𝑎 + 𝑏∑𝑥 ∑𝑥𝑦 = 𝑎∑𝑥 + 𝑏∑𝑥2 Substitutingthe values 40 = 5𝑎 + 30𝑏 ⋯(𝑖) 214 = 30𝑎 + 220𝑏 ⋯(𝑖𝑖) Multiplyingequation (𝑖)by6, 240 = 30𝑎 + 180𝑏 ⋯(𝑖𝑖𝑖) 214 = 30𝑎 + 220𝑏 ⋯(𝑖𝑣) Subtractingequation (𝑖𝑣)from (𝑖𝑖𝑖)−40𝑏 = 26 𝑜𝑟 𝑏 = −0.65 Substitutingthe value of binequation(𝑖) 40 = 5𝑎 + 30(−0.65) 𝑜𝑟 5𝑎 = 40 + 19.5 = 59.5 𝑜𝑟 𝑎 = 11.9 Puttingthe valuesof a and b in equation,the regressionof yonx is = 11.9 − 0.65𝑥 . Regressionequationof x ony: 𝑥 = 𝑎 + 𝑏𝑦 ∑𝑥 = 𝑛𝑎 + 𝑏∑𝑦 ∑𝑥𝑦 = 𝑎∑𝑦 + 𝑏∑𝑦2 30 = 5𝑎 + 40𝑏 ⋯(𝑖) 214 = 40𝑎 + 340𝑏 ⋯(𝑖𝑖) Multiplyingequation (𝑖)by 8: 240 = 40𝑎 + 320𝑏 ⋯(𝑖𝑖𝑖)
  • 7. 214 = 40𝑎 + 340𝑏 ⋯(𝑖𝑣) From equation (𝑖𝑖𝑖) and(𝑖𝑣) − 20𝑏 = 26 𝑜𝑟 𝑏 = −13 Substitutingthe value of binequation (𝑖); 30 = 5𝑎 + 40(−1.3) 𝑜𝑟 5𝑎 = 30 + 52 = 82 𝑎 = 16.4 Puttingthe value of a and b inthe equation,the regressionlineof x ony is = 16.4 − 1.3𝑦 . CALCULATION OF REGRESSION EQUATIONS x 𝑥 − 𝑥̅ = 𝑋 𝑋2 y 𝑦 − 𝑦̅ = 𝑌 𝑌2 𝑋𝑌 6 0 0 9 +1 1 0 2 -4 16 11 +3 9 -12 10 +4 16 5 -3 9 -12 4 -2 4 8 0 0 0 8 +2 4 7 -1 1 -2 ∑𝑥 = 30 ∑𝑋 = 0 ∑𝑋2 = 40 ∑𝑦 = 40 ∑𝑌 = 0 ∑𝑌2 = 20 ∑𝑋𝑌 = −26 𝑥̅ = 30 5 = 6 ; 𝑦̅ = 40 5 = 8 The line of regressionx ony is ( 𝑥 − 𝑥̅) = 𝑟 𝜎 𝑥 𝜎 𝑦 ( 𝑦 − 𝑦̅) 𝑟 𝜎 𝑥 𝜎 𝑦 = ∑𝑋𝑌 ∑𝑌2 = −26 20 = −1.3 𝑥 − 6 = −1.3( 𝑦 − 8) = −1.3𝑦 + 10.4 𝑥 = −1.3𝑦 + 10.4 + 6 = 16.4 − 1.3𝑦 The line of regressionyonx is ( 𝑦 − 𝑦̅) = 𝑟 𝜎 𝑦 𝜎 𝑥 ( 𝑥 − 𝑥̅) 𝑟 𝜎 𝑦 𝜎 𝑥 = ∑𝑋𝑌 ∑𝑋2 = −26 40 = −0.65 𝑦 − 8 = −0.65( 𝑥 − 6) = −0.65𝑥 + 3.9 𝑦 = −0.65𝑥 + 3.9 + 8 = 11.9 − 0.65𝑥 Thus we findthe same answerwhatobtainedearlier.However,the calculationsare verymuch simplifiedwithoutthe use of the normal equation. Experiment:- An experimentisa treatmenton a groupof objectsor subjectsinthe interestof observingthe response. Treatment:- In experiments,atreatmentissomethingthatresearchersadministertoexperimental units. For example,acornfieldisdividedintofour,eachpartis'treated'witha differentfertilizertosee which producesthe mostcorn; a teacherpracticesdifferentteachingmethodsondifferentgroupsinherclass to see whichyieldsthe bestresults;adoctortreats a patientwithaskinconditionwithdifferentcreams to see whichismosteffective.Treatmentsare administeredtoexperimental unitsby'level',where level impliesamountormagnitude.Forexample,if the experimental unitsweregiven5mg,10mg,15mg of a
  • 8. medication,those amountswouldbe three levelsof the treatment. (Definition taken fromValerie J. Easton and John H.McColl's StatisticsGlossary v1.1) Factor:- A factorof an experimentisacontrolledindependentvariable;avariable whose levelsare setbythe experimenter. A factor isa general type orcategory of treatments.Differenttreatmentsconstitute differentlevelsof a factor. For example,threedifferentgroupsof runnersare subjectedtodifferenttrainingmethods.The runners are the experimental units,the trainingmethods,the treatments;where the three typesof training methodsconstitute three levelsof the factor'type of training'. (Definition taken fromValerie J. Easton and John H.McColl's StatisticsGlossary v1.1) Experimental Design The analysisof data generatedfromanexperiment.Asittakestime toorganize the experimentproperly to ensure thatthe right type of data, andenoughof it, isavailable toanswerthe questionsof interestas clearlyandefficientlyaspossible.Thisprocessiscalled experimental design. There are six conceptsof experimentaldesign: (i) IndependentVariable (ii) DependentVariable (iii) Constant (iv) Control group (v) ExperimentalGroup (vi) Repeatedtrials Variable:-Variable isthatchange duringthe experiment. IndependentVariable:- IndependentVariableisthatchange on purpose bythe experimenter. Itisalso knownas cause,stimulus,reasonormanipulated variable. Itisthe “if” part of the hypothesis. DependentVariable:- The variable thatrespondtothe independentvariableiscalledDependent Variable Itisknownas effect,resultorrespondingvariable.Itisthe thenpartof the hypothesis. Constant:-All factorswhichare not allowedto change duringthe experimentsare calledconstant. Control Group:- Control groupis the groupor the standardto whicheverythingiscompared. Experimental Group:- The experimentalgroupisthe groupwhichistestedwiththe Independent Variable.Eachtestgrouphas onlyone factor differentfromthe othergroup that isthe independent variable. Repeatedtrials:- Repeatedtrialsisthe numberof timesthe experimentisrepeated.The more timeswe repeatthe experiment,we will getthe more validresult.
  • 9. The IVCDV (IndependentVariable ConstantDependentVariable) chartisusedtodesignthe experiment. IV Constant DV Fertilizer 0 drop 2 drop 4 drop 6 drop Amounts of water Types of soil Amount of soil Type of plant Type of planter Size of planter Type of light Location Plant growth The Variable isthatchange duringthe experiment. Here the dropof fertilizer0,2,4or 6 is variedby the experimenter.Plantgrowthisthe dependentvariable thatdependsonthe dropof fertilizer,So it isthe dependentvariable.The othersare constants. Amountsof water,Typesof soil,Amountof soil,Type of plant,Type of planter,Size of planter,Type of light,locationare constants. If we wantto testthe soil insteadof fertilizerthanfertilizerbecomethe constantandtype of soil become the independentvariable.
  • 10. The plant growththat we can observe here iscalled (i) the result(of addingfertilizer) (ii) the response (of addingfertilizer) (iii) the effect(of addingfertilizer) Completely Randomized Designs:- Completely randomized designs are the simplest in which the treatments are assigned to the experimental units completely at random. This allows every experimental unit, i.e., plot, animal, soil sample, etc., to have an equal probability of receiving a treatment. REFERRENCE 1. Statistical Method by S.C. Gupta 2. en.wikipedia.org/wiki/Statistics 3. www.mathsisfun.com/data/probability.html 4. www.stats.gla.ac.uk/steps/glossary/sampling.html 5. A First Course in statistics with application by A K P C Swain 6. A test book of agricultural Statistics by R. Rangaswamy 7. Fundamental of statistics, Vol.-I and II by A.M. Goon, M.K. Gupta and B. Dasgupta 8. https://www.youtube.com/watch?feature=player_detailpage&v=UN206 cSaF0k#t=7 9. Statistics Glossary v1.1