SlideShare a Scribd company logo
Applied Techniques for Economists
Josphat Omanga.
Copenhagen Business College.
1
Quantitative Methods, 4th edition, Swift & Piff
Describing Data Toolkit
Displays for numerical data:
• Frequency charts
• Relative frequency
• Histogram
Displays for categorical data:
• Bar charts
• Pie charts
• Contingency tables
Quantitative (numerical) data
• Mean
• Median
• Mode
• Range
• Variance
• Standard deviation
• Quartiles
• Index
Summarising Data Toolkit
Recap
Location
Spread
2
Main Objectives
Scatter Plot
Measures of Association
a) Covariance
b) Correlation
Contingency Table & Dependency/Independency
3
Quantitative Methods, 4th edition, Swift & Piff
Why analyse two (or more) variables together ?
The main aim is to examine whether there is any pattern between
the responses of two or more variables:
- To understand the relationship better;
- To impact one variable by changing the other;
- To forecast;
Depending on the type of variable:
• Quantitative (numerical): scatter plot, covariance, correlation
coefficient
• Qualitative (categorical): contingency table and dependency
4
Quantitative Methods, 4th edition, Swift & Piff
Scatter plot
As briefly mentioned before in Data and Graphing Data, a scatter plot can
show the relationship between two variables.
Returns of two shares in 9 consecutive months
5
Quantitative Methods, 4th edition, Swift & Piff
Scatter plot examples
6
Quantitative Methods, 4th edition, Swift & Piff
Scatter plot examples
7
Quantitative Methods, 4th edition, Swift & Piff
Scatter plot
The owner of an ice cream store wants to
examine the relationship between daily sales
and atmospheric temperature.
A sample of 25 consecutive days is selected and
the data of consumption of ice cream per head
(in pints) and average temperature (in
Fahrenheit) is recorded as follows.
8
Scatter plot
• Does a relationship between daily sales and temperature exist?
• What kind of relationship exists?
• How to measure this relationship (if it exists)?
The distribution of the “cloud”
of data indicates a positive
relationship.
How to be more precise on
measuring this relationship?
9
Measures of Association
Covariance
• Covariance is a measure of the linear relationship between two variables.
• A positive value indicates a direct or increasing linear relationship, while a
negative value indicates a decreasing linear relationship.
The equation for calculating the covariance is defined as:
Note that this is the formula for sample covariance, and for population covariance it
is divided by n. 10
Quantitative Methods, 4th edition, Swift & Piff
Measures of Association
Covariance
e.g. Data (no. of observations n=5 )
(smoking years) x: 0 5 10 15 20
(lung capacity) y: 45 42 33 31 29
𝑥 =
0 + 5 + 10 + 15 + 20
5
= 10
𝑦 =
45 + 42 + 33 + 31 + 29
5
= 36
𝐶𝑜𝑣(𝑥, 𝑦) =
0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36)
𝑛 − 1
=
−10 ∗9 + −5 ∗6+0∗ −3 +5∗ −5 +10∗(−7)
5−1
=
−90−30+0−25−70
4
=
−215
4
= −53.75
11
Quantitative Methods, 4th edition, Swift & Piff
Measures of Association
Covariance
e.g. Data (no. of observations n=5 )
(smoking years) x: 0 5 10 15 20
(lung capacity) y: 45 42 33 31 29
𝐶𝑜𝑣(𝑥, 𝑦) = −53.75
12
Practice question 1:
13
Person Age Days off work last year
A 19 21
B 21 18
C 32 15
D 27 17
E 45 8
The table below is the sickness records:
(a) Calculate the covariance of two variables according to question above.
- Compute it manually using the formula
- Compute it in Excel using spreadsheet.
- Step by step
- Using Excel formula
Quantitative Methods, 4th edition, Swift & Piff
Measures of Association
Covariance Properties
Note that when cov(x, y) = 0 , it indicates that these two variables are independent
(no relationship), whilst positive/negative relationship indicates dependent.
14
Measures of Association
Correlation Coefficient
Covariance is a very useful index to verify if two variables are independent or not.
However, it fails to indicate how strong is this relationship.
Correlation coefficient is kind of a normalized index to measure the association.
The equation for calculating the correlation coefficient is defined as:
same as
15
Measures of Association
Correlation Coefficient VS Covariance
The Correlation Coefficient has advantages over covariance
for determining strengths of relationships:
• Covariance can be any number with corresponding unit, while a
correlation coefficient is unit free and its range is limited between -1
to 1.
• Correlation is comparable index and it is useful for determining how
strong the relationship is.
16
Measures of Association
Correlation Coefficient
e.g. Data (no. of observations n=5 )
(smoking years) x: 0 5 10 15 20
(lung capacity) y: 45 42 33 31 29
𝑥 =
0+5+10+15+20
5
= 10 𝑦 =
45+42+33+31+29
5
= 36
𝑟 =
0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36)
(0 − 10)2+(5 − 10)2+(10 − 10)2+(15 − 10)2+(20 − 10)2∗ (45 − 36)2+(42 − 36)2+(33 − 36)2+(31 − 36)2+(29 − 36)2
=
−10 ∗ 9 + −5 ∗ 6 + 0 ∗ −3 + 5 ∗ −5 + 10 ∗ (−7)
100 + 25 + 0 + 25 + 100 ∗ 81 + 36 + 9 + 25 + 49
=
−90 − 30 + 0 − 25 − 70
250 ∗ 200
=
−215
50000
= −0.9615
17
Measures of Association
Correlation Examples
• The closer to -1, the
stronger the negative
linear relationship
• The closer to 1, the
stronger the positive
linear relationship
• The closer to 0, the
weaker the linear
relationship
18
Practice question 1 (cont.):
19
Person Age Days off work last year
A 19 21
B 21 18
C 32 15
D 27 17
E 45 8
The table below is the sickness records:
(a) Calculate the correlation of two variables according to question above.
- Compute it manually using the formula
- Compute it in Excel using spreadsheet.
- Step by step
- Using Excel formula
Quantitative Methods, 4th edition, Swift & Piff
Contingency Table & Dependency/Independency
The contingency table can be used to study the relationships that may exist between
two qualitative variables.
i.e. The contingency table that was briefly introduced in Data and Graphing Data
20
Quantitative Methods, 4th edition, Swift & Piff
Contingency Table & Dependency/Independency
The case of Titanic
Many well-known facts are reflected in the survival rates for various classes of
passenger. The British Board of Trade originally collected the data regarding
passengers in their investigation of the sinking, which makes it possible to
verify, i.e., if the “women and children first” or the “first-class passengers first”
policies have been entirely followed during the saving operations.
Here we will look into the “first-class passengers first” policy.
21
Contingency Table & Dependency/Independency
The case of Titanic
The data contain two qualitative variables: the class where each passenger was
travelling and if the passenger has survived or not to the disaster. (note that
unfortunately, no complete agreement among primary sources as to the exact
numbers on board, rescued, or lost.)
Just part of the long list of survivors:
22
Contingency Table & Dependency/Independency
The case of Titanic ---- univariate description by table and graphs
23
Contingency Table & Dependency/Independency
The case of Titanic ---- contingency table
By frequency By relative frequency
24
Contingency Table & Dependency/Independency
The case of Titanic ---- contingency table
25
Contingency Table & Dependency/Independency
The case of Titanic ---- contingency table
26
Contingency Table & Dependency/Independency
The case of Titanic ---- contingency table ---conditional relative distribution
Three conditional relative distribution
of “survived (X)” given “class (Y)”,
normally written as X|Y
Two conditional relative distribution
of “class (Y)” given “survived (X)” ,
normally written as Y|X
27
Contingency Table & Dependency/Independency
The case of Titanic ---- contingency table ---conditional relative distribution
28
Contingency Table & Dependency/Independency
Why not just the frequencies?
Assume that
29
The Maths and Notations – Boring but Necessary
30
Contingency Table & Dependency/Independency
31
Contingency Table & Dependency/Independency
32
Contingency Table & Dependency/Independency
The case of Titanic
33
Contingency Table & Dependency/Independency
The case of Titanic
34
Make sure that you can
• Calculate the covariance and correlation coefficient;
• Understand their differences and properties;
• Conduct a comprehensive association analysis on quantitative variables
by adopting scatter plot, covariance, correlation with corresponding
interpretations;
• Understand the use of contingency table on dependency/independency
analysis for qualitative variables.
35

More Related Content

PDF
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
PPTX
Correlational Analysis on Quantitative Research.pptx
PDF
Introduction to measures of relationship: covariance, and Pearson r
PDF
Section 5 - Improve Phase pdf Lean Six sigma
PPTX
Unit-III Correlation and Regression.pptx
PPTX
Correlation and Regression ppt
PDF
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
PPTX
Linear and Logistics Regression
Deepak_DAI101_Data_Anal_lecture6 (1).pdf
Correlational Analysis on Quantitative Research.pptx
Introduction to measures of relationship: covariance, and Pearson r
Section 5 - Improve Phase pdf Lean Six sigma
Unit-III Correlation and Regression.pptx
Correlation and Regression ppt
regression-linearandlogisitics-220524024037-4221a176 (1).pdf
Linear and Logistics Regression

Similar to correlation.pptx (20)

PPTX
Correlation types steps examples 123.pptx
PPTX
data analysis
PPTX
3.3 correlation and regression part 2.pptx
PDF
Correlation and Regression
PDF
correlation_and_covariance
PPTX
manecohuhuhuhubasicEstimation-1.pptx
PPT
Biostatistics lecture notes 7.ppt
PPT
Data analysis test for association BY Prof Sachin Udepurkar
PDF
CORRELATION-AND-REGRESSION.pdf for human resource
PDF
Correlation analysis
PDF
Unit 1 Correlation- BSRM.pdf
PDF
Regression Analysis-Machine Learning -Different Types
PPTX
Module 2_ Regression Models..pptx
PDF
Applied statistics lecture_6
PPTX
simple and multiple linear Regression. (1).pptx
PPTX
Correlation _ Regression Analysis statistics.pptx
PPTX
PPTX
Relation Anaylsis
PPTX
correlation.final.ppt (1).pptx
PDF
Dr. Shivu___Machine Learning_Module 2pdf
Correlation types steps examples 123.pptx
data analysis
3.3 correlation and regression part 2.pptx
Correlation and Regression
correlation_and_covariance
manecohuhuhuhubasicEstimation-1.pptx
Biostatistics lecture notes 7.ppt
Data analysis test for association BY Prof Sachin Udepurkar
CORRELATION-AND-REGRESSION.pdf for human resource
Correlation analysis
Unit 1 Correlation- BSRM.pdf
Regression Analysis-Machine Learning -Different Types
Module 2_ Regression Models..pptx
Applied statistics lecture_6
simple and multiple linear Regression. (1).pptx
Correlation _ Regression Analysis statistics.pptx
Relation Anaylsis
correlation.final.ppt (1).pptx
Dr. Shivu___Machine Learning_Module 2pdf
Ad

Recently uploaded (20)

PDF
The Right Social Media Strategy Can Transform Your Business
PPTX
Role and functions of International monetary fund.pptx
PPTX
ML Credit Scoring of Thin-File Borrowers
PPTX
Maths science sst hindi english cucumber
PDF
3CMT J.AFABLE Flexible-Learning ENTREPRENEURIAL MANAGEMENT.pdf
PDF
4a Probability-of-Failure-Based Decision Rules to Manage Sequence Risk in Ret...
PDF
2a A Dynamic and Adaptive Approach to Distribution Planning and Monitoring JF...
PPTX
Very useful ppt for your banking assignments Banking.pptx
PDF
7a Lifetime Expected Income Breakeven Comparison between SPIAs and Managed Po...
PPT
features and equilibrium under MONOPOLY 17.11.20.ppt
PPTX
Q1 PE AND HEALTH 5 WEEK 5 DAY 1 powerpoint template
PDF
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
PDF
3a The Dynamic Implications of Sequence Risk on a Distribution Portfolio JFP ...
PDF
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
DOCX
Final. 150 minutes exercise agrumentative Essay
PDF
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
PDF
GVCParticipation_Automation_Climate_India
PDF
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
PDF
USS pension Report and Accounts 2025.pdf
The Right Social Media Strategy Can Transform Your Business
Role and functions of International monetary fund.pptx
ML Credit Scoring of Thin-File Borrowers
Maths science sst hindi english cucumber
3CMT J.AFABLE Flexible-Learning ENTREPRENEURIAL MANAGEMENT.pdf
4a Probability-of-Failure-Based Decision Rules to Manage Sequence Risk in Ret...
2a A Dynamic and Adaptive Approach to Distribution Planning and Monitoring JF...
Very useful ppt for your banking assignments Banking.pptx
7a Lifetime Expected Income Breakeven Comparison between SPIAs and Managed Po...
features and equilibrium under MONOPOLY 17.11.20.ppt
Q1 PE AND HEALTH 5 WEEK 5 DAY 1 powerpoint template
The Role of Islamic Faith, Ethics, Culture, and values in promoting fairness ...
3a The Dynamic Implications of Sequence Risk on a Distribution Portfolio JFP ...
Fintech Regulatory Sandbox: Lessons Learned and Future Prospects
Final. 150 minutes exercise agrumentative Essay
Statistics for Management and Economics Keller 10th Edition by Gerald Keller ...
GVCParticipation_Automation_Climate_India
HCWM AND HAI FOR BHCM STUDENTS(1).Pdf and ptts
USS pension Report and Accounts 2025.pdf
Ad

correlation.pptx

  • 1. Applied Techniques for Economists Josphat Omanga. Copenhagen Business College. 1
  • 2. Quantitative Methods, 4th edition, Swift & Piff Describing Data Toolkit Displays for numerical data: • Frequency charts • Relative frequency • Histogram Displays for categorical data: • Bar charts • Pie charts • Contingency tables Quantitative (numerical) data • Mean • Median • Mode • Range • Variance • Standard deviation • Quartiles • Index Summarising Data Toolkit Recap Location Spread 2
  • 3. Main Objectives Scatter Plot Measures of Association a) Covariance b) Correlation Contingency Table & Dependency/Independency 3
  • 4. Quantitative Methods, 4th edition, Swift & Piff Why analyse two (or more) variables together ? The main aim is to examine whether there is any pattern between the responses of two or more variables: - To understand the relationship better; - To impact one variable by changing the other; - To forecast; Depending on the type of variable: • Quantitative (numerical): scatter plot, covariance, correlation coefficient • Qualitative (categorical): contingency table and dependency 4
  • 5. Quantitative Methods, 4th edition, Swift & Piff Scatter plot As briefly mentioned before in Data and Graphing Data, a scatter plot can show the relationship between two variables. Returns of two shares in 9 consecutive months 5
  • 6. Quantitative Methods, 4th edition, Swift & Piff Scatter plot examples 6
  • 7. Quantitative Methods, 4th edition, Swift & Piff Scatter plot examples 7
  • 8. Quantitative Methods, 4th edition, Swift & Piff Scatter plot The owner of an ice cream store wants to examine the relationship between daily sales and atmospheric temperature. A sample of 25 consecutive days is selected and the data of consumption of ice cream per head (in pints) and average temperature (in Fahrenheit) is recorded as follows. 8
  • 9. Scatter plot • Does a relationship between daily sales and temperature exist? • What kind of relationship exists? • How to measure this relationship (if it exists)? The distribution of the “cloud” of data indicates a positive relationship. How to be more precise on measuring this relationship? 9
  • 10. Measures of Association Covariance • Covariance is a measure of the linear relationship between two variables. • A positive value indicates a direct or increasing linear relationship, while a negative value indicates a decreasing linear relationship. The equation for calculating the covariance is defined as: Note that this is the formula for sample covariance, and for population covariance it is divided by n. 10
  • 11. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝑥 = 0 + 5 + 10 + 15 + 20 5 = 10 𝑦 = 45 + 42 + 33 + 31 + 29 5 = 36 𝐶𝑜𝑣(𝑥, 𝑦) = 0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36) 𝑛 − 1 = −10 ∗9 + −5 ∗6+0∗ −3 +5∗ −5 +10∗(−7) 5−1 = −90−30+0−25−70 4 = −215 4 = −53.75 11
  • 12. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝐶𝑜𝑣(𝑥, 𝑦) = −53.75 12
  • 13. Practice question 1: 13 Person Age Days off work last year A 19 21 B 21 18 C 32 15 D 27 17 E 45 8 The table below is the sickness records: (a) Calculate the covariance of two variables according to question above. - Compute it manually using the formula - Compute it in Excel using spreadsheet. - Step by step - Using Excel formula
  • 14. Quantitative Methods, 4th edition, Swift & Piff Measures of Association Covariance Properties Note that when cov(x, y) = 0 , it indicates that these two variables are independent (no relationship), whilst positive/negative relationship indicates dependent. 14
  • 15. Measures of Association Correlation Coefficient Covariance is a very useful index to verify if two variables are independent or not. However, it fails to indicate how strong is this relationship. Correlation coefficient is kind of a normalized index to measure the association. The equation for calculating the correlation coefficient is defined as: same as 15
  • 16. Measures of Association Correlation Coefficient VS Covariance The Correlation Coefficient has advantages over covariance for determining strengths of relationships: • Covariance can be any number with corresponding unit, while a correlation coefficient is unit free and its range is limited between -1 to 1. • Correlation is comparable index and it is useful for determining how strong the relationship is. 16
  • 17. Measures of Association Correlation Coefficient e.g. Data (no. of observations n=5 ) (smoking years) x: 0 5 10 15 20 (lung capacity) y: 45 42 33 31 29 𝑥 = 0+5+10+15+20 5 = 10 𝑦 = 45+42+33+31+29 5 = 36 𝑟 = 0 − 10 ∗ 45 − 36 + 5 − 10 ∗ 42 − 36 + 10 − 10 ∗ 33 − 36 + 15 − 10 ∗ 31 − 36 + 20 − 10 ∗ (29 − 36) (0 − 10)2+(5 − 10)2+(10 − 10)2+(15 − 10)2+(20 − 10)2∗ (45 − 36)2+(42 − 36)2+(33 − 36)2+(31 − 36)2+(29 − 36)2 = −10 ∗ 9 + −5 ∗ 6 + 0 ∗ −3 + 5 ∗ −5 + 10 ∗ (−7) 100 + 25 + 0 + 25 + 100 ∗ 81 + 36 + 9 + 25 + 49 = −90 − 30 + 0 − 25 − 70 250 ∗ 200 = −215 50000 = −0.9615 17
  • 18. Measures of Association Correlation Examples • The closer to -1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker the linear relationship 18
  • 19. Practice question 1 (cont.): 19 Person Age Days off work last year A 19 21 B 21 18 C 32 15 D 27 17 E 45 8 The table below is the sickness records: (a) Calculate the correlation of two variables according to question above. - Compute it manually using the formula - Compute it in Excel using spreadsheet. - Step by step - Using Excel formula
  • 20. Quantitative Methods, 4th edition, Swift & Piff Contingency Table & Dependency/Independency The contingency table can be used to study the relationships that may exist between two qualitative variables. i.e. The contingency table that was briefly introduced in Data and Graphing Data 20
  • 21. Quantitative Methods, 4th edition, Swift & Piff Contingency Table & Dependency/Independency The case of Titanic Many well-known facts are reflected in the survival rates for various classes of passenger. The British Board of Trade originally collected the data regarding passengers in their investigation of the sinking, which makes it possible to verify, i.e., if the “women and children first” or the “first-class passengers first” policies have been entirely followed during the saving operations. Here we will look into the “first-class passengers first” policy. 21
  • 22. Contingency Table & Dependency/Independency The case of Titanic The data contain two qualitative variables: the class where each passenger was travelling and if the passenger has survived or not to the disaster. (note that unfortunately, no complete agreement among primary sources as to the exact numbers on board, rescued, or lost.) Just part of the long list of survivors: 22
  • 23. Contingency Table & Dependency/Independency The case of Titanic ---- univariate description by table and graphs 23
  • 24. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table By frequency By relative frequency 24
  • 25. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table 25
  • 26. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table 26
  • 27. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table ---conditional relative distribution Three conditional relative distribution of “survived (X)” given “class (Y)”, normally written as X|Y Two conditional relative distribution of “class (Y)” given “survived (X)” , normally written as Y|X 27
  • 28. Contingency Table & Dependency/Independency The case of Titanic ---- contingency table ---conditional relative distribution 28
  • 29. Contingency Table & Dependency/Independency Why not just the frequencies? Assume that 29
  • 30. The Maths and Notations – Boring but Necessary 30
  • 31. Contingency Table & Dependency/Independency 31
  • 32. Contingency Table & Dependency/Independency 32
  • 33. Contingency Table & Dependency/Independency The case of Titanic 33
  • 34. Contingency Table & Dependency/Independency The case of Titanic 34
  • 35. Make sure that you can • Calculate the covariance and correlation coefficient; • Understand their differences and properties; • Conduct a comprehensive association analysis on quantitative variables by adopting scatter plot, covariance, correlation with corresponding interpretations; • Understand the use of contingency table on dependency/independency analysis for qualitative variables. 35