SlideShare a Scribd company logo
Data Science
Instructor:Maham Naeem
September 11, 2024
1
National University of Computer and Emerging Sciences,
Lahore
Covariance and Correlation
Lecture No. 3c
Covariance
Covariance measures the direction of the relationship
between two variables.
A positive covariance means that both variables tend to be
high or low at the same time.
A negative covariance means that when one variable is
high, the other tends to be low
Covariance vs Correlation Coefficient
Covariance measures the direction of a
relationship between two variables
Correlation measures the strength of that
relationship.
Both correlation and covariance are positive when
the variables move in the same direction, and
negative when they move in opposite directions.
However, a correlation coefficient must always be
������� − � ��� + �, with the extreme values
indicating a strong relationship.
Computing Covariance
��� �, � = �=1
�
�� − � × �� − �
�
Average of
variable A
Average of
variable B
��� �, � = �=1
�
�� × ��
�
− � × �
�������� ��
=
�
�
�=�
�
�� − � �
�������� ��
=
�
�
�=�
�
��
�
− ��
Recall from Lecture 4:
Note: Variance is special case of covariance where two attributes are identical (covariance with
itself)
Computing Covariance
��� �, � = �=1
�
�� × ��
�
− � × �
� �������������� =
6 + 5 + 4 + 3 + 2
5
=
20
5
= 4
� ���ℎ���ℎ =
20 + 10 + 14 + 5 + 5
5
=
54
5
= 10.80
��� ��������������, ���ℎ���ℎ =
6 × 20 + 5 × 10 + 4 × 14 + 3 × 5 + 2 × 5
5
− 4 × 10.80
��� ��������������, ���ℎ���ℎ = 50.2 − 43.2
��� ��������������, ���ℎ���ℎ = 7
What if the data is
nominal?
Correlation
Correlation is a statistical term describing the degree to
which two variables move in coordination with one
another.
If the two variables move in the same direction, then those
variables are said to have a positive correlation.
If they move in opposite directions, then they have a negative
correlation.
The strength of the correlation is determined by the
correlation coefficient, which varies between −� and +�.
What if correlation is 0?
Correlation Between Two Variables
Weak relationship have
small correlation
value…
Strong relationship have
large correlation
value…
Correlation is 1 if a
straight line with positive
slope can be drawn from
the center of all data
points
Correlation Between Two Variables
Correlation Between Two Variables
Correlation Between Two Variables
But … How to
Compute Correlation?
Computing Correlation
X Y
1 2
2 4
3 7
4 9
5 12
6 14
X Y XY �� ��
1 2
2 4
3 7
4 9
5 12
6 14
Computing Correlation
X Y
1 2
2 4
3 7
4 9
5 12
6 14
X Y XY �� ��
1 2 2
2 4 8
3 7 21
4 9 36
5 12 60
6 14 84
Computing Correlation
X Y
1 2
2 4
3 7
4 9
5 12
6 14
X Y XY �� ��
1 2 2 1 4
2 4 8 4 16
3 7 21 9 49
4 9 36 16 81
5 12 60 25 144
6 14 84 36 196
21 48 211 91 490
� � �� ��
��
Computing Correlation
X Y
1 2
2 4
3 7
4 9
5 12
6 14
21 48 211 91 490
� � �� ��
��
� =
� × ��� − �� × ��
� × �� − �� � × � × ��� − �� �
� =
���� − ����
��� − ��� × ���� − ����
=
���
��� × ���
=
���
�����
= �. ���
More Details: https://www.investopedia.com/ask/answers/032515/what-does-it-mean-if-correlation-coefficient-positive-negative-or-zero.asp
This is “Pearson’s
Correlation
Coefficient”
Correlation of Multiple Variables
IRIS Dataset: Again…
Heatmap
Correlation Heatmaps: Complex Examples
We can also create heatmaps for Similarity and
Dissimilarity (Distance) between two
objects/records…
But … What is similarity and dissimilarity
between two objects and how to measure it?
Correlation Interpretation
More Details: https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
Correlation does NOT imply causality…
if A and B are correlated, this does not necessarily imply
that A causes B or that B causes A.
For example, in analyzing a demographic database, we may
find that attributes representing the number of hospitals and
the number of car thefts in a region are correlated.
This does not mean that one causes the other.
Both are actually causally linked to a third attribute, namely,
population.
What Correlation Tells Us?
Some attributes are redundant
Redundancy here means that an attribute (such as
annual_revenue) may be redundant if it can be “derived”
from other attribute or set of attributes.
Inconsistencies in attribute (or dimension) naming can also
cause redundancies in the resulting dataset.
Some redundancies can be
detected by correlation
analysis.
Covariance vs Correlation
Covariance : Direction (any value)
Correlation: Direction and Strength (-1 to +1)
Correlation Analysis
Correlation Analysis:
Given two attributes, how strongly one attribute implies the other,
based on the available data
For Numerical Data:
Correlation coefficient (or Pearson’s Correlation Coefficient)
Covariance
For Nominal Data:
�2
(Chi-Square) test
Reading
Chap 2 nd 3 (Han and Kamber)
Activity
Calculate Correlation between these two variables X, Y
X Y
2 10
4 9
6 8
8 7
10 6
Activity
Calculate Covariance of the following two variables X, Y
X Y
2.1 8
2.5 10
3.6 12
4 14

More Related Content

PPTX
Multivariate Analysis Degree of association between two variable - Test of Ho...
DOCX
MCA_UNIT-4_Computer Oriented Numerical Statistical Methods
PDF
Correlation analysis
PDF
Study of Correlation
PPTX
Correlation.pptx
PPTX
PPT Correlation.pptx
PPTX
Dr Amita Marwha -correlation coeeficient and partial.pptx
PPTX
Correlation and regression
Multivariate Analysis Degree of association between two variable - Test of Ho...
MCA_UNIT-4_Computer Oriented Numerical Statistical Methods
Correlation analysis
Study of Correlation
Correlation.pptx
PPT Correlation.pptx
Dr Amita Marwha -correlation coeeficient and partial.pptx
Correlation and regression

Similar to DS-Lecture-3c-Covariance and Correlation.pdf (20)

PPTX
Correlation analysis
PPTX
UNIT-II-Describing Data and Relationships
PPTX
Correlation
PPTX
Correlation and regression impt
PPTX
Correlation and Regression
PPT
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
PPT
Correlation IN STATISTICS
PDF
The management of a regional bus line thought the companys cost of .pdf
PPTX
Stats 3000 Week 2 - Winter 2011
PPT
Ch 7 correlation_and_linear_regression
PPTX
Correlation and regression
PPT
2-20-04.ppt
PPTX
Correlation and regression
PPTX
Correlation and regression
PPTX
Correlation
PDF
9. parametric regression
PPTX
Correlation
PDF
Section 5 - Improve Phase pdf Lean Six sigma
PPTX
Scatter plot- Complete
PPTX
Correlation analysis
Correlation analysis
UNIT-II-Describing Data and Relationships
Correlation
Correlation and regression impt
Correlation and Regression
2-20-04.ppthjjbnjjjhhhhhhhhhhhhhhhhhhhhhhhh
Correlation IN STATISTICS
The management of a regional bus line thought the companys cost of .pdf
Stats 3000 Week 2 - Winter 2011
Ch 7 correlation_and_linear_regression
Correlation and regression
2-20-04.ppt
Correlation and regression
Correlation and regression
Correlation
9. parametric regression
Correlation
Section 5 - Improve Phase pdf Lean Six sigma
Scatter plot- Complete
Correlation analysis
Ad

Recently uploaded (20)

PPT
Teaching material agriculture food technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Weekly Chronicles - August'25-Week II
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Electronic commerce courselecture one. Pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Spectroscopy.pptx food analysis technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Cloud computing and distributed systems.
Teaching material agriculture food technology
The AUB Centre for AI in Media Proposal.docx
Empathic Computing: Creating Shared Understanding
NewMind AI Weekly Chronicles - August'25-Week II
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Programs and apps: productivity, graphics, security and other tools
Electronic commerce courselecture one. Pdf
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Spectroscopy.pptx food analysis technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Dropbox Q2 2025 Financial Results & Investor Presentation
Reach Out and Touch Someone: Haptics and Empathic Computing
The Rise and Fall of 3GPP – Time for a Sabbatical?
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
Network Security Unit 5.pdf for BCA BBA.
Cloud computing and distributed systems.
Ad

DS-Lecture-3c-Covariance and Correlation.pdf

  • 1. Data Science Instructor:Maham Naeem September 11, 2024 1 National University of Computer and Emerging Sciences, Lahore Covariance and Correlation Lecture No. 3c
  • 2. Covariance Covariance measures the direction of the relationship between two variables. A positive covariance means that both variables tend to be high or low at the same time. A negative covariance means that when one variable is high, the other tends to be low
  • 3. Covariance vs Correlation Coefficient Covariance measures the direction of a relationship between two variables Correlation measures the strength of that relationship. Both correlation and covariance are positive when the variables move in the same direction, and negative when they move in opposite directions. However, a correlation coefficient must always be ������� − � ��� + �, with the extreme values indicating a strong relationship.
  • 4. Computing Covariance ��� �, � = �=1 � �� − � × �� − � � Average of variable A Average of variable B ��� �, � = �=1 � �� × �� � − � × � �������� �� = � � �=� � �� − � � �������� �� = � � �=� � �� � − �� Recall from Lecture 4: Note: Variance is special case of covariance where two attributes are identical (covariance with itself)
  • 5. Computing Covariance ��� �, � = �=1 � �� × �� � − � × � � �������������� = 6 + 5 + 4 + 3 + 2 5 = 20 5 = 4 � ���ℎ���ℎ = 20 + 10 + 14 + 5 + 5 5 = 54 5 = 10.80 ��� ��������������, ���ℎ���ℎ = 6 × 20 + 5 × 10 + 4 × 14 + 3 × 5 + 2 × 5 5 − 4 × 10.80 ��� ��������������, ���ℎ���ℎ = 50.2 − 43.2 ��� ��������������, ���ℎ���ℎ = 7 What if the data is nominal?
  • 6. Correlation Correlation is a statistical term describing the degree to which two variables move in coordination with one another. If the two variables move in the same direction, then those variables are said to have a positive correlation. If they move in opposite directions, then they have a negative correlation. The strength of the correlation is determined by the correlation coefficient, which varies between −� and +�. What if correlation is 0?
  • 7. Correlation Between Two Variables Weak relationship have small correlation value… Strong relationship have large correlation value… Correlation is 1 if a straight line with positive slope can be drawn from the center of all data points
  • 10. Correlation Between Two Variables But … How to Compute Correlation?
  • 11. Computing Correlation X Y 1 2 2 4 3 7 4 9 5 12 6 14 X Y XY �� �� 1 2 2 4 3 7 4 9 5 12 6 14
  • 12. Computing Correlation X Y 1 2 2 4 3 7 4 9 5 12 6 14 X Y XY �� �� 1 2 2 2 4 8 3 7 21 4 9 36 5 12 60 6 14 84
  • 13. Computing Correlation X Y 1 2 2 4 3 7 4 9 5 12 6 14 X Y XY �� �� 1 2 2 1 4 2 4 8 4 16 3 7 21 9 49 4 9 36 16 81 5 12 60 25 144 6 14 84 36 196 21 48 211 91 490 � � �� �� ��
  • 14. Computing Correlation X Y 1 2 2 4 3 7 4 9 5 12 6 14 21 48 211 91 490 � � �� �� �� � = � × ��� − �� × �� � × �� − �� � × � × ��� − �� � � = ���� − ���� ��� − ��� × ���� − ���� = ��� ��� × ��� = ��� ����� = �. ��� More Details: https://www.investopedia.com/ask/answers/032515/what-does-it-mean-if-correlation-coefficient-positive-negative-or-zero.asp This is “Pearson’s Correlation Coefficient”
  • 15. Correlation of Multiple Variables IRIS Dataset: Again… Heatmap
  • 16. Correlation Heatmaps: Complex Examples We can also create heatmaps for Similarity and Dissimilarity (Distance) between two objects/records… But … What is similarity and dissimilarity between two objects and how to measure it?
  • 17. Correlation Interpretation More Details: https://statistics.laerd.com/statistical-guides/pearson-correlation-coefficient-statistical-guide.php
  • 18. Correlation does NOT imply causality… if A and B are correlated, this does not necessarily imply that A causes B or that B causes A. For example, in analyzing a demographic database, we may find that attributes representing the number of hospitals and the number of car thefts in a region are correlated. This does not mean that one causes the other. Both are actually causally linked to a third attribute, namely, population.
  • 19. What Correlation Tells Us? Some attributes are redundant Redundancy here means that an attribute (such as annual_revenue) may be redundant if it can be “derived” from other attribute or set of attributes. Inconsistencies in attribute (or dimension) naming can also cause redundancies in the resulting dataset. Some redundancies can be detected by correlation analysis.
  • 20. Covariance vs Correlation Covariance : Direction (any value) Correlation: Direction and Strength (-1 to +1)
  • 21. Correlation Analysis Correlation Analysis: Given two attributes, how strongly one attribute implies the other, based on the available data For Numerical Data: Correlation coefficient (or Pearson’s Correlation Coefficient) Covariance For Nominal Data: �2 (Chi-Square) test
  • 22. Reading Chap 2 nd 3 (Han and Kamber)
  • 23. Activity Calculate Correlation between these two variables X, Y X Y 2 10 4 9 6 8 8 7 10 6
  • 24. Activity Calculate Covariance of the following two variables X, Y X Y 2.1 8 2.5 10 3.6 12 4 14