SlideShare a Scribd company logo
STAT 3615: BIOLOGICAL STATISTICS Hamdy F. F. Mahmoud, PhD
Collegiate Assistant Professor
Statistics Department @ VT
Chapter 3: Scatterplots and correlation
PART I: EXPLORING DATA
VARIABLES AND DISTRIBUTIONS
Chapter 1: Picturing distributions with graphs
Chapter 2: Describing distributions with numbers
RELATIONSHPS
Chapter 3: Scatterplots and correlation
Chapter 4: Regression
Chapter 5: Two-way tables
CHAPTER 3 TOPICS
3
 Bivariate data – response and explanatory variable
 Scatterplots – two quantitative variable
 Interpreting the Scatterplots
 Adding categorical variables to scatterplots – three variables,
two quantitative and one categorical.
 Measuring the association by correlation coefficient, r.
 Facts about correlation
In chapter 1 and 2, we worked with only one variable and summarized it
graphically and numerically. In this chapter, we cover relationship between two
quantitative variables and describe it graphically and measure the association
numerically
For each individual studied, we record data on two
variables. We then examine whether there is a
relationship between these two variables.
❖ Do changes in one variable tend to be associated with
specific changes in the other variables?
Bivariate Data – response and explanatory variable
In this table at the left, we have two quantitative
variables recorded for each of 12 students:
• How many beers they drank.
• Their resulting blood alcohol content (BAC).
Number
of Beers
Blood
Alcohol
Content
2 0.03
7 0.09
3 0.07
4 0.07
5 0.08
8 0.12
3 0.04
5 0.06
6 0.10
7 0.09
1 0.01
4 0.05
– A response variable measures an outcome of a study.
An explanatory variable may explain or influence
changes in a response variable.
– Examples:
• Number of beer and blood alcohol content
• Age and an animal weight
• Corn yield and amount of rain.
Explanatory variable Response variable
May affect
Bivariate Data – response and explanatory variables
In each of the following situations, is it more reasonable to
simply explore the association between the two variables or
to view one of the variables as an explanatory variable and
the other as a response variable?
a) The typical amount of calories a person consumes per
day and that person’s percentage of body fat.
b) The weight in kilograms and height in centimeters of a
person.
c) Inches of rain in the grown season and the yield of corn
in bushels per acre.
d) A person’s leg length and arm length, in centimeters.
Practice on response and explanatory variables
•A scatterplot shows the relationship between two
quantitative variables measured on the same individuals.
•The values of one variable appear on horizontal axis
(explanatory variable) and the values of the other
variable appear on the the vertical axis (response
variable).
 Scatterplots – two quantitative variables
Example: An endangered species
Manatees are large, herbivorous,
aquatic, endangered mammals found
primary in the rivers and estuaries of
Florida.
Research question:
• Do you think powerboats are responsible to manatees
death?
Powerboat registrations (in thousands) and manatee deaths from
powerboat collisions in Florida
Example: An endangered species (cont.)
 A study examined the relationship between the number of manatee
deaths from powerboat collisions and the number of powerboats
registered in any given year between 1977 and 2012.
Scatterplot of the number of manatee deaths due to powerboat
collisions in Florida each year against the number of powerboats
registered (in thousands) that same year. The dotted lines intersect at the
point (755, 54), the data for year 1997.
• Do you think powerboats are
responsible to manatees death?
Do you think there is an association between Brain size
and brain performance measured by IQ ?
1100000
1050000
1000000
950000
900000
850000
800000
160
150
140
130
120
110
100
90
80
70
Brain Size (pixels)
Performace
IQ
Scatterplot of Performace IQ vs Brain Size (pixels)
Comments:
To interpret a scatterplot, you need to look at:
 Overall pattern (Form): linear or nonlinear
 Direction: if it is linear, positive or negative
 Strength of the relationship which is determined
by how close the points in the scatterplot making a
specific pattern or form.
 Outliers: individuals values that fall outside the
overall pattern of the relationship.
 Interpreting the Scatterplots
The pattern (form) of the
relationship between 2
quantitative variables refers
to the overall pattern.
100
90
80
70
60
50
140
130
120
110
100
90
80
70
60
Temperature
Mortality
Scatterplot of Mortality vs Temperature
Positive association: High values
of one variable tend to occur
together with high values of the
other variable.
Negative association: High values
of one variable tend to occur
together with low values of the
other variable.
• If the relationship is linear, look at the direction
The strength of the relationship between 2 quantitative variables
refers to how much variation, or scatter, there is around the main
form.
An outlier is a data value
that has a very low
probability of occurrence
(i.e., it is unusual or
unexpected). In a scatterplot,
outliers are points that fall
outside of the overall
pattern of the relationship.
In many cases, the relationship is not between the
two variables, but it is between the transformed
variables. In other words, the relation is not between
x and y, but it may be between x and square root of
y, log(x) and log(y), …. etc.
 Transformed variables and association
Is there a relationship between brain size and body
weight among animals?
To answer this research
question, a researcher collected
data for 62 different types of
animals. Excel file below shows
a part of this data.
Example: body weight and brain size
Is there a relationship between brain size and body weight
among animals?
0
1000
2000
3000
4000
5000
6000
0 1000 2000 3000 4000 5000 6000 7000
Brain
weight
Body weight
Brain weight and body weigt scatter plot
It seems there are some outliers or unusual values!
Example: body weight and brain size (cont.)
Is there a relationship between brain size and body weight
among animals?
Example: body weight and brain size (cont.)
0
100
200
300
400
500
600
700
800
0 100 200 300 400 500 600
Brain
weight
Body weight
Brain weight and body weight
After
removing
the outliers.
Is there a relationship between brain size and body weight
among animals?
Example: body weight and brain size (cont.)
After
taking
logarithm
of both
variables.
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
-3 -2 -1 0 1 2 3
log
Brain
weight
log Body weight
Brain weight and body weight
 Adding a categorical variable to a scatterplot
➢Two or more relationships can be compared on a single scatterplot
when we use different symbols for groups of points on the graph.
The graph compares the association
between thorax length and longevity of
male fruit flies that are allowed to
reproduce (green) or not (purple).
The pattern is similar in both groups (linear,
positive association), but male fruit flies not
allowed to reproduce tend to live longer
than reproducing male fruit flies of the same
size.
• Energy expended as a function of running speed for various
treadmill inclines
However, for each incline, there is a very
strong, positive, linear relationship
between energy expenditure and speed.
In addition, we find that the relationship
between energy expenditure and speed is
noticeably different for different
inclines: More energy tends to be
expended for a given running speed if
the incline is steeper (uphill).
The correlation measures the direction and strength of the linear
relationship between two quantitative variables. Suppose we have
data on variables x and y for n individuals. The correlation r
between x and y is
Or you can use
Where is the average or x
is the average of y
is the standard deviation of x
is the standard deviation of y
x
y
sx
sy
 Measuring the Association by Coefficient of Correlation, r
• r ranges from −1 to +1
• Strength is indicated by
the absolute value of r
• Direction is indicated by
the sign of r (+ or –)
-1£ r £1
 Blood alcohol content (BAC)
a) Which is explanatory and which is response?
b) Draw a scatterplot and interpret.
c) Calculate the coefficient of correlation, r.
To examine the association between number
of beers the person drank and the resulting
blood alcohol content (BAC), 16 students
have been surveyed and the data are recorded.
Number
of Beers
Blood
Alcohol
Content
2 0.03
7 0.09
3 0.07
4 0.07
5 0.08
8 0.12
3 0.04
5 0.06
6 0.10
7 0.09
1 0.01
4 0.05
1 2 3 4 5 6 7 8 9
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Number of beers
Blood
Alcohol
Content
Number
of Beers
Blood
Alcohol
Content
2 0.03
7 0.09
3 0.07
4 0.07
5 0.08
8 0.12
3 0.04
5 0.06
6 0.10
7 0.09
1 0.01
4 0.05
Calculate the coefficient of correlation, r.
Chapter 03 scatterplots and correlation
FACTS ABOUT CORRELATION
❑ Correlation makes no distinction between explanatory
and response variable.
❑ Because r uses standardized values of the
observations, it does not change when we change units
of measurement of x, y, or both.
❑ Positive r indicates positive association and negative r
indicates negative association between variables.
❑ The correlation r is always a number between -1 and 1.
Values near to 0 indicates a very weak linear
relationship.
-1£ r £1
❑Correlation requires that both
variables be quantitative.
❑Correlation measures only
linear relations, it does not
describe curved relationships.
• Correlations are calculated using
means and standard deviations,
and thus are NOT resistant to
outliers.
Let us play with scatterplot
Click
Like mean and standard deviation it is
not resistant to outliers.
Chapter 03 scatterplots and correlation
End of Chapter 3

More Related Content

PPTX
Math 6 - Ratio and Rate
PPT
Basic Probability
PPTX
Lesson 1.4 the set of integers
PPTX
Bar Graph
PPT
Rate of Change & Slope
PPT
Proportion
PPTX
4. solving inequalities
PPT
Linear Regression Using SPSS
Math 6 - Ratio and Rate
Basic Probability
Lesson 1.4 the set of integers
Bar Graph
Rate of Change & Slope
Proportion
4. solving inequalities
Linear Regression Using SPSS

What's hot (20)

PPTX
Finding the Percent in Percent Problems
PPTX
INTRODUCTION TO ALGEBRA
PDF
Systems of linear equations in three variables
PPTX
Probability Distributions for Discrete Variables
PPTX
Proportion and its types, mathematics 8
PPTX
Fractions, decimals, and percentages
PPTX
Sampling Distribution
PPT
Unit 1 Whole Numbers
PDF
1.1 Linear Equations
PDF
Permutation and combination
ODP
Inequalities
PDF
Application of ordinal logistic regression in the study of students’ performance
PDF
Logistic Ordinal Regression
PPTX
Inverse Functions
PPTX
PPT
Ratios And Rates
PPTX
Measures of Variability
PPT
Simplifying algebraic expressions
PPTX
Percentages
Finding the Percent in Percent Problems
INTRODUCTION TO ALGEBRA
Systems of linear equations in three variables
Probability Distributions for Discrete Variables
Proportion and its types, mathematics 8
Fractions, decimals, and percentages
Sampling Distribution
Unit 1 Whole Numbers
1.1 Linear Equations
Permutation and combination
Inequalities
Application of ordinal logistic regression in the study of students’ performance
Logistic Ordinal Regression
Inverse Functions
Ratios And Rates
Measures of Variability
Simplifying algebraic expressions
Percentages
Ad

Similar to Chapter 03 scatterplots and correlation (20)

PDF
Chapter 2 part1-Scatterplots
PPT
Scatterplots and Cautions of Correlation
PPTX
Chapter 3.1
PPT
Lecture 2
PPT
Coefficient of Correlation Pearsons .ppt
PPT
Frequency Tables - Statistics
PPT
Medical statistics2
PPT
Scatterplots - LSRLs - RESIDs
PPTX
Correlation: Bivariate Data and Scatter Plot
PPT
Biostatistics
PPTX
statistic
DOCX
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
PPTX
Statistics.pptx
PPTX
Introduction to Basic Biostatistics (Biostats)
DOC
ch 13 Correlation and regression.doc
PPT
Biostatistics lecture notes 7.ppt
PPTX
Correlation analysis
PPT
Statistics trinity college
PPT
Statistics trinity college
PPT
Chapter35
Chapter 2 part1-Scatterplots
Scatterplots and Cautions of Correlation
Chapter 3.1
Lecture 2
Coefficient of Correlation Pearsons .ppt
Frequency Tables - Statistics
Medical statistics2
Scatterplots - LSRLs - RESIDs
Correlation: Bivariate Data and Scatter Plot
Biostatistics
statistic
ReferenceArticleModule 18 Correlational ResearchMagnitude,.docx
Statistics.pptx
Introduction to Basic Biostatistics (Biostats)
ch 13 Correlation and regression.doc
Biostatistics lecture notes 7.ppt
Correlation analysis
Statistics trinity college
Statistics trinity college
Chapter35
Ad

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PPTX
Lesson notes of climatology university.
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PPTX
master seminar digital applications in india
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Types and Its function , kingdom of life
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PPTX
Pharma ospi slides which help in ospi learning
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
GDM (1) (1).pptx small presentation for students
Lesson notes of climatology university.
Pharmacology of Heart Failure /Pharmacotherapy of CHF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Abdominal Access Techniques with Prof. Dr. R K Mishra
202450812 BayCHI UCSC-SV 20250812 v17.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
master seminar digital applications in india
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
A systematic review of self-coping strategies used by university students to ...
Final Presentation General Medicine 03-08-2024.pptx
Cell Types and Its function , kingdom of life
Microbial diseases, their pathogenesis and prophylaxis
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Pharma ospi slides which help in ospi learning
2.FourierTransform-ShortQuestionswithAnswers.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape

Chapter 03 scatterplots and correlation

  • 1. STAT 3615: BIOLOGICAL STATISTICS Hamdy F. F. Mahmoud, PhD Collegiate Assistant Professor Statistics Department @ VT Chapter 3: Scatterplots and correlation
  • 2. PART I: EXPLORING DATA VARIABLES AND DISTRIBUTIONS Chapter 1: Picturing distributions with graphs Chapter 2: Describing distributions with numbers RELATIONSHPS Chapter 3: Scatterplots and correlation Chapter 4: Regression Chapter 5: Two-way tables
  • 3. CHAPTER 3 TOPICS 3  Bivariate data – response and explanatory variable  Scatterplots – two quantitative variable  Interpreting the Scatterplots  Adding categorical variables to scatterplots – three variables, two quantitative and one categorical.  Measuring the association by correlation coefficient, r.  Facts about correlation In chapter 1 and 2, we worked with only one variable and summarized it graphically and numerically. In this chapter, we cover relationship between two quantitative variables and describe it graphically and measure the association numerically
  • 4. For each individual studied, we record data on two variables. We then examine whether there is a relationship between these two variables. ❖ Do changes in one variable tend to be associated with specific changes in the other variables? Bivariate Data – response and explanatory variable In this table at the left, we have two quantitative variables recorded for each of 12 students: • How many beers they drank. • Their resulting blood alcohol content (BAC). Number of Beers Blood Alcohol Content 2 0.03 7 0.09 3 0.07 4 0.07 5 0.08 8 0.12 3 0.04 5 0.06 6 0.10 7 0.09 1 0.01 4 0.05
  • 5. – A response variable measures an outcome of a study. An explanatory variable may explain or influence changes in a response variable. – Examples: • Number of beer and blood alcohol content • Age and an animal weight • Corn yield and amount of rain. Explanatory variable Response variable May affect Bivariate Data – response and explanatory variables
  • 6. In each of the following situations, is it more reasonable to simply explore the association between the two variables or to view one of the variables as an explanatory variable and the other as a response variable? a) The typical amount of calories a person consumes per day and that person’s percentage of body fat. b) The weight in kilograms and height in centimeters of a person. c) Inches of rain in the grown season and the yield of corn in bushels per acre. d) A person’s leg length and arm length, in centimeters. Practice on response and explanatory variables
  • 7. •A scatterplot shows the relationship between two quantitative variables measured on the same individuals. •The values of one variable appear on horizontal axis (explanatory variable) and the values of the other variable appear on the the vertical axis (response variable).  Scatterplots – two quantitative variables
  • 8. Example: An endangered species Manatees are large, herbivorous, aquatic, endangered mammals found primary in the rivers and estuaries of Florida. Research question: • Do you think powerboats are responsible to manatees death?
  • 9. Powerboat registrations (in thousands) and manatee deaths from powerboat collisions in Florida Example: An endangered species (cont.)  A study examined the relationship between the number of manatee deaths from powerboat collisions and the number of powerboats registered in any given year between 1977 and 2012.
  • 10. Scatterplot of the number of manatee deaths due to powerboat collisions in Florida each year against the number of powerboats registered (in thousands) that same year. The dotted lines intersect at the point (755, 54), the data for year 1997. • Do you think powerboats are responsible to manatees death?
  • 11. Do you think there is an association between Brain size and brain performance measured by IQ ? 1100000 1050000 1000000 950000 900000 850000 800000 160 150 140 130 120 110 100 90 80 70 Brain Size (pixels) Performace IQ Scatterplot of Performace IQ vs Brain Size (pixels) Comments:
  • 12. To interpret a scatterplot, you need to look at:  Overall pattern (Form): linear or nonlinear  Direction: if it is linear, positive or negative  Strength of the relationship which is determined by how close the points in the scatterplot making a specific pattern or form.  Outliers: individuals values that fall outside the overall pattern of the relationship.  Interpreting the Scatterplots
  • 13. The pattern (form) of the relationship between 2 quantitative variables refers to the overall pattern. 100 90 80 70 60 50 140 130 120 110 100 90 80 70 60 Temperature Mortality Scatterplot of Mortality vs Temperature
  • 14. Positive association: High values of one variable tend to occur together with high values of the other variable. Negative association: High values of one variable tend to occur together with low values of the other variable. • If the relationship is linear, look at the direction
  • 15. The strength of the relationship between 2 quantitative variables refers to how much variation, or scatter, there is around the main form.
  • 16. An outlier is a data value that has a very low probability of occurrence (i.e., it is unusual or unexpected). In a scatterplot, outliers are points that fall outside of the overall pattern of the relationship.
  • 17. In many cases, the relationship is not between the two variables, but it is between the transformed variables. In other words, the relation is not between x and y, but it may be between x and square root of y, log(x) and log(y), …. etc.  Transformed variables and association
  • 18. Is there a relationship between brain size and body weight among animals? To answer this research question, a researcher collected data for 62 different types of animals. Excel file below shows a part of this data. Example: body weight and brain size
  • 19. Is there a relationship between brain size and body weight among animals? 0 1000 2000 3000 4000 5000 6000 0 1000 2000 3000 4000 5000 6000 7000 Brain weight Body weight Brain weight and body weigt scatter plot It seems there are some outliers or unusual values! Example: body weight and brain size (cont.)
  • 20. Is there a relationship between brain size and body weight among animals? Example: body weight and brain size (cont.) 0 100 200 300 400 500 600 700 800 0 100 200 300 400 500 600 Brain weight Body weight Brain weight and body weight After removing the outliers.
  • 21. Is there a relationship between brain size and body weight among animals? Example: body weight and brain size (cont.) After taking logarithm of both variables. -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 -3 -2 -1 0 1 2 3 log Brain weight log Body weight Brain weight and body weight
  • 22.  Adding a categorical variable to a scatterplot ➢Two or more relationships can be compared on a single scatterplot when we use different symbols for groups of points on the graph. The graph compares the association between thorax length and longevity of male fruit flies that are allowed to reproduce (green) or not (purple). The pattern is similar in both groups (linear, positive association), but male fruit flies not allowed to reproduce tend to live longer than reproducing male fruit flies of the same size.
  • 23. • Energy expended as a function of running speed for various treadmill inclines However, for each incline, there is a very strong, positive, linear relationship between energy expenditure and speed. In addition, we find that the relationship between energy expenditure and speed is noticeably different for different inclines: More energy tends to be expended for a given running speed if the incline is steeper (uphill).
  • 24. The correlation measures the direction and strength of the linear relationship between two quantitative variables. Suppose we have data on variables x and y for n individuals. The correlation r between x and y is Or you can use Where is the average or x is the average of y is the standard deviation of x is the standard deviation of y x y sx sy  Measuring the Association by Coefficient of Correlation, r
  • 25. • r ranges from −1 to +1 • Strength is indicated by the absolute value of r • Direction is indicated by the sign of r (+ or –) -1£ r £1
  • 26.  Blood alcohol content (BAC) a) Which is explanatory and which is response? b) Draw a scatterplot and interpret. c) Calculate the coefficient of correlation, r. To examine the association between number of beers the person drank and the resulting blood alcohol content (BAC), 16 students have been surveyed and the data are recorded. Number of Beers Blood Alcohol Content 2 0.03 7 0.09 3 0.07 4 0.07 5 0.08 8 0.12 3 0.04 5 0.06 6 0.10 7 0.09 1 0.01 4 0.05
  • 27. 1 2 3 4 5 6 7 8 9 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 Number of beers Blood Alcohol Content Number of Beers Blood Alcohol Content 2 0.03 7 0.09 3 0.07 4 0.07 5 0.08 8 0.12 3 0.04 5 0.06 6 0.10 7 0.09 1 0.01 4 0.05
  • 28. Calculate the coefficient of correlation, r.
  • 30. FACTS ABOUT CORRELATION ❑ Correlation makes no distinction between explanatory and response variable. ❑ Because r uses standardized values of the observations, it does not change when we change units of measurement of x, y, or both. ❑ Positive r indicates positive association and negative r indicates negative association between variables. ❑ The correlation r is always a number between -1 and 1. Values near to 0 indicates a very weak linear relationship. -1£ r £1
  • 31. ❑Correlation requires that both variables be quantitative. ❑Correlation measures only linear relations, it does not describe curved relationships. • Correlations are calculated using means and standard deviations, and thus are NOT resistant to outliers. Let us play with scatterplot Click Like mean and standard deviation it is not resistant to outliers.