SlideShare a Scribd company logo
1Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Chapter 3
Displaying and
Summarizing
Quantitative Data
2Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.1
Displaying
Quantitative
Variables
3Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Histograms
•Histogram: A chart that
displays quantitative data
• Great for seeing the distribution of the data
• Most earthquake generating tsunamis have
magnitudes
between 6.5 and 8.
• Japan and Sumatra quakes (9.0 and 9.1) are rare.
• Quakes under 5 rarely cause tsunamis.
A histogram of tsunami
generating earthquakes
4Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Choosing the Bin Width
•Different bin widths tell different
stories.
•Choose the width that best shows
the important features.
•Presentations can feature two
histograms that present the same
data in different ways.
•A gap in the histogram means that
there were no occurrences in that
5Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Relative Frequency Histograms
•Relative Frequency Histogram
•The vertical axis represents
the relative frequency, the
frequency divided by the total.
•The horizontal axis is the same
as the horizontal axis for the frequency histogram.
•The shape of the relative frequency histogram is the
same as the frequency histogram.
•Only the scale of the y-axis is different.
6Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Stem-and-Leaf Displays
•Stem-and-Leaf: Shows both the
shape of the distribution and all
of the individual values
•Not as visually pleasing as a
histogram; more technical looking
•Can only be used for small collections of data
•The first column (stems) represents the leftmost digit.
•The second column (leaves) shows the remaining
7Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Dotplots
•Dotplot: Displays dots to describe
the shape of the distribution
•There were 30 races with a winning
time of 122 seconds.
•Good for smaller data sets
•Visually more appealing than
stem-and-leaf
•In StatCrunch:
Graphics → Dotplot
8Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Think Before you Draw
•Is the variable quantitative? Is the answer to the
survey
question or result of the experiment a number whose
units are known?
•Histograms, stem-and-leaf diagrams, and dotplots
can only display quantitative data.
•Bar and pie charts display categorical data.
9Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.2
Shape
10Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Modes
•A Mode of a histogram is a hump or high-frequency bin.
•One mode → Unimodal
•Two modes → Bimodal
•3 or more → Multimodal
Unimodal MultimodalBimodal
11Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Uniform Distributions
•Uniform Distribution: All the bins have the same
frequency, or at least close to the same frequency.
•The histogram for a uniform distribution will be flat.
12Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Symmetry
•The histogram for a symmetric distribution will look the
same on the left and the right of its center.
Symmetric
Not
Symmetric Symmetric
13Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Skew
•A histogram is skewed right if the longer tail is on the
right side of the mode.
•A histogram is skewed left if the longer tail is on the left
side of the mode.
Skewed LeftSkewed Right
14Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Outliers
•An Outlier is a data value that is far above or far below
the rest of the data values.
•An outlier is sometimes just
an error in the data collection.
•An outlier can also be the
most important data value.
•Income of a CEO
•Temperature of a person with
a high fever
15Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Example
•The histogram shows the amount
of money spent by a credit card
company’s customers. Describe
and interpret the distribution.
•The distribution is unimodal. Customers most
commonly spent a small amount of money.
•The distribution is skewed right. Many customers
spent only a small amount and a few were spread out
at the high end.
•There is an outlier at around $7000. One customer
16Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.3
Center
17Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Median
•Median: The center of the
data values
•Half of the data values are to
the left of the median and half
are to the right of the median.
•For symmetric distributions, the median is directly
in the middle.
18Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Calculating the Median: Odd Sample Size
•First order the numbers.
•If there are an odd number of numbers, n, the median is
at position .
•Find the median of the numbers: 2, 4, 5, 6, 7, 9, 9.
•
•The median is the fourth number: 6
•Note that there are 3 numbers to the left of 6 and 3 to
the right.
+1
2
n
+ +
= =
1 7 1
4
2 2
n
19Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Calculating the Median: Even Sample Size
•First order the numbers.
•If there are an even number of numbers, n, the median
is the average of the two middle numbers: .
•Find the median of the numbers: 2, 2, 4, 6, 7, 8.
•
•The median is the average of the third and the fourth
numbers:
= =
6
3
2 2
n
+, 1
2 2
n n
+
= =
4 6
Median 5
2
20Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.4
Spread
21Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Spread
•Locating the center is only part of the story
•Are the data all near the center or are they spread out?
•Is the highest value much higher than the lowest value?
•To describe data, we must discuss both the center and
the spread.
22Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Range
•The range is the difference between the maximum and
minimum values.
Range = Maximum – Minimum
•The ages of the guests at your dinner party are:
16, 18, 23, 23, 27, 35, 74
•The range is: 74 – 16 = 58
•The range is sensitive to outliers. A single high or low
value will affect the range significantly.
23Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Percentiles and Quartiles
•Percentiles divide the data in one hundred groups.
•The nth
percentile is the data value such that n percent
of the data lies below that value.
•For large data sets, the median is the 50th
percentile.
•The median of the lower half of the data is the 25th
percentile and is called the first quartile (Q1).
•The median of the upper half of the data is the 75th
percentile and is called the third quartile (Q3).
24Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Interquartile Range
•The Interquartile Range (IQR) is the difference between
the upper quartile and the lower quartile
IQR = Q3 – Q1
•The IQR measures the range of the middle half of the
data.
•Example: If Q1 = 23 and Q3 = 44 then
IQR = 44 – 23 = 21
25Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Interquartile Range
•The Interquartile Range for earthquake causing
tsunamis is 0.9.
•The picture below shows the meaning of the IQR.
26Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Benefits and Drawbacks of the IQR
•The Interquartile Range is not sensitive to outliers.
•The IQR provides a reasonable summary of the spread
of the distribution.
•The IQR shows where typical values are, except for the
case of a bimodal distribution.
•The IQR is not great for a general audience since most
people do not know what it is.
27Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.5
Boxplots and
5-Number
Summaries
28Copyright © 2014, 2012, 2009 Pearson Education, Inc.
5-Number Summary
•The 5-Number Summary provides a numerical
description of the data. It consists of
•Minimum
•First Quartile (Q1)
•Median
•Third Quartile (Q3)
•Maximum
•The list to the right shows the
5-Number Summary for the
tsunami data.
29Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Interpreting the 5-Number Summary
•The smallest tsunami-causing earthquake
had magnitude 3.7.
•The largest tsunami-causing earthquake
had magnitude 9.1.
•The middle half of tsunami-causing
earthquakes is between 6.7 and 7.6.
•Half of tsunami-causing earthquakes have
magnitudes below 7.2 and half are above 7.2.
•A tsunami-causing earthquake less than 6.7 is small.
30Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Boxplots
•A Boxplot is a chart that displays the
5-Point Summary and the outliers.
•The Box shows the Interquartile Range.
•The dashed lines are called fences,
outside the fences lie the outliers.
•Above and below the box are the whiskers
that display the most extreme data values
within the fences.
31Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Finding the Fences
•The lower fence is defined by
Lower Fence = Q1 – 1.5 × IQR
•The upper fence is defined by
Upper Fence = Q3 + 1.5 × IQR
•Tsunami Example: Q1 = 6.7, Q3 = 7.6
IQR = 7.6 – 6.7 = 0.9
•Lower Fence = 6.7 – 1.5 × 0.9 = 5.35
•Upper Fence = 7.6 + 1.5 × 0.9 = 8.95
32Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Step-by-Step Example of Shape, Center,
Spread: Flight Cancellations
•Question: How often are flights cancelled?
•Who? Months
•What? Percentage of Flights Cancelled at U.S.
Airports
•When? 1995 – 2011
•Where? United States
•How? Bureau of Transportation Statistics Data
33Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Flight Cancellations: Think
•Identify the Variable
•Percent of flight cancellations at U.S. airports
•Quantitative: Units are percentages.
•How will be data be summarized?
•Histogram
•Numerical Summary
•Boxplot
34Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Flight Cancellations: Show
•Use StatCrunch to create the histogram,
boxplot, and numerical summary.
35Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Flight Cancellations: Tell
•Describe the shape, center, and spread of the
distribution. Report on the symmetry, number of modes,
and any gaps or outliers. You should also mention any
concerns you may have about the data.
•Skewed to the Right: Can’t be a negative percent.
Bad weather and other airport troubles can cause
extreme cancellations.
•IQR is small: 1.23%. Consistency among
cancellation
percents
36Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.6
The Center of
Symmetric
Distributions:
The Mean
37Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Mean
•The Mean is what most people think of as the average.
•Add up all the numbers and divide by the number of
numbers.
•Recall that Σ means “Add them all.”
•In StatCrunch, the mean is listed in the
Summary Statistics.
y
y
n
=
∑
38Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Mean is the “Balancing Point”
•If you put your finger
on the mean, the
histogram will
balance perfectly.
39Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Mean Vs. Median
•For symmetric distributions, the mean and the median
are equal.
•The balancing point is at the center.
•The tail “pulls” the mean towards it more than it does to
the median.
•The mean is more sensitive to outliers than the median.
40Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Mean Is Attracted to the Outlier
•The mean is larger
than the median
since it is “pulled”
to the right by the
outlier.
•The median is a better
measure of the center
for data that is skewed.
41Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Why Use the Mean?
•Although the median is a better measure of the center,
the mean weighs in large and small values better.
•The mean is easier to work with.
•For symmetric data, statisticians would rather use the
mean.
•It is always ok to report both the mean and the median.
42Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.7
The Spread of
Symmetric
Distributions:
The Standard
Deviation
43Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Variance
•The variance is a measure of how far the data is
spread
out from the mean.
•The difference from the mean is: .
•To make it positive, square it.
•Then find the average of all of these distances, except
instead of dividing by n, divide by n – 1.
•Use s2
to represent the variance.
−y y
( )
2
2
1
y y
s
n
−
=
−
∑
44Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Standard Deviation
•The variance’s units are the square of the original units.
•Taking the square root of the variance gives the
standard deviation, which will have the same units as y.
•The standard deviation is a number that is close to the
average distances that the y values are from the mean.
•If data values are close to the mean (less spread out),
then the standard deviation will be small.
•If data values are far from the mean (more spread out),
( )
2
1
y y
s
n
−
=
−
∑
45Copyright © 2014, 2012, 2009 Pearson Education, Inc.
The Standard Deviation and Histograms
A B C
Answer: C, A, B
Order the histograms below from smallest
standard deviation to largest standard deviation.
46Copyright © 2014, 2012, 2009 Pearson Education, Inc.
3.8
Summary—What
to Tell About a
Quantitative
Variable
47Copyright © 2014, 2012, 2009 Pearson Education, Inc.
What to Tell
•Histogram, Stem-and-Leaf, Boxplot
•Describe modality, symmetry, outliers
•Center and Spread
•Median and IQR if not symmetric
•Mean and Standard Deviation if symmetric.
•Unimodal symmetric data: IQR > s. Check for errors.
•Unusual Features
•For multiple modes, possibly split the data into groups.
•When there are outliers, report the mean and standard
deviation with and without the outliers.
48Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Example: Fuel Efficiency
•The car owner has checked the fuel efficiency each
time
he filled the tank. How would you describe the fuel
efficiency?
•Plan: Summarize the distribution of the car’s fuel
efficiency.
•Variable: mpg for 100 fill ups, Quantitative
•Mechanics: show a histogram
•Fairly symmetric
•
49Copyright © 2014, 2012, 2009 Pearson Education, Inc.
Fuel Efficiency Continued
•Which to report?
•The mean and median are close.
•Report the mean and standard deviation.
•Conclusion
•Distribution is unimodal and symmetric.
•Mean is 22.4 mpg.
•Low outlier may be investigated, but limited effect on
the mean
•s = 2.45; from one filling to the next, fuel efficiency
differs from the mean by an average of about 2.45 mpg.
50Copyright © 2014, 2012, 2009 Pearson Education, Inc.
What Can Go Wrong?
•Don’t make a histogram for categorical data.
•Don’t look for shape, center,
and spread for a bar chart.
•Choose a bin width appropriate
for the data.
51Copyright © 2014, 2012, 2009 Pearson Education, Inc.
What Can Go Wrong? Continued
•Do a reality check
•Don’t blindly trust your calculator. For example, a
mean student age of 193 years old is nonsense.
•Sort before finding the median and percentiles.
•315, 8, 2, 49, 97 does not have median of 2.
•Don’t worry about small differences in the quartile
calculation.
•Don’t compute numerical summaries for a categorical
variable.
•The mean Social Security number is meaningless.
52Copyright © 2014, 2012, 2009 Pearson Education, Inc.
What Can Go Wrong? Continued
•Don’t report too many decimal places.
•Citing the mean fuel efficiency as 22.417822453 is
going overboard.
•Don’t round in the middle of a calculation.
•For multiple modes, think about separating groups.
•Heights of people → Separate men and women
•Beware of outliers, the mean and standard deviation
are
sensitive to outliers.
•

More Related Content

PPT
Ch2ppt velasquez12
PPTX
Psychology: Introduction
PPT
Ch1ppt velasquez12
PPTX
Working with Numerical Data
PDF
Qm 4 20100905
PDF
Aron chpt 1 ed (1)
PPTX
Skewness
Ch2ppt velasquez12
Psychology: Introduction
Ch1ppt velasquez12
Working with Numerical Data
Qm 4 20100905
Aron chpt 1 ed (1)
Skewness

Similar to Dilplaying and summarising Quantitative Data (20)

PPTX
Chapter-1-section 2.1 Exploring data-Edition-5.pptx
PPT
Wynberg girls high-Jade Gibson-maths-data analysis statistics
PDF
Stats - Lecture CH 3- Describing Data Using Numerical Measures.pdf
PPTX
G12 graders - Graphing Distributions (Quantitative).pptx
DOCX
Slide Copyright © 2007 Pearson Education, Inc Publishi.docx
PDF
PPTX
Bar_Graphs_Histograms_PieCharts_Box&WhiskerPlots
PPT
Chapter 2 Section 3.ppt
PDF
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
PDF
Lesson2 - chapter two Measures of Tendency.pptx.pdf
PDF
Lessontwo - Measures of Tendency.pptx.pdf
PPTX
Descrptive statistics
PDF
organizing data.pdf
PPTX
1. Descriptive statistics.pptx engineering
KEY
Exploring Data
PDF
3. Descriptive statistics.pdf
PPTX
Introduction to data visualization 1
PPTX
3.1 Measures of center
PPT
Descriptive Statistics and Data Visualization
PPTX
RVO-STATISTICS_Statistics_Introduction To Statistics IBBI.pptx
Chapter-1-section 2.1 Exploring data-Edition-5.pptx
Wynberg girls high-Jade Gibson-maths-data analysis statistics
Stats - Lecture CH 3- Describing Data Using Numerical Measures.pdf
G12 graders - Graphing Distributions (Quantitative).pptx
Slide Copyright © 2007 Pearson Education, Inc Publishi.docx
Bar_Graphs_Histograms_PieCharts_Box&WhiskerPlots
Chapter 2 Section 3.ppt
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
Lesson2 - chapter two Measures of Tendency.pptx.pdf
Lessontwo - Measures of Tendency.pptx.pdf
Descrptive statistics
organizing data.pdf
1. Descriptive statistics.pptx engineering
Exploring Data
3. Descriptive statistics.pdf
Introduction to data visualization 1
3.1 Measures of center
Descriptive Statistics and Data Visualization
RVO-STATISTICS_Statistics_Introduction To Statistics IBBI.pptx
Ad

Recently uploaded (20)

PDF
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
20th Century Theater, Methods, History.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PDF
Hazard Identification & Risk Assessment .pdf
DOC
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
PDF
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
PDF
Paper A Mock Exam 9_ Attempt review.pdf.
PDF
Computing-Curriculum for Schools in Ghana
PDF
Weekly quiz Compilation Jan -July 25.pdf
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Trump Administration's workforce development strategy
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PDF
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PDF
AI-driven educational solutions for real-life interventions in the Philippine...
PDF
Practical Manual AGRO-233 Principles and Practices of Natural Farming
ChatGPT for Dummies - Pam Baker Ccesa007.pdf
What if we spent less time fighting change, and more time building what’s rig...
20th Century Theater, Methods, History.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Hazard Identification & Risk Assessment .pdf
Soft-furnishing-By-Architect-A.F.M.Mohiuddin-Akhand.doc
Vision Prelims GS PYQ Analysis 2011-2022 www.upscpdf.com.pdf
Paper A Mock Exam 9_ Attempt review.pdf.
Computing-Curriculum for Schools in Ghana
Weekly quiz Compilation Jan -July 25.pdf
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Computer Architecture Input Output Memory.pptx
Trump Administration's workforce development strategy
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
David L Page_DCI Research Study Journey_how Methodology can inform one's prac...
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
Share_Module_2_Power_conflict_and_negotiation.pptx
AI-driven educational solutions for real-life interventions in the Philippine...
Practical Manual AGRO-233 Principles and Practices of Natural Farming
Ad

Dilplaying and summarising Quantitative Data

  • 1. 1Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 3 Displaying and Summarizing Quantitative Data
  • 2. 2Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.1 Displaying Quantitative Variables
  • 3. 3Copyright © 2014, 2012, 2009 Pearson Education, Inc. Histograms •Histogram: A chart that displays quantitative data • Great for seeing the distribution of the data • Most earthquake generating tsunamis have magnitudes between 6.5 and 8. • Japan and Sumatra quakes (9.0 and 9.1) are rare. • Quakes under 5 rarely cause tsunamis. A histogram of tsunami generating earthquakes
  • 4. 4Copyright © 2014, 2012, 2009 Pearson Education, Inc. Choosing the Bin Width •Different bin widths tell different stories. •Choose the width that best shows the important features. •Presentations can feature two histograms that present the same data in different ways. •A gap in the histogram means that there were no occurrences in that
  • 5. 5Copyright © 2014, 2012, 2009 Pearson Education, Inc. Relative Frequency Histograms •Relative Frequency Histogram •The vertical axis represents the relative frequency, the frequency divided by the total. •The horizontal axis is the same as the horizontal axis for the frequency histogram. •The shape of the relative frequency histogram is the same as the frequency histogram. •Only the scale of the y-axis is different.
  • 6. 6Copyright © 2014, 2012, 2009 Pearson Education, Inc. Stem-and-Leaf Displays •Stem-and-Leaf: Shows both the shape of the distribution and all of the individual values •Not as visually pleasing as a histogram; more technical looking •Can only be used for small collections of data •The first column (stems) represents the leftmost digit. •The second column (leaves) shows the remaining
  • 7. 7Copyright © 2014, 2012, 2009 Pearson Education, Inc. Dotplots •Dotplot: Displays dots to describe the shape of the distribution •There were 30 races with a winning time of 122 seconds. •Good for smaller data sets •Visually more appealing than stem-and-leaf •In StatCrunch: Graphics → Dotplot
  • 8. 8Copyright © 2014, 2012, 2009 Pearson Education, Inc. Think Before you Draw •Is the variable quantitative? Is the answer to the survey question or result of the experiment a number whose units are known? •Histograms, stem-and-leaf diagrams, and dotplots can only display quantitative data. •Bar and pie charts display categorical data.
  • 9. 9Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.2 Shape
  • 10. 10Copyright © 2014, 2012, 2009 Pearson Education, Inc. Modes •A Mode of a histogram is a hump or high-frequency bin. •One mode → Unimodal •Two modes → Bimodal •3 or more → Multimodal Unimodal MultimodalBimodal
  • 11. 11Copyright © 2014, 2012, 2009 Pearson Education, Inc. Uniform Distributions •Uniform Distribution: All the bins have the same frequency, or at least close to the same frequency. •The histogram for a uniform distribution will be flat.
  • 12. 12Copyright © 2014, 2012, 2009 Pearson Education, Inc. Symmetry •The histogram for a symmetric distribution will look the same on the left and the right of its center. Symmetric Not Symmetric Symmetric
  • 13. 13Copyright © 2014, 2012, 2009 Pearson Education, Inc. Skew •A histogram is skewed right if the longer tail is on the right side of the mode. •A histogram is skewed left if the longer tail is on the left side of the mode. Skewed LeftSkewed Right
  • 14. 14Copyright © 2014, 2012, 2009 Pearson Education, Inc. Outliers •An Outlier is a data value that is far above or far below the rest of the data values. •An outlier is sometimes just an error in the data collection. •An outlier can also be the most important data value. •Income of a CEO •Temperature of a person with a high fever
  • 15. 15Copyright © 2014, 2012, 2009 Pearson Education, Inc. Example •The histogram shows the amount of money spent by a credit card company’s customers. Describe and interpret the distribution. •The distribution is unimodal. Customers most commonly spent a small amount of money. •The distribution is skewed right. Many customers spent only a small amount and a few were spread out at the high end. •There is an outlier at around $7000. One customer
  • 16. 16Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.3 Center
  • 17. 17Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Median •Median: The center of the data values •Half of the data values are to the left of the median and half are to the right of the median. •For symmetric distributions, the median is directly in the middle.
  • 18. 18Copyright © 2014, 2012, 2009 Pearson Education, Inc. Calculating the Median: Odd Sample Size •First order the numbers. •If there are an odd number of numbers, n, the median is at position . •Find the median of the numbers: 2, 4, 5, 6, 7, 9, 9. • •The median is the fourth number: 6 •Note that there are 3 numbers to the left of 6 and 3 to the right. +1 2 n + + = = 1 7 1 4 2 2 n
  • 19. 19Copyright © 2014, 2012, 2009 Pearson Education, Inc. Calculating the Median: Even Sample Size •First order the numbers. •If there are an even number of numbers, n, the median is the average of the two middle numbers: . •Find the median of the numbers: 2, 2, 4, 6, 7, 8. • •The median is the average of the third and the fourth numbers: = = 6 3 2 2 n +, 1 2 2 n n + = = 4 6 Median 5 2
  • 20. 20Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.4 Spread
  • 21. 21Copyright © 2014, 2012, 2009 Pearson Education, Inc. Spread •Locating the center is only part of the story •Are the data all near the center or are they spread out? •Is the highest value much higher than the lowest value? •To describe data, we must discuss both the center and the spread.
  • 22. 22Copyright © 2014, 2012, 2009 Pearson Education, Inc. Range •The range is the difference between the maximum and minimum values. Range = Maximum – Minimum •The ages of the guests at your dinner party are: 16, 18, 23, 23, 27, 35, 74 •The range is: 74 – 16 = 58 •The range is sensitive to outliers. A single high or low value will affect the range significantly.
  • 23. 23Copyright © 2014, 2012, 2009 Pearson Education, Inc. Percentiles and Quartiles •Percentiles divide the data in one hundred groups. •The nth percentile is the data value such that n percent of the data lies below that value. •For large data sets, the median is the 50th percentile. •The median of the lower half of the data is the 25th percentile and is called the first quartile (Q1). •The median of the upper half of the data is the 75th percentile and is called the third quartile (Q3).
  • 24. 24Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Interquartile Range •The Interquartile Range (IQR) is the difference between the upper quartile and the lower quartile IQR = Q3 – Q1 •The IQR measures the range of the middle half of the data. •Example: If Q1 = 23 and Q3 = 44 then IQR = 44 – 23 = 21
  • 25. 25Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Interquartile Range •The Interquartile Range for earthquake causing tsunamis is 0.9. •The picture below shows the meaning of the IQR.
  • 26. 26Copyright © 2014, 2012, 2009 Pearson Education, Inc. Benefits and Drawbacks of the IQR •The Interquartile Range is not sensitive to outliers. •The IQR provides a reasonable summary of the spread of the distribution. •The IQR shows where typical values are, except for the case of a bimodal distribution. •The IQR is not great for a general audience since most people do not know what it is.
  • 27. 27Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.5 Boxplots and 5-Number Summaries
  • 28. 28Copyright © 2014, 2012, 2009 Pearson Education, Inc. 5-Number Summary •The 5-Number Summary provides a numerical description of the data. It consists of •Minimum •First Quartile (Q1) •Median •Third Quartile (Q3) •Maximum •The list to the right shows the 5-Number Summary for the tsunami data.
  • 29. 29Copyright © 2014, 2012, 2009 Pearson Education, Inc. Interpreting the 5-Number Summary •The smallest tsunami-causing earthquake had magnitude 3.7. •The largest tsunami-causing earthquake had magnitude 9.1. •The middle half of tsunami-causing earthquakes is between 6.7 and 7.6. •Half of tsunami-causing earthquakes have magnitudes below 7.2 and half are above 7.2. •A tsunami-causing earthquake less than 6.7 is small.
  • 30. 30Copyright © 2014, 2012, 2009 Pearson Education, Inc. Boxplots •A Boxplot is a chart that displays the 5-Point Summary and the outliers. •The Box shows the Interquartile Range. •The dashed lines are called fences, outside the fences lie the outliers. •Above and below the box are the whiskers that display the most extreme data values within the fences.
  • 31. 31Copyright © 2014, 2012, 2009 Pearson Education, Inc. Finding the Fences •The lower fence is defined by Lower Fence = Q1 – 1.5 × IQR •The upper fence is defined by Upper Fence = Q3 + 1.5 × IQR •Tsunami Example: Q1 = 6.7, Q3 = 7.6 IQR = 7.6 – 6.7 = 0.9 •Lower Fence = 6.7 – 1.5 × 0.9 = 5.35 •Upper Fence = 7.6 + 1.5 × 0.9 = 8.95
  • 32. 32Copyright © 2014, 2012, 2009 Pearson Education, Inc. Step-by-Step Example of Shape, Center, Spread: Flight Cancellations •Question: How often are flights cancelled? •Who? Months •What? Percentage of Flights Cancelled at U.S. Airports •When? 1995 – 2011 •Where? United States •How? Bureau of Transportation Statistics Data
  • 33. 33Copyright © 2014, 2012, 2009 Pearson Education, Inc. Flight Cancellations: Think •Identify the Variable •Percent of flight cancellations at U.S. airports •Quantitative: Units are percentages. •How will be data be summarized? •Histogram •Numerical Summary •Boxplot
  • 34. 34Copyright © 2014, 2012, 2009 Pearson Education, Inc. Flight Cancellations: Show •Use StatCrunch to create the histogram, boxplot, and numerical summary.
  • 35. 35Copyright © 2014, 2012, 2009 Pearson Education, Inc. Flight Cancellations: Tell •Describe the shape, center, and spread of the distribution. Report on the symmetry, number of modes, and any gaps or outliers. You should also mention any concerns you may have about the data. •Skewed to the Right: Can’t be a negative percent. Bad weather and other airport troubles can cause extreme cancellations. •IQR is small: 1.23%. Consistency among cancellation percents
  • 36. 36Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.6 The Center of Symmetric Distributions: The Mean
  • 37. 37Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Mean •The Mean is what most people think of as the average. •Add up all the numbers and divide by the number of numbers. •Recall that Σ means “Add them all.” •In StatCrunch, the mean is listed in the Summary Statistics. y y n = ∑
  • 38. 38Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Mean is the “Balancing Point” •If you put your finger on the mean, the histogram will balance perfectly.
  • 39. 39Copyright © 2014, 2012, 2009 Pearson Education, Inc. Mean Vs. Median •For symmetric distributions, the mean and the median are equal. •The balancing point is at the center. •The tail “pulls” the mean towards it more than it does to the median. •The mean is more sensitive to outliers than the median.
  • 40. 40Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Mean Is Attracted to the Outlier •The mean is larger than the median since it is “pulled” to the right by the outlier. •The median is a better measure of the center for data that is skewed.
  • 41. 41Copyright © 2014, 2012, 2009 Pearson Education, Inc. Why Use the Mean? •Although the median is a better measure of the center, the mean weighs in large and small values better. •The mean is easier to work with. •For symmetric data, statisticians would rather use the mean. •It is always ok to report both the mean and the median.
  • 42. 42Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.7 The Spread of Symmetric Distributions: The Standard Deviation
  • 43. 43Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Variance •The variance is a measure of how far the data is spread out from the mean. •The difference from the mean is: . •To make it positive, square it. •Then find the average of all of these distances, except instead of dividing by n, divide by n – 1. •Use s2 to represent the variance. −y y ( ) 2 2 1 y y s n − = − ∑
  • 44. 44Copyright © 2014, 2012, 2009 Pearson Education, Inc. Standard Deviation •The variance’s units are the square of the original units. •Taking the square root of the variance gives the standard deviation, which will have the same units as y. •The standard deviation is a number that is close to the average distances that the y values are from the mean. •If data values are close to the mean (less spread out), then the standard deviation will be small. •If data values are far from the mean (more spread out), ( ) 2 1 y y s n − = − ∑
  • 45. 45Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Standard Deviation and Histograms A B C Answer: C, A, B Order the histograms below from smallest standard deviation to largest standard deviation.
  • 46. 46Copyright © 2014, 2012, 2009 Pearson Education, Inc. 3.8 Summary—What to Tell About a Quantitative Variable
  • 47. 47Copyright © 2014, 2012, 2009 Pearson Education, Inc. What to Tell •Histogram, Stem-and-Leaf, Boxplot •Describe modality, symmetry, outliers •Center and Spread •Median and IQR if not symmetric •Mean and Standard Deviation if symmetric. •Unimodal symmetric data: IQR > s. Check for errors. •Unusual Features •For multiple modes, possibly split the data into groups. •When there are outliers, report the mean and standard deviation with and without the outliers.
  • 48. 48Copyright © 2014, 2012, 2009 Pearson Education, Inc. Example: Fuel Efficiency •The car owner has checked the fuel efficiency each time he filled the tank. How would you describe the fuel efficiency? •Plan: Summarize the distribution of the car’s fuel efficiency. •Variable: mpg for 100 fill ups, Quantitative •Mechanics: show a histogram •Fairly symmetric •
  • 49. 49Copyright © 2014, 2012, 2009 Pearson Education, Inc. Fuel Efficiency Continued •Which to report? •The mean and median are close. •Report the mean and standard deviation. •Conclusion •Distribution is unimodal and symmetric. •Mean is 22.4 mpg. •Low outlier may be investigated, but limited effect on the mean •s = 2.45; from one filling to the next, fuel efficiency differs from the mean by an average of about 2.45 mpg.
  • 50. 50Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Can Go Wrong? •Don’t make a histogram for categorical data. •Don’t look for shape, center, and spread for a bar chart. •Choose a bin width appropriate for the data.
  • 51. 51Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Can Go Wrong? Continued •Do a reality check •Don’t blindly trust your calculator. For example, a mean student age of 193 years old is nonsense. •Sort before finding the median and percentiles. •315, 8, 2, 49, 97 does not have median of 2. •Don’t worry about small differences in the quartile calculation. •Don’t compute numerical summaries for a categorical variable. •The mean Social Security number is meaningless.
  • 52. 52Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Can Go Wrong? Continued •Don’t report too many decimal places. •Citing the mean fuel efficiency as 22.417822453 is going overboard. •Don’t round in the middle of a calculation. •For multiple modes, think about separating groups. •Heights of people → Separate men and women •Beware of outliers, the mean and standard deviation are sensitive to outliers. •