SlideShare a Scribd company logo
Chapter 2
Descriptive Statistics
1
Larson/Farber 4th ed.
Chapter Outline
• 2.1 Frequency Distributions and Their Graphs
• 2.2 More Graphs and Displays
• 2.3 Measures of Central Tendency
• 2.4 Measures of Variation
• 2.5 Measures of Position
2
Larson/Farber 4th ed.
Overview
Descriptive Statistics
• Describes the important characteristics of a set of
data.
• Organize, present, and summarize data:
1. Graphically
2. Numerically
Larson/Farber 4th ed. 3
Important Characteristics of
Quantitative Data
“Shape, Center, and Spread”
• Center: A representative or average value that
indicates where the middle of the data set is located.
• Variation: A measure of the amount that the
values vary among themselves.
• Distribution: The nature or shape of the distribution
of data (such as bell-shaped, uniform, or skewed).
Overview
• 2.1 Frequency Distributions and Their Graphs
• 2.2 More Graphs and Displays
• 2.3 Measures of Central Tendency
• 2.4 Measures of Variation
• 2.5 Measures of Position
5
Larson/Farber 4th ed.
Section 2.1
Frequency Distributions
and Their Graphs
6
Larson/Farber 4th ed.
Frequency Distributions
Frequency Distribution
• A table that organizes data values into classes or intervals
along with number of values that fall in each class
(frequency, f ).
1. Ungrouped Frequency Distribution – for data sets with
few different values. Each value is in its own class.
2. Grouped Frequency Distribution: for data sets with
many different values, which are grouped together in
the classes.
Grouped and Ungrouped
Frequency Distributions
Courses
Taken
Frequency, f
1 25
2 38
3 217
4 1462
5 932
6 15
Ungrouped
Age of
Voters
Frequency, f
18-30 202
31-42 508
43-54 620
55-66 413
67-78 158
78-90 32
Grouped
Ungrouped Frequency Distributions
Number of Peas in a Pea
Pod
Sample Size: 50
5 5 4 6 4
3 7 6 3 5
6 5 4 5 5
6 2 3 5 5
5 5 7 4 3
4 5 4 5 6
5 1 6 2 6
6 6 6 6 4
4 5 4 5 3
5 5 7 6 5
Peas per
pod Freq, f Peas per pod
Freq,
f
1 1
2 2
3 5
4 9
5 18
6 12
7 3
Graphs of Frequency Distributions:
Frequency Histograms
Frequency Histogram
• A bar graph that represents the frequency distribution.
• The horizontal scale is quantitative and measures the
data values.
• The vertical scale measures the frequencies of the
classes.
• Consecutive bars must touch.
Larson/Farber 4th ed. 10
data values
frequency
Frequency Histogram
Ex. Peas per Pod
Peas per pod Freq, f
1 1
2 2
3 5
4 9
5 18
6 12
7 3
Number of Peas in a Pod
0
5
10
15
20
1 2 3 4 5 6 7
Number of Peas
Frequency,
f
Relative Frequency Distributions and
Relative Frequency Histograms
Relative Frequency Distribution
• Shows the portion or percentage of the data that falls
in a particular class.
12
n
f


size
Sample
frequency
class
frequency
relative
•
Relative Frequency Histogram
• Has the same shape and the same horizontal scale as
the corresponding frequency histogram.
• The vertical scale measures the relative frequencies,
not frequencies.
Relative Frequency Histogram
Has the same shape and horizontal scale as a
histogram, but the vertical scale is marked with
relative frequencies.
Grouped Frequency Distributions
Grouped Frequency Distribution
• For data sets with many different values.
• Groups data into 5-20 classes of equal width.
Exam Scores Freq, f Exam Scores Freq, f
30-39
40-49
50-59
60-69
70-79
80-89
90-99
Exam Scores Freq, f
30-39 1
40-49 0
50-59 4
60-69 9
70-79 13
80-89 10
90-99 3
Grouped Frequency Distribution Terms
• Lower class limits: are the smallest numbers that
can actually belong to different classes
• Upper class limits: are the largest numbers that can
actually belong to different classes
• Class width: is the difference between two
consecutive lower class limits
15
Labeling Grouped Frequency
Distributions
• Class midpoints: the value halfway between LCL
and UCL:
• Class boundaries: the value halfway between an
UCL and the next LCL
(Lower class limit) (Upper class limit)
2

(Upper class limit) (next Lower class limit)
2

Constructing a Grouped Frequency
Distribution
17
1. Determine the range of the data:
 Range = highest data value – lowest data value
 May round up to the next convenient number
2. Decide on the number of classes.
 Usually between 5 and 20; otherwise, it may be difficult to detect any
patterns.
3. Find the class width:
 .
 Round up to the next convenient number.
range
class width =
number of classes
Constructing a Frequency Distribution
4. Find the class limits.
 Choose the first LCL: use the minimum data entry
or something smaller that is convenient.
 Find the remaining LCLs: add the class width to the
lower limit of the preceding class.
 Find the UCLs: Remember that classes must cover
all data values and cannot overlap.
5. Find the frequencies for each class. (You may add a
tally column first and make a tally mark for each data
value in the class).
Larson/Farber 4th ed. 18
“Shape” of Distributions
Symmetric
• Data is symmetric if the left half of its histogram is
roughly a mirror image of its right half.
Skewed
• Data is skewed if it is not symmetric and if it extends
more to one side than the other.
Uniform
• Data is uniform if it is equally distributed (on a
histogram, all the bars are the same height or
approximately the same height).
The Shape of Distributions
Symmetric
Skewed Right
Skewed left
Uniform
Outliers
• Unusual data values as compared to the rest of the set.
They may be distinguished by gaps in a histogram.
Outliers
Section 2.2
More Graphs and Displays
Larson/Farber 4th ed. 22
Other Graphs
Besides Histograms, there are other methods of
graphing quantitative data:
• Stem and Leaf Plots
• Dot Plots
• Time Series
Stem and Leaf Plots
Represents data by separating each data value into
two parts: the stem (such as the leftmost digit) and
the leaf (such as the rightmost digit)
Larson/Farber 4th ed. 24
Constructing Stem and Leaf Plots
• Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems).
• Arrange all possible stems vertically so there are no missing stems.
• Write each leaf to the right of its stem, in order.
• Create a key to recreate the data.
• Variations of stem plots:
1. Split stems
2. Back to back stem plots.
Larson/Farber 4th ed. 25
Constructing a Stem-and-Leaf Plot
Larson/Farber 4th ed. 26
Include a key to identify
the values of the data.
Dot Plots
Dot plot
• Consists of a graph in which each data value is plotted as
a point along a scale of values
Figure 2-5
Time Series
(Paired data)
Time Series
• Data set is composed of quantitative entries taken at
regular intervals over a period of time.
 e.g., The amount of precipitation measured each
day for one month.
• Use a time series chart to graph.
Larson/Farber 4th ed. 28
time
Quantitative
data
Time-Series Graph
Number of Screens at Drive-In Movies Theaters
Figure 2-8
Ex. www.eia.doe.gov/oil_gas/petroleum/
Graphing Qualitative Data Sets
Pie Chart
• A circle is divided into sectors
that represent categories.
Larson/Farber 4th ed. 30
Pareto Chart
• A vertical bar graph in which the
height of each bar represents
frequency or relative frequency.
Categories
Frequency
Constructing a Pie Chart
• Find the total sample size.
• Convert the frequencies to relative frequencies (percent).
31
Marital Status Frequency,f
(in millions)
Relative frequency (%)
Never Married 55.3
Married 127.7
Widowed 13.9
Divorced 22.8
Total: 219.7
55.3
0.25 or 25%
219.7

127.7
219.7

13.9
219.7

22.8
219.7

Constructing Pareto Charts
• Create a bar for each category, where the height of the
bar can represent frequency or relative frequency.
• The bars are often positioned in order of decreasing
height, with the tallest bar positioned at the left.
Figure 2-6
Section 2.3
Measures of Central Tendency
Larson/Farber 4th ed. 33
Measures of Central Tendency
Measure of central tendency
• A value that represents a typical, or central, entry of a
data set.
• Most common measures of central tendency:
 Mean
 Median
 Mode
Larson/Farber 4th ed. 34
Measure of Central Tendency: Mean
Mean : The sum of all the data entries divided by the number
of entries.
• Population mean:
• Sample mean:
• Round-off rule for measures of center: Carry
one more decimal place than is in the original values. Do
not round until the last step.
35
x
N



x
x
n


Measure of Central Tendency: Median
Median
• The value that lies in the middle of the data when the data
set is arranged in order from lowest to highest. .
• Measures the center of an ordered data set by dividing it
into two equal parts.
• A sample mean is often referred to as x.
• If the data set has an
 odd number of entries: median is the middle data entry.
 even number of entries: median is the mean of the two
middle data entries.
Larson/Farber 4th ed. 36
~
Computing the Median
If the data set has an:
•odd number of entries: median is the middle data entry:
•even number of entries: median is the mean of the two
middle data entries:
37
2 5 6 11 13
median is the exact middle value:
median is the mean of the by two numbers:
2 5 6 7 11 13
 6 7
6.5
2
x

 
 6
x 
Measure of Central Tendency: Mode
Mode
• The data entry that occurs with the greatest frequency.
• If no entry is repeated the data set has no mode.
• If two entries occur with the same greatest frequency,
each entry is a mode (bimodal).
a) 5.40 1.10 0.42 0.73 0.48 1.10
b) 27 27 27 55 55 55 88 88 99
c) 1 2 3 6 7 8 9 10
Mode is 1.10
Bimodal - 27 & 55
No Mode
Comparing the Mean, Median, and Mode
• All three measures describe an “average”. Choose the one that best
represents a “typical” value in the set.
• Mean:
 The most familiar average.
 A reliable measure because it takes into account every entry of a
data set.
 May be greatly affected by outliers or skew.
• Median:
 A common average.
 Not as effected by skew or outliers.
• Mode: May be used if there is an overwhelming repeat.
Choosing the “Best Average”
• The shape of your data and the existence of any
outliers may help you choose the best average:
Section 2.4
Measures of Variation
Larson/Farber 4th ed. 41
Measures of Variation (“Spread”)
Another important characteristic of quantitative data is how
much the data varies, or is spread out.
The 2 most common method of measuring spread are:
1. Range
2. Standard deviation and Variance
Larson/Farber 4th ed. 42
Range
Range
• The difference between the maximum and minimum
data entries in the set.
• The data must be quantitative.
• Range = (Max. data entry) – (Min. data entry)
Larson/Farber 4th ed. 43
Example: Finding the Range
The wait time to see a bank teller is studied at 2 banks.
Bank A has multiple lines, one for each teller.
Bank B has a single wait line for 1st
available teller.
5 wait times (in minutes) are sampled from each bank:
Bank A: 5.2 6.2 7.5 8.4 9.2
Bank B: 6.6 6.8 7.5 7.7 7.9
Find the mean, median, and range for each bank.
Solution: Finding the Range
• Bank A: Range = ?
• Bank B: Range = ?
• Note: The range is easy to compute, but only uses 2
values. Do the following 2 sets vary the same?
 Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
 Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10
Larson/Farber 4th ed. 45
Standard Deviation and Variance
Measures the typical amount data deviates from the
mean.
Sample Variance, :
•
Sample Standard Deviation, s:
•
46
2
2 ( )
1
x x
s
n
 


2
2 ( )
1
x x
s s
n
 
 

2
s
Finding Sample Variance & Standard Deviation
47
1. Find the mean of the sample
data set.
2. Find deviation of each entry.
3. Square each deviation.
4. Add to get the sum of the
deviations squared.
5. Divide by n – 1 to get the
sample variance.
6. Find the square root to get
the sample standard
deviation.
x
x
n


2
( )
x x
 
2
( )
x x

x x

2
2 ( )
1
x x
s
n
 


2
( )
1
x x
s
n
 


Find the Standard Deviation and Variance
for Bank A (multi-line)
Wait time,
x (in min)
Deviation: x – x Squares: (x – x)2
5.2 5.2 – 7.3 = -2.1 (–2.1)2
= 4.41
6.2 6.2 – 7.3 = ( )2
=
7.5 7.5 – 7.3 = ( )2
=
8.4 8.4 – 7.3 = ( )2
=
9.2 9.2 – 7.3 = ( )2
=
Σ(x – x) =
36.5
x
 
36.5
7.3 min
5
x
x
n

  
 
2
x x
  
2
2 ( )
1
x x
s
n
 
 

2
s s
 
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Find the Standard Deviation and Variance
for Bank B (1 wait line)
Wait time,
x (in min)
Deviation: x – x Squares: (x – x)2
6.6
6.8
7.5
7.7
7.9
Σ(x – x) =
36.5
x
 
36.5
7.3 min
5
x
x
n

  
 
2
x x
  
2
2 ( )
1
x x
s
n
 
 

2
s s
 
• Round to one more decimal than the data.
• Don’t round until the end.
• Include the appropriate units.
Sample versus Population
Standard Deviation and Variance
Sample Population
Statistics: Parameters:
Mean x µ
Standard s σ
Deviation
Variance s2
σ2
Sample versus Population
Standard Deviation
Sample Standard Deviation
•
Population Standard Deviation
•
Larson/Farber 4th ed. 51
2
2 ( )
x
N

 
 
 
Note: Unlike x and µ, the formulas for s and σ
are not mathematically the same:
2
2 ( )
1
x x
s s
n
 
 

Standard Deviation: Key Points
 The standard deviation is a measure of variation of all
values from the mean. The larger s is, the more the
data varies.
 ( When would s = 0 ?)
 The value of the standard deviation s can increase
dramatically with the inclusion of one or more
outliers (data values far away from all others)
 The units of the standard deviation s are the same as
the units of the original data values. (The variance
has units2
).
0
s 
Interpreting Standard Deviation
• Standard deviation is a measure of the typical amount
an entry deviates from the mean.
• The more the entries are spread out, the greater the
standard deviation.
Larson/Farber 4th ed. 53
Solution: Using Technology to Find the
Standard Deviation
Larson/Farber 4th ed. 54
Sample Mean
Sample Standard
Deviation
Using Technology
The gas mileage of 2 cars is sampled over various
conditions:
Car A: 21.1 21.2 20.8 19.8 23.8 (mpg)
Car B: 25.2 19.1 18.0 24.4 20.3 (mpg)
Which car do you think gets “better” mpg?
Use a calculator to find the mean and standard deviation
for each to justify your choice.
Standard Deviation and “Spread”
How does “s” show how much the data varies?
Three methods:
1. Range Rule of Thumb
2. Chebyshev’s Theorem
3. The Empirical Rule
The Range Rule of Thumb
Alternatively, If the range is known, you can use the range
rule to estimate the standard deviation:
Range
4
s 
Range Rule: For most data sets, the majority of the
data lies within 2 standard deviations of the mean.
Recall: Range = High – Lo
Estimate: Range ≈ 4s
Using the Range Rule of Thumb
A sample of women’s heights has a mean of 64
inches and a standard deviation of 2.5 inches.
Using the range rule, “most” women fall within
what heights?
What would be an “unusual” height?
Using the Range Rule of Thumb
The sample of Exam Scores used in the class
handout had a mean of 73.6. Which of the
following is most likely the standard deviation of
the sample?
s = 3.6 s = 12.8 s = 74.5
Use the range rule to help justify your choice.
Chebyshev’s Theorem
Chebyshev’s Theorem
For data with any distribution, the proportion (or
fraction) of any set of data lying within K standard
deviations of the mean is always at least 1-1/K2
, where
K is any positive number greater than 1.
 For K = 2, at least 3/4 (or 75%) of all values lie
within 2 standard deviations of the mean
 For K = 3, at least 8/9 (or 89%) of all values lie
within 3 standard deviations of the mean
Using Chebyshev’s Theorem
A sample of salaries at an elementary school has a
mean of $32,000 and a standard deviation of $3000.
Use Chebyshev’s Theorem to describe how the salaries
are spread out.
Would a salary of $28,000 be “unusual?”
Would a salary of $45,000 be “unusual”?
The Empirical Rule
Empirical (68-95-99.7) Rule
For data sets having a symmetric distribution:
 About 68% of all values fall within 1 standard
deviation of the mean
 About 95% of all values fall within 2 standard
deviations of the mean
 About 99.7% of all values fall within 3 standard
deviations of the mean
The Empirical Rule
The Empirical Rule
The Empirical Rule
Example: Using the Empirical Rule
A sample of IQs has a symmetric distribution with a mean
of 100 and a standard deviation of 15.
1. Sketch the distribution.
2. 68% of people have an IQ between what 2 values?
3. What percent of people have an IQ between 70 and 130?
4. What percent of people have an IQ between 100 and 115?
5. What percent of people have an IQ above 145?
66

More Related Content

PPTX
2.2 Histograms
PPTX
Descriptive statistics
PPTX
collectionandrepresentationofdata1-200904192336.pptx
PPTX
2.1 frequency distributions, histograms, and related topics
PDF
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
PDF
Lesson2 - chapter two Measures of Tendency.pptx.pdf
PDF
Lessontwo - Measures of Tendency.pptx.pdf
PPTX
Classes
2.2 Histograms
Descriptive statistics
collectionandrepresentationofdata1-200904192336.pptx
2.1 frequency distributions, histograms, and related topics
Lesson2 - chapter 2 Measures of Tendency.pptx.pdf
Lesson2 - chapter two Measures of Tendency.pptx.pdf
Lessontwo - Measures of Tendency.pptx.pdf
Classes

Similar to Ch.2 ppt - descriptive stat - Larson-fabers.ppt (20)

PPTX
UNIT II DESCRIPTIVE STATISTICS TABLE GRAPH.pptx
PPTX
3 Frequency Distribution biostatistics wildlife
PPTX
collectionandrepresentationofdata1-200904192336 (1).pptx
PPTX
Data presentation.pptx
PPTX
3.1 Measures of center
PPTX
Frequency Distributions
PPTX
Tabular and Graphical Representation of Data
PPTX
collectionandrepresentationofdata1-200904192336.pptx
PDF
Chapter 1 - Displaying Descriptive Statistics.pdf
PPTX
2.3 Graphs that enlighten and graphs that deceive
PPTX
Chapter 2 Descriptive statistics for pedatric.pptx
PPTX
Data presentation Lecture
PPTX
Frequency Distribution
PPTX
LECTURE 3 - inferential statistics bmaths
PPTX
Frequency-Distribution..m.,m.........pptx
PPT
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
PPT
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
PPT
graphic representations in statistics
PPTX
day two.pptx
PPTX
2. AAdata presentation edited edited tutor srudents(1).pptx
UNIT II DESCRIPTIVE STATISTICS TABLE GRAPH.pptx
3 Frequency Distribution biostatistics wildlife
collectionandrepresentationofdata1-200904192336 (1).pptx
Data presentation.pptx
3.1 Measures of center
Frequency Distributions
Tabular and Graphical Representation of Data
collectionandrepresentationofdata1-200904192336.pptx
Chapter 1 - Displaying Descriptive Statistics.pdf
2.3 Graphs that enlighten and graphs that deceive
Chapter 2 Descriptive statistics for pedatric.pptx
Data presentation Lecture
Frequency Distribution
LECTURE 3 - inferential statistics bmaths
Frequency-Distribution..m.,m.........pptx
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
20- Tabular & Graphical Presentation of data(UG2017-18).ppt
graphic representations in statistics
day two.pptx
2. AAdata presentation edited edited tutor srudents(1).pptx
Ad

More from simonkahinga (20)

PPTX
Social_Mediaretd_Parents_Education_Updated_With_Stats.pptx
PPTX
Social_Medias_Parents_Education_PPT.pptx
PPTX
Hasel__F__Ellen_G._White_and_Creationism(1).pptx
PPT
EGWHermeneuticsffgggggggggggggggggggggggggggggggg.ppt
PPT
EGWRelationshiptoBiblegytytytytytyuhtrtty.ppt
PPTX
Hasel__F__Ellen_G._White_and_Creationism(1).pptx
PPTX
The-Great-Controversy-Theme-the Life-blood-of-Adventist-Theology (1).pptx
PPTX
22 - Prophets1234567890123456677775554444.pptx
PPT
THE 2520 Wrong prophecy and teaching.ppt
PPTX
UTLZ ftgggvggggfghhhfddhbhhvfhhh L.1.pptx
PPT
MY DRESS my dress my choice MY CHOICE.ppt
PDF
GEOGRAPHY REVISION NOTES PAPER 223345.pdf
PDF
Philosophies Issues adn Their Relevance for Education.pdf
PPTX
frequencydistribution-161207172034666667767587 (11)i.pptx
PPT
Devotion 2 - Faith - The Power of Prayer.ppt
PPT
1-151016093422122333455456-lva1-app6891.ppt
PPT
FREQUENCY_DISTRIBUTIONS gfrtfyftyfyyfy.ppt
PDF
VII_GEO_L06_M01_NATURAI_VEGETATION_AND_WILDLIFE_PPT.pdf
PDF
VII_GEO_L06_M01_NATURAL_VEGETATION_AND_WILDLIFE_PPT.pdf
PPT
14. NATURAL VEGETATION in kenya and the world.ppt
Social_Mediaretd_Parents_Education_Updated_With_Stats.pptx
Social_Medias_Parents_Education_PPT.pptx
Hasel__F__Ellen_G._White_and_Creationism(1).pptx
EGWHermeneuticsffgggggggggggggggggggggggggggggggg.ppt
EGWRelationshiptoBiblegytytytytytyuhtrtty.ppt
Hasel__F__Ellen_G._White_and_Creationism(1).pptx
The-Great-Controversy-Theme-the Life-blood-of-Adventist-Theology (1).pptx
22 - Prophets1234567890123456677775554444.pptx
THE 2520 Wrong prophecy and teaching.ppt
UTLZ ftgggvggggfghhhfddhbhhvfhhh L.1.pptx
MY DRESS my dress my choice MY CHOICE.ppt
GEOGRAPHY REVISION NOTES PAPER 223345.pdf
Philosophies Issues adn Their Relevance for Education.pdf
frequencydistribution-161207172034666667767587 (11)i.pptx
Devotion 2 - Faith - The Power of Prayer.ppt
1-151016093422122333455456-lva1-app6891.ppt
FREQUENCY_DISTRIBUTIONS gfrtfyftyfyyfy.ppt
VII_GEO_L06_M01_NATURAI_VEGETATION_AND_WILDLIFE_PPT.pdf
VII_GEO_L06_M01_NATURAL_VEGETATION_AND_WILDLIFE_PPT.pdf
14. NATURAL VEGETATION in kenya and the world.ppt
Ad

Recently uploaded (20)

PDF
Business Ethics Teaching Materials for college
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
master seminar digital applications in india
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PDF
Pre independence Education in Inndia.pdf
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Basic Mud Logging Guide for educational purpose
Business Ethics Teaching Materials for college
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Anesthesia in Laparoscopic Surgery in India
school management -TNTEU- B.Ed., Semester II Unit 1.pptx
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Microbial diseases, their pathogenesis and prophylaxis
master seminar digital applications in india
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Week 4 Term 3 Study Techniques revisited.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
102 student loan defaulters named and shamed – Is someone you know on the list?
Pre independence Education in Inndia.pdf
STATICS OF THE RIGID BODIES Hibbelers.pdf
Basic Mud Logging Guide for educational purpose

Ch.2 ppt - descriptive stat - Larson-fabers.ppt

  • 2. Chapter Outline • 2.1 Frequency Distributions and Their Graphs • 2.2 More Graphs and Displays • 2.3 Measures of Central Tendency • 2.4 Measures of Variation • 2.5 Measures of Position 2 Larson/Farber 4th ed.
  • 3. Overview Descriptive Statistics • Describes the important characteristics of a set of data. • Organize, present, and summarize data: 1. Graphically 2. Numerically Larson/Farber 4th ed. 3
  • 4. Important Characteristics of Quantitative Data “Shape, Center, and Spread” • Center: A representative or average value that indicates where the middle of the data set is located. • Variation: A measure of the amount that the values vary among themselves. • Distribution: The nature or shape of the distribution of data (such as bell-shaped, uniform, or skewed).
  • 5. Overview • 2.1 Frequency Distributions and Their Graphs • 2.2 More Graphs and Displays • 2.3 Measures of Central Tendency • 2.4 Measures of Variation • 2.5 Measures of Position 5 Larson/Farber 4th ed.
  • 6. Section 2.1 Frequency Distributions and Their Graphs 6 Larson/Farber 4th ed.
  • 7. Frequency Distributions Frequency Distribution • A table that organizes data values into classes or intervals along with number of values that fall in each class (frequency, f ). 1. Ungrouped Frequency Distribution – for data sets with few different values. Each value is in its own class. 2. Grouped Frequency Distribution: for data sets with many different values, which are grouped together in the classes.
  • 8. Grouped and Ungrouped Frequency Distributions Courses Taken Frequency, f 1 25 2 38 3 217 4 1462 5 932 6 15 Ungrouped Age of Voters Frequency, f 18-30 202 31-42 508 43-54 620 55-66 413 67-78 158 78-90 32 Grouped
  • 9. Ungrouped Frequency Distributions Number of Peas in a Pea Pod Sample Size: 50 5 5 4 6 4 3 7 6 3 5 6 5 4 5 5 6 2 3 5 5 5 5 7 4 3 4 5 4 5 6 5 1 6 2 6 6 6 6 6 4 4 5 4 5 3 5 5 7 6 5 Peas per pod Freq, f Peas per pod Freq, f 1 1 2 2 3 5 4 9 5 18 6 12 7 3
  • 10. Graphs of Frequency Distributions: Frequency Histograms Frequency Histogram • A bar graph that represents the frequency distribution. • The horizontal scale is quantitative and measures the data values. • The vertical scale measures the frequencies of the classes. • Consecutive bars must touch. Larson/Farber 4th ed. 10 data values frequency
  • 11. Frequency Histogram Ex. Peas per Pod Peas per pod Freq, f 1 1 2 2 3 5 4 9 5 18 6 12 7 3 Number of Peas in a Pod 0 5 10 15 20 1 2 3 4 5 6 7 Number of Peas Frequency, f
  • 12. Relative Frequency Distributions and Relative Frequency Histograms Relative Frequency Distribution • Shows the portion or percentage of the data that falls in a particular class. 12 n f   size Sample frequency class frequency relative • Relative Frequency Histogram • Has the same shape and the same horizontal scale as the corresponding frequency histogram. • The vertical scale measures the relative frequencies, not frequencies.
  • 13. Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.
  • 14. Grouped Frequency Distributions Grouped Frequency Distribution • For data sets with many different values. • Groups data into 5-20 classes of equal width. Exam Scores Freq, f Exam Scores Freq, f 30-39 40-49 50-59 60-69 70-79 80-89 90-99 Exam Scores Freq, f 30-39 1 40-49 0 50-59 4 60-69 9 70-79 13 80-89 10 90-99 3
  • 15. Grouped Frequency Distribution Terms • Lower class limits: are the smallest numbers that can actually belong to different classes • Upper class limits: are the largest numbers that can actually belong to different classes • Class width: is the difference between two consecutive lower class limits 15
  • 16. Labeling Grouped Frequency Distributions • Class midpoints: the value halfway between LCL and UCL: • Class boundaries: the value halfway between an UCL and the next LCL (Lower class limit) (Upper class limit) 2  (Upper class limit) (next Lower class limit) 2 
  • 17. Constructing a Grouped Frequency Distribution 17 1. Determine the range of the data:  Range = highest data value – lowest data value  May round up to the next convenient number 2. Decide on the number of classes.  Usually between 5 and 20; otherwise, it may be difficult to detect any patterns. 3. Find the class width:  .  Round up to the next convenient number. range class width = number of classes
  • 18. Constructing a Frequency Distribution 4. Find the class limits.  Choose the first LCL: use the minimum data entry or something smaller that is convenient.  Find the remaining LCLs: add the class width to the lower limit of the preceding class.  Find the UCLs: Remember that classes must cover all data values and cannot overlap. 5. Find the frequencies for each class. (You may add a tally column first and make a tally mark for each data value in the class). Larson/Farber 4th ed. 18
  • 19. “Shape” of Distributions Symmetric • Data is symmetric if the left half of its histogram is roughly a mirror image of its right half. Skewed • Data is skewed if it is not symmetric and if it extends more to one side than the other. Uniform • Data is uniform if it is equally distributed (on a histogram, all the bars are the same height or approximately the same height).
  • 20. The Shape of Distributions Symmetric Skewed Right Skewed left Uniform
  • 21. Outliers • Unusual data values as compared to the rest of the set. They may be distinguished by gaps in a histogram. Outliers
  • 22. Section 2.2 More Graphs and Displays Larson/Farber 4th ed. 22
  • 23. Other Graphs Besides Histograms, there are other methods of graphing quantitative data: • Stem and Leaf Plots • Dot Plots • Time Series
  • 24. Stem and Leaf Plots Represents data by separating each data value into two parts: the stem (such as the leftmost digit) and the leaf (such as the rightmost digit) Larson/Farber 4th ed. 24
  • 25. Constructing Stem and Leaf Plots • Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems). • Arrange all possible stems vertically so there are no missing stems. • Write each leaf to the right of its stem, in order. • Create a key to recreate the data. • Variations of stem plots: 1. Split stems 2. Back to back stem plots. Larson/Farber 4th ed. 25
  • 26. Constructing a Stem-and-Leaf Plot Larson/Farber 4th ed. 26 Include a key to identify the values of the data.
  • 27. Dot Plots Dot plot • Consists of a graph in which each data value is plotted as a point along a scale of values Figure 2-5
  • 28. Time Series (Paired data) Time Series • Data set is composed of quantitative entries taken at regular intervals over a period of time.  e.g., The amount of precipitation measured each day for one month. • Use a time series chart to graph. Larson/Farber 4th ed. 28 time Quantitative data
  • 29. Time-Series Graph Number of Screens at Drive-In Movies Theaters Figure 2-8 Ex. www.eia.doe.gov/oil_gas/petroleum/
  • 30. Graphing Qualitative Data Sets Pie Chart • A circle is divided into sectors that represent categories. Larson/Farber 4th ed. 30 Pareto Chart • A vertical bar graph in which the height of each bar represents frequency or relative frequency. Categories Frequency
  • 31. Constructing a Pie Chart • Find the total sample size. • Convert the frequencies to relative frequencies (percent). 31 Marital Status Frequency,f (in millions) Relative frequency (%) Never Married 55.3 Married 127.7 Widowed 13.9 Divorced 22.8 Total: 219.7 55.3 0.25 or 25% 219.7  127.7 219.7  13.9 219.7  22.8 219.7 
  • 32. Constructing Pareto Charts • Create a bar for each category, where the height of the bar can represent frequency or relative frequency. • The bars are often positioned in order of decreasing height, with the tallest bar positioned at the left. Figure 2-6
  • 33. Section 2.3 Measures of Central Tendency Larson/Farber 4th ed. 33
  • 34. Measures of Central Tendency Measure of central tendency • A value that represents a typical, or central, entry of a data set. • Most common measures of central tendency:  Mean  Median  Mode Larson/Farber 4th ed. 34
  • 35. Measure of Central Tendency: Mean Mean : The sum of all the data entries divided by the number of entries. • Population mean: • Sample mean: • Round-off rule for measures of center: Carry one more decimal place than is in the original values. Do not round until the last step. 35 x N    x x n  
  • 36. Measure of Central Tendency: Median Median • The value that lies in the middle of the data when the data set is arranged in order from lowest to highest. . • Measures the center of an ordered data set by dividing it into two equal parts. • A sample mean is often referred to as x. • If the data set has an  odd number of entries: median is the middle data entry.  even number of entries: median is the mean of the two middle data entries. Larson/Farber 4th ed. 36 ~
  • 37. Computing the Median If the data set has an: •odd number of entries: median is the middle data entry: •even number of entries: median is the mean of the two middle data entries: 37 2 5 6 11 13 median is the exact middle value: median is the mean of the by two numbers: 2 5 6 7 11 13  6 7 6.5 2 x     6 x 
  • 38. Measure of Central Tendency: Mode Mode • The data entry that occurs with the greatest frequency. • If no entry is repeated the data set has no mode. • If two entries occur with the same greatest frequency, each entry is a mode (bimodal). a) 5.40 1.10 0.42 0.73 0.48 1.10 b) 27 27 27 55 55 55 88 88 99 c) 1 2 3 6 7 8 9 10 Mode is 1.10 Bimodal - 27 & 55 No Mode
  • 39. Comparing the Mean, Median, and Mode • All three measures describe an “average”. Choose the one that best represents a “typical” value in the set. • Mean:  The most familiar average.  A reliable measure because it takes into account every entry of a data set.  May be greatly affected by outliers or skew. • Median:  A common average.  Not as effected by skew or outliers. • Mode: May be used if there is an overwhelming repeat.
  • 40. Choosing the “Best Average” • The shape of your data and the existence of any outliers may help you choose the best average:
  • 41. Section 2.4 Measures of Variation Larson/Farber 4th ed. 41
  • 42. Measures of Variation (“Spread”) Another important characteristic of quantitative data is how much the data varies, or is spread out. The 2 most common method of measuring spread are: 1. Range 2. Standard deviation and Variance Larson/Farber 4th ed. 42
  • 43. Range Range • The difference between the maximum and minimum data entries in the set. • The data must be quantitative. • Range = (Max. data entry) – (Min. data entry) Larson/Farber 4th ed. 43
  • 44. Example: Finding the Range The wait time to see a bank teller is studied at 2 banks. Bank A has multiple lines, one for each teller. Bank B has a single wait line for 1st available teller. 5 wait times (in minutes) are sampled from each bank: Bank A: 5.2 6.2 7.5 8.4 9.2 Bank B: 6.6 6.8 7.5 7.7 7.9 Find the mean, median, and range for each bank.
  • 45. Solution: Finding the Range • Bank A: Range = ? • Bank B: Range = ? • Note: The range is easy to compute, but only uses 2 values. Do the following 2 sets vary the same?  Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10  Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10 Larson/Farber 4th ed. 45
  • 46. Standard Deviation and Variance Measures the typical amount data deviates from the mean. Sample Variance, : • Sample Standard Deviation, s: • 46 2 2 ( ) 1 x x s n     2 2 ( ) 1 x x s s n      2 s
  • 47. Finding Sample Variance & Standard Deviation 47 1. Find the mean of the sample data set. 2. Find deviation of each entry. 3. Square each deviation. 4. Add to get the sum of the deviations squared. 5. Divide by n – 1 to get the sample variance. 6. Find the square root to get the sample standard deviation. x x n   2 ( ) x x   2 ( ) x x  x x  2 2 ( ) 1 x x s n     2 ( ) 1 x x s n    
  • 48. Find the Standard Deviation and Variance for Bank A (multi-line) Wait time, x (in min) Deviation: x – x Squares: (x – x)2 5.2 5.2 – 7.3 = -2.1 (–2.1)2 = 4.41 6.2 6.2 – 7.3 = ( )2 = 7.5 7.5 – 7.3 = ( )2 = 8.4 8.4 – 7.3 = ( )2 = 9.2 9.2 – 7.3 = ( )2 = Σ(x – x) = 36.5 x   36.5 7.3 min 5 x x n       2 x x    2 2 ( ) 1 x x s n      2 s s   • Round to one more decimal than the data. • Don’t round until the end. • Include the appropriate units.
  • 49. Find the Standard Deviation and Variance for Bank B (1 wait line) Wait time, x (in min) Deviation: x – x Squares: (x – x)2 6.6 6.8 7.5 7.7 7.9 Σ(x – x) = 36.5 x   36.5 7.3 min 5 x x n       2 x x    2 2 ( ) 1 x x s n      2 s s   • Round to one more decimal than the data. • Don’t round until the end. • Include the appropriate units.
  • 50. Sample versus Population Standard Deviation and Variance Sample Population Statistics: Parameters: Mean x µ Standard s σ Deviation Variance s2 σ2
  • 51. Sample versus Population Standard Deviation Sample Standard Deviation • Population Standard Deviation • Larson/Farber 4th ed. 51 2 2 ( ) x N        Note: Unlike x and µ, the formulas for s and σ are not mathematically the same: 2 2 ( ) 1 x x s s n     
  • 52. Standard Deviation: Key Points  The standard deviation is a measure of variation of all values from the mean. The larger s is, the more the data varies.  ( When would s = 0 ?)  The value of the standard deviation s can increase dramatically with the inclusion of one or more outliers (data values far away from all others)  The units of the standard deviation s are the same as the units of the original data values. (The variance has units2 ). 0 s 
  • 53. Interpreting Standard Deviation • Standard deviation is a measure of the typical amount an entry deviates from the mean. • The more the entries are spread out, the greater the standard deviation. Larson/Farber 4th ed. 53
  • 54. Solution: Using Technology to Find the Standard Deviation Larson/Farber 4th ed. 54 Sample Mean Sample Standard Deviation
  • 55. Using Technology The gas mileage of 2 cars is sampled over various conditions: Car A: 21.1 21.2 20.8 19.8 23.8 (mpg) Car B: 25.2 19.1 18.0 24.4 20.3 (mpg) Which car do you think gets “better” mpg? Use a calculator to find the mean and standard deviation for each to justify your choice.
  • 56. Standard Deviation and “Spread” How does “s” show how much the data varies? Three methods: 1. Range Rule of Thumb 2. Chebyshev’s Theorem 3. The Empirical Rule
  • 57. The Range Rule of Thumb Alternatively, If the range is known, you can use the range rule to estimate the standard deviation: Range 4 s  Range Rule: For most data sets, the majority of the data lies within 2 standard deviations of the mean. Recall: Range = High – Lo Estimate: Range ≈ 4s
  • 58. Using the Range Rule of Thumb A sample of women’s heights has a mean of 64 inches and a standard deviation of 2.5 inches. Using the range rule, “most” women fall within what heights? What would be an “unusual” height?
  • 59. Using the Range Rule of Thumb The sample of Exam Scores used in the class handout had a mean of 73.6. Which of the following is most likely the standard deviation of the sample? s = 3.6 s = 12.8 s = 74.5 Use the range rule to help justify your choice.
  • 60. Chebyshev’s Theorem Chebyshev’s Theorem For data with any distribution, the proportion (or fraction) of any set of data lying within K standard deviations of the mean is always at least 1-1/K2 , where K is any positive number greater than 1.  For K = 2, at least 3/4 (or 75%) of all values lie within 2 standard deviations of the mean  For K = 3, at least 8/9 (or 89%) of all values lie within 3 standard deviations of the mean
  • 61. Using Chebyshev’s Theorem A sample of salaries at an elementary school has a mean of $32,000 and a standard deviation of $3000. Use Chebyshev’s Theorem to describe how the salaries are spread out. Would a salary of $28,000 be “unusual?” Would a salary of $45,000 be “unusual”?
  • 62. The Empirical Rule Empirical (68-95-99.7) Rule For data sets having a symmetric distribution:  About 68% of all values fall within 1 standard deviation of the mean  About 95% of all values fall within 2 standard deviations of the mean  About 99.7% of all values fall within 3 standard deviations of the mean
  • 66. Example: Using the Empirical Rule A sample of IQs has a symmetric distribution with a mean of 100 and a standard deviation of 15. 1. Sketch the distribution. 2. 68% of people have an IQ between what 2 values? 3. What percent of people have an IQ between 70 and 130? 4. What percent of people have an IQ between 100 and 115? 5. What percent of people have an IQ above 145? 66