SlideShare a Scribd company logo
Chapter 1 & Section 2.1
Exploring Data
Introduction
2
 Statistics:
 the science of data. We begin our study of statistics by
mastering the art of examining data. Any set of data contains
information about some group of individuals. The
information is organized in variables.
 Individuals:
 The objects described by a set of data. Individuals may be
people, but they may also be other things.
 Variable:
 Any characteristic of an individual.
 Can take different values for different individuals.
VariableTypes
3
 Categorical variable:
 places an individual into one of several groups of categories.
 Quantitative variable:
 takes numerical values for which arithmetic operations such as
adding and averaging make sense.
 Distribution:
 pattern of variation of a variable
 tells what values the variable takes and how often it takes these
values.
4
5
 A. The individuals are the BMW 318I, the Buick
Century, and the Chevrolet Blazer.
 B. The variables given are
 Vehicle type (categorical)
 Transmission type (categorical)
 Number of cylinders (quantitative)
 City MPG (quantitative)
 Highway MPG (quantitative)
1.1 & 1.2: Displaying Distributions with graphs
6
• Graphs used to display data:
• bar graphs, pie charts, dot plots, stem plots, histograms, and
time plots
• Purpose of a graph:
• Helps to understand the data.
• Allows overall patterns and striking deviations from that pattern
to be seen.
• Describing the overall pattern:
• Three biggest descriptors:
• shape, center and spread.
• Next look for outliers and clusters.
Shape
7
 Concentrate on main features.
 Major peaks, outliers (not just the smallest and largest
observations), rough symmetry or clear skewness.
 Types of Shapes:
Symmetric Skewed right
Skewed left
How to make a bar graph.
8
1.5 How to make a bar graph.
9
Percent of females among people
earning doctorates in 1994.
Percent
Computer
science
Education
Engineering
Life
sciences
Physical
sciences
Psychology
10
20
30
40
50
60
70
15.4%
60.8%
11.1%
40.7%
21.7%
62.2%
10
No, a pie chart is used to display one variable
with all of its categories totaling 100%
How to make a dotplot
11
Highway mpg for some 2000
midsize cars
Frequency
or
Count
MPG
32
21 22 24 25 26 27 28 29 30 31
23
2
4
6
8
10
How to make and read a stemplot
12
 A stemplot is similar to a dotplot but there are some format
differences. Instead of dots actual numbers are used.
Instead of a horizontal axis, a vertical one is used.
Stems Leaves
Leaves are
single digits only
52 3 6
This arrangement
would be read as the
numbers 523 and
526.
How to make and read a stemplot
13
 With the following data, make a stemplot.
1
2
3
4
5
Stems Leaves
4 9
2 5 6 6
3 3 4 5 5 5 5 9
0 2 7 7 8
2
How to make and read a stemplot
14
 Lets use the same stemplot but now split the stems
1
2
3
4
5
Stems Leaves
4 9
2 5 6 6
3 3 4 5 5 5 5 9
0 2 7 7 8
2
4
9
2
5 6 6
3 3 4
5 5 5 5 9
0 2
7 7 8
2
1
1
2
2
3
3
4
4
5
Split
stems
Leaves, first stem uses
number 0-4, second
uses numbers 5-9
How to construct a histogram
15
 The most common graph of the distribution of one
quantitative variable is a histogram.
 To make a histogram:
1. Divide the range into equal widths.Then count the number
of observations that fall in each group.
2. Label and scale your axes and title your graph.
3. Draw bars that represent each count, no space between bars.
Chapter-1-section 2.1 Exploring data-Edition-5.pptx
Divide range into equal widths and count
17
0 < CEO Salary < 100
100 < CEO Salary < 200
200 < CEO Salary < 300
300 < CEO Salary < 400
400 < CEO Salary < 500
500 < CEO Salary < 600
600 < CEO Salary < 700
700 < CEO Salary < 800
800 < CEO Salary < 900
Scale
1
3
11
10
1
1
2
1
1
Counts
Draw and label axis, then make bars
18
CEO Salary in thousands of dollars
100 200 300 400 500 600 700 800 900
Thousand dollars
Count
1
2
3
4
5
6
7
8
9
10
11 Shape – the graph is skewed right
Center – the median is the first value in the
$300,000 to $400,000 range
Spread – the range of salaries is from
$21,000 to $862,000.
Outliers – there does not look like there are
any outliers, I would have to calculate to
make sure.
New terms used when graphing data.
19
 Relative frequency:
 Category count divided by the total count
 Gives a percentage
 Cumulative frequency:
 Sum of category counts up to an including the current category
 Ogives (pronounced O-Jive)
 Cumulative frequencies divided by the total count
 Relative cumulative frequency graph
 Percentile:
 The pth
percentile of a distribution is the value such that p
percent of the observations fall at or below it.
Lets look at a table to see what an ogive
would refer to.
20
The graph of an ogive for this data would
look like this.
21
22
Find the age of the
10th
percentile, the
median, and the
85th
percentile?
10th
percentile
Median
85th
percentile
47 55.5 62.5
Last graph of this section
23
 Time plots :
 Graph of each observation against the time at which it was
measured.
 Time is always on the x-axis.
 Use time plots to analyze what is occurring over time.
24
Deaths from cancer per 100,000
Deaths
Year
45 50 55 60 65 70 75 80 85 90 95
134
144
154
164
174
184
194
204
Section 1.1 & 1.2
25
 Homework: #’s Section 1-1: 2, 4, 6, 9, 38a, 48a&b,
Section 1-2: 52, 56 (use scale starting at 7 with width
of .5, make an ogive and use it to estimate the value of
the center and also the 90th
percentile) 58, Section 2-1:
9, R1.1 & 6, R2.2
 Complete additional notes packet pg. 1-4.
Section 1.3: Describing Quantitative
Data with Numbers.
 Center:
 Mean
 Median
 Mode – (only a measure of center for categorical data)
 Spread:
 Range
 Interquartile Range (IQR)
 Variance
 Standard Deviation
26
Measuring center:
27
 Mean:
 Most common measure of center.
 Is the arithmetic average.
 Formula:
 or
 Not resistant to the influence of extreme observations.
1 2 ... n
x x x
x
n
  

1
i
x x
n
 
Measuring center:
28
 Median
 The midpoint of a distribution
 The number such that half the observations are smaller and
the other half are larger.
 If the number of observations n is odd, the median is the
center of the ordered list.
 If the number of observations n is even, the median M is
the mean of the two center observations in the ordered
list.
 Is resistant to the influence of extreme observations.
Quick summary of measures of center.
Measure Definition Example using 1,2,3,3,4,5,5,9
sum of the data values
number of data values
The most frequently
occurring value (Categorical
data only)
Mean
Median
Mode
Middle value for an odd
# of data values
Mean of the 2 middle values
for an even # of data values
1 2 3 3 4 5 5 9
4
8
      

For 1,2,3,3,4,5,5,9, the
middle values are 3 and
4. The median is: 3
3
4
5
2
.


Two modes: 3 and 5
Set is bimodal.
Comparing the Mean and Median.
30
 The location of the mean and median for a distribution are
effected by the distribution’s shape.
Median and Mean
Symmetric
Median and Mean
Skewed right
Mean and Median
Skewed left
31
1 2 ... n
x x x
x
n
  

86 84 ... 93
14
x
  

1190
14
x 
85
x 
32
33
34
79.3
new
x  85
old
x 
Since zero is an outlier it effects the mean, since the mean is not a
resistant measurement of the center of data.
35
1
i
x x
n
 
1
$1,200,000
25
SUM

$1,200,000 25 SUM
 
$30million SUM

Measuring spread or variability:
36
 Range
 Difference between largest and smallest points.
 Not resistant to the influence of extreme observations.
 Interquartile Range (IQR)
 Measures the spread of the middle half of the data.
 Is resistant to the influence of extreme observations.
 Quartile 3 minus Quartile 1.
To calculate quartiles:
37
1. Arrange the observations in increasing order and locate
the median M.
2. The first quartile Q1 is the median of the observations
whose position in the ordered list is to the left of the
overall median.
3. The third quartile Q3 is the median of the observations
whose position in the ordered list is to the right of the
overall median.
The five number summary and box plots.
 The five number summary
 Consists of the
 min, Q1, median, Q3, max
 Offers a reasonably complete description of center and spread.
 Used to create a boxplot.
 Boxplot
 Shows less detail than histograms or stemplots.
 Best used for side-by-side comparison of more than one
distribution.
 Gives a good indication of symmetry or skewness of a
distribution.
 Regular boxplots conceal outliers.
 Modified boxplots put outliers as isolated points.
38
39
• Start by finding the 5 number summary for each of the groups.
• Use your calculator and put the two lists into their own column,
then use the 1-var Stats function.
Min Q1 M Q3 Max
Women: 101 126 138.5 154 200
Men: 70 98 114.5 143 187
How to construct a side-by-side boxplot
40
SSHA Scores for first year
college students
Women
Men
Scores
70 80 90 100 110 120 130 140 150 160 170 180 190 200
Calculating outliers
 Outlier
 An observation that falls outside the overall pattern of the data.
 Calculated by using the IQR
 Anything smaller than or larger than
is an outlier
41
1 1.5
Q IQR
  3 1.5
Q IQR
 
Min Q1 Median Q3 Max
1 1.5
Q IQR
  3 1.5
Q IQR
 
Constructing a modified boxplot
42
Min Q1 M Q3 Max
Women: 101 126 138.5 154 200
28
IQR 
1 1.5 126 1.5 28 84
Q IQR
     
3 1.5 154 1.5 28 196
Q IQR
     
Constructing a modified boxplot
43
84
Lower bound for outlier 
SSHA Scores for first year
college students
Women
Scores
70 80 90 100 110 120 130 140 150 160 170 180 190 200
3 1.5
Q IQR
 
1 1.5
Q IQR
 
196
Upper bound for outlier 
Min Q1 M Q3 Max
Women: 101 126 138.5 154 200
Section 1.3 Day 1
44
 Homework: #’s 84, 86, 88, 91, 92
 Complete additional notes packet pg. 5-12.
Measuring Spread:
 Variance (s2
)
 The average of the squares of the deviations of the observations
from their mean.
 In symbols, the variance of n observations x1, x2, …, xn is
 Standard deviation (s)
 The square root of variance.
45
     
2 2 2
1 2
2
...
1
n
x x x x x x
s
n
     


 
2
2 1
1
i
s x x
n
 


or
 
2
1
1
i
s x x
n
 


How to find the mean and standard
deviation from their definitions.
46
 With the list of numbers below, calculate the standard
deviation.
o 5, 6, 7, 8, 10, 12
 
2
1
1
i
s x x
n
 


           
2 2 2 2 2 2
5 8 6 8 7 8 8 8 10 8 12 8
6 1
s
          


5 6 7 8 10 12
6
x
    

8
x 
47
           
2 2 2 2 2 2
3 2 1 0 2 4
5
s
       

9 4 1 0 4 16
5
s
    

34
5
s 
6.8
s 
2.61
s 
           
2 2 2 2 2 2
5 8 6 8 7 8 8 8 10 8 12 8
6 1
s
          


Properties ofVariance:
 Uses squared deviations from the mean because the sum
of all the deviations not squared is always zero.
 Has square units.
 Found by taking an average but dividing by n-1.
 The sum of the deviations is always zero, so the last
deviation can be found once the other n-1 deviations are
known.
 Means only n-1 of the squared deviations can vary freely, so
the average is found by dividing by n-1.
 n-1 is called the degrees of freedom.
48
Properties of Standard Deviation
 Measures the spread about the mean and should be used
only when the mean is chosen as the measure of center.
 Equals zero when there is no spread, happens when all
observations are the same value. Otherwise it is always
positive.
 Not resistant to the influence of extreme observations
or strong skewness.
49
Mean & Standard Deviation
Vs.
Median & the 5-Number Summary
50
 Mean & Standard Deviation
 Most common numerical description of a distribution.
 Used for reasonably symmetric distributions that are free from
outliers.
 Five-Number Summary
 Offer a reasonably complete description of center and spread.
 Used for describing skewed distributions or a distribution with
strong outliers.
Always plot your data.
 Graphs
 Give the best overall picture of a distribution.
 Numerical measures of center and spread
 Only give specific facts about a distribution.
 Do not describe its entire shape.
 Can give a misleading picture of a distribution or the
comparison of two or more distributions.
51
Changing the unit of measurement.
52
 LinearTransformations
 Changes the original variable x into the new variable xnew.
 xnew = a + bx
 Do not change the shape of a distribution.
 Can change one or both the center and spread.
 The effects of the changes follow a simple pattern.
 Adding the constant (a) shifts all values of x upward or downward by
the same amount.
 Adds (a) to the measures of center and to the quartiles but does not change
measures of spread.
 Multiplying by the positive constant (b) changes the size of the unit of
measurement.
 Multiplies both the measures of center (mean and median) and the measures of
spread (standard deviation and IQR) by (b).
The table shows an original data set and two different
linear transformations for that set.
Original (x) x + 12 3(x) - 7
5 17 8
6 18 11
7 19 14
8 20 17
10 22 23
12 24 29
What are the original and transformed mean, median,
range, quartiles, IQR, variance and standard deviation?
53
 Original Data
 Mean:
 Median:
 Q1:
 Q3:
 IQR:
 Range:
 Variance:
 St Dev:
8
X 
7
4
7 5
.
6
10
6 8
.
2 61
.
 x + 12
 Mean:
 Median:
 Q1:
 Q3:
 IQR:
 Range:
 Variance:
 St Dev:
 3(x) – 7
 Mean:
 Median:
 Q1:
 Q3:
 IQR:
 Range:
 Variance:
 St Dev:
54
20
X 
7
4
19 5
.
18
22
6 8
.
2 61
.
17
X 
21
12
15 5
.
11
23
61 2
.
7 82
.
Section 1.3 & 2.1 Day 2
55
 Homework: #’s Section 1-3: 97, 98, 103; Section 2-1:
19, 20, 22, R.2.3
 Complete additional notes packet pg. 13-16.
Chapter review
56
57
58
59
60
61

More Related Content

PDF
1.0 Descriptive statistics.pdf
DOCX
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
PPT
Coefficient of Variation Business statstis
PPT
ap_stat_1.3.ppt
PPTX
Measures of Central Tendency, Variability and Shapes
PPT
PERTEMUAN-01-02 mengenai probabilitas statistika ekonomi dan umum.ppt
PDF
Describing Distributions with Numbers
PPTX
ProbabilityandStatsUnitAPowerpoint-1.pptx
1.0 Descriptive statistics.pdf
TSTD 6251  Fall 2014SPSS Exercise and Assignment 120 PointsI.docx
Coefficient of Variation Business statstis
ap_stat_1.3.ppt
Measures of Central Tendency, Variability and Shapes
PERTEMUAN-01-02 mengenai probabilitas statistika ekonomi dan umum.ppt
Describing Distributions with Numbers
ProbabilityandStatsUnitAPowerpoint-1.pptx

Similar to Chapter-1-section 2.1 Exploring data-Edition-5.pptx (20)

PPTX
3.2 Measures of variation
PPT
Graphics Basic Stats in Excel.ppt
PPT
PPTX
3.3 Measures of relative standing and boxplots
PDF
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures.pdf
PPT
Basic Statistics to start Analytics
PPT
Penggambaran Data Secara Numerik
PDF
3Measurements of health and disease_MCTD.pdf
PDF
2 biostatistics presenting data
PPTX
measure of dispersion
PPTX
Outlier managment
PPTX
2-L2 Presentation of data.pptx
PPT
Chapter 1 descriptive_stats_2_rev_2009
PPTX
Stats chapter 1
PPT
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
PPTX
CO1_Session_6 Statistical Angalysis.pptx
PDF
Empirics of standard deviation
PDF
Statistics.pdf
PPTX
Chapter 3_M of Location and dispersion mean, median, mode, standard deviation
PPTX
Measures of Relative Standing and Boxplots
3.2 Measures of variation
Graphics Basic Stats in Excel.ppt
3.3 Measures of relative standing and boxplots
Lecture-2 Descriptive Statistics-Box Plot Descriptive Measures.pdf
Basic Statistics to start Analytics
Penggambaran Data Secara Numerik
3Measurements of health and disease_MCTD.pdf
2 biostatistics presenting data
measure of dispersion
Outlier managment
2-L2 Presentation of data.pptx
Chapter 1 descriptive_stats_2_rev_2009
Stats chapter 1
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
CO1_Session_6 Statistical Angalysis.pptx
Empirics of standard deviation
Statistics.pdf
Chapter 3_M of Location and dispersion mean, median, mode, standard deviation
Measures of Relative Standing and Boxplots
Ad

Recently uploaded (20)

PDF
Anesthesia in Laparoscopic Surgery in India
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Business Ethics Teaching Materials for college
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Complications of Minimal Access Surgery at WLH
PPTX
Pharma ospi slides which help in ospi learning
PDF
Pre independence Education in Inndia.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
RMMM.pdf make it easy to upload and study
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Anesthesia in Laparoscopic Surgery in India
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Business Ethics Teaching Materials for college
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Complications of Minimal Access Surgery at WLH
Pharma ospi slides which help in ospi learning
Pre independence Education in Inndia.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Microbial diseases, their pathogenesis and prophylaxis
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
RMMM.pdf make it easy to upload and study
2.FourierTransform-ShortQuestionswithAnswers.pdf
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
FourierSeries-QuestionsWithAnswers(Part-A).pdf
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Ad

Chapter-1-section 2.1 Exploring data-Edition-5.pptx

  • 1. Chapter 1 & Section 2.1 Exploring Data
  • 2. Introduction 2  Statistics:  the science of data. We begin our study of statistics by mastering the art of examining data. Any set of data contains information about some group of individuals. The information is organized in variables.  Individuals:  The objects described by a set of data. Individuals may be people, but they may also be other things.  Variable:  Any characteristic of an individual.  Can take different values for different individuals.
  • 3. VariableTypes 3  Categorical variable:  places an individual into one of several groups of categories.  Quantitative variable:  takes numerical values for which arithmetic operations such as adding and averaging make sense.  Distribution:  pattern of variation of a variable  tells what values the variable takes and how often it takes these values.
  • 4. 4
  • 5. 5  A. The individuals are the BMW 318I, the Buick Century, and the Chevrolet Blazer.  B. The variables given are  Vehicle type (categorical)  Transmission type (categorical)  Number of cylinders (quantitative)  City MPG (quantitative)  Highway MPG (quantitative)
  • 6. 1.1 & 1.2: Displaying Distributions with graphs 6 • Graphs used to display data: • bar graphs, pie charts, dot plots, stem plots, histograms, and time plots • Purpose of a graph: • Helps to understand the data. • Allows overall patterns and striking deviations from that pattern to be seen. • Describing the overall pattern: • Three biggest descriptors: • shape, center and spread. • Next look for outliers and clusters.
  • 7. Shape 7  Concentrate on main features.  Major peaks, outliers (not just the smallest and largest observations), rough symmetry or clear skewness.  Types of Shapes: Symmetric Skewed right Skewed left
  • 8. How to make a bar graph. 8
  • 9. 1.5 How to make a bar graph. 9 Percent of females among people earning doctorates in 1994. Percent Computer science Education Engineering Life sciences Physical sciences Psychology 10 20 30 40 50 60 70 15.4% 60.8% 11.1% 40.7% 21.7% 62.2%
  • 10. 10 No, a pie chart is used to display one variable with all of its categories totaling 100%
  • 11. How to make a dotplot 11 Highway mpg for some 2000 midsize cars Frequency or Count MPG 32 21 22 24 25 26 27 28 29 30 31 23 2 4 6 8 10
  • 12. How to make and read a stemplot 12  A stemplot is similar to a dotplot but there are some format differences. Instead of dots actual numbers are used. Instead of a horizontal axis, a vertical one is used. Stems Leaves Leaves are single digits only 52 3 6 This arrangement would be read as the numbers 523 and 526.
  • 13. How to make and read a stemplot 13  With the following data, make a stemplot. 1 2 3 4 5 Stems Leaves 4 9 2 5 6 6 3 3 4 5 5 5 5 9 0 2 7 7 8 2
  • 14. How to make and read a stemplot 14  Lets use the same stemplot but now split the stems 1 2 3 4 5 Stems Leaves 4 9 2 5 6 6 3 3 4 5 5 5 5 9 0 2 7 7 8 2 4 9 2 5 6 6 3 3 4 5 5 5 5 9 0 2 7 7 8 2 1 1 2 2 3 3 4 4 5 Split stems Leaves, first stem uses number 0-4, second uses numbers 5-9
  • 15. How to construct a histogram 15  The most common graph of the distribution of one quantitative variable is a histogram.  To make a histogram: 1. Divide the range into equal widths.Then count the number of observations that fall in each group. 2. Label and scale your axes and title your graph. 3. Draw bars that represent each count, no space between bars.
  • 17. Divide range into equal widths and count 17 0 < CEO Salary < 100 100 < CEO Salary < 200 200 < CEO Salary < 300 300 < CEO Salary < 400 400 < CEO Salary < 500 500 < CEO Salary < 600 600 < CEO Salary < 700 700 < CEO Salary < 800 800 < CEO Salary < 900 Scale 1 3 11 10 1 1 2 1 1 Counts
  • 18. Draw and label axis, then make bars 18 CEO Salary in thousands of dollars 100 200 300 400 500 600 700 800 900 Thousand dollars Count 1 2 3 4 5 6 7 8 9 10 11 Shape – the graph is skewed right Center – the median is the first value in the $300,000 to $400,000 range Spread – the range of salaries is from $21,000 to $862,000. Outliers – there does not look like there are any outliers, I would have to calculate to make sure.
  • 19. New terms used when graphing data. 19  Relative frequency:  Category count divided by the total count  Gives a percentage  Cumulative frequency:  Sum of category counts up to an including the current category  Ogives (pronounced O-Jive)  Cumulative frequencies divided by the total count  Relative cumulative frequency graph  Percentile:  The pth percentile of a distribution is the value such that p percent of the observations fall at or below it.
  • 20. Lets look at a table to see what an ogive would refer to. 20
  • 21. The graph of an ogive for this data would look like this. 21
  • 22. 22 Find the age of the 10th percentile, the median, and the 85th percentile? 10th percentile Median 85th percentile 47 55.5 62.5
  • 23. Last graph of this section 23  Time plots :  Graph of each observation against the time at which it was measured.  Time is always on the x-axis.  Use time plots to analyze what is occurring over time.
  • 24. 24 Deaths from cancer per 100,000 Deaths Year 45 50 55 60 65 70 75 80 85 90 95 134 144 154 164 174 184 194 204
  • 25. Section 1.1 & 1.2 25  Homework: #’s Section 1-1: 2, 4, 6, 9, 38a, 48a&b, Section 1-2: 52, 56 (use scale starting at 7 with width of .5, make an ogive and use it to estimate the value of the center and also the 90th percentile) 58, Section 2-1: 9, R1.1 & 6, R2.2  Complete additional notes packet pg. 1-4.
  • 26. Section 1.3: Describing Quantitative Data with Numbers.  Center:  Mean  Median  Mode – (only a measure of center for categorical data)  Spread:  Range  Interquartile Range (IQR)  Variance  Standard Deviation 26
  • 27. Measuring center: 27  Mean:  Most common measure of center.  Is the arithmetic average.  Formula:  or  Not resistant to the influence of extreme observations. 1 2 ... n x x x x n     1 i x x n  
  • 28. Measuring center: 28  Median  The midpoint of a distribution  The number such that half the observations are smaller and the other half are larger.  If the number of observations n is odd, the median is the center of the ordered list.  If the number of observations n is even, the median M is the mean of the two center observations in the ordered list.  Is resistant to the influence of extreme observations.
  • 29. Quick summary of measures of center. Measure Definition Example using 1,2,3,3,4,5,5,9 sum of the data values number of data values The most frequently occurring value (Categorical data only) Mean Median Mode Middle value for an odd # of data values Mean of the 2 middle values for an even # of data values 1 2 3 3 4 5 5 9 4 8         For 1,2,3,3,4,5,5,9, the middle values are 3 and 4. The median is: 3 3 4 5 2 .   Two modes: 3 and 5 Set is bimodal.
  • 30. Comparing the Mean and Median. 30  The location of the mean and median for a distribution are effected by the distribution’s shape. Median and Mean Symmetric Median and Mean Skewed right Mean and Median Skewed left
  • 31. 31 1 2 ... n x x x x n     86 84 ... 93 14 x     1190 14 x  85 x 
  • 32. 32
  • 33. 33
  • 34. 34 79.3 new x  85 old x  Since zero is an outlier it effects the mean, since the mean is not a resistant measurement of the center of data.
  • 35. 35 1 i x x n   1 $1,200,000 25 SUM  $1,200,000 25 SUM   $30million SUM 
  • 36. Measuring spread or variability: 36  Range  Difference between largest and smallest points.  Not resistant to the influence of extreme observations.  Interquartile Range (IQR)  Measures the spread of the middle half of the data.  Is resistant to the influence of extreme observations.  Quartile 3 minus Quartile 1.
  • 37. To calculate quartiles: 37 1. Arrange the observations in increasing order and locate the median M. 2. The first quartile Q1 is the median of the observations whose position in the ordered list is to the left of the overall median. 3. The third quartile Q3 is the median of the observations whose position in the ordered list is to the right of the overall median.
  • 38. The five number summary and box plots.  The five number summary  Consists of the  min, Q1, median, Q3, max  Offers a reasonably complete description of center and spread.  Used to create a boxplot.  Boxplot  Shows less detail than histograms or stemplots.  Best used for side-by-side comparison of more than one distribution.  Gives a good indication of symmetry or skewness of a distribution.  Regular boxplots conceal outliers.  Modified boxplots put outliers as isolated points. 38
  • 39. 39 • Start by finding the 5 number summary for each of the groups. • Use your calculator and put the two lists into their own column, then use the 1-var Stats function. Min Q1 M Q3 Max Women: 101 126 138.5 154 200 Men: 70 98 114.5 143 187
  • 40. How to construct a side-by-side boxplot 40 SSHA Scores for first year college students Women Men Scores 70 80 90 100 110 120 130 140 150 160 170 180 190 200
  • 41. Calculating outliers  Outlier  An observation that falls outside the overall pattern of the data.  Calculated by using the IQR  Anything smaller than or larger than is an outlier 41 1 1.5 Q IQR   3 1.5 Q IQR   Min Q1 Median Q3 Max 1 1.5 Q IQR   3 1.5 Q IQR  
  • 42. Constructing a modified boxplot 42 Min Q1 M Q3 Max Women: 101 126 138.5 154 200 28 IQR  1 1.5 126 1.5 28 84 Q IQR       3 1.5 154 1.5 28 196 Q IQR      
  • 43. Constructing a modified boxplot 43 84 Lower bound for outlier  SSHA Scores for first year college students Women Scores 70 80 90 100 110 120 130 140 150 160 170 180 190 200 3 1.5 Q IQR   1 1.5 Q IQR   196 Upper bound for outlier  Min Q1 M Q3 Max Women: 101 126 138.5 154 200
  • 44. Section 1.3 Day 1 44  Homework: #’s 84, 86, 88, 91, 92  Complete additional notes packet pg. 5-12.
  • 45. Measuring Spread:  Variance (s2 )  The average of the squares of the deviations of the observations from their mean.  In symbols, the variance of n observations x1, x2, …, xn is  Standard deviation (s)  The square root of variance. 45       2 2 2 1 2 2 ... 1 n x x x x x x s n           2 2 1 1 i s x x n     or   2 1 1 i s x x n    
  • 46. How to find the mean and standard deviation from their definitions. 46  With the list of numbers below, calculate the standard deviation. o 5, 6, 7, 8, 10, 12   2 1 1 i s x x n                 2 2 2 2 2 2 5 8 6 8 7 8 8 8 10 8 12 8 6 1 s              5 6 7 8 10 12 6 x       8 x 
  • 47. 47             2 2 2 2 2 2 3 2 1 0 2 4 5 s          9 4 1 0 4 16 5 s       34 5 s  6.8 s  2.61 s              2 2 2 2 2 2 5 8 6 8 7 8 8 8 10 8 12 8 6 1 s             
  • 48. Properties ofVariance:  Uses squared deviations from the mean because the sum of all the deviations not squared is always zero.  Has square units.  Found by taking an average but dividing by n-1.  The sum of the deviations is always zero, so the last deviation can be found once the other n-1 deviations are known.  Means only n-1 of the squared deviations can vary freely, so the average is found by dividing by n-1.  n-1 is called the degrees of freedom. 48
  • 49. Properties of Standard Deviation  Measures the spread about the mean and should be used only when the mean is chosen as the measure of center.  Equals zero when there is no spread, happens when all observations are the same value. Otherwise it is always positive.  Not resistant to the influence of extreme observations or strong skewness. 49
  • 50. Mean & Standard Deviation Vs. Median & the 5-Number Summary 50  Mean & Standard Deviation  Most common numerical description of a distribution.  Used for reasonably symmetric distributions that are free from outliers.  Five-Number Summary  Offer a reasonably complete description of center and spread.  Used for describing skewed distributions or a distribution with strong outliers.
  • 51. Always plot your data.  Graphs  Give the best overall picture of a distribution.  Numerical measures of center and spread  Only give specific facts about a distribution.  Do not describe its entire shape.  Can give a misleading picture of a distribution or the comparison of two or more distributions. 51
  • 52. Changing the unit of measurement. 52  LinearTransformations  Changes the original variable x into the new variable xnew.  xnew = a + bx  Do not change the shape of a distribution.  Can change one or both the center and spread.  The effects of the changes follow a simple pattern.  Adding the constant (a) shifts all values of x upward or downward by the same amount.  Adds (a) to the measures of center and to the quartiles but does not change measures of spread.  Multiplying by the positive constant (b) changes the size of the unit of measurement.  Multiplies both the measures of center (mean and median) and the measures of spread (standard deviation and IQR) by (b).
  • 53. The table shows an original data set and two different linear transformations for that set. Original (x) x + 12 3(x) - 7 5 17 8 6 18 11 7 19 14 8 20 17 10 22 23 12 24 29 What are the original and transformed mean, median, range, quartiles, IQR, variance and standard deviation? 53
  • 54.  Original Data  Mean:  Median:  Q1:  Q3:  IQR:  Range:  Variance:  St Dev: 8 X  7 4 7 5 . 6 10 6 8 . 2 61 .  x + 12  Mean:  Median:  Q1:  Q3:  IQR:  Range:  Variance:  St Dev:  3(x) – 7  Mean:  Median:  Q1:  Q3:  IQR:  Range:  Variance:  St Dev: 54 20 X  7 4 19 5 . 18 22 6 8 . 2 61 . 17 X  21 12 15 5 . 11 23 61 2 . 7 82 .
  • 55. Section 1.3 & 2.1 Day 2 55  Homework: #’s Section 1-3: 97, 98, 103; Section 2-1: 19, 20, 22, R.2.3  Complete additional notes packet pg. 13-16.
  • 57. 57
  • 58. 58
  • 59. 59
  • 60. 60
  • 61. 61