SlideShare a Scribd company logo
1

INTRODUCTION TO STATISTICS &
PROBABILITY
Chapter 1:

Looking at Data—Distributions (Part 2)
1.2 Describing Distributions with Numbers

Dr. Nahid Sultana
1.2 Describing Distributions with
Numbers
2

Objectives

 Measures of center: mean, median
 Measures of spread: quartiles, standard deviation
 Five-number summary and boxplot

 IQR and outliers
 Choosing among summary statistics

 Changing the unit of measurement
Measures of center: The Mean
3

 The most common measure of center is the arithmetic
average, or mean, or sample mean.
 To calculate the average, or mean, add all values, then
divide by the number of individuals.
 It is the “center of mass.”
 If the n observations are x1, x2, x3, …, xn, their mean is:
sum of observations x1  x2  ...  xn
x

n
n
1
or in more compact notation, x  n  xi
Measures of center: The Mean
(cont…)
4

Find the mean:
Here are the scores on the first exam in an introductory
statistics course for 10 students:

80

73

92

85

75

98

93

55

Find the mean first-exam score for these students.
Solution:

80

90
Measuring Center: The Median
5

 Another common measure of center is the median.

 The median M is the midpoint of a distribution, the
number such that half of the observations are smaller
and the other half are larger.
To find the median of a distribution:
1. Arrange all observations from smallest to largest.
2. If the number of observations n is odd, the median M is the
center observation in the ordered list.
3. If the number of observations n is even, the median M is the
average of the two center observations in the ordered list.
Measuring Center: The Median (cont...)
6

Find the median:
Here are the scores on the first exam in an introductory
statistics course for 10 students:
80 73
92
85
75
98
93
55
80
Find the median first-exam score for these students.
Solution:

90

Note: The location of the median is (n + 1)/2 in the sorted list.
Comparing Mean and Median
7
Comparing Mean and Median (Cont...)
8

 The mean and the median are the same only if the distribution is
symmetrical.

 In a skewed
distribution, the mean is
usually farther out in
the long tail than is the
median.

 The median is a measure of center that is resistant to skew and
outliers. The mean is not.
Measuring Spread: The Quartiles
9

A measure of center alone can be misleading. A useful numerical
description of a distribution requires both a measure of center and a
measure of spread.
 We describe the spread or variability of a distribution by giving
several percentiles.
 The median divides the data in two parts; half of the observations
are above the median and half are below the median. We could
call the median the 50th percentile.
 The lower quartile (first quartile, Q1)is the median of the lower
half of the data; the upper quartile (third quartile, Q3) is the
median of the upper half of the data.
 With the median, the quartiles divide the data into four equal
parts; 25% of the data are in each part
Measuring Spread: The Quartiles (Cont.)
Calculate the quartiles and inter-quartile:

10

1. Arrange the observations in
increasing order and locate
the median M.
2. The first quartile Q1 is the
median of the lower half of
the data, excluding M.

3. The third quartile Q3 is it is
the median of the upper half
of the data, excluding M.
Measuring Spread: The Quartiles
(Cont.)
11

Example: Here are the scores on the first-exam in an introductory
statistics course for 10 students:
80 73
92
85
75
98
93
55
80
90
Find the quartiles for these first-exam scores.
Solution: In order, the scores are:
55 73
75
80
80
85
90
92
93
98
The median is,
Q1 = 75, the median of the first five numbers: 55, 73, 75, 80, 80.
Q3 = 92, the median of the last five numbers: 85, 90, 92, 93, 98.
The Five-Number Summary
12

The five-number summary of a distribution consists of
 The smallest observation (Min)
 The first quartile (Q1)
 The median (M)
 The third quartile (Q3)
 The largest observation (Max)
written in order from smallest to largest.

Minimum

Q1

M

Q3

Maximum
Boxplots
13

A boxplot is a graph of the five-number summary.
 Draw a central box from Q1 to Q3.
 Draw a line inside the box to mark the median M.
 Extend lines from the box out to the minimum and maximum
values that are not outliers.
Boxplots (Cont…)
14

Example: Here are the scores on the first-exam in an introductory
statistics course for 10 students:
80 73
92
85
75

98
93
Make a boxplot for these first-exam scores.
Solution: In order, the scores are:
55, 73, 75, 80, 80, 85, 90, 92, 93, 98
Min = 55
Q1 = 75
M = 82.5
Q3 = 92
Max = 98

55

80

90
Comparing Boxplots to Histograms
15
15
Boxplots and skewed data
16

Years until death

Boxplots for a symmetric and a right-skewed distribution
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0

Boxplots show

symmetry or skew.

Disease X

Multiple Myeloma
Suspected Outliers: 1.5  IQR Rule
17

 Outliers are troublesome data points, and it is important to be
able to identify them.
The interquartile range IQR is the distance between the first and
third quartiles,
IQR = Q3 − Q1

 IQR is used as part of a rule of thumb for identifying outliers.
The 1.5  IQR Rule for Outliers
Call an observation an outlier if it falls more than 1.5  IQR above
the third quartile or below the first quartile.

 Suspected low outlier: any value < Q1 – 1.5  IQR
 Suspected high outlier: any value > Q3 + 1.5  IQR
Suspected Outliers: 1.5  IQR Rule (Cont..)
18

Individual #25 has a value of 7.9 years, which is 3.55 years
above the third quartile. This is more than 1.5 * IQR =3.225
years. Thus, individual #25 is a suspected outlier.
Suspected Outliers: 1.5  IQR Rule (Cont..)
19

 Modified boxplots plot suspected outliers individually.

 The 8 largest call lengths are
438, 465, 479, 700, 700, 951, 1148, 2631
 They are plotted as individual points, though 2 of them are
identical and so do not appear separately.
Measuring Spread:
The Standard Deviation

20
The most common measure of spread looks at how far each
observation is from the mean. This measure is called the standard
deviation.

 The standard deviation s measures the average distance of the
observations from their mean.
 It is calculated by

 This average squared distance is called the variance.
Calculating The Standard Deviation
21
1. Calculate mean
2. Calculate each deviation,
deviation = observation – mean
3. Square each deviation
4. Calculate the sum of the squared
deviations
5. Divided by degrees freedom,
(df) = (n-1), this is called the variance.
6. Calculate the square root of the
variance…this is the standard
deviation.

The variance = 52/(9 – 1) = 6.5
Standard deviation = 6.5 = 2.55

xi

(xi-mean) (xi-mean)2

1

1 - 5 = -4

(-4)2 = 16

3

3 - 5 = -2

(-2)2 = 4

4

4 - 5 = -1

(-1)2 = 1

4

4 - 5 = -1

(-1)2 = 1

4

4 - 5 = -1

(-1)2 = 1

5

5-5=0

(0)2 = 0

7

7-5=2

(2)2 = 4

8

8-5=3

(3)2 = 9

9

9-5=4

(4)2 = 16

Mean=5

Sum=0

Sum=52
Properties of The Standard Deviation
22
 s measures spread about the mean and should be used only
when the mean is the measure of center.

 s = 0 only when all observations have the same value and there
is no spread. Otherwise, s > 0.
 s is not resistant to outliers.
 s has the same units of measurement as the original
observations.
Choosing Measures of Center and
Spread

23
We now have a choice between two descriptions for center and spread
 Mean and Standard Deviation
 Median and Interquartile Range
 The median and IQR are usually better than the mean and
standard deviation for describing a skewed distribution or a
distribution with outliers.

 Use mean and standard deviation only for reasonably symmetric
distributions that don’t have outliers.
NOTE: Numerical summaries do not fully describe the shape of a
distribution. ALWAYS PLOT YOUR DATA FIRST!
Changing the Unit of Measurement
24
 Variables can be recorded in different units of measurement.
 Most often, one measurement unit is a linear transformation of
another measurement unit: xnew = a + bx.
Example 1: If a distance x is measured in kilometers, the same distance
in miles is xnew = 0.62 x
This transformation changes the units without changing the origin
—a distance of 0 kilometers is the same as a distance of 0 miles.
Example 2: Temperatures can be expressed in degrees Fahrenheit or
degrees Celsius.
This transformation changes both the unit; size and the origin of
the measurements —The origin in the Celsius scale (0◦C, the
temperature at which water freezes) is 32◦ in the Fahrenheit scale.
Changing the Unit of Measurement
(Cont…)

25

 Linear transformations do not change the basic shape of a
distribution (skew, symmetry).
 But they do change the measures of center and spread:
 Multiplying each observation by a positive number b multiplies

both measures of center (mean, median) and spread (IQR, s) by b.
 Adding the same number a (positive or negative) to each

observation adds a to measures of center and to quartiles but it
does not change measures of spread (IQR, s).

More Related Content

PDF
Chapter 5 part1- The Sampling Distribution of a Sample Mean
PPT
Why we run cronbach’s alpha
PPTX
Range, quartiles, and interquartile range
PPTX
Introduction to Statistics - Basic concepts
PPTX
Meaning and Importance of Statistics
PPT
Introduction To Statistics
PPTX
Normal Distribution and its characteristics.pptx
PPTX
Sampling and Sampling Distributions
Chapter 5 part1- The Sampling Distribution of a Sample Mean
Why we run cronbach’s alpha
Range, quartiles, and interquartile range
Introduction to Statistics - Basic concepts
Meaning and Importance of Statistics
Introduction To Statistics
Normal Distribution and its characteristics.pptx
Sampling and Sampling Distributions

What's hot (20)

PPTX
Intro to probability
PPTX
The Central Limit Theorem
PPTX
The Standard Normal Distribution
PPTX
Basic Statistics & Data Analysis
PPTX
Statistical inference concept, procedure of hypothesis testing
PPTX
Moment introduction
PPTX
Descriptive statistics
PPTX
Central limit theorem
PPTX
Sampling and sampling distributions
PPTX
Sampling Distribution
PPTX
Sampling distribution
PPT
The sampling distribution
PPT
Estimation and hypothesis testing 1 (graduate statistics2)
PPT
Testing Hypothesis
PDF
Population and sample mean
PPTX
Normal Curve
PPT
Measures of central tendency
PPTX
6. point and interval estimation
PPTX
Inter quartile range
PPTX
T distribution
Intro to probability
The Central Limit Theorem
The Standard Normal Distribution
Basic Statistics & Data Analysis
Statistical inference concept, procedure of hypothesis testing
Moment introduction
Descriptive statistics
Central limit theorem
Sampling and sampling distributions
Sampling Distribution
Sampling distribution
The sampling distribution
Estimation and hypothesis testing 1 (graduate statistics2)
Testing Hypothesis
Population and sample mean
Normal Curve
Measures of central tendency
6. point and interval estimation
Inter quartile range
T distribution
Ad

Viewers also liked (12)

PDF
FEC 512.04
PPT
F test Analysis of Variance (ANOVA)
PDF
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
PPT
Estimation
PDF
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
PPTX
Anova (f test) and mean differentiation
PDF
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
PPTX
Theory of estimation
PPT
Chi – square test
PPT
Hypothesis Testing
PDF
Hypothesis testing; z test, t-test. f-test
PPTX
Chi square test
FEC 512.04
F test Analysis of Variance (ANOVA)
Chapter 6 part2-Introduction to Inference-Tests of Significance, Stating Hyp...
Estimation
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Anova (f test) and mean differentiation
Chapter 7 : Inference for Distributions(The t Distributions, One-Sample t Con...
Theory of estimation
Chi – square test
Hypothesis Testing
Hypothesis testing; z test, t-test. f-test
Chi square test
Ad

Similar to Describing Distributions with Numbers (20)

PDF
3. Descriptive statistics.pdf
PPTX
local_media4419196206087945469 (1).pptx
PPTX
Measures of Dispersion.pptx
PPT
Penggambaran Data Secara Numerik
PPT
Coefficient of Variation Business statstis
PDF
Empirics of standard deviation
PPT
Describing quantitative data with numbers
PPTX
Statistics
PPT
ap_stat_1.3.ppt
PDF
Measures of Variability By Dr. Vikramjit Singh
PDF
Measures of Variability By Dr. Vikramjit Singh
PPTX
Measures of Central Tendency, Variability and Shapes
PPT
Measures of dispersion
PPTX
Measures of dispersion
PPT
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
PPT
Lecture 29-Description Data I (Summary measures and central tendency).ppt
PDF
Module 3 statistics
PPT
Measures of dispersion
PDF
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
PPTX
Biostatistics cource for clinical pharmacy
3. Descriptive statistics.pdf
local_media4419196206087945469 (1).pptx
Measures of Dispersion.pptx
Penggambaran Data Secara Numerik
Coefficient of Variation Business statstis
Empirics of standard deviation
Describing quantitative data with numbers
Statistics
ap_stat_1.3.ppt
Measures of Variability By Dr. Vikramjit Singh
Measures of Variability By Dr. Vikramjit Singh
Measures of Central Tendency, Variability and Shapes
Measures of dispersion
Measures of dispersion
Measure of dispersion by Neeraj Bhandari ( Surkhet.Nepal )
Lecture 29-Description Data I (Summary measures and central tendency).ppt
Module 3 statistics
Measures of dispersion
Unit 1 - Measures of Dispersion - 18MAB303T - PPT - Part 2.pdf
Biostatistics cource for clinical pharmacy

More from nszakir (16)

PDF
Chapter-4: More on Direct Proof and Proof by Contrapositive
PDF
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
PDF
Chapter 2: Relations
PDF
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
PDF
Chapter 4 part4- General Probability Rules
PDF
Chapter 4 part3- Means and Variances of Random Variables
PDF
Chapter 4 part2- Random Variables
PDF
Chapter 4 part1-Probability Model
PDF
Chapter 3 part3-Toward Statistical Inference
PDF
Chapter 3 part2- Sampling Design
PDF
Chapter 3 part1-Design of Experiments
PDF
Chapter 2 part2-Correlation
PDF
Chapter 2 part1-Scatterplots
PDF
Chapter 2 part3-Least-Squares Regression
PDF
Density Curves and Normal Distributions
PDF
Displaying Distributions with Graphs
Chapter-4: More on Direct Proof and Proof by Contrapositive
Chapter-3: DIRECT PROOF AND PROOF BY CONTRAPOSITIVE
Chapter 2: Relations
Chapter 5 part2- Sampling Distributions for Counts and Proportions (Binomial ...
Chapter 4 part4- General Probability Rules
Chapter 4 part3- Means and Variances of Random Variables
Chapter 4 part2- Random Variables
Chapter 4 part1-Probability Model
Chapter 3 part3-Toward Statistical Inference
Chapter 3 part2- Sampling Design
Chapter 3 part1-Design of Experiments
Chapter 2 part2-Correlation
Chapter 2 part1-Scatterplots
Chapter 2 part3-Least-Squares Regression
Density Curves and Normal Distributions
Displaying Distributions with Graphs

Recently uploaded (20)

PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Institutional Correction lecture only . . .
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
Basic Mud Logging Guide for educational purpose
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Institutional Correction lecture only . . .
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Renaissance Architecture: A Journey from Faith to Humanism
Pharma ospi slides which help in ospi learning
Week 4 Term 3 Study Techniques revisited.pptx
Microbial diseases, their pathogenesis and prophylaxis
Basic Mud Logging Guide for educational purpose
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial disease of the cardiovascular and lymphatic systems
STATICS OF THE RIGID BODIES Hibbelers.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
Abdominal Access Techniques with Prof. Dr. R K Mishra
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Cell Structure & Organelles in detailed.
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...

Describing Distributions with Numbers

  • 1. 1 INTRODUCTION TO STATISTICS & PROBABILITY Chapter 1: Looking at Data—Distributions (Part 2) 1.2 Describing Distributions with Numbers Dr. Nahid Sultana
  • 2. 1.2 Describing Distributions with Numbers 2 Objectives  Measures of center: mean, median  Measures of spread: quartiles, standard deviation  Five-number summary and boxplot  IQR and outliers  Choosing among summary statistics  Changing the unit of measurement
  • 3. Measures of center: The Mean 3  The most common measure of center is the arithmetic average, or mean, or sample mean.  To calculate the average, or mean, add all values, then divide by the number of individuals.  It is the “center of mass.”  If the n observations are x1, x2, x3, …, xn, their mean is: sum of observations x1  x2  ...  xn x  n n 1 or in more compact notation, x  n  xi
  • 4. Measures of center: The Mean (cont…) 4 Find the mean: Here are the scores on the first exam in an introductory statistics course for 10 students: 80 73 92 85 75 98 93 55 Find the mean first-exam score for these students. Solution: 80 90
  • 5. Measuring Center: The Median 5  Another common measure of center is the median.  The median M is the midpoint of a distribution, the number such that half of the observations are smaller and the other half are larger. To find the median of a distribution: 1. Arrange all observations from smallest to largest. 2. If the number of observations n is odd, the median M is the center observation in the ordered list. 3. If the number of observations n is even, the median M is the average of the two center observations in the ordered list.
  • 6. Measuring Center: The Median (cont...) 6 Find the median: Here are the scores on the first exam in an introductory statistics course for 10 students: 80 73 92 85 75 98 93 55 80 Find the median first-exam score for these students. Solution: 90 Note: The location of the median is (n + 1)/2 in the sorted list.
  • 8. Comparing Mean and Median (Cont...) 8  The mean and the median are the same only if the distribution is symmetrical.  In a skewed distribution, the mean is usually farther out in the long tail than is the median.  The median is a measure of center that is resistant to skew and outliers. The mean is not.
  • 9. Measuring Spread: The Quartiles 9 A measure of center alone can be misleading. A useful numerical description of a distribution requires both a measure of center and a measure of spread.  We describe the spread or variability of a distribution by giving several percentiles.  The median divides the data in two parts; half of the observations are above the median and half are below the median. We could call the median the 50th percentile.  The lower quartile (first quartile, Q1)is the median of the lower half of the data; the upper quartile (third quartile, Q3) is the median of the upper half of the data.  With the median, the quartiles divide the data into four equal parts; 25% of the data are in each part
  • 10. Measuring Spread: The Quartiles (Cont.) Calculate the quartiles and inter-quartile: 10 1. Arrange the observations in increasing order and locate the median M. 2. The first quartile Q1 is the median of the lower half of the data, excluding M. 3. The third quartile Q3 is it is the median of the upper half of the data, excluding M.
  • 11. Measuring Spread: The Quartiles (Cont.) 11 Example: Here are the scores on the first-exam in an introductory statistics course for 10 students: 80 73 92 85 75 98 93 55 80 90 Find the quartiles for these first-exam scores. Solution: In order, the scores are: 55 73 75 80 80 85 90 92 93 98 The median is, Q1 = 75, the median of the first five numbers: 55, 73, 75, 80, 80. Q3 = 92, the median of the last five numbers: 85, 90, 92, 93, 98.
  • 12. The Five-Number Summary 12 The five-number summary of a distribution consists of  The smallest observation (Min)  The first quartile (Q1)  The median (M)  The third quartile (Q3)  The largest observation (Max) written in order from smallest to largest. Minimum Q1 M Q3 Maximum
  • 13. Boxplots 13 A boxplot is a graph of the five-number summary.  Draw a central box from Q1 to Q3.  Draw a line inside the box to mark the median M.  Extend lines from the box out to the minimum and maximum values that are not outliers.
  • 14. Boxplots (Cont…) 14 Example: Here are the scores on the first-exam in an introductory statistics course for 10 students: 80 73 92 85 75 98 93 Make a boxplot for these first-exam scores. Solution: In order, the scores are: 55, 73, 75, 80, 80, 85, 90, 92, 93, 98 Min = 55 Q1 = 75 M = 82.5 Q3 = 92 Max = 98 55 80 90
  • 15. Comparing Boxplots to Histograms 15 15
  • 16. Boxplots and skewed data 16 Years until death Boxplots for a symmetric and a right-skewed distribution 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Boxplots show symmetry or skew. Disease X Multiple Myeloma
  • 17. Suspected Outliers: 1.5  IQR Rule 17  Outliers are troublesome data points, and it is important to be able to identify them. The interquartile range IQR is the distance between the first and third quartiles, IQR = Q3 − Q1  IQR is used as part of a rule of thumb for identifying outliers. The 1.5  IQR Rule for Outliers Call an observation an outlier if it falls more than 1.5  IQR above the third quartile or below the first quartile.  Suspected low outlier: any value < Q1 – 1.5  IQR  Suspected high outlier: any value > Q3 + 1.5  IQR
  • 18. Suspected Outliers: 1.5  IQR Rule (Cont..) 18 Individual #25 has a value of 7.9 years, which is 3.55 years above the third quartile. This is more than 1.5 * IQR =3.225 years. Thus, individual #25 is a suspected outlier.
  • 19. Suspected Outliers: 1.5  IQR Rule (Cont..) 19  Modified boxplots plot suspected outliers individually.  The 8 largest call lengths are 438, 465, 479, 700, 700, 951, 1148, 2631  They are plotted as individual points, though 2 of them are identical and so do not appear separately.
  • 20. Measuring Spread: The Standard Deviation 20 The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation.  The standard deviation s measures the average distance of the observations from their mean.  It is calculated by  This average squared distance is called the variance.
  • 21. Calculating The Standard Deviation 21 1. Calculate mean 2. Calculate each deviation, deviation = observation – mean 3. Square each deviation 4. Calculate the sum of the squared deviations 5. Divided by degrees freedom, (df) = (n-1), this is called the variance. 6. Calculate the square root of the variance…this is the standard deviation. The variance = 52/(9 – 1) = 6.5 Standard deviation = 6.5 = 2.55 xi (xi-mean) (xi-mean)2 1 1 - 5 = -4 (-4)2 = 16 3 3 - 5 = -2 (-2)2 = 4 4 4 - 5 = -1 (-1)2 = 1 4 4 - 5 = -1 (-1)2 = 1 4 4 - 5 = -1 (-1)2 = 1 5 5-5=0 (0)2 = 0 7 7-5=2 (2)2 = 4 8 8-5=3 (3)2 = 9 9 9-5=4 (4)2 = 16 Mean=5 Sum=0 Sum=52
  • 22. Properties of The Standard Deviation 22  s measures spread about the mean and should be used only when the mean is the measure of center.  s = 0 only when all observations have the same value and there is no spread. Otherwise, s > 0.  s is not resistant to outliers.  s has the same units of measurement as the original observations.
  • 23. Choosing Measures of Center and Spread 23 We now have a choice between two descriptions for center and spread  Mean and Standard Deviation  Median and Interquartile Range  The median and IQR are usually better than the mean and standard deviation for describing a skewed distribution or a distribution with outliers.  Use mean and standard deviation only for reasonably symmetric distributions that don’t have outliers. NOTE: Numerical summaries do not fully describe the shape of a distribution. ALWAYS PLOT YOUR DATA FIRST!
  • 24. Changing the Unit of Measurement 24  Variables can be recorded in different units of measurement.  Most often, one measurement unit is a linear transformation of another measurement unit: xnew = a + bx. Example 1: If a distance x is measured in kilometers, the same distance in miles is xnew = 0.62 x This transformation changes the units without changing the origin —a distance of 0 kilometers is the same as a distance of 0 miles. Example 2: Temperatures can be expressed in degrees Fahrenheit or degrees Celsius. This transformation changes both the unit; size and the origin of the measurements —The origin in the Celsius scale (0◦C, the temperature at which water freezes) is 32◦ in the Fahrenheit scale.
  • 25. Changing the Unit of Measurement (Cont…) 25  Linear transformations do not change the basic shape of a distribution (skew, symmetry).  But they do change the measures of center and spread:  Multiplying each observation by a positive number b multiplies both measures of center (mean, median) and spread (IQR, s) by b.  Adding the same number a (positive or negative) to each observation adds a to measures of center and to quartiles but it does not change measures of spread (IQR, s).