Lesson 5.pdf ....probability and statistics

1
Measures of Dispersion
Measures of dispersion are also called measures of variation or measures of variability. To
see the need for such measures of dispersion, consider the two frequency curves shown in
the figure below. They are both unimodal and symmetric with the same means and
medians, but while one rises sharply on both sides of the mean the other shows less
concentration of the data near the mean with more dispersion outward.
While both curves differ markedly in dispersion, they have the same range, Xl – XS.
Certainly, the range is important in identifying the outer limits of a data distribution, but it
gives no information on what is occurring between these limits. Also, the range is unreliable,
being highly sensitive to extreme values that tend to vary from sample to sample.
Range
The range of a given set of data is obtained by taking the largest value and subtracting the
smallest value in the data set. It is therefore the difference between the largest and smallest
values in a given set of data.
Range = Largest value (Xl) – Smallest value (Xs)
Example: Given the weights (in kg) of four students as 56, 63, 65, 58, then the range of the
weights among the 4 students is Range = 65 – 56 = 9 kg.
The Mean Deviation
Of all the measures of central tendency presented so far, the arithmetic mean is by far the
most important and commonly used. Because of this it is necessary to have a measure of
dispersion around the mean. In measures of dispersion, the term mean will refer to the
arithmetic mean unless otherwise specified. The mean deviation as a measure should: (1) be
calculated from all the data, (2) show with a single number the typical or average dispersion
from the mean, and (3) increase, from data set to data set, with increasing dispersion. Mean
deviation refers to the arithmetic mean of deviations from the mean, denoted by;

2
Whatever the dispersion of the data, all calculations with this formula will always result in
zero for an answer. There are two accepted ways to solve this problem, both of which
eliminate the negative signs from the calculations. The first way is shown in this formula:
where the numerator is now the sum of the absolute values of the deviations, and absolute
values are always positive in sign. This computation is called the mean deviation (or the
average deviation or the mean absolute deviation). It shows the average size of the
deviations from the mean without regard to direction of deviation. It is zero when all values
in a sample are the same and increases across samples with increasing dispersion. While the
mean deviation is a legitimate measure of dispersion from the mean, it is rarely used
because it has limited value in theoretical statistics.
The second way to solve this problem, which we consider when we deal with the variance
and the standard deviation is to square each deviation and use the sum of the squared
deviations in the calculations.
EXAMPLE: Calculate the range and the mean deviation for the samples (a) x1= 1 g, x2 = 3 g,
x3 = 2 g, x4 = 7 g, x5 = 5 g, x6 = 4 g, x7 = 2 g, (b) 1 g, 3 g, 2 g, 7 g, 5 g, 4 g, 200 g.
Solution:
(a)

3
(b)
Frequency Distribution Formula for Mean Deviation
For calculating the mean deviation, just as there was a frequency-distribution formula for
sample mean, there is also a frequency-distribution formula for sample mean deviation
given by

4
EXAMPLE: Calculate the range and the mean deviation for the sample data given below:
Solution:
Range = 1 .8 cm - 1 .2 cm = 0.6 cm.
Variance
The quantity called the sum of squares (and denoted by SS) for a set of sample data is given
by

5
The variance (or mean squared deviation, or mean sum of squares) of a set of data is the
arithmetic mean of its squared deviations from the arithmetic mean. It is therefore defined
by the definitional formula for the variance
The numerator is the sample sum of squares (SS).
EXAMPLE: Calculate the variance for this sample of lengths (in cm): 3, 4, 5, 6, 7.
Solution:
The algebraically equivalent derived computational formulae for the sample variance are:

6
Standard Deviation
The sample standard deviation is defined by these definitional formulae:

7
And it has these computational formulae:
The standard deviation, on the other hand, is the most important and commonly used
measure of dispersion from the mean in both descriptive and inferential statistics.
Example: Compute the standard deviation of the set of data given by sample of lengths (in
cm): 3, 4, 5, 6, 7.
Solution:
Calculating Standard Deviations from Non-grouped Frequency Distributions
The computational frequency-distribution formulas for sample standard deviations are:

8
Example: Calculate the standard deviation of the following data set
Solution:

9
Therefore, √
Calculating Approximate Standard Deviations from Grouped Frequency Distributions
A standard deviation calculated from a grouped frequency distribution will only
approximate the exact value calculated directly from the data, and it is therefore called an
approximate standard deviation. To make this calculation from grouped data requires the
assumption that all values in a class are equal to the class mark mi. The computational
formula for a standard deviation of a sample data is given by:
Example: Calculate the approximate standard deviation from the grouped frequency
distribution given by:

10
Solution:
Therefore: √
( )
( )
Variance from an Assumed Mean
Letting A to be the assumed mean, let di be the difference between A and observation Xi of
the data set. Here di = Xi – A and the variance S2 of the data set is given by

11
where ̅ is given by
In the case where the observations have corresponding frequencies, the formulae changes to
and
Example
Consider the data set given by 2,3,5,10. Let the assumed mean A be 4. Use this information
to compute the variance of the data.
Solution
The data can be arranged in tabular form as below, and the computations made as shown.
Example
Consider the set of observations in the table below.

12
Take the assumed mean A = 40 and compute the variance of the data.
Solution
One can arrange the data in table form and perform computations as shown below.
The Coefficient of Variation
The coefficient of variation (also called the coefficient of variability, the coefficient of
dispersion, or the relative standard deviation) is defined for a sample data by both:

13
The measures of dispersion we have dealt with previously (range, mean deviation, variance,
standard deviation) are called measures of absolute dispersion because they are calculated
directly from the data and have the units of the original measurements or those units
squared. The coefficient of variation, on the other hand, is called a measure of relative
dispersion because it expresses a measure of absolute dispersion as a proportion (or
percentage) of some measure of average value that is in the same units as the measure of
dispersion. Because the numerator and denominator of the ratios in the measure have the
same units, the resulting measure of relative dispersion has no units.
Example:
You are a biologist studying genetic variation within different species of rodents. One
measure you take for each rodent is body weight in grams. For a sample of 10 males of the
white-footed mouse, you get these results: mean=12.9 g, s = 1.6 g; and for 8 males from the
plains pocket gopher you get these: mean= 545.0 g, S = 32.8 g. Compare the relative
dispersions of these two species.
Solution:
These results show that there is twice as much relative dispersion of body weight among the
mice as there is among the pocket gophers. This greater variation relative to the mean is not
apparent from the standard deviations, which show twenty times more absolute variation
among the pocket gophers.
The Standard Score and The Standardized Variable
For a sample, the standard score (also called the normal deviate, or z score) is defined as

14
For any data distribution, the standard score shows how far any given data value Xi is from
the mean of the distribution in standard deviation units; how many standard deviations the
value is from the mean. A positive z value indicates that Xi is larger than the mean (to its
right in a histogram or polygon) and a negative z value indicates that Xi is smaller than the
mean (to its left). Like the coefficient of variation, the standard score is a relative measure;
while the coefficient shows absolute dispersions relative to their means, the standard score
shows deviations from the mean relative to the standard deviation. Because its units are
numbers of standard deviations, the standard score allows comparisons of relative positions
within distributions that have very different means or different measurement units. When
for any variable X each measurement value in a sample or population is transformed into a z
value, this process is known as standardizing (or normalizing) the variable, and the
resulting variable Z is called a standardized variable.
Example:
Standardize the sample: 3,5,7,9,11.
Solution:
To standardize the sample is to calculate a standard score Zi for each Xi. These scores are
typically reported, as shown below, rounded to the nearest hundredth.

Lesson 5.pdf ....probability and statistics

More Related Content

Similar to Lesson 5.pdf ....probability and statistics (20)

Recently uploaded (20)

Lesson 5.pdf ....probability and statistics