Principlles of statistics

1
SECTION 1
Statistics: is the science of obtaining data, organizing, summarizing, and
presenting, analyzing, interpreting and drawing conclusions based on the data to
give the best decision.
Statistics divided in to two distinct parts:
1- Descriptive Statistics: It is concerned only with the collection, organization,
summarizing, analysis and presentation of an array of numerical qualitative or
quantitative data. Descriptive statistics include the mean, median, mode, standard
deviation, range, etc.
2- Inferential Statistics: it is consist of methods for drawing conclusion based on
the data to give the best decision. Its divide in to two parts also:
A- Estimation
B- Testing Hypothesis
Population: Is the complete collection of all elements to be studied.
Finite (countable) Population: A population is called finite if it is possible to
count its individuals. For example, the number of students in Shaqlawa technical
institute or number of computers in a libratory.
Infinite (uncountable) Population: A population is called infinite if it is
impossible to count its individuals, for example the number of bacteria's in a
garden, number of fishes in a sea.
Census: is the collection of data from every elements of population.
Sample: is a sub- collection of elements drawn from a population.
Sampling: the process of selecting a subset of data from the population is called
Sampling.

2
Sources of collecting the data:
1- Historical Sources
2- Field Sources
Probability (Random) Samples are drawn from populations through several
different sampling methods:
1- Simple Random Sampling
Every member of the population (N) has an equal chance of being selected for
your sample (n). This is arguably the best sampling method, as your samples
almost guaranteed to be representative of your population. However, it is rarely
ever used due to being too impractical.
2- Systematic Sampling
In this method, every nth individual from the population (N) is placed in the
sample (n). For example, if you add every 7th individual to walk out of a
supermarket to your sample, you are performing systematic sampling.
3- Stratified Sampling
A general problem with random sampling is that you could, by chance, miss out a
particular group in the sample. However, if you form the population into groups,
and sample from each group, you can make sure the sample is representative. In
METHODS OF
COLLECTING THE DATA
SAMPLES
PROBABILITY
(RANDOM)
NON
PROBABLITY
CENSUS

3
stratified sampling, the population is divided into groups called strata. A sample is
then drawn from within these strata. Some examples of strata commonly used by
the ABS are States, Age and Sex. Other strata may be religion, academic ability
or marital status.
4- MULTI-STAGE SAMPLING
Multi-stage sampling is like cluster sampling, but involves selecting a sample
within each chosen cluster, rather than including all units in the cluster. Thus,
multi-stage sampling involves selecting a sample in at least two stages. In the first
stage, large groups or clusters are selected. These clusters are designed to contain
more population units than are required for the final sample. In the second stage,
population units are chosen from selected clusters to derive a final sample. If
more than two stages are used, the process of choosing population units within
clusters continues until the final sample is achieved.
Variable: is a characteristic or property of the elements in the population. The
name of variable is derived from the fact that any particular characteristic may
vary among the elements in a population.
Variables
Quantitative variables
Descrete variables
(Number of students)
Continuous variables
(Hieght, Weight)
Qualitative (descriptive)
variables

4
Section 2
Frequency Distribution (Table):
After a researcher might have gotten a raw data from any source, there is a need
for the raw data (ungrouped) to be arranged and organized in a meaningful way in
order to be able to describe and come up with a useful inference. The method that
is being used for such organization and arrangement is called frequency
distribution. Frequency means the number of times something happens.
Frequency distribution simply means organizing of raw data in table from using
classes and frequencies.
1- Frequency Distribution for Qualitative variables:
Frequency Distribution for Qualitative variables lists all classes and the number of
elements that belong to each of the classes.
Example1: the following list gives the rank of a sample that consists of 25 clerks
in Soran institute:
Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher,
Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant
Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher,
Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher,
Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher.
Create a frequency distribution for the above data.
Solution:
FrequencyClasses (rank)
4Lecturer
5Assistant lecturer
6Researcher
10Assistant Researcher
25Total

5
Relative Frequency of a Class:
The relative frequency of a class is obtained by dividing the frequency of class by
the sum of the all frequencies.
Example 2: depending on the previous example, calculate the relative frequency.
Solution:
Relative FrequencyFrequencyClasses (rank)
4/25=0.164Lecturer
5/25=0.25Assistant lecturer
6/25=0.246Researcher
10/25=0.410Assistant Researcher
125Total
2- Frequency Distribution for Quantitative variables
Total Range (T.R): is equal to highest value minus lowest value in the data set.
Number of classes: the appropriate number of classes may be decided by Yules
formula which is as follows:
Number of classes= where n is the total number of observation.
Class Width= T.R/ No. of classes
Class Width (Length) is the difference between two consecutive lower class limit
or two consecutive lower class boundaries. The class width can be found by the
following formula:
Frequency (F): is the number of values in a specific class of the distribution.
4
n2.5

6
A- Frequency Distribution for Discrete variables:
The lower and upper limits of the frequency distribution of discrete variables are
as below:
frequency
Class
Upper limitLower limit
f1Xs+W-1Xs
f2Xs+2W-1Xs+W
f3Xs+3W-1Xs+2W
.
.
.
.
.
.
fmXs+M.W-1Xs+(M-1)W
Where:
Xs: the lowest value
W: class width
M: number of classes
Example3: Construct the frequency distribution for the following data:
60 76 80 120 132 82 90 65 68 142 157 164 88
90 98 101 103 110 119 116 120 126 109 114 120 122
111 116 90 78 93 95 98 104 120 113 121 119 125
126 130 131 136 118 120 142 150 154 122 123 139 125
106 154 136 137 110 137 72 150
Total Range (T.R) = 164-60=104
Number of Classes (M) = 2.5(2.783) = 6.958 = 7
Length of Classes (L) = 104/7=14.86 = 15

7
Class Frequency Midpoint Relative Frequency
60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067
75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083
90 – 104 10 97 =10/60 = 0.167
105 – 119 12 112 =12/60 = 0.200
120 – 134 16 127 =16/60 = 0.267
135 – 149 7 142 =7/60 = 0.117
150 – 164 6 157 =6/60 = 0.100
Total 60 1
B- Frequency Distribution for continuous variables:
The lower and upper limits of the frequency distribution of continuous variables
are as below:
frequency
Class
Upper limitLower limit
f1Xs+WXs
f2Xs+2WXs+W
f3Xs+3WXs+2W
.
.
.
.
.
.
fmXs+M.WXs+(M-1)W
Example4: construct a frequency distribution for below data:
1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8
5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3
5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5
3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9

8
Cumulative Frequency Distribution
A- Ascending Cumulative Frequency Distribution
Ascending Cumulative Frequency Distribution is the total frequency of all values
less than the upper class boundary of a given class interval.
Example5: Construct an Ascending Cumulative Frequency Distribution
depending on the example 3.
Classes Frequency
Upper Limit of
Class
Ascending Cumulative
Frequency
60 - 74 4 74 Less than or equal to 74= 4
75 – 89 5 89 Less than or equal to 89= 9
Total 60
B- Descending Cumulative Frequency Distribution
Descending Cumulative Frequency Distribution is the total frequency of all values
Greater than the lower class boundary of a given class interval.
Example6: Construct a descending cumulative frequency distribution depending
on the example 4.
Classes Frequency Lower Limit of Class Descending Cumulative Freq.
0 - 2 1 0 Greater than or equal to 0= 50
Total 50

9
Charts
The graphical presentation of statistical data is using statistical charts. There are
several kinds of charts for representing set of data, such as:
Bar- Charts
A bar chart is a chart composed of bars whose heights are the frequencies of the
different classes. (Qualitative Variables)
Example7: Display the below data as a bar chart.
Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green,
Red, Red
Solution:
In the first step we will create a frequency table for this data:
Color Frequency
red 6
green 6
blue 4
Then we use this table for creating a bar chart
0
1
2
3
4
5
6
7
red green blue
Frequency
Color

11
Histogram
A histogram is similar to bar charts, but it is used for representing the quantitative
variable rather than qualitative variables.
Example8: Draw a histogram for the following frequency distribution.
Classes Frequency
60 - 74 4
75 – 89 5
90 – 104 10
105 – 119 12
120 – 134 16
135 – 149 7
150 – 164 6
Total 60
Solution:
0
2
4
6
8
10
12
14
16
18
60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164
Frequency
Classes

11
Pie Chart
A pie chart is a circle divided into sectors, where each sector represents a category
(relative frequency of each class) of data that is proportional to the total amount of
data collected.
We can calculate the angle size of each class by the following rule:
Angle size of class= relative of the class X 360o
Example9: Draw a pie chart for the data in example 1.
Angle SizeRelative FrequencyFrequencyClass
0.16*360=57.64/25=0.164Lecturer
0.2*360=725/25=0.25Assistant lecturer
0.24*360=86.46/25=0.246Researcher
0.4*360=14410/25=0.411Assistant Researcher
360125Total
Lecturer
57.6o
Assistant
lecturer
72o
Researcher
86.4
Assistant
Researcher
1440

12
Frequency Polygon
It is a chart that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes.
Example10: draw a frequency distribution for the frequency distribution in
example3.
Frequency Curve
Frequency curve is like a frequency polygon, but there is one difference between
them, instead of using lines to connect midpoints a smooth curve will be used.
Example 11: draw a frequency curve for the data in example 4.
0
2
4
6
8
10
12
14
16
18
67 82 97 112 127 142 157
Frequency
Midpoints
1
7
15 15
8
4
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12
Frequency
Midpoints
1 3 5 7 9 11

13
Cumulative Frequency Chart
It is a chart that represents the cumulative frequencies of classes in frequency
distribution.
Example 12: Construct an ascending cumulative frequency chart for the data in
example 4.
Example13: Construct a descending cumulative frequency chart for the data in
example 4.
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Upper Limit of classes
2 4 6 8 10 12
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Lower Limit of Classes
2 4 6 8 10 12

14
Exercise 1: complete the following frequency distributions if the widths of
classes are equal.
Class Midpoint Class Midpoint
3 8 6
18
Class Midpoint
14
26
Exercise2: the height of 35 students were noted and shown as follows:
170 180 175 165 160 155 180 190 185 170 174 178
165 169 186 179 161 171 159 168 177 164 191 140
173 181 177 173 166 162 168 184 168 158 155
Find the following:
1- Frequency distribution
2- Midpoints
3- Descending cumulative frequency
4- Relative frequency
And draw:
a) Histogram b) frequency polygon

15
SECTION 3
Notations
In this section we will represent some useful notations before explaining the
subjects that related to measures of central tendency and measures of dispersion
(variation).
1- Summation Notation (  )
The symbol
n
i
iX
1
, read as (the summation of X), where n is the number of
observations and (i) is the subscript for the order of values.
Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X
is represent as follow:
2010532
432
1
4
1
1

  
XXXXXX
n
i i
ii
Symbol Operation
n
n
i
i XXXX 
21
1
Sum of observations
22
2
2
1
1
2
n
n
i
i XXXX 
 Sum of Square of observations
 2
21
2
1
n
n
i
i XXXX 






 Square of Sum of observations
Let X and Y are random variables and a is a constant then
ana
XaaX
n
i
n
i
i
n
i
i
.
1
11






 
  





n
i
i
n
i
i
n
i
ii
n
i
i
n
i
i
YXYX
anXaX
111
11
.



16
 
nn
n
i
ii
n
i
i
n
i
i
n
i
ii
YXYXYXYX
YbXabYaX
.... 2211
1
111








 





 n
i
i
n
i
i XX
1
2
2
1
Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
2
c- 
n
i
iX
1
2 d-  

n
i
iX
1
3
Solution:
    11213)4(333)
26)13.(222)
511534
)
131534
)
4
1
4
11
4
11
2222
2
4
2
3
2
2
2
1
4
1
2
1
2
4321
4
11














i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
XXXd
XXc
XXXXXXb
XXXXXXa

17
2- Pie Notation  )(
The symbol 
n
i
iX
1
is used to multiplication of all values of Xi’s, or:
n
n
i
i XXXX .. 21
1








n
i
i
n
n
i
i
n
i
n
XaaX
aa
11
1
.
Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
5
Solution:
1203*5*2*4
...) 4321
4
11

  
XXXXXXa
i
i
n
i
i
b)
75000)120.(5
.55
4
11

  
n
i
i
n
n
i
i XX
Exercise: If
Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following:
a- 
4
1
2
i
iX b- 
4
1
3
i
iY c- 2
4
1
. i
i
i YX
d-  

n
i
ii YX
1
e- 4
4
1
66 i
f- 
4
1i
iX g- 
n
i
iY
1
4 h- i
n
i
i YX .2
1

j- 
4
1i i
i
Y
X
k-   2.3
4
1

i
i
i YX

18
SECTION 4: MEASURES OF CENTRAL TENDENCY
In the previous sections, we have studied how to collect raw data, its classification
and tabulation in a useful form, which contributes in solving many problems of
statistical concern. Yet, this is not sufficient, for in practical purposes, there is
need for further condensation, particularly when we want to compare two or more
different distributions. We may reduce the entire distribution to one number
which represents the distribution.
A single value which can be considered as typical or representative of a set of
observations and around which the observations can be considered as Centered is
called an ’Average’ (or average value) or a Center of location. Since such typical
values tend to lie centrally within a set of observations when arranged according
to magnitudes, averages are called measures of central tendency.
So the measure of central tendency is a value at the center or middle of a data set.
This value represents all data of the group.
The fundamental measures of tendencies are:
(1) Arithmetic Mean
(2) Weighted Mean
(3) Harmonic Mean
(4) Quadratic Mean
(5) Mode
(6) Median
However the most common measures of central tendencies or locations are:
Arithmetic mean, median and mode.

19
1)Arithmetic Mean
The arithmetic mean (generally called mean) is the sum of all observations
(values of all items) together and divides this sum by the number of observations
(or items). The symbol X (pronounced as X bar) represents the sample mean and
 represents the population mean.
Arithmetic mean for ungrouped data
Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the
Arithmetic mean is obviously:
n
XXXX
n
X
X n
n
i
i


 3211
Where: Xi = the ith
observation.
n = the size of the data.
The mean for a population consisting N observations is:
N
XXXX
N
X
N
N
i
i


 3211

Example: Calculate the arithmetic mean of the given values:
98 96 95 98 100 92 96 69
Solution:
93
8
699692100989596981




n
X
X
n
i
i

21
Arithmetic mean for grouped data:
The arithmetic mean of grouped data is found by multiplying every midpoints (i.e.
value of x) by its corresponding frequency (fi) then their total (sum) is found
 ii xf . , and then dividing this sum by the  if .


i
ii
f
xf
X
.
The above formula will be sample data. Similar formulas are used for population data.
Example: Determine the mean for the following set of data.
Classes Frequency
8 - 2
10 3
12 5
14 4
16 1
Solution:


i
ii
f
xf
X
.
Classes Frequency (fi) Midpoint (xi) fi . xi
8- 2 9 18
10- 3 11 33
12- 5 13 65
14- 4 15 60
16- 1 17 17
Total 15 193
87.12
15
193
X

21
The Properties of the Arithmetic Mean:
1- The sum of the deviations, of all the values of x, from their mean, is zero.
0)(
:
)(
1
1
1
1 1






 



 
XnXnXX
then
XXn
n
X
Xhavewe
XnXXX
n
i
i
n
i
i
n
i
i
n
i
n
i
ii
2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn
observations respectively, the mean of the groups combined is:




 k
i
i
k
i
ii
n
Xn
X
1
1
.
3- The sum of squares of the deviations from the mean is smaller than from
any other value. (prove this property)
Advantage (merits) of Arithmetic mean
1- It is easy to calculate and simple to understand.
2- It is very popular (most widely used).
3- It is based on all the observations; so that it becomes a good representative.
Disadvantage (demerits) of Arithmetic mean
1- It is affected by outliers or extreme values.
2- It cannot be obtained if a single observation is missing or lost;
3- It cannot be calculated in case open-frequency distributions.
4- It cannot be computed for qualitative data.

22
2) Weighted Arithmetic Mean:
One of the limitations of the arithmetic mean is that it gives equal importance
to all the items. But there are cases where the relative importance of the different
items is not the same. When this is so, we compute weighted arithmetic mean.
The formula for computing weighted arithmetic mean in case of ungrouped data
is:
WWW
XWXWXW
W
XW
n
nn
n
i
i
i
n
i
i
WX









21
2211
1
1
Where, Wi is the weight of ith
observation.
The formula for computing weighted arithmetic mean in case of grouped data is:
nn
nnn
n
i
ii
i
n
i
i
W
fff
ff
f
i
X
WWW
xWXWXfW
W
XfW









2211
22211
1
1 1
Example: The marks of a student in the final examination of Statistics department
are as follows:
Subjects (Xi): 98 96 95 98 100 92 96 69
Units (Wi): 2 3 3 1 3 3 2 2
Calculate the weighted mean.
Solution:
3158.93
19
1773
22331332
)2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98(
1
1









W
n
i
i
i
n
i
i
W
X
X
W
XW

23
Remark: If all the weights are equal, then the weighted mean is the same as the
arithmetic mean.
Exercise1: The average marks of three groups of students having 70, 50 and 30
students respectively are 50, 55 and 45. Find the average marks of all the 150
students, taken together.
Exercise2: following frequency distribution showing the marks obtained by 50
students in statistics at Soran institute. Find the arithmetic mean.
Classes Frequency (fi)
20 - 29 1
30 - 39 5
40 - 49 12
50 - 59 15
60 - 69 9
70 - 79 6
80 - 89 2
Exercise3: The mean of a certain number of observations is 40. If two items with
values 50 and 64 are added to this data, the mean rises to 42. Find the number of
items in the original data.
Exercise4: If 

n
i
iX
1
72)4( and 

n
i
iX
1
3)7( , then find the number of
observation (n).

24
3) Harmonic Mean
Harmonic mean is one of the measures of central tendency, which are used less
than other measures (mean, median and mode).
The formula for computing weighted arithmetic mean in case of ungrouped data
is:

 n
i i
h
X
n
X
1
1
And for grouped data is:



 n
i i
i
i
h
X
f
f
X
1
Example: calculate the harmonic mean for the following data:
Xi: 8 2 5 3 4 7 8
Solution:

 n
i i
h
X
n
X
1
1
:
1
iX
0.13 0.5 0.2 0.33 0.25 0.14 0.13
167.4
68.1
7
68.1
1
 h
i
X
X
4) Quadratic mean
n
X
X
n
i
i
q

 1
2
for Ungrouped data




 n
i
i
n
i
ii
q
f
Xf
X
1
1
2
for grouped data

25
5) MODE
The mode (Mo) is the value that occurs most often in a data set.
Mode for ungrouped data:
The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5
because it is repeated more than other numbers (6 times).
Remark: When 2 numbers occur with the same greatest frequency, each one is
mode and the data set is bimodal. When more than 2 numbers occur with the same
greatest frequency, each is a mode and the data set is said to be multimodal. When
no number is repeated, we say that there is no mode.
Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5.
Solution: Number 5 and 7 are both modes. The data set is bimodal.
Mode for grouped data:
Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …,
fn) represent the frequencies. The modal class is that class which has the highest
frequency. The formula of obtaining the mode is as follows:
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
Where:
Lk: lower limit of modal class.
fk: modal class frequency
fk-1: frequency of previous class
fk+1: frequency of next class
Wk: Size of modal class interval (class width).

26
Example: Find the mode for the following frequency distribution:
Solution:
Modal class is 30 – 39 because it has a highest frequency (10).
Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
3610
5
3
30
10
)810()710(
)710(
30




Mo
Remark1: If there are 2 or more modal classes; therefore, to find the model class
we must use assembly method.
Remark2: When we use assembly method, the formula of mode will be:
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
Remark3: If the widths of the classes are not equal, in this case adjusted
frequency must be used instead of real frequency. Where adjusted frequency for
each class is equal to
i
i
W
f
.
Class frequency
10 – 19 5
20 – 29 7
30 – 39 10
40 – 49 8
50 – 59 4
60 – 69 3
70 – 79 1

27
Solution:
There are 2 modal classes, therefore, to find the model class we must use
assembly method and it is as follows:
From the previous table we can abstract the following table:
Serial No.
Of column
Greatest frequency
appears in the column
Contributor
Class
1 4 1, 2
2 8 1, 2
3 7 2, 3
4 11 1, 2, 3
5 9 2, 3, 4
Then the 2nd
class is the modal class
Class frequency
10 – 19 4
20 – 29 4
30 – 39 3
40 – 49 2
50 – 59 3
60 – 69 3
70 – 79 1
Class frequency
1st
assembly 2nd assembly 3rd
assembly
4th
assembly
10 – 19 4
8
1120 – 29 4
7
930 – 39 3
5
40 – 49 2
5
850 – 59 3
6
760 – 69 3
4
70 – 79 1

28
Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
2010
3444
)44(
20 


Mo
Advantage of Mode
1- It is easy to calculate.
2- It is not affected by extreme values.
3- It can be used for qualitative data.
4- It can be located graphically (Histogram).
5- It can be calculated for distributions with open end classes.
Disadvantage of Mode
1- It is not based upon all the observations.
2- It is not always possible to find a clearly defined mode (2 modes or 3
modes).
3- It is not capable of further mathematical treatment.
Exercise: Find the mode for the following frequency distributions:
Class frequency Class frequency
5 – 2 10 – 30
10 – 6 20 – 12
15 – 10 30 – 16
25 – 22 40 – 28
35 – 27 50 – 26
50 – 60 11 60 – 14

29
6) MEDIAN
The Median (Me) is the value of the middle item in a data set and divides the
dataset in to two equal parts, one part comprising all values greater and the other
all values smaller than the median
Median for ungrouped data:
In the first step we will arrange the data in ascending (increasing) order.
If number of observations (n) is odd, the median is the observation that has





 
2
1n
order.
If number of observations (n) is even, then the median is the average of
observations that have order 





2
n
and 





1
2
n
.
Example: Find the median of the following data set:
55, 62, 53, 70, 68, 65, 63, 79, and 80.
Solution:
Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80.
Since n=9 is odd, then the order of median is 




 
2
1n
5
2
19
2
1





 





 n
Then the 5th
observation is the value of median or Me=65.
Example: Find the median of the following data set:
20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25.
Solution:
Arranging the data in increasing order
18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30
2366
2
12
2
isvalueththe
n













31
25771
2
12
1
2
isvalueththe
n













Then:
24
2
2523


Me
Median for grouped data:
To find the median of a frequency distribution, follow these steps:
Step1: Find cumulative frequency (Ascending or descending).
Step2: Compute the median order that equal to
2
 if
.
Step3: If k
i
k F
f
F 


2
1 , then the median class is the class which its order is K .
Step4: Compute the value of median:
k
k
k
i
k
f
W
F
f
LMe .
2
1 







 
 for ascending cumulative frequency.
k
ki
kk
f
Wf
FLMe .
2
*









 for descending cumulative frequency.
Where:
Lk : Lower Limit of median class.
fk : Frequency of the median class.
W: Median class’s width.
 if : Sum of the frequencies.
Fk–1: Ascending cumulative frequency precede the median class.
*
kF : Descending cumulative frequency of the median class.

31
Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 -
no. of families 3 7 14 20 18 12 6
Solution:
In the first step we find ascending cumulative frequency
Then we find the median order that equal to:
40
2
80
2

 if
Compare the median order with ascending cumulative frequency then:
444024
2
1 

 k
i
k F
f
F Then the median class is 4th
class.
Then:
Lk=160, Wk=20, fk=20
4
4
34 .
2 f
W
F
f
LMe
i










176
20
20
.24
2
80
160 





Me
Class frequency
Ascending Cumulative
frequency
100 - 3 3
120 - 7 10
140 - 14 24
160 - 20 44
180 - 18 62
200 - 12 74
220 - 6 80
Total 80

32
Merits of Median
1. It is easy to calculate and understand.
2. It is not affected by extreme values like the arithmetic mean
3. It can be found by mere inspection.
4. It can be used for qualitative studies.
5. It can be calculated for distributions with open-end classes.
6. It can be obtained graphically.
Demerits of Median
1. It is not capable of further algebraic treatment.
2. It is not based on all observations.
Exercise: find the median for the following frequency distribution by using
ascending and descending cumulative frequency.
The relationship between Arithmetic Mean, Median and Mode
If the frequency distribution is symmetric then the following relationship between
these measures is true:
3
o
e
MX
MX


Class frequency
18 - 10
28 - 15
36 - 18
50 - 22
70 - 20
100 - 18
130 - 150 13
Total

33
SECTION 5) Measures of Dispersion (Variation)
Measures that describe the spread of a data set are called measures of dispersion.
The main objective is to know the homogeneity of the values for a data set, or to
compare between the values for two or more than two data set.
1-Range
The simplest measure of absolute variation is the range which calculated by
subtracting the smallest value from the largest value of a data set.
R=Largest value – Smallest value
Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15.
Solution:
R= Largest value – Smallest value=15-2=13
Remark: in case of grouped data we calculate the value of Range by subtracting
the lower limit of first class from the upper limit of last class.
2- Mean Deviation
It is the sum of the absolute deviation of observations from a point (A) divided by
the number of observations.
n
AX
DM
n
i
i

 1
. for ungrouped data
n
AXf
DM
n
i
ii

 1
. for grouped data
Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ).
Example: find the value of mean deviation for the following data by using mean,
median and mode.
Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19
Solution:

34
First we find the value of ( X ) and ( eM ) and ( oM ).
X =8, eM =6, oM =5
Xi XXi  oi MX  ei MX 
2 6 3 4
3 5 2 3
4 4 1 2
5 3 0 1
5 3 0 1
6 2 1 0
7 1 2 1
10 2 5 4
13 5 8 7
14 6 9 8
19 11 14 13
Total 48 44 45
367.4
11
48
)(. 1




n
XX
XDM
n
i
i
0909.4
11
45
)(. 1




n
MX
MDM
n
i
ei
e
4
11
44
)(. 1




n
MX
MDM
n
i
oi
o

35
3- Variance
It is one of the most important measures of absolute variation. The variance can
be calculated by taking the average of the square of the distance (deviation) of
each observation from the mean of data set.
The formula for the population variance ( ) for raw data is:
N
X
n
i
i

 1
2
2
)( 

Where:
X: individual value
µ: population mean
N: population size (number of observations).
Also the formula for the sample variance (S2
) for raw data is as follows:
1
)(
1
2
2




n
XX
S
n
i
i
On the other hand, the formula for the sample variance for grouped data is:
1
)(
1
2
2




n
XXf
S
n
i
ii
Where  ifn
Example: find the variance for the following dataset:
56, 68, 72, 63, 65, 68, 71, 69, 62, 56.
Solution:
1
)(
1
2
2




n
XX
S
n
i
i
65
10
650
10
10
1

i
iX
X

36
Xi )( XXi  2
)( XXi 
56 -9 81
68 3 9
72 7 49
63 -2 4
65 0 0
68 3 9
71 6 36
69 4 16
62 -3 9
56 -9 81
Total 294
then
667.32
110
2942


S
Properties of variance:
1) 02
S
2) If 222
XYii SaSaXY  , where a is a constant. (Prove that)
3) If 22
XYii SSbXY  , where b is a constant. (Prove that)
4) If X and Y are independent variables and iii YX=Z  , then the variance of Z
is:
222
YXZ SSS 
5) If ),...,,( 22
2
2
1 nSSS represent the variance for k groups based on ),...,,( 21 knnn
observations respectively, then the pooled variance of the groups is as follows:






 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
)1(
)1(
where 30in




 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
.
where 30in

37
4-Standard deviation (S)
Standard deviation is the most important and most widely used measure of
absolute variation. Standard deviation is the square root of variance.
1
)(
1
2
2




n
XX
SS
n
i
i
Example: Find the standard deviation of the following frequency distribution.
Solution:
75.175
80
14060
.
1
1





n
i
i
n
i
ii
f
Xf
X
198.30
80
72955
).(
1
2




n
XXf
S
n
i
ii
Class fi Xi
fi.Xi )( XXi  2
)( XXi  2
).( XXf ii 
100 - 3 110 330 -65.75 4323.063 12969.19
120 - 7 130 910 -45.75 2093.063 14651.44
140 - 14 150 2100 -25.75 663.0625 9282.875
160 - 20 170 3400 -5.75 33.0625 661.25
180 - 18 190 3420 14.25 203.0625 3655.125
200 - 12 210 2520 34.25 1173.063 14076.75
220 - 6 230 1380 54.25 2943.063 17658.38
Total 80 14060 72955

38
Coefficient of Variation
A disadvantage of the standard deviation as a comparative measure of variation is
that it depends on the units of measurement. This means that it is difficult to use
the standard deviation to compare measurements from different populations. For
this reason, statisticians have defined the coefficient of variation, which expresses
the standard deviation as a percentage of the sample or population mean.
If X and S represents the sample mean and the sample standard deviation, then
the coefficient of variation (C.V.) is defined to be:
100*..
X
S
VC 
If μ and σ represent the population mean and standard deviation, then the
coefficient of variation CV is defined to be:
100*..


VC
Notice that the numerator and denominator in the definition of CV have the same
units, so CV itself has no units of measurement. This gives us the advantage of
being able to directly compare the variability of two different populations using
the coefficient of variation.
Example1: A company has two sections (A and B) with 40 and 65 employees
respectively. Their average weekly wages are $450 and $350. The standard
deviations are 7 and 9. Which section has larger variability in wages?
Solution:
55.1100*
450
7
100*.. )( 
X
S
VC A
57.2100*
350
9
100*.. )( 
X
S
VC B
Because the C.V for section A is smaller than C.V for section B then, section B
has larger variability. So section A has more homogeneity than section B.

39
Example2: if we know that the mean and standard deviation of heights and
weights of 40 students are as below:
Mean Standard Deviation
Weights 68.34 3.02
Heights 172.55 26.33
Then find the coefficient of variation of height and weight and compare the
results.
Solution:
42.4100*
34.68
02.3
100*. )Weights( 
X
S
VC
26.15100*
55.172
33.26
100*. )( 
X
S
VC Height
So, the Weights (with C.V. =4.42) have less variation than Heights (with
C.V.=15.26).

Principlles of statistics

More Related Content

What's hot (18)

Similar to Principlles of statistics (20)

Recently uploaded (20)

Principlles of statistics