1
SECTION 1
Statistics: is the science of obtaining data, organizing, summarizing, and
presenting, analyzing, interpreting and drawing conclusions based on the data to
give the best decision.
Statistics divided in to two distinct parts:
1- Descriptive Statistics: It is concerned only with the collection, organization,
summarizing, analysis and presentation of an array of numerical qualitative or
quantitative data. Descriptive statistics include the mean, median, mode, standard
deviation, range, etc.
2- Inferential Statistics: it is consist of methods for drawing conclusion based on
the data to give the best decision. Its divide in to two parts also:
A- Estimation
B- Testing Hypothesis
Population: Is the complete collection of all elements to be studied.
Finite (countable) Population: A population is called finite if it is possible to
count its individuals. For example, the number of students in Shaqlawa technical
institute or number of computers in a libratory.
Infinite (uncountable) Population: A population is called infinite if it is
impossible to count its individuals, for example the number of bacteria's in a
garden, number of fishes in a sea.
Census: is the collection of data from every elements of population.
Sample: is a sub- collection of elements drawn from a population.
Sampling: the process of selecting a subset of data from the population is called
Sampling.
2
Sources of collecting the data:
1- Historical Sources
2- Field Sources
Probability (Random) Samples are drawn from populations through several
different sampling methods:
1- Simple Random Sampling
Every member of the population (N) has an equal chance of being selected for
your sample (n). This is arguably the best sampling method, as your samples
almost guaranteed to be representative of your population. However, it is rarely
ever used due to being too impractical.
2- Systematic Sampling
In this method, every nth individual from the population (N) is placed in the
sample (n). For example, if you add every 7th individual to walk out of a
supermarket to your sample, you are performing systematic sampling.
3- Stratified Sampling
A general problem with random sampling is that you could, by chance, miss out a
particular group in the sample. However, if you form the population into groups,
and sample from each group, you can make sure the sample is representative. In
METHODS OF
COLLECTING THE DATA
SAMPLES
PROBABILITY
(RANDOM)
NON
PROBABLITY
CENSUS
3
stratified sampling, the population is divided into groups called strata. A sample is
then drawn from within these strata. Some examples of strata commonly used by
the ABS are States, Age and Sex. Other strata may be religion, academic ability
or marital status.
4- MULTI-STAGE SAMPLING
Multi-stage sampling is like cluster sampling, but involves selecting a sample
within each chosen cluster, rather than including all units in the cluster. Thus,
multi-stage sampling involves selecting a sample in at least two stages. In the first
stage, large groups or clusters are selected. These clusters are designed to contain
more population units than are required for the final sample. In the second stage,
population units are chosen from selected clusters to derive a final sample. If
more than two stages are used, the process of choosing population units within
clusters continues until the final sample is achieved.
Variable: is a characteristic or property of the elements in the population. The
name of variable is derived from the fact that any particular characteristic may
vary among the elements in a population.
Variables
Quantitative variables
Descrete variables
(Number of students)
Continuous variables
(Hieght, Weight)
Qualitative (descriptive)
variables
4
Section 2
Frequency Distribution (Table):
After a researcher might have gotten a raw data from any source, there is a need
for the raw data (ungrouped) to be arranged and organized in a meaningful way in
order to be able to describe and come up with a useful inference. The method that
is being used for such organization and arrangement is called frequency
distribution. Frequency means the number of times something happens.
Frequency distribution simply means organizing of raw data in table from using
classes and frequencies.
1- Frequency Distribution for Qualitative variables:
Frequency Distribution for Qualitative variables lists all classes and the number of
elements that belong to each of the classes.
Example1: the following list gives the rank of a sample that consists of 25 clerks
in Soran institute:
Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher,
Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant
Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher,
Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher,
Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher.
Create a frequency distribution for the above data.
Solution:
FrequencyClasses (rank)
4Lecturer
5Assistant lecturer
6Researcher
10Assistant Researcher
25Total
5
Relative Frequency of a Class:
The relative frequency of a class is obtained by dividing the frequency of class by
the sum of the all frequencies.
Example 2: depending on the previous example, calculate the relative frequency.
Solution:
Relative FrequencyFrequencyClasses (rank)
4/25=0.164Lecturer
5/25=0.25Assistant lecturer
6/25=0.246Researcher
10/25=0.410Assistant Researcher
125Total
2- Frequency Distribution for Quantitative variables
Total Range (T.R): is equal to highest value minus lowest value in the data set.
Number of classes: the appropriate number of classes may be decided by Yules
formula which is as follows:
Number of classes= where n is the total number of observation.
Class Width= T.R/ No. of classes
Class Width (Length) is the difference between two consecutive lower class limit
or two consecutive lower class boundaries. The class width can be found by the
following formula:
Frequency (F): is the number of values in a specific class of the distribution.
4
n2.5
6
A- Frequency Distribution for Discrete variables:
The lower and upper limits of the frequency distribution of discrete variables are
as below:
frequency
Class
Upper limitLower limit
f1Xs+W-1Xs
f2Xs+2W-1Xs+W
f3Xs+3W-1Xs+2W
.
.
.
.
.
.
fmXs+M.W-1Xs+(M-1)W
Where:
Xs: the lowest value
W: class width
M: number of classes
Example3: Construct the frequency distribution for the following data:
60 76 80 120 132 82 90 65 68 142 157 164 88
90 98 101 103 110 119 116 120 126 109 114 120 122
111 116 90 78 93 95 98 104 120 113 121 119 125
126 130 131 136 118 120 142 150 154 122 123 139 125
106 154 136 137 110 137 72 150
Total Range (T.R) = 164-60=104
Number of Classes (M) = 2.5(2.783) = 6.958 = 7
Length of Classes (L) = 104/7=14.86 = 15
7
Class Frequency Midpoint Relative Frequency
60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067
75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083
90 – 104 10 97 =10/60 = 0.167
105 – 119 12 112 =12/60 = 0.200
120 – 134 16 127 =16/60 = 0.267
135 – 149 7 142 =7/60 = 0.117
150 – 164 6 157 =6/60 = 0.100
Total 60 1
B- Frequency Distribution for continuous variables:
The lower and upper limits of the frequency distribution of continuous variables
are as below:
frequency
Class
Upper limitLower limit
f1Xs+WXs
f2Xs+2WXs+W
f3Xs+3WXs+2W
.
.
.
.
.
.
fmXs+M.WXs+(M-1)W
Example4: construct a frequency distribution for below data:
1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8
5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3
5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5
3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9
8
Cumulative Frequency Distribution
A- Ascending Cumulative Frequency Distribution
Ascending Cumulative Frequency Distribution is the total frequency of all values
less than the upper class boundary of a given class interval.
Example5: Construct an Ascending Cumulative Frequency Distribution
depending on the example 3.
Classes Frequency
Upper Limit of
Class
Ascending Cumulative
Frequency
60 - 74 4 74 Less than or equal to 74= 4
75 – 89 5 89 Less than or equal to 89= 9
90 – 104 10 104 Less than or equal to 104= 19
105 – 119 12 119 Less than or equal to 119= 31
120 – 134 16 134 Less than or equal to 134= 47
135 – 149 7 149 Less than or equal to 149= 54
150 – 164 6 164 Less than or equal to 164= 60
Total 60
B- Descending Cumulative Frequency Distribution
Descending Cumulative Frequency Distribution is the total frequency of all values
Greater than the lower class boundary of a given class interval.
Example6: Construct a descending cumulative frequency distribution depending
on the example 4.
Classes Frequency Lower Limit of Class Descending Cumulative Freq.
0 - 2 1 0 Greater than or equal to 0= 50
2 - 4 7 2 Greater than or equal to 2= 49
4 - 6 15 4 Greater than or equal to 4= 42
6 - 8 15 6 Greater than or equal to 6= 27
8 - 10 8 8 Greater than or equal to 8= 12
10 - 12 4 10 Greater than or equal to 10= 4
Total 50
9
Charts
The graphical presentation of statistical data is using statistical charts. There are
several kinds of charts for representing set of data, such as:
Bar- Charts
A bar chart is a chart composed of bars whose heights are the frequencies of the
different classes. (Qualitative Variables)
Example7: Display the below data as a bar chart.
Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green,
Red, Red
Solution:
In the first step we will create a frequency table for this data:
Color Frequency
red 6
green 6
blue 4
Then we use this table for creating a bar chart
0
1
2
3
4
5
6
7
red green blue
Frequency
Color
11
Histogram
A histogram is similar to bar charts, but it is used for representing the quantitative
variable rather than qualitative variables.
Example8: Draw a histogram for the following frequency distribution.
Classes Frequency
60 - 74 4
75 – 89 5
90 – 104 10
105 – 119 12
120 – 134 16
135 – 149 7
150 – 164 6
Total 60
Solution:
0
2
4
6
8
10
12
14
16
18
60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164
Frequency
Classes
11
Pie Chart
A pie chart is a circle divided into sectors, where each sector represents a category
(relative frequency of each class) of data that is proportional to the total amount of
data collected.
We can calculate the angle size of each class by the following rule:
Angle size of class= relative of the class X 360o
Example9: Draw a pie chart for the data in example 1.
Angle SizeRelative FrequencyFrequencyClass
0.16*360=57.64/25=0.164Lecturer
0.2*360=725/25=0.25Assistant lecturer
0.24*360=86.46/25=0.246Researcher
0.4*360=14410/25=0.411Assistant Researcher
360125Total
Lecturer
57.6o
Assistant
lecturer
72o
Researcher
86.4
Assistant
Researcher
1440
12
Frequency Polygon
It is a chart that displays the data by using lines that connect points plotted for the
frequencies at the midpoints of the classes.
Example10: draw a frequency distribution for the frequency distribution in
example3.
Frequency Curve
Frequency curve is like a frequency polygon, but there is one difference between
them, instead of using lines to connect midpoints a smooth curve will be used.
Example 11: draw a frequency curve for the data in example 4.
0
2
4
6
8
10
12
14
16
18
67 82 97 112 127 142 157
Frequency
Midpoints
1
7
15 15
8
4
0
2
4
6
8
10
12
14
16
18
0 2 4 6 8 10 12
Frequency
Midpoints
1 3 5 7 9 11
13
Cumulative Frequency Chart
It is a chart that represents the cumulative frequencies of classes in frequency
distribution.
Example 12: Construct an ascending cumulative frequency chart for the data in
example 4.
Example13: Construct a descending cumulative frequency chart for the data in
example 4.
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Upper Limit of classes
2 4 6 8 10 12
0
10
20
30
40
50
60
1 2 3 4 5 6
Cumulativefrequency
Lower Limit of Classes
2 4 6 8 10 12
14
Exercise 1: complete the following frequency distributions if the widths of
classes are equal.
Class Midpoint Class Midpoint
3 8 6
18
Class Midpoint
14
26
Exercise2: the height of 35 students were noted and shown as follows:
170 180 175 165 160 155 180 190 185 170 174 178
165 169 186 179 161 171 159 168 177 164 191 140
173 181 177 173 166 162 168 184 168 158 155
Find the following:
1- Frequency distribution
2- Midpoints
3- Descending cumulative frequency
4- Relative frequency
And draw:
a) Histogram b) frequency polygon
15
SECTION 3
Notations
In this section we will represent some useful notations before explaining the
subjects that related to measures of central tendency and measures of dispersion
(variation).
1- Summation Notation (  )
The symbol
n
i
iX
1
, read as (the summation of X), where n is the number of
observations and (i) is the subscript for the order of values.
Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X
is represent as follow:
2010532
432
1
4
1
1

  
XXXXXX
n
i i
ii
Symbol Operation
n
n
i
i XXXX 
21
1
Sum of observations
22
2
2
1
1
2
n
n
i
i XXXX 
 Sum of Square of observations
 2
21
2
1
n
n
i
i XXXX 






 Square of Sum of observations
Let X and Y are random variables and a is a constant then
ana
XaaX
n
i
n
i
i
n
i
i
.
1
11






 
  





n
i
i
n
i
i
n
i
ii
n
i
i
n
i
i
YXYX
anXaX
111
11
.


16
 
nn
n
i
ii
n
i
i
n
i
i
n
i
ii
YXYXYXYX
YbXabYaX
.... 2211
1
111








 





 n
i
i
n
i
i XX
1
2
2
1
Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
2
c- 
n
i
iX
1
2 d-  

n
i
iX
1
3
Solution:
    11213)4(333)
26)13.(222)
511534
)
131534
)
4
1
4
11
4
11
2222
2
4
2
3
2
2
2
1
4
1
2
1
2
4321
4
11














i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
i
i
n
i
i
XXXd
XXc
XXXXXXb
XXXXXXa
17
2- Pie Notation  )(
The symbol 
n
i
iX
1
is used to multiplication of all values of Xi’s, or:
n
n
i
i XXXX .. 21
1








n
i
i
n
n
i
i
n
i
n
XaaX
aa
11
1
.
Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following:
a- 
n
i
iX
1
b- 
n
i
iX
1
5
Solution:
1203*5*2*4
...) 4321
4
11

  
XXXXXXa
i
i
n
i
i
b)
75000)120.(5
.55
4
11

  
n
i
i
n
n
i
i XX
Exercise: If
Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following:
a- 
4
1
2
i
iX b- 
4
1
3
i
iY c- 2
4
1
. i
i
i YX
d-  

n
i
ii YX
1
e- 4
4
1
66 i
f- 
4
1i
iX g- 
n
i
iY
1
4 h- i
n
i
i YX .2
1

j- 
4
1i i
i
Y
X
k-   2.3
4
1

i
i
i YX
18
SECTION 4: MEASURES OF CENTRAL TENDENCY
In the previous sections, we have studied how to collect raw data, its classification
and tabulation in a useful form, which contributes in solving many problems of
statistical concern. Yet, this is not sufficient, for in practical purposes, there is
need for further condensation, particularly when we want to compare two or more
different distributions. We may reduce the entire distribution to one number
which represents the distribution.
A single value which can be considered as typical or representative of a set of
observations and around which the observations can be considered as Centered is
called an ’Average’ (or average value) or a Center of location. Since such typical
values tend to lie centrally within a set of observations when arranged according
to magnitudes, averages are called measures of central tendency.
So the measure of central tendency is a value at the center or middle of a data set.
This value represents all data of the group.
The fundamental measures of tendencies are:
(1) Arithmetic Mean
(2) Weighted Mean
(3) Harmonic Mean
(4) Quadratic Mean
(5) Mode
(6) Median
However the most common measures of central tendencies or locations are:
Arithmetic mean, median and mode.
19
1)Arithmetic Mean
The arithmetic mean (generally called mean) is the sum of all observations
(values of all items) together and divides this sum by the number of observations
(or items). The symbol X (pronounced as X bar) represents the sample mean and
 represents the population mean.
Arithmetic mean for ungrouped data
Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the
Arithmetic mean is obviously:
n
XXXX
n
X
X n
n
i
i


 3211
Where: Xi = the ith
observation.
n = the size of the data.
The mean for a population consisting N observations is:
N
XXXX
N
X
N
N
i
i


 3211

Example: Calculate the arithmetic mean of the given values:
98 96 95 98 100 92 96 69
Solution:
93
8
699692100989596981




n
X
X
n
i
i
21
Arithmetic mean for grouped data:
The arithmetic mean of grouped data is found by multiplying every midpoints (i.e.
value of x) by its corresponding frequency (fi) then their total (sum) is found
 ii xf . , and then dividing this sum by the  if .


i
ii
f
xf
X
.
The above formula will be sample data. Similar formulas are used for population data.
Example: Determine the mean for the following set of data.
Classes Frequency
8 - 2
10 3
12 5
14 4
16 1
Solution:


i
ii
f
xf
X
.
Classes Frequency (fi) Midpoint (xi) fi . xi
8- 2 9 18
10- 3 11 33
12- 5 13 65
14- 4 15 60
16- 1 17 17
Total 15 193
87.12
15
193
X
21
The Properties of the Arithmetic Mean:
1- The sum of the deviations, of all the values of x, from their mean, is zero.
0)(
:
)(
1
1
1
1 1






 



 
XnXnXX
then
XXn
n
X
Xhavewe
XnXXX
n
i
i
n
i
i
n
i
i
n
i
n
i
ii
2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn
observations respectively, the mean of the groups combined is:




 k
i
i
k
i
ii
n
Xn
X
1
1
.
3- The sum of squares of the deviations from the mean is smaller than from
any other value. (prove this property)
Advantage (merits) of Arithmetic mean
1- It is easy to calculate and simple to understand.
2- It is very popular (most widely used).
3- It is based on all the observations; so that it becomes a good representative.
Disadvantage (demerits) of Arithmetic mean
1- It is affected by outliers or extreme values.
2- It cannot be obtained if a single observation is missing or lost;
3- It cannot be calculated in case open-frequency distributions.
4- It cannot be computed for qualitative data.
22
2) Weighted Arithmetic Mean:
One of the limitations of the arithmetic mean is that it gives equal importance
to all the items. But there are cases where the relative importance of the different
items is not the same. When this is so, we compute weighted arithmetic mean.
The formula for computing weighted arithmetic mean in case of ungrouped data
is:
WWW
XWXWXW
W
XW
n
nn
n
i
i
i
n
i
i
WX









21
2211
1
1
Where, Wi is the weight of ith
observation.
The formula for computing weighted arithmetic mean in case of grouped data is:
nn
nnn
n
i
ii
i
n
i
i
W
fff
ff
f
i
X
WWW
xWXWXfW
W
XfW









2211
22211
1
1 1
Example: The marks of a student in the final examination of Statistics department
are as follows:
Subjects (Xi): 98 96 95 98 100 92 96 69
Units (Wi): 2 3 3 1 3 3 2 2
Calculate the weighted mean.
Solution:
3158.93
19
1773
22331332
)2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98(
1
1









W
n
i
i
i
n
i
i
W
X
X
W
XW
23
Remark: If all the weights are equal, then the weighted mean is the same as the
arithmetic mean.
Exercise1: The average marks of three groups of students having 70, 50 and 30
students respectively are 50, 55 and 45. Find the average marks of all the 150
students, taken together.
Exercise2: following frequency distribution showing the marks obtained by 50
students in statistics at Soran institute. Find the arithmetic mean.
Classes Frequency (fi)
20 - 29 1
30 - 39 5
40 - 49 12
50 - 59 15
60 - 69 9
70 - 79 6
80 - 89 2
Exercise3: The mean of a certain number of observations is 40. If two items with
values 50 and 64 are added to this data, the mean rises to 42. Find the number of
items in the original data.
Exercise4: If 

n
i
iX
1
72)4( and 

n
i
iX
1
3)7( , then find the number of
observation (n).
24
3) Harmonic Mean
Harmonic mean is one of the measures of central tendency, which are used less
than other measures (mean, median and mode).
The formula for computing weighted arithmetic mean in case of ungrouped data
is:

 n
i i
h
X
n
X
1
1
And for grouped data is:



 n
i i
i
i
h
X
f
f
X
1
Example: calculate the harmonic mean for the following data:
Xi: 8 2 5 3 4 7 8
Solution:

 n
i i
h
X
n
X
1
1
:
1
iX
0.13 0.5 0.2 0.33 0.25 0.14 0.13
167.4
68.1
7
68.1
1
 h
i
X
X
4) Quadratic mean
n
X
X
n
i
i
q

 1
2
for Ungrouped data




 n
i
i
n
i
ii
q
f
Xf
X
1
1
2
for grouped data
25
5) MODE
The mode (Mo) is the value that occurs most often in a data set.
Mode for ungrouped data:
The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5
because it is repeated more than other numbers (6 times).
Remark: When 2 numbers occur with the same greatest frequency, each one is
mode and the data set is bimodal. When more than 2 numbers occur with the same
greatest frequency, each is a mode and the data set is said to be multimodal. When
no number is repeated, we say that there is no mode.
Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5.
Solution: Number 5 and 7 are both modes. The data set is bimodal.
Mode for grouped data:
Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …,
fn) represent the frequencies. The modal class is that class which has the highest
frequency. The formula of obtaining the mode is as follows:
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
Where:
Lk: lower limit of modal class.
fk: modal class frequency
fk-1: frequency of previous class
fk+1: frequency of next class
Wk: Size of modal class interval (class width).
26
Example: Find the mode for the following frequency distribution:
Solution:
Modal class is 30 – 39 because it has a highest frequency (10).
Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10
k
kkkk
kk
k W
ffff
ff
LMo 





)()(
)(
11
1
3610
5
3
30
10
)810()710(
)710(
30




Mo
Remark1: If there are 2 or more modal classes; therefore, to find the model class
we must use assembly method.
Remark2: When we use assembly method, the formula of mode will be:
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
Remark3: If the widths of the classes are not equal, in this case adjusted
frequency must be used instead of real frequency. Where adjusted frequency for
each class is equal to
i
i
W
f
.
Class frequency
10 – 19 5
20 – 29 7
30 – 39 10
40 – 49 8
50 – 59 4
60 – 69 3
70 – 79 1
27
Example: Find the mode for the following frequency distribution:
Solution:
There are 2 modal classes, therefore, to find the model class we must use
assembly method and it is as follows:
From the previous table we can abstract the following table:
Serial No.
Of column
Greatest frequency
appears in the column
Contributor
Class
1 4 1, 2
2 8 1, 2
3 7 2, 3
4 11 1, 2, 3
5 9 2, 3, 4
Then the 2nd
class is the modal class
Class frequency
10 – 19 4
20 – 29 4
30 – 39 3
40 – 49 2
50 – 59 3
60 – 69 3
70 – 79 1
Class frequency
1st
assembly 2nd assembly 3rd
assembly
4th
assembly
10 – 19 4
8
1120 – 29 4
7
930 – 39 3
5
40 – 49 2
5
850 – 59 3
6
760 – 69 3
4
70 – 79 1
28
Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10
k
kkkk
kk
k W
ffff
ff
LMo 





11
1)(
2010
3444
)44(
20 


Mo
Advantage of Mode
1- It is easy to calculate.
2- It is not affected by extreme values.
3- It can be used for qualitative data.
4- It can be located graphically (Histogram).
5- It can be calculated for distributions with open end classes.
Disadvantage of Mode
1- It is not based upon all the observations.
2- It is not always possible to find a clearly defined mode (2 modes or 3
modes).
3- It is not capable of further mathematical treatment.
Exercise: Find the mode for the following frequency distributions:
Class frequency Class frequency
5 – 2 10 – 30
10 – 6 20 – 12
15 – 10 30 – 16
25 – 22 40 – 28
35 – 27 50 – 26
50 – 60 11 60 – 14
29
6) MEDIAN
The Median (Me) is the value of the middle item in a data set and divides the
dataset in to two equal parts, one part comprising all values greater and the other
all values smaller than the median
Median for ungrouped data:
In the first step we will arrange the data in ascending (increasing) order.
If number of observations (n) is odd, the median is the observation that has





 
2
1n
order.
If number of observations (n) is even, then the median is the average of
observations that have order 





2
n
and 





1
2
n
.
Example: Find the median of the following data set:
55, 62, 53, 70, 68, 65, 63, 79, and 80.
Solution:
Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80.
Since n=9 is odd, then the order of median is 




 
2
1n
5
2
19
2
1





 





 n
Then the 5th
observation is the value of median or Me=65.
Example: Find the median of the following data set:
20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25.
Solution:
Arranging the data in increasing order
18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30
2366
2
12
2
isvalueththe
n












31
25771
2
12
1
2
isvalueththe
n













Then:
24
2
2523


Me
Median for grouped data:
To find the median of a frequency distribution, follow these steps:
Step1: Find cumulative frequency (Ascending or descending).
Step2: Compute the median order that equal to
2
 if
.
Step3: If k
i
k F
f
F 


2
1 , then the median class is the class which its order is K .
Step4: Compute the value of median:
k
k
k
i
k
f
W
F
f
LMe .
2
1 







 
 for ascending cumulative frequency.
k
ki
kk
f
Wf
FLMe .
2
*









 for descending cumulative frequency.
Where:
Lk : Lower Limit of median class.
fk : Frequency of the median class.
W: Median class’s width.
 if : Sum of the frequencies.
Fk–1: Ascending cumulative frequency precede the median class.
*
kF : Descending cumulative frequency of the median class.
31
Example: Find the mode for the following frequency distribution:
Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 -
no. of families 3 7 14 20 18 12 6
Solution:
In the first step we find ascending cumulative frequency
Then we find the median order that equal to:
40
2
80
2

 if
Compare the median order with ascending cumulative frequency then:
444024
2
1 

 k
i
k F
f
F Then the median class is 4th
class.
Then:
Lk=160, Wk=20, fk=20
4
4
34 .
2 f
W
F
f
LMe
i










176
20
20
.24
2
80
160 





Me
Class frequency
Ascending Cumulative
frequency
100 - 3 3
120 - 7 10
140 - 14 24
160 - 20 44
180 - 18 62
200 - 12 74
220 - 6 80
Total 80
32
Merits of Median
1. It is easy to calculate and understand.
2. It is not affected by extreme values like the arithmetic mean
3. It can be found by mere inspection.
4. It can be used for qualitative studies.
5. It can be calculated for distributions with open-end classes.
6. It can be obtained graphically.
Demerits of Median
1. It is not capable of further algebraic treatment.
2. It is not based on all observations.
Exercise: find the median for the following frequency distribution by using
ascending and descending cumulative frequency.
The relationship between Arithmetic Mean, Median and Mode
If the frequency distribution is symmetric then the following relationship between
these measures is true:
3
o
e
MX
MX


Class frequency
18 - 10
28 - 15
36 - 18
50 - 22
70 - 20
100 - 18
130 - 150 13
Total
33
SECTION 5) Measures of Dispersion (Variation)
Measures that describe the spread of a data set are called measures of dispersion.
The main objective is to know the homogeneity of the values for a data set, or to
compare between the values for two or more than two data set.
1-Range
The simplest measure of absolute variation is the range which calculated by
subtracting the smallest value from the largest value of a data set.
R=Largest value – Smallest value
Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15.
Solution:
R= Largest value – Smallest value=15-2=13
Remark: in case of grouped data we calculate the value of Range by subtracting
the lower limit of first class from the upper limit of last class.
2- Mean Deviation
It is the sum of the absolute deviation of observations from a point (A) divided by
the number of observations.
n
AX
DM
n
i
i

 1
. for ungrouped data
n
AXf
DM
n
i
ii

 1
. for grouped data
Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ).
Example: find the value of mean deviation for the following data by using mean,
median and mode.
Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19
Solution:
34
First we find the value of ( X ) and ( eM ) and ( oM ).
X =8, eM =6, oM =5
Xi XXi  oi MX  ei MX 
2 6 3 4
3 5 2 3
4 4 1 2
5 3 0 1
5 3 0 1
6 2 1 0
7 1 2 1
10 2 5 4
13 5 8 7
14 6 9 8
19 11 14 13
Total 48 44 45
367.4
11
48
)(. 1




n
XX
XDM
n
i
i
0909.4
11
45
)(. 1




n
MX
MDM
n
i
ei
e
4
11
44
)(. 1




n
MX
MDM
n
i
oi
o
35
3- Variance
It is one of the most important measures of absolute variation. The variance can
be calculated by taking the average of the square of the distance (deviation) of
each observation from the mean of data set.
The formula for the population variance ( ) for raw data is:
N
X
n
i
i

 1
2
2
)( 

Where:
X: individual value
µ: population mean
N: population size (number of observations).
Also the formula for the sample variance (S2
) for raw data is as follows:
1
)(
1
2
2




n
XX
S
n
i
i
On the other hand, the formula for the sample variance for grouped data is:
1
)(
1
2
2




n
XXf
S
n
i
ii
Where  ifn
Example: find the variance for the following dataset:
56, 68, 72, 63, 65, 68, 71, 69, 62, 56.
Solution:
1
)(
1
2
2




n
XX
S
n
i
i
65
10
650
10
10
1

i
iX
X
36
Xi )( XXi  2
)( XXi 
56 -9 81
68 3 9
72 7 49
63 -2 4
65 0 0
68 3 9
71 6 36
69 4 16
62 -3 9
56 -9 81
Total 294
then
667.32
110
2942


S
Properties of variance:
1) 02
S
2) If 222
XYii SaSaXY  , where a is a constant. (Prove that)
3) If 22
XYii SSbXY  , where b is a constant. (Prove that)
4) If X and Y are independent variables and iii YX=Z  , then the variance of Z
is:
222
YXZ SSS 
5) If ),...,,( 22
2
2
1 nSSS represent the variance for k groups based on ),...,,( 21 knnn
observations respectively, then the pooled variance of the groups is as follows:






 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
)1(
)1(
where 30in




 n
i
i
n
i
ii
p
n
Sn
S
1
1
2
2
.
where 30in
37
4-Standard deviation (S)
Standard deviation is the most important and most widely used measure of
absolute variation. Standard deviation is the square root of variance.
1
)(
1
2
2




n
XX
SS
n
i
i
Example: Find the standard deviation of the following frequency distribution.
Solution:
75.175
80
14060
.
1
1





n
i
i
n
i
ii
f
Xf
X
198.30
80
72955
).(
1
2




n
XXf
S
n
i
ii
Class fi Xi
fi.Xi )( XXi  2
)( XXi  2
).( XXf ii 
100 - 3 110 330 -65.75 4323.063 12969.19
120 - 7 130 910 -45.75 2093.063 14651.44
140 - 14 150 2100 -25.75 663.0625 9282.875
160 - 20 170 3400 -5.75 33.0625 661.25
180 - 18 190 3420 14.25 203.0625 3655.125
200 - 12 210 2520 34.25 1173.063 14076.75
220 - 6 230 1380 54.25 2943.063 17658.38
Total 80 14060 72955
38
Coefficient of Variation
A disadvantage of the standard deviation as a comparative measure of variation is
that it depends on the units of measurement. This means that it is difficult to use
the standard deviation to compare measurements from different populations. For
this reason, statisticians have defined the coefficient of variation, which expresses
the standard deviation as a percentage of the sample or population mean.
If X and S represents the sample mean and the sample standard deviation, then
the coefficient of variation (C.V.) is defined to be:
100*..
X
S
VC 
If μ and σ represent the population mean and standard deviation, then the
coefficient of variation CV is defined to be:
100*..


VC
Notice that the numerator and denominator in the definition of CV have the same
units, so CV itself has no units of measurement. This gives us the advantage of
being able to directly compare the variability of two different populations using
the coefficient of variation.
Example1: A company has two sections (A and B) with 40 and 65 employees
respectively. Their average weekly wages are $450 and $350. The standard
deviations are 7 and 9. Which section has larger variability in wages?
Solution:
55.1100*
450
7
100*.. )( 
X
S
VC A
57.2100*
350
9
100*.. )( 
X
S
VC B
Because the C.V for section A is smaller than C.V for section B then, section B
has larger variability. So section A has more homogeneity than section B.
39
Example2: if we know that the mean and standard deviation of heights and
weights of 40 students are as below:
Mean Standard Deviation
Weights 68.34 3.02
Heights 172.55 26.33
Then find the coefficient of variation of height and weight and compare the
results.
Solution:
42.4100*
34.68
02.3
100*. )Weights( 
X
S
VC
26.15100*
55.172
33.26
100*. )( 
X
S
VC Height
So, the Weights (with C.V. =4.42) have less variation than Heights (with
C.V.=15.26).

More Related Content

PDF
Principlles of statistics [amar mamusta amir]
PPT
Chapter 3 260110 044503
PPT
Chapter 2 250110 083240
PPTX
Lesson 5 data presentation
PPTX
CABT SHS Statistics & Probability - Sampling Distribution of Means
PPT
The sampling distribution
PPTX
Measurement and descriptive statistics
PPT
Source of DATA
Principlles of statistics [amar mamusta amir]
Chapter 3 260110 044503
Chapter 2 250110 083240
Lesson 5 data presentation
CABT SHS Statistics & Probability - Sampling Distribution of Means
The sampling distribution
Measurement and descriptive statistics
Source of DATA

What's hot (18)

PPTX
Stratified Random Sampling - Problems
PDF
Lesson 1 07 measures of variation
PPTX
Data array and frequency distribution
PPTX
Statistics
PPTX
2.1 frequency distributions for organizing and summarizing data
PPTX
SAMPLING and SAMPLING DISTRIBUTION
PPTX
Sampling techniques new
ODP
Basic concepts of statistics
PPT
CLASSIFICATION AND TABULATION in Biostatic
PPTX
Stat 3203 -cluster and multi-stage sampling
PPTX
Chap06 sampling and sampling distributions
PPT
Business Statistics
PPTX
Presentation of data
ODP
QT1 - 02 - Frequency Distribution
PPT
Probability and statistics(assign 7 and 8)
PPTX
Sampling distribution
PDF
Sampling Distribution and Simulation in R
PDF
I. central tendency
Stratified Random Sampling - Problems
Lesson 1 07 measures of variation
Data array and frequency distribution
Statistics
2.1 frequency distributions for organizing and summarizing data
SAMPLING and SAMPLING DISTRIBUTION
Sampling techniques new
Basic concepts of statistics
CLASSIFICATION AND TABULATION in Biostatic
Stat 3203 -cluster and multi-stage sampling
Chap06 sampling and sampling distributions
Business Statistics
Presentation of data
QT1 - 02 - Frequency Distribution
Probability and statistics(assign 7 and 8)
Sampling distribution
Sampling Distribution and Simulation in R
I. central tendency
Ad

Similar to Principlles of statistics (20)

PPTX
Tabulation of Data, Frequency Distribution, Contingency table
PPTX
Statistics and prob.
PPTX
Biostats in ortho
PPTX
Statistics and prob.
PDF
iSTAT1-The Frequency Distribution_Relative Frequency_Cummulative.pdf
PPTX
lesson-data-presentation-tools-1.pptx
PDF
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
PDF
BIOSTATICS & RESEARCH METHODOLOGY UNIT-1.pdf
PDF
Engineering Statistics
PPTX
Classification and tabulation of data
PDF
first lecture to elementary statistcs
PDF
Biostatistics FOR NURSING 1.docx.pdf
PPT
Descriptive statistics and sampling Methods ).ppt
DOC
Qt notes
PPT
Statistics final seminar
PPTX
Frequency distribution 6
PPT
RM chapter-4 (3).ppt measurements and descriptive
PPT
1) Chapter#02 Presentation of Data.ppt
DOC
Ch 3 DATA.doc
PDF
Chapter 4 MMW.pdf
Tabulation of Data, Frequency Distribution, Contingency table
Statistics and prob.
Biostats in ortho
Statistics and prob.
iSTAT1-The Frequency Distribution_Relative Frequency_Cummulative.pdf
lesson-data-presentation-tools-1.pptx
STATISTICS-AND-PROBABLITY-A-REVIEW-FOR-SHS.pdf
BIOSTATICS & RESEARCH METHODOLOGY UNIT-1.pdf
Engineering Statistics
Classification and tabulation of data
first lecture to elementary statistcs
Biostatistics FOR NURSING 1.docx.pdf
Descriptive statistics and sampling Methods ).ppt
Qt notes
Statistics final seminar
Frequency distribution 6
RM chapter-4 (3).ppt measurements and descriptive
1) Chapter#02 Presentation of Data.ppt
Ch 3 DATA.doc
Chapter 4 MMW.pdf
Ad

Recently uploaded (20)

PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPTX
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
PPT
Mutation in dna of bacteria and repairss
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PDF
CuO Nps photocatalysts 15156456551564161
PPTX
PMR- PPT.pptx for students and doctors tt
PDF
Chapter 3 - Human Development Poweroint presentation
PDF
Packaging materials of fruits and vegetables
PDF
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
PDF
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPT
Animal tissues, epithelial, muscle, connective, nervous tissue
PDF
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
PPTX
perinatal infections 2-171220190027.pptx
PPTX
A powerpoint on colorectal cancer with brief background
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Substance Disorders- part different drugs change body
PDF
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
PPTX
Introcution to Microbes Burton's Biology for the Health
PPTX
2currentelectricity1-201006102815 (1).pptx
Presentation1 INTRODUCTION TO ENZYMES.pptx
LIPID & AMINO ACID METABOLISM UNIT-III, B PHARM II SEMESTER
Mutation in dna of bacteria and repairss
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
CuO Nps photocatalysts 15156456551564161
PMR- PPT.pptx for students and doctors tt
Chapter 3 - Human Development Poweroint presentation
Packaging materials of fruits and vegetables
Communicating Health Policies to Diverse Populations (www.kiu.ac.ug)
Integrative Oncology: Merging Conventional and Alternative Approaches (www.k...
Enhancing Laboratory Quality Through ISO 15189 Compliance
Animal tissues, epithelial, muscle, connective, nervous tissue
The Future of Telehealth: Engineering New Platforms for Care (www.kiu.ac.ug)
perinatal infections 2-171220190027.pptx
A powerpoint on colorectal cancer with brief background
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Substance Disorders- part different drugs change body
GROUP 2 ORIGINAL PPT. pdf Hhfiwhwifhww0ojuwoadwsfjofjwsofjw
Introcution to Microbes Burton's Biology for the Health
2currentelectricity1-201006102815 (1).pptx

Principlles of statistics

  • 1. 1 SECTION 1 Statistics: is the science of obtaining data, organizing, summarizing, and presenting, analyzing, interpreting and drawing conclusions based on the data to give the best decision. Statistics divided in to two distinct parts: 1- Descriptive Statistics: It is concerned only with the collection, organization, summarizing, analysis and presentation of an array of numerical qualitative or quantitative data. Descriptive statistics include the mean, median, mode, standard deviation, range, etc. 2- Inferential Statistics: it is consist of methods for drawing conclusion based on the data to give the best decision. Its divide in to two parts also: A- Estimation B- Testing Hypothesis Population: Is the complete collection of all elements to be studied. Finite (countable) Population: A population is called finite if it is possible to count its individuals. For example, the number of students in Shaqlawa technical institute or number of computers in a libratory. Infinite (uncountable) Population: A population is called infinite if it is impossible to count its individuals, for example the number of bacteria's in a garden, number of fishes in a sea. Census: is the collection of data from every elements of population. Sample: is a sub- collection of elements drawn from a population. Sampling: the process of selecting a subset of data from the population is called Sampling.
  • 2. 2 Sources of collecting the data: 1- Historical Sources 2- Field Sources Probability (Random) Samples are drawn from populations through several different sampling methods: 1- Simple Random Sampling Every member of the population (N) has an equal chance of being selected for your sample (n). This is arguably the best sampling method, as your samples almost guaranteed to be representative of your population. However, it is rarely ever used due to being too impractical. 2- Systematic Sampling In this method, every nth individual from the population (N) is placed in the sample (n). For example, if you add every 7th individual to walk out of a supermarket to your sample, you are performing systematic sampling. 3- Stratified Sampling A general problem with random sampling is that you could, by chance, miss out a particular group in the sample. However, if you form the population into groups, and sample from each group, you can make sure the sample is representative. In METHODS OF COLLECTING THE DATA SAMPLES PROBABILITY (RANDOM) NON PROBABLITY CENSUS
  • 3. 3 stratified sampling, the population is divided into groups called strata. A sample is then drawn from within these strata. Some examples of strata commonly used by the ABS are States, Age and Sex. Other strata may be religion, academic ability or marital status. 4- MULTI-STAGE SAMPLING Multi-stage sampling is like cluster sampling, but involves selecting a sample within each chosen cluster, rather than including all units in the cluster. Thus, multi-stage sampling involves selecting a sample in at least two stages. In the first stage, large groups or clusters are selected. These clusters are designed to contain more population units than are required for the final sample. In the second stage, population units are chosen from selected clusters to derive a final sample. If more than two stages are used, the process of choosing population units within clusters continues until the final sample is achieved. Variable: is a characteristic or property of the elements in the population. The name of variable is derived from the fact that any particular characteristic may vary among the elements in a population. Variables Quantitative variables Descrete variables (Number of students) Continuous variables (Hieght, Weight) Qualitative (descriptive) variables
  • 4. 4 Section 2 Frequency Distribution (Table): After a researcher might have gotten a raw data from any source, there is a need for the raw data (ungrouped) to be arranged and organized in a meaningful way in order to be able to describe and come up with a useful inference. The method that is being used for such organization and arrangement is called frequency distribution. Frequency means the number of times something happens. Frequency distribution simply means organizing of raw data in table from using classes and frequencies. 1- Frequency Distribution for Qualitative variables: Frequency Distribution for Qualitative variables lists all classes and the number of elements that belong to each of the classes. Example1: the following list gives the rank of a sample that consists of 25 clerks in Soran institute: Researcher, Assistant Researcher, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Researcher, Assistant Researcher, Researcher, Assistant Researcher, Assistant lecturer, Assistant Researcher, Lecturer, Assistant Researcher, Assistant lecturer, Assistant lecturer, Researcher, Lecturer, Assistant Researcher, Assistant Researcher, Assistant Researcher, Researcher. Create a frequency distribution for the above data. Solution: FrequencyClasses (rank) 4Lecturer 5Assistant lecturer 6Researcher 10Assistant Researcher 25Total
  • 5. 5 Relative Frequency of a Class: The relative frequency of a class is obtained by dividing the frequency of class by the sum of the all frequencies. Example 2: depending on the previous example, calculate the relative frequency. Solution: Relative FrequencyFrequencyClasses (rank) 4/25=0.164Lecturer 5/25=0.25Assistant lecturer 6/25=0.246Researcher 10/25=0.410Assistant Researcher 125Total 2- Frequency Distribution for Quantitative variables Total Range (T.R): is equal to highest value minus lowest value in the data set. Number of classes: the appropriate number of classes may be decided by Yules formula which is as follows: Number of classes= where n is the total number of observation. Class Width= T.R/ No. of classes Class Width (Length) is the difference between two consecutive lower class limit or two consecutive lower class boundaries. The class width can be found by the following formula: Frequency (F): is the number of values in a specific class of the distribution. 4 n2.5
  • 6. 6 A- Frequency Distribution for Discrete variables: The lower and upper limits of the frequency distribution of discrete variables are as below: frequency Class Upper limitLower limit f1Xs+W-1Xs f2Xs+2W-1Xs+W f3Xs+3W-1Xs+2W . . . . . . fmXs+M.W-1Xs+(M-1)W Where: Xs: the lowest value W: class width M: number of classes Example3: Construct the frequency distribution for the following data: 60 76 80 120 132 82 90 65 68 142 157 164 88 90 98 101 103 110 119 116 120 126 109 114 120 122 111 116 90 78 93 95 98 104 120 113 121 119 125 126 130 131 136 118 120 142 150 154 122 123 139 125 106 154 136 137 110 137 72 150 Total Range (T.R) = 164-60=104 Number of Classes (M) = 2.5(2.783) = 6.958 = 7 Length of Classes (L) = 104/7=14.86 = 15
  • 7. 7 Class Frequency Midpoint Relative Frequency 60 - 74 4 =(60+74)/2= 67 =4/60 = 0.067 75 – 89 5 =(75+89)/2= 82 =5/60 = 0.083 90 – 104 10 97 =10/60 = 0.167 105 – 119 12 112 =12/60 = 0.200 120 – 134 16 127 =16/60 = 0.267 135 – 149 7 142 =7/60 = 0.117 150 – 164 6 157 =6/60 = 0.100 Total 60 1 B- Frequency Distribution for continuous variables: The lower and upper limits of the frequency distribution of continuous variables are as below: frequency Class Upper limitLower limit f1Xs+WXs f2Xs+2WXs+W f3Xs+3WXs+2W . . . . . . fmXs+M.WXs+(M-1)W Example4: construct a frequency distribution for below data: 1.3 4.1 5.7 6.5 7.9 10.4 2 4.2 5.7 6.5 8.2 8.3 6.8 5.7 4.3 10.4 2.1 2.8 4.3 10.8 5.8 6.9 8.3 8.4 7 11.3 5.8 4.7 3.3 3.3 4.8 5.9 7 8.9 9.1 7.3 6 5.1 3.5 3.7 5.1 6.2 7.6 9.2 9.7 7.8 6.4 5.3 6.4 7.9
  • 8. 8 Cumulative Frequency Distribution A- Ascending Cumulative Frequency Distribution Ascending Cumulative Frequency Distribution is the total frequency of all values less than the upper class boundary of a given class interval. Example5: Construct an Ascending Cumulative Frequency Distribution depending on the example 3. Classes Frequency Upper Limit of Class Ascending Cumulative Frequency 60 - 74 4 74 Less than or equal to 74= 4 75 – 89 5 89 Less than or equal to 89= 9 90 – 104 10 104 Less than or equal to 104= 19 105 – 119 12 119 Less than or equal to 119= 31 120 – 134 16 134 Less than or equal to 134= 47 135 – 149 7 149 Less than or equal to 149= 54 150 – 164 6 164 Less than or equal to 164= 60 Total 60 B- Descending Cumulative Frequency Distribution Descending Cumulative Frequency Distribution is the total frequency of all values Greater than the lower class boundary of a given class interval. Example6: Construct a descending cumulative frequency distribution depending on the example 4. Classes Frequency Lower Limit of Class Descending Cumulative Freq. 0 - 2 1 0 Greater than or equal to 0= 50 2 - 4 7 2 Greater than or equal to 2= 49 4 - 6 15 4 Greater than or equal to 4= 42 6 - 8 15 6 Greater than or equal to 6= 27 8 - 10 8 8 Greater than or equal to 8= 12 10 - 12 4 10 Greater than or equal to 10= 4 Total 50
  • 9. 9 Charts The graphical presentation of statistical data is using statistical charts. There are several kinds of charts for representing set of data, such as: Bar- Charts A bar chart is a chart composed of bars whose heights are the frequencies of the different classes. (Qualitative Variables) Example7: Display the below data as a bar chart. Red, Green, Green, Green, Blue, Blue, Red, Blue, Green, Green, Red, Red, Blue, Green, Red, Red Solution: In the first step we will create a frequency table for this data: Color Frequency red 6 green 6 blue 4 Then we use this table for creating a bar chart 0 1 2 3 4 5 6 7 red green blue Frequency Color
  • 10. 11 Histogram A histogram is similar to bar charts, but it is used for representing the quantitative variable rather than qualitative variables. Example8: Draw a histogram for the following frequency distribution. Classes Frequency 60 - 74 4 75 – 89 5 90 – 104 10 105 – 119 12 120 – 134 16 135 – 149 7 150 – 164 6 Total 60 Solution: 0 2 4 6 8 10 12 14 16 18 60 - 74 75 – 89 90 – 104 105 – 119 120 – 134 135 – 149 150 – 164 Frequency Classes
  • 11. 11 Pie Chart A pie chart is a circle divided into sectors, where each sector represents a category (relative frequency of each class) of data that is proportional to the total amount of data collected. We can calculate the angle size of each class by the following rule: Angle size of class= relative of the class X 360o Example9: Draw a pie chart for the data in example 1. Angle SizeRelative FrequencyFrequencyClass 0.16*360=57.64/25=0.164Lecturer 0.2*360=725/25=0.25Assistant lecturer 0.24*360=86.46/25=0.246Researcher 0.4*360=14410/25=0.411Assistant Researcher 360125Total Lecturer 57.6o Assistant lecturer 72o Researcher 86.4 Assistant Researcher 1440
  • 12. 12 Frequency Polygon It is a chart that displays the data by using lines that connect points plotted for the frequencies at the midpoints of the classes. Example10: draw a frequency distribution for the frequency distribution in example3. Frequency Curve Frequency curve is like a frequency polygon, but there is one difference between them, instead of using lines to connect midpoints a smooth curve will be used. Example 11: draw a frequency curve for the data in example 4. 0 2 4 6 8 10 12 14 16 18 67 82 97 112 127 142 157 Frequency Midpoints 1 7 15 15 8 4 0 2 4 6 8 10 12 14 16 18 0 2 4 6 8 10 12 Frequency Midpoints 1 3 5 7 9 11
  • 13. 13 Cumulative Frequency Chart It is a chart that represents the cumulative frequencies of classes in frequency distribution. Example 12: Construct an ascending cumulative frequency chart for the data in example 4. Example13: Construct a descending cumulative frequency chart for the data in example 4. 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Upper Limit of classes 2 4 6 8 10 12 0 10 20 30 40 50 60 1 2 3 4 5 6 Cumulativefrequency Lower Limit of Classes 2 4 6 8 10 12
  • 14. 14 Exercise 1: complete the following frequency distributions if the widths of classes are equal. Class Midpoint Class Midpoint 3 8 6 18 Class Midpoint 14 26 Exercise2: the height of 35 students were noted and shown as follows: 170 180 175 165 160 155 180 190 185 170 174 178 165 169 186 179 161 171 159 168 177 164 191 140 173 181 177 173 166 162 168 184 168 158 155 Find the following: 1- Frequency distribution 2- Midpoints 3- Descending cumulative frequency 4- Relative frequency And draw: a) Histogram b) frequency polygon
  • 15. 15 SECTION 3 Notations In this section we will represent some useful notations before explaining the subjects that related to measures of central tendency and measures of dispersion (variation). 1- Summation Notation (  ) The symbol n i iX 1 , read as (the summation of X), where n is the number of observations and (i) is the subscript for the order of values. Let X is a variable represent 4 values: 2, 3, 5, and 10. Then the sum of variable X is represent as follow: 2010532 432 1 4 1 1     XXXXXX n i i ii Symbol Operation n n i i XXXX  21 1 Sum of observations 22 2 2 1 1 2 n n i i XXXX   Sum of Square of observations  2 21 2 1 n n i i XXXX         Square of Sum of observations Let X and Y are random variables and a is a constant then ana XaaX n i n i i n i i . 1 11                 n i i n i i n i ii n i i n i i YXYX anXaX 111 11 .  
  • 16. 16   nn n i ii n i i n i i n i ii YXYXYXYX YbXabYaX .... 2211 1 111                 n i i n i i XX 1 2 2 1 Example 1: If Xi represents the following 4, 3, 5 and 1. Find the following: a-  n i iX 1 b-  n i iX 1 2 c-  n i iX 1 2 d-    n i iX 1 3 Solution:     11213)4(333) 26)13.(222) 511534 ) 131534 ) 4 1 4 11 4 11 2222 2 4 2 3 2 2 2 1 4 1 2 1 2 4321 4 11               i i i i n i i i i n i i i i n i i i i n i i XXXd XXc XXXXXXb XXXXXXa
  • 17. 17 2- Pie Notation  )( The symbol  n i iX 1 is used to multiplication of all values of Xi’s, or: n n i i XXXX .. 21 1         n i i n n i i n i n XaaX aa 11 1 . Example 2: If Xi represents the following 4, 2, 5 and 3. Find the following: a-  n i iX 1 b-  n i iX 1 5 Solution: 1203*5*2*4 ...) 4321 4 11     XXXXXXa i i n i i b) 75000)120.(5 .55 4 11     n i i n n i i XX Exercise: If Xi: 5, 3, 4, 2 and Yi: 3, 1, 4, 2 then find the following: a-  4 1 2 i iX b-  4 1 3 i iY c- 2 4 1 . i i i YX d-    n i ii YX 1 e- 4 4 1 66 i f-  4 1i iX g-  n i iY 1 4 h- i n i i YX .2 1  j-  4 1i i i Y X k-   2.3 4 1  i i i YX
  • 18. 18 SECTION 4: MEASURES OF CENTRAL TENDENCY In the previous sections, we have studied how to collect raw data, its classification and tabulation in a useful form, which contributes in solving many problems of statistical concern. Yet, this is not sufficient, for in practical purposes, there is need for further condensation, particularly when we want to compare two or more different distributions. We may reduce the entire distribution to one number which represents the distribution. A single value which can be considered as typical or representative of a set of observations and around which the observations can be considered as Centered is called an ’Average’ (or average value) or a Center of location. Since such typical values tend to lie centrally within a set of observations when arranged according to magnitudes, averages are called measures of central tendency. So the measure of central tendency is a value at the center or middle of a data set. This value represents all data of the group. The fundamental measures of tendencies are: (1) Arithmetic Mean (2) Weighted Mean (3) Harmonic Mean (4) Quadratic Mean (5) Mode (6) Median However the most common measures of central tendencies or locations are: Arithmetic mean, median and mode.
  • 19. 19 1)Arithmetic Mean The arithmetic mean (generally called mean) is the sum of all observations (values of all items) together and divides this sum by the number of observations (or items). The symbol X (pronounced as X bar) represents the sample mean and  represents the population mean. Arithmetic mean for ungrouped data Suppose, we have (n) observations (or measures) X1, X2, X3... Xn then the Arithmetic mean is obviously: n XXXX n X X n n i i    3211 Where: Xi = the ith observation. n = the size of the data. The mean for a population consisting N observations is: N XXXX N X N N i i    3211  Example: Calculate the arithmetic mean of the given values: 98 96 95 98 100 92 96 69 Solution: 93 8 699692100989596981     n X X n i i
  • 20. 21 Arithmetic mean for grouped data: The arithmetic mean of grouped data is found by multiplying every midpoints (i.e. value of x) by its corresponding frequency (fi) then their total (sum) is found  ii xf . , and then dividing this sum by the  if .   i ii f xf X . The above formula will be sample data. Similar formulas are used for population data. Example: Determine the mean for the following set of data. Classes Frequency 8 - 2 10 3 12 5 14 4 16 1 Solution:   i ii f xf X . Classes Frequency (fi) Midpoint (xi) fi . xi 8- 2 9 18 10- 3 11 33 12- 5 13 65 14- 4 15 60 16- 1 17 17 Total 15 193 87.12 15 193 X
  • 21. 21 The Properties of the Arithmetic Mean: 1- The sum of the deviations, of all the values of x, from their mean, is zero. 0)( : )( 1 1 1 1 1              XnXnXX then XXn n X Xhavewe XnXXX n i i n i i n i i n i n i ii 2- If ),...,,( 21 kXXX represent the means for k groups based on ),...,,( 21 knnn observations respectively, the mean of the groups combined is:      k i i k i ii n Xn X 1 1 . 3- The sum of squares of the deviations from the mean is smaller than from any other value. (prove this property) Advantage (merits) of Arithmetic mean 1- It is easy to calculate and simple to understand. 2- It is very popular (most widely used). 3- It is based on all the observations; so that it becomes a good representative. Disadvantage (demerits) of Arithmetic mean 1- It is affected by outliers or extreme values. 2- It cannot be obtained if a single observation is missing or lost; 3- It cannot be calculated in case open-frequency distributions. 4- It cannot be computed for qualitative data.
  • 22. 22 2) Weighted Arithmetic Mean: One of the limitations of the arithmetic mean is that it gives equal importance to all the items. But there are cases where the relative importance of the different items is not the same. When this is so, we compute weighted arithmetic mean. The formula for computing weighted arithmetic mean in case of ungrouped data is: WWW XWXWXW W XW n nn n i i i n i i WX          21 2211 1 1 Where, Wi is the weight of ith observation. The formula for computing weighted arithmetic mean in case of grouped data is: nn nnn n i ii i n i i W fff ff f i X WWW xWXWXfW W XfW          2211 22211 1 1 1 Example: The marks of a student in the final examination of Statistics department are as follows: Subjects (Xi): 98 96 95 98 100 92 96 69 Units (Wi): 2 3 3 1 3 3 2 2 Calculate the weighted mean. Solution: 3158.93 19 1773 22331332 )2*69()2*96()3*92()3*100()1*98()3*95()3*96()2*98( 1 1          W n i i i n i i W X X W XW
  • 23. 23 Remark: If all the weights are equal, then the weighted mean is the same as the arithmetic mean. Exercise1: The average marks of three groups of students having 70, 50 and 30 students respectively are 50, 55 and 45. Find the average marks of all the 150 students, taken together. Exercise2: following frequency distribution showing the marks obtained by 50 students in statistics at Soran institute. Find the arithmetic mean. Classes Frequency (fi) 20 - 29 1 30 - 39 5 40 - 49 12 50 - 59 15 60 - 69 9 70 - 79 6 80 - 89 2 Exercise3: The mean of a certain number of observations is 40. If two items with values 50 and 64 are added to this data, the mean rises to 42. Find the number of items in the original data. Exercise4: If   n i iX 1 72)4( and   n i iX 1 3)7( , then find the number of observation (n).
  • 24. 24 3) Harmonic Mean Harmonic mean is one of the measures of central tendency, which are used less than other measures (mean, median and mode). The formula for computing weighted arithmetic mean in case of ungrouped data is:   n i i h X n X 1 1 And for grouped data is:     n i i i i h X f f X 1 Example: calculate the harmonic mean for the following data: Xi: 8 2 5 3 4 7 8 Solution:   n i i h X n X 1 1 : 1 iX 0.13 0.5 0.2 0.33 0.25 0.14 0.13 167.4 68.1 7 68.1 1  h i X X 4) Quadratic mean n X X n i i q   1 2 for Ungrouped data      n i i n i ii q f Xf X 1 1 2 for grouped data
  • 25. 25 5) MODE The mode (Mo) is the value that occurs most often in a data set. Mode for ungrouped data: The mode of the following data set: 5, 6, 7, 5, 5, 10, 4, 5, 4, 7, 5, 5 is the number 5 because it is repeated more than other numbers (6 times). Remark: When 2 numbers occur with the same greatest frequency, each one is mode and the data set is bimodal. When more than 2 numbers occur with the same greatest frequency, each is a mode and the data set is said to be multimodal. When no number is repeated, we say that there is no mode. Example: Find the mode of the following data set: 5, 7, 6, 7, 5, 7, 5, 10, 4, 4, 7, 5. Solution: Number 5 and 7 are both modes. The data set is bimodal. Mode for grouped data: Let (X1, X2, … Xn) represent the class marks of the class intervals with ( f1, f2, …, fn) represent the frequencies. The modal class is that class which has the highest frequency. The formula of obtaining the mode is as follows: k kkkk kk k W ffff ff LMo       )()( )( 11 1 Where: Lk: lower limit of modal class. fk: modal class frequency fk-1: frequency of previous class fk+1: frequency of next class Wk: Size of modal class interval (class width).
  • 26. 26 Example: Find the mode for the following frequency distribution: Solution: Modal class is 30 – 39 because it has a highest frequency (10). Lk=30, fk=10, fk-1=7, fk+1=8, Wk=10 k kkkk kk k W ffff ff LMo       )()( )( 11 1 3610 5 3 30 10 )810()710( )710( 30     Mo Remark1: If there are 2 or more modal classes; therefore, to find the model class we must use assembly method. Remark2: When we use assembly method, the formula of mode will be: k kkkk kk k W ffff ff LMo       11 1)( Remark3: If the widths of the classes are not equal, in this case adjusted frequency must be used instead of real frequency. Where adjusted frequency for each class is equal to i i W f . Class frequency 10 – 19 5 20 – 29 7 30 – 39 10 40 – 49 8 50 – 59 4 60 – 69 3 70 – 79 1
  • 27. 27 Example: Find the mode for the following frequency distribution: Solution: There are 2 modal classes, therefore, to find the model class we must use assembly method and it is as follows: From the previous table we can abstract the following table: Serial No. Of column Greatest frequency appears in the column Contributor Class 1 4 1, 2 2 8 1, 2 3 7 2, 3 4 11 1, 2, 3 5 9 2, 3, 4 Then the 2nd class is the modal class Class frequency 10 – 19 4 20 – 29 4 30 – 39 3 40 – 49 2 50 – 59 3 60 – 69 3 70 – 79 1 Class frequency 1st assembly 2nd assembly 3rd assembly 4th assembly 10 – 19 4 8 1120 – 29 4 7 930 – 39 3 5 40 – 49 2 5 850 – 59 3 6 760 – 69 3 4 70 – 79 1
  • 28. 28 Lk =20, fk =4, fk-1 =4, fk+1 =3, Wk =10 k kkkk kk k W ffff ff LMo       11 1)( 2010 3444 )44( 20    Mo Advantage of Mode 1- It is easy to calculate. 2- It is not affected by extreme values. 3- It can be used for qualitative data. 4- It can be located graphically (Histogram). 5- It can be calculated for distributions with open end classes. Disadvantage of Mode 1- It is not based upon all the observations. 2- It is not always possible to find a clearly defined mode (2 modes or 3 modes). 3- It is not capable of further mathematical treatment. Exercise: Find the mode for the following frequency distributions: Class frequency Class frequency 5 – 2 10 – 30 10 – 6 20 – 12 15 – 10 30 – 16 25 – 22 40 – 28 35 – 27 50 – 26 50 – 60 11 60 – 14
  • 29. 29 6) MEDIAN The Median (Me) is the value of the middle item in a data set and divides the dataset in to two equal parts, one part comprising all values greater and the other all values smaller than the median Median for ungrouped data: In the first step we will arrange the data in ascending (increasing) order. If number of observations (n) is odd, the median is the observation that has        2 1n order. If number of observations (n) is even, then the median is the average of observations that have order       2 n and       1 2 n . Example: Find the median of the following data set: 55, 62, 53, 70, 68, 65, 63, 79, and 80. Solution: Arrange the data increasingly: 53, 55, 62, 63, 65, 68, 70, 79, 80. Since n=9 is odd, then the order of median is        2 1n 5 2 19 2 1              n Then the 5th observation is the value of median or Me=65. Example: Find the median of the following data set: 20, 22, 19, 26, 30, 27, 28, 29, 18, 20, 23, 25. Solution: Arranging the data in increasing order 18, 19, 20, 20, 22, 23, 25, 26, 27, 28, 29, 30 2366 2 12 2 isvalueththe n            
  • 30. 31 25771 2 12 1 2 isvalueththe n              Then: 24 2 2523   Me Median for grouped data: To find the median of a frequency distribution, follow these steps: Step1: Find cumulative frequency (Ascending or descending). Step2: Compute the median order that equal to 2  if . Step3: If k i k F f F    2 1 , then the median class is the class which its order is K . Step4: Compute the value of median: k k k i k f W F f LMe . 2 1            for ascending cumulative frequency. k ki kk f Wf FLMe . 2 *           for descending cumulative frequency. Where: Lk : Lower Limit of median class. fk : Frequency of the median class. W: Median class’s width.  if : Sum of the frequencies. Fk–1: Ascending cumulative frequency precede the median class. * kF : Descending cumulative frequency of the median class.
  • 31. 31 Example: Find the mode for the following frequency distribution: Classes 100 - 120 - 140 - 160 - 180 - 200 - 220 - no. of families 3 7 14 20 18 12 6 Solution: In the first step we find ascending cumulative frequency Then we find the median order that equal to: 40 2 80 2   if Compare the median order with ascending cumulative frequency then: 444024 2 1    k i k F f F Then the median class is 4th class. Then: Lk=160, Wk=20, fk=20 4 4 34 . 2 f W F f LMe i           176 20 20 .24 2 80 160       Me Class frequency Ascending Cumulative frequency 100 - 3 3 120 - 7 10 140 - 14 24 160 - 20 44 180 - 18 62 200 - 12 74 220 - 6 80 Total 80
  • 32. 32 Merits of Median 1. It is easy to calculate and understand. 2. It is not affected by extreme values like the arithmetic mean 3. It can be found by mere inspection. 4. It can be used for qualitative studies. 5. It can be calculated for distributions with open-end classes. 6. It can be obtained graphically. Demerits of Median 1. It is not capable of further algebraic treatment. 2. It is not based on all observations. Exercise: find the median for the following frequency distribution by using ascending and descending cumulative frequency. The relationship between Arithmetic Mean, Median and Mode If the frequency distribution is symmetric then the following relationship between these measures is true: 3 o e MX MX   Class frequency 18 - 10 28 - 15 36 - 18 50 - 22 70 - 20 100 - 18 130 - 150 13 Total
  • 33. 33 SECTION 5) Measures of Dispersion (Variation) Measures that describe the spread of a data set are called measures of dispersion. The main objective is to know the homogeneity of the values for a data set, or to compare between the values for two or more than two data set. 1-Range The simplest measure of absolute variation is the range which calculated by subtracting the smallest value from the largest value of a data set. R=Largest value – Smallest value Example: find the range for the following data: 2, 5, 3, 8, 7, 10, 9, 12, 15. Solution: R= Largest value – Smallest value=15-2=13 Remark: in case of grouped data we calculate the value of Range by subtracting the lower limit of first class from the upper limit of last class. 2- Mean Deviation It is the sum of the absolute deviation of observations from a point (A) divided by the number of observations. n AX DM n i i   1 . for ungrouped data n AXf DM n i ii   1 . for grouped data Where A, may be is arithmetic mean ( X ) or median ( eM ) or mode ( oM ). Example: find the value of mean deviation for the following data by using mean, median and mode. Xi: 2, 3, 4, 5, 5, 6, 7, 10, 13, 14, 19 Solution:
  • 34. 34 First we find the value of ( X ) and ( eM ) and ( oM ). X =8, eM =6, oM =5 Xi XXi  oi MX  ei MX  2 6 3 4 3 5 2 3 4 4 1 2 5 3 0 1 5 3 0 1 6 2 1 0 7 1 2 1 10 2 5 4 13 5 8 7 14 6 9 8 19 11 14 13 Total 48 44 45 367.4 11 48 )(. 1     n XX XDM n i i 0909.4 11 45 )(. 1     n MX MDM n i ei e 4 11 44 )(. 1     n MX MDM n i oi o
  • 35. 35 3- Variance It is one of the most important measures of absolute variation. The variance can be calculated by taking the average of the square of the distance (deviation) of each observation from the mean of data set. The formula for the population variance ( ) for raw data is: N X n i i   1 2 2 )(   Where: X: individual value µ: population mean N: population size (number of observations). Also the formula for the sample variance (S2 ) for raw data is as follows: 1 )( 1 2 2     n XX S n i i On the other hand, the formula for the sample variance for grouped data is: 1 )( 1 2 2     n XXf S n i ii Where  ifn Example: find the variance for the following dataset: 56, 68, 72, 63, 65, 68, 71, 69, 62, 56. Solution: 1 )( 1 2 2     n XX S n i i 65 10 650 10 10 1  i iX X
  • 36. 36 Xi )( XXi  2 )( XXi  56 -9 81 68 3 9 72 7 49 63 -2 4 65 0 0 68 3 9 71 6 36 69 4 16 62 -3 9 56 -9 81 Total 294 then 667.32 110 2942   S Properties of variance: 1) 02 S 2) If 222 XYii SaSaXY  , where a is a constant. (Prove that) 3) If 22 XYii SSbXY  , where b is a constant. (Prove that) 4) If X and Y are independent variables and iii YX=Z  , then the variance of Z is: 222 YXZ SSS  5) If ),...,,( 22 2 2 1 nSSS represent the variance for k groups based on ),...,,( 21 knnn observations respectively, then the pooled variance of the groups is as follows:        n i i n i ii p n Sn S 1 1 2 2 )1( )1( where 30in      n i i n i ii p n Sn S 1 1 2 2 . where 30in
  • 37. 37 4-Standard deviation (S) Standard deviation is the most important and most widely used measure of absolute variation. Standard deviation is the square root of variance. 1 )( 1 2 2     n XX SS n i i Example: Find the standard deviation of the following frequency distribution. Solution: 75.175 80 14060 . 1 1      n i i n i ii f Xf X 198.30 80 72955 ).( 1 2     n XXf S n i ii Class fi Xi fi.Xi )( XXi  2 )( XXi  2 ).( XXf ii  100 - 3 110 330 -65.75 4323.063 12969.19 120 - 7 130 910 -45.75 2093.063 14651.44 140 - 14 150 2100 -25.75 663.0625 9282.875 160 - 20 170 3400 -5.75 33.0625 661.25 180 - 18 190 3420 14.25 203.0625 3655.125 200 - 12 210 2520 34.25 1173.063 14076.75 220 - 6 230 1380 54.25 2943.063 17658.38 Total 80 14060 72955
  • 38. 38 Coefficient of Variation A disadvantage of the standard deviation as a comparative measure of variation is that it depends on the units of measurement. This means that it is difficult to use the standard deviation to compare measurements from different populations. For this reason, statisticians have defined the coefficient of variation, which expresses the standard deviation as a percentage of the sample or population mean. If X and S represents the sample mean and the sample standard deviation, then the coefficient of variation (C.V.) is defined to be: 100*.. X S VC  If μ and σ represent the population mean and standard deviation, then the coefficient of variation CV is defined to be: 100*..   VC Notice that the numerator and denominator in the definition of CV have the same units, so CV itself has no units of measurement. This gives us the advantage of being able to directly compare the variability of two different populations using the coefficient of variation. Example1: A company has two sections (A and B) with 40 and 65 employees respectively. Their average weekly wages are $450 and $350. The standard deviations are 7 and 9. Which section has larger variability in wages? Solution: 55.1100* 450 7 100*.. )(  X S VC A 57.2100* 350 9 100*.. )(  X S VC B Because the C.V for section A is smaller than C.V for section B then, section B has larger variability. So section A has more homogeneity than section B.
  • 39. 39 Example2: if we know that the mean and standard deviation of heights and weights of 40 students are as below: Mean Standard Deviation Weights 68.34 3.02 Heights 172.55 26.33 Then find the coefficient of variation of height and weight and compare the results. Solution: 42.4100* 34.68 02.3 100*. )Weights(  X S VC 26.15100* 55.172 33.26 100*. )(  X S VC Height So, the Weights (with C.V. =4.42) have less variation than Heights (with C.V.=15.26).