Statistics and probability Lecture Notes 5 (Chapters 1–8).pdf

Addis Ababa Science and Technology University
School of Interdisciplinary program directorate
Department of Statistics
Probability and statistics
By Ashebir Feyisa. (BSc, MSc in Biostatistics)
Email: asheber.feyisa@gmail.com
2015/2016
Ashebir Feyisa 2/22/2023

Introduction to Statistics
Objectives:
At the end of this session, students should be able to:
 understand statistics and basic terminologies
 understand scales of measurement in statistics
 understand the basic methods of data collection

Definition of Statistics
2/22/2023
Ashebir Feyisa
 The word statistics has several meanings.
 We can define statistics either in plural or singular
sense.
 In plural sense: statistics is defined as the collection of
numerical facts or figures (or the raw data themselves).
 In this sense the word 'statistics' is usually understood by
a layman.

2/22/2023
Ashebir Feyisa
 Eg. 1. Vital statistics (numerical data on marriage,
births, deaths, etc).
 2. The average mark of statistics course for
students is 70% would be considered as a statistics
whereas Abebe has got 90% in statistics course is
not statistics.
 Remark: statistics are aggregate of facts. Single
and isolated figures are not statistics as they cannot
be compared and are unrelated.

Definition of Statistics
2/22/2023
Ashebir Feyisa
 In its singular sense:- Statistics is the science
that deals with the methods of collecting,
organizing, presenting, analyzing and
interpreting statistical data.

Classification of Statistics
2/22/2023
Ashebir Feyisa
 Statistics may be divided into two main branches:
I. Descriptive Statistics
II. Inferential Statistics

Classification of Statistics cont…
2/22/2023
Ashebir Feyisa
Descriptive statistics:
 Includes statistical methods involving the collection,
presentation, and characterization of a set of data
in order to describe the various features of the
data.
 Methods of descriptive statistics include graphic
methods (bar chart, pie chart, e t c) and numeric
measures (mean, median, variance e t c).
 Descriptive statistics do not allow us to make
conclusions beyond the data we have analyzed.

2/22/2023
Ashebir Feyisa
 Meaningful and pertinent information cannot
be realized from raw data unless summarized
by the tools of descriptive statistics.
 Descriptive statistics, therefore, allow us to
present the data in a more meaningful way
which allows interpretation of the data easily.

2/22/2023
Ashebir Feyisa
Inferential statistics:
 Includes statistical methods which facilitate estimation
the characteristics of a population or making decisions
concerning a population on the basis of sample results.
 In this regard, methods like estimation and hypothesis
testing are examples of inferential statistics.

2/22/2023
Ashebir Feyisa
 For example, a biologist collected blood samples of 10 students
from biology department to study blood types. Accordingly, the
following data is obtained:
 O A O AB A A O O B O
 Summary measures, for example, the proportion of students with
blood type O in the sample is 50% is an example of descriptive
statistics. We can also describe the data using bar or pie charts.
 However, if he/she wants to get information on the proportion of
students with blood type O in the entire class, he/she may use the
sample proportion (50%) as an estimate of the corresponding value
of the entire class. This is an example of inferential statistics.

Stages in statistical investigation
2/22/2023
Ashebir Feyisa
 A statistical study might involve the following stages:
collection of data, organizing and presenting the
collected data, analyzing and interpreting the result.
 Stage 1: Data collection: this stage involves acquiring
data related with the problem at hand.
 Stage 2: Organizing: this stage involves the
classification or sorting the collected data based on
some characteristics or attributes such as age, sex,
marital status e t c.
 Stage 3: presenting data: Further we may use tables,
graphs, charts so on to present the data.

Stages in statistical investigation
2/22/2023
Ashebir Feyisa
 Stage 4: Data analysis: a thorough scrutiny or analysis of the data is
necessary in order to reach conclusions or provide answers to a
problem. The analysis might require simple or sophisticated statistical
tools depending on the type of answers that may have to be
provided.
 Stage 5: Interpretation of the result: logically a statistical analysis has
to be followed by conclusions in order to be able to make a decision.
The technical terminology used to describe this last process of a
statistical study is referred to as interpretation.

Definition of some terms
2/22/2023
Ashebir Feyisa
 A population: Consists of all elements, individuals, items or
objects whose characteristics are being studied. The population
that is being studied is called target population.
 Sample: A portion of the population selected for study.
 Sample survey: The technique of collecting information from a
portion of the population.
 Census survey: A survey that includes every member of the
population.
 Variable: is a characteristic under study that assumes different
values for different element.
 Quantitative variable: A variable that can be measured
numerically. The data collected on quantitative variable are
called quantitative data. Examples include weight, height,
number of students in a class, number of car accidents, e t c.

Definition of some terms cont…
2/22/2023
Ashebir Feyisa
 Qualitative variable: A variable that cannot assume a numerical
value but can be classified into two or more non numerical
categories. The data collected on such a variable are called
qualitative or categorical data. Examples include sex, blood type,
marital status, religion e t c.
 Discrete variable: a variable whose values are countable.
Examples include number patients in a hospital, number of white
blood cells in a droplet of blood sample, number of rodents per
plot of farmland e t c.
 Continuous variable: a variable that can assume any numerical
value over a certain interval or intervals. Examples include
weight of new born babies, height of seedlings, temperature
measurements e t c.

Definition of some terms cont.….
2/22/2023
Ashebir Feyisa
 Parameter: A statistical measure obtained from a
population data. Examples include population
mean, proportion, variance and so on.
 Statistic: A statistical measure obtained from a
sample data. Examples include sample mean,
proportion, variance and so on.
 Unit of analysis: The type of thing being measured
in the data, such as persons, families, households,
states, nations, etc.

Limitation of statistics
2/22/2023
Ashebir Feyisa
 Statistics deals with only those subjects of inquiry
which are capable of being quantitatively
measured and numerically expressed.
 Statistics deals only with aggregates of facts and
no importance is attached to individual items
 Statistical data is only approximately and not
mathematically correct
 Statistics is liable to be misused. Hence expertise in
the subject is very essential. Besides, honesty is very
important in the use of statistics.

Scales of measurement
2/22/2023
Ashebir Feyisa
 Formally, we distinguish among four levels of
measurement scales.

Scales of measurements cont…
2/22/2023
Ashebir Feyisa
Nominal scale:
❖ It is the simplest measurement scale.
❖ There is no natural ordering of the levels or values of
the scale in nominal scale.
❖ For example, sex of an individual may be male or
female. There is no natural ordering of the two sexes.
Others examples include religion, blood type, eye
colour, marital status e t c.
❖ The values of nominal scale can be coded using
numerical values;
❖ However, we cannot perform any mathematical
operations on the numbers used to code.

Scales of measurements cont..
2/22/2023
Ashebir Feyisa
Ordinal scale:
 This measurement scale is similar to the nominal scale but the
levels or categories can be ranked or order.
 That is, we can compare levels or categories of the scale.
 Therefore, this scale of measurement gives better information
on the quantities being measured as compared to nominal
scale. For example, living standard of a family can be poor,
medium or higher.
 These categories can be ordered as poor is less than medium
and medium is less than higher class.
 However, the distance or magnitude between the levels, say
between poor and medium, is not clearly known.

2/22/2023
Ashebir Feyisa
Interval scale:
 This measurement scale shares the ordering or ranking and
labeling properties of ordinal scale of measurement. Besides,
the distance or magnitude between two values is clearly known
(meaningful).
 However, it lacks a true zero point (i.e., zero point is not
meaningful). For example, temperature in degree centigrade
or Fahrenheit of an object. If the temperature of an object is
zero degree centigrade, it doesn’t mean that the object lacks
heat. Hence zero is arbitrary point in the scale. It doesn’t make
sense to say that 80° F is twice as hot as 40° F.
 We can do subtraction and addition on interval level data but
division and multiplication are impossible.

2/22/2023
Ashebir Feyisa
Ratio scale:
 It is the highest level of measurement scale.
 It shares the ordering, labeling and meaningful distance
properties of interval scale.
 In addition, it has a true or meaningful zero point. The
existence of a true zero makes the ratio of two measures
meaningful. example includes, weight, height e t c.
 We can do subtraction, addition, multiplication and
division on ratio level data.

2/22/2023
Ashebir Feyisa
 The more precise variable is ratio variable and the
least precise is the nominal variable. Ratio and
interval level data are classified under quantitative
variable and, nominal and ordinal level data are
classified under qualitative variable.

Addis Ababa Science and Technology University
School of Interdisciplinary program directorate
Department of Statistics
Probability and statistics for engineers
By Mulugeta G. (BSc, MPH)
Email: mullergaro@gmail.com
2015/2016

2/22/2023
Ashebir Feyisa
After completing this unit you should be able to:
 organize data using frequency distribution.
 present data using suitable graphs or diagrams.

Methods of data collection
2/22/2023
Ashebir Feyisa
❑ Depending on the source, data can be classified in to two:
1. Primary data &
2. Secondary data
 Primary data refers to the statistical data which the
investigator originates for the purpose of inquiry.
 Secondary data refers to data which is not originated by
the investigator himself, but which he/she obtains from
someone else records. Secondary data can be obtained
from published or unpublished documents: reports,
journals, magazines, articles e t c.

Methods of data collection cont…
2/22/2023
Ashebir Feyisa
 Primary methods of data collection: It includes
data collection using observation, personal interview,
self administered questionnaire, mailed questionnaire
etc.

Classification and tabulation of data
2/22/2023
Ashebir Feyisa
 The uses of classifying and tabulating data
are:
 to display the points of similarity and dissimilarity;
 to save mental strain by systematic condensation and
suppression of irrelevant detail;
 to enable one to form a mental picture of objects of
perception; and
 to prepare the ground for comparison and inference.

2/22/2023
Ashebir Feyisa
 Types of classification
 Geographical- in terms of cities, districts, countries etc.
 Chronological - on the basis of time
 Qualitative - according to some qualitative
characteristics.
 Quantitative – in terms of magnitude.

Tabulation
2/22/2023
Ashebir Feyisa
 Tabulation: tables may be classified according to
the number of characteristics used for tabulation.
 Simple or one way table: it uses only one
characteristic or variable for classification.
 Example 2.1: Students who took introduction to
statistics in 2014 G.C.by gender.
Gender Number
Male 2000
Female 700

Tabulation cont…
2/22/2023
Ashebir Feyisa
 Two-way tables: it uses two variables for
classification.
 Example 2.2: Students who took introduction to
statistics in 2007 E.C.by age and gender.
Age Gender
Number of male Number of female
19 and below 200 180
20-25 1415 385
26 and above 385 135

Frequency distributions
2/22/2023
Ashebir Feyisa
Frequency distribution is the easiest method
of organizing data, which converts raw data
into a meaningful pattern for statistical
analysis.

The main uses of a frequency distribution are:
2/22/2023
Ashebir Feyisa
 to organize data in a meaningful way.
 to enable one to determine the nature or shape of the
distribution; how the observations cluster around a
central value; and how the values spread around the
center of the data.
 to facilitate computational procedures for measures of
average and spread.
 to enable one to draw charts and graphs for the
presentation of data.
 to enable one to make comparisons between data sets.

Terminologies
2/22/2023
Ashebir Feyisa
 Frequency distribution: a grouping of data into
categories showing the number of observations in each
mutually exclusive category.
 Array: data put in an ascending or descending order of
magnitude.
 Grouped data: data presented in the form of a frequency
distribution.
 Frequency: the number of observations corresponding to a
fixed value or to a class of values.
 Relative frequency: the number obtained when the
frequency of a class is divided by total number of
observations.

Components of a frequency distribution
2/22/2023
Ashebir Feyisa
 Class limits: the values of a variable which typically
serve to identify the classes of a frequency distribution.
 Class boundaries: the precise points which separate
various classes rather than the values included in any one
of the classes.
 Class mark: the point which divides the class into two
equal parts. This is also known as class mid-point. This
can be determined by dividing the sum of the two limits or
the sum of the two boundaries by 2.
 Class width: the length of a class

2/22/2023
Ashebir Feyisa
 Example 2.3: The following data are the weights in kg of
40 individuals participated in a diet program for weight
loss:
 70 64 99 55 64 89 87 65 62 3867 70 60 69 78 39 75 56 71 51
99 68 95 86 57 53 47 50 55 8180 98 51 36 63 66 85 79 83 70
 By grouping data into classes we can make the data much
easier to read and understand. Considering 10 as a class
width. The smallest weight is 36 kg, thus the first class of
weights is 31 kg.

2/22/2023
Ashebir Feyisa
Class Class boundary Count (Frequency)
31 – 40 30.5-40.5 3
41 – 50 40.5-50.5 2
51 – 60 50.5-60.5 8
61 – 70 60.5-70.5 12
71 – 80 70.5-80.5 5
81 – 90 80.5-90.5 6
91 – 100 90.5-100.5 4
Total 40

Steps of constructing frequency
distribution
2/22/2023
Ashebir Feyisa
1) Find the highest and the smallest value,
2) Compute the range; R = H – L,
3) Determine the number of classes using sturgges
formula
K= 1 + 3.322Log n; n= Total frequency
4) Find the class width (W) by dividing the range by
the number of classes and round to the nearest
integer. W = R/K

2/22/2023
Ashebir Feyisa
5) Identify the unit of measure usually as 1, 0.1,
0.01,…..
6) Pick a minimum value as starting point. Your
starting point is lower limit of the first class, then
continue to add the class width to get the rest
lower class limits.
7) Find the upper class limits UCLi = LCLi +w-U. then
continue to add width to get the rest upper class
limit
8) Finally find the class frequencies.

2/22/2023
Ashebir Feyisa
 Example 2.4: The following data are on the number of
minutes to travel from home to work for a group of
automobile workers:
 28 25 48 37 41 19 32 26 16 23 23 29
36 31 26 21 32 25 31 43 35 42 38 33 28.
Construct a frequency distribution for this data.
Solution:
 Range = 48 – 16 =32
 K=1+3.322log 25 =5.64≈6
 W=32/6=5.33 rounding up to the nearest integer i.e
W=6.

2/22/2023
Ashebir Feyisa
 Let the lower limit of the first class be 16 then the
frequency distribution is as follows:
Class limit Class boundaries Tally Frequency
16-21 15.5-21.5 3
22-27 21.5-27.5 6
28-33 27.5-33.5 8
34-39 33.5-39.5 4
40-45 39.5-45.5 3
46-51 45.5-51.5 1
Total 25

Types of frequency distributions
2/22/2023
Ashebir Feyisa
 Based on the type of frequency assigned to the classes
we have three types of frequency distributions:
 Absolute frequency distribution
 Relative frequency distribution
 Cumulative frequency distribution
❖ The frequency distributions that we have seen in the
previous examples are absolute frequency distributions
because the frequencies assigned are absolute frequencies.

Relative frequency distribution
2/22/2023
Ashebir Feyisa
 Definition 2.1: A relative frequency distribution is a
distribution which specifies the frequency of a class
relative to the total frequency.
 By dividing the absolute frequency to total frequency in
example 2.4 we can get relative frequency distribution.
Time (in minute) Relative frequency
16-21 0.12
22-27 0.24
28-33 0.32
34-39 0.16
40-45 0.12
46-51 0.04
Total 1

Cumulative frequency distribution
2/22/2023
Ashebir Feyisa
 Definition 2.2: Cumulative frequency refers to the
number of observations that are below/above a
specified value.
 Note: Class boundaries are mostly used to obtain
cumulative frequencies. Based on whether the
observations are bounded from above or from
below, we can have a cumulative less than or a
cumulative more than frequency distributions,
respectively.

2/22/2023
Ashebir Feyisa
 Example 2.6: Convert the absolute frequency distribution in
example 2.4 into:
 a cumulative less than frequency distribution.
 a cumulative more than frequency distribution.
Table: Less than cumulative frequency distribution of times
Time (in minute) Less than cumulative frequency
15.5- 21.5 3
21.5-27.5 9
27.5-33.5 17
33.5-39.5 21
39.5-45.5 24
45.5-51.5 25

More than cumulative frequency distribution
2/22/2023
Ashebir Feyisa
 Table: More than cumulative frequency distribution
Time (in minute) More than cumulative
frequency
15.5-21.5 25
21.5-27.5 22
27.5-33.5 16
33.5-39.5 8
39.5-45.5 4
45.5-51.5 1

Ungrouped frequency distributions
(Single-value grouping)
2/22/2023
Ashebir Feyisa
 Example 2.7: A demographer is interested in the
number of children a family may have. He took a
random sample of 30 families. The following data is
the number of children in a sample of 30 families.
 4 2 4 3 2 8 3 4 4 2 2 8 5 3 4
4 5 4 3 5 2 7 3 3 6 7 3 8 4 5
 To group these data, we will use classes based on
the single numerical value.

Ungrouped frequency distributions
2/22/2023
Ashebir Feyisa
 Table: Distribution of the number of children.
Number of Children Frequency Relative frequency
2 5 .17
3 7 .23
4 8 .27
5 4 .13
6 1 .03
7 2 .07
8 3 .1
Total 30 1

Categorical frequency distributions
2/22/2023
Ashebir Feyisa
 Note: Up to now we have seen frequency
distributions for quantitative data; we can have also
frequency distributions for qualitative (categorical)
data.
 The categorical frequency distribution is used for data
which can be placed in specific categories such as
nominal or ordinal level data.
 For example, data on political affiliation, religious
affiliation, blood type, marital status, or major field of
study would use categorical frequency distributions

Categorical frequency distributions cont...
2/22/2023
Ashebir Feyisa
 Example 2.8: The following data are on the
political party affiliations of sample of 40
engineering students. D, R, and O stand for
Democratic, Republican and Other, respectively.
 D D D D O R O R O R O R O D D R D D D R
R O R D R R O R R R R R O O R R D R D D
 The classes for grouping are ‘Democratic’,
‘Republican’ and ‘Other’

Categorical frequency distributions cont...
2/22/2023
Ashebir Feyisa
 Table: Number of students by political party
affiliations.
Class frequency Relative
frequency
Democratic 13 0.325
Republican 18 0.45
Other 9 0.225
Total 40 1

Diagrammatic and graphical presentation of data
2/22/2023
Ashebir Feyisa
 Graphs for quantitative data
 Histogram: it consists of a set of adjacent rectangles
whose bases are marked off by class boundaries (not
class limits) along the horizontal axis and whose heights
are proportional to the frequencies associated with the
respective classes.
To construct a histogram from a data set:
◼ Construct a frequency table.
◼ Draw adjacent bars having heights determined by the
frequencies in step1.

2/22/2023
Ashebir Feyisa
 Histogram can often indicate how symmetric the
data are; how spread out the data are; whether
there are intervals having high levels of data
concentration; whether there are gaps in the data;
and whether some data values are far apart from
others.

2/22/2023
Ashebir Feyisa
 Example 2.9: The following is a histogram for the
frequency distribution in example 2.4.
Figure: Distribution of number of minutes spent by the
automobile workers

2/22/2023
Ashebir Feyisa
 Frequency polygon: is a graphic form of a frequency
distribution. It can be constructed by plotting the class
frequencies against class marks and joining them by a
set of line segments.
 Note: we should add two classes with zero frequencies
at the two ends of the frequency distribution to
complete the polygon.

2/22/2023
Ashebir Feyisa
 Example 2.10: Construct a frequency polygon for the frequency
distribution of the time spent by the automobile workers that we have
seen in example 2.4
 Figure: Distribution of number of minutes spent by the automobile workers

Graphs useful for presenting qualitative data
2/22/2023
Ashebir Feyisa
 Bar charts are diagrammatic representation of
data in which the data are represented by series of
vertical or horizontal bars, the height (or length) of
each bar indicating the size of the figure
represented.
 Example 2.11: Draw a bar chart for the following
coffee production data.

2/22/2023
Ashebir Feyisa
 Table: Coffee productions from 1990 to 1995.
Production year 1990 1991 1992 1993 1994 1995
Amounts of coffee (in 1000
tons)
50 75 92 64 100 120

2/22/2023
Ashebir Feyisa
 Pie-chart: it is a circle divided by radial lines into
sections or sectors so that the area of each sector is
proportional to the size of the figure represented.
 Pie-chart construction:
 Calculate the percentage frequency of each
component. It is given by
 Calculate the degree measures of each sector. It is
given by
 Then draw the circle.

2/22/2023
Ashebir Feyisa
 Example 2.13: Draw a pie-chart to represent the
following data on a certain family expenditure.
 Table: Family expenditure.
Item Food Clothing House
rent
Fuel & light Miscella
neous
Total
Expenditure(in
birr)
50 30 20 15 35 150
Percentage
frequencies
33.33 20 13.33 10 23.33
Angles of the
sector
1200 720 480 360 840 3600

2/22/2023
Ashebir Feyisa
Figure: Family expenditure

 MEASURES OF CENTRAL TENDENCY
2/22/2023
Ashebir Feyisa

Introduction and objectives of
measuring central tendency
 In the pervious section, we have discussed how raw data
can be organized in terms of tables, charts and frequency
distributions in order to be easily understood and
analyzed.
 Frequency distributions and their corresponding graphical
displays roughly tell us some of the features of a data
set.
 However, they don’t condense the mass of data in a way
that we can easily understand and interpret.
 In this section, we will see how to summarize data using a
descriptive measure called average. This will help us in
condensing a mass of data into a single value which is in
some sense representative of the whole data set.
2/22/2023
Ashebir Feyisa

 An average is a single value intended to represent
a distribution as a whole.
 Note that the individual values of the distribution
must have a tendency to cluster around an average.
In view of this requirement an average is also
referred to as a measure of central tendency.
2/22/2023
Ashebir Feyisa

 An average (a measure of central tendency) is
considered satisfactory if it possesses all or most of
the following properties. An average should be:
 Rigidly defined (unique),
 Based on all observation under investigation
 Easily understood,
 Simple to compute
 Suitable for further mathematical treatment
 Little affected by fluctuations of sampling
 Not highly affected by extreme values.
2/22/2023
Ashebir Feyisa

The summation notation
 Suppose a variable is represented by X. The
successive values of this variable may be
represented by using subscripts or indexes as x1, x2,
x3,…, xn. If the sum of these values or terms is
required, we write x1+x2+x3+…+xn. The Greek
letter ∑ (read as sigma) can be used to write the
above sum in a compact form as
where 1= lower limit and n = upper limit.
2/22/2023
Ashebir Feyisa

Types of measures of central tendency
 Arithmetic mean
 Note that if the data refers to a population data the mean is denoted by
the Greek letter µ (read as mu).
2/22/2023
Ashebir Feyisa

Arithmetic mean for raw data (ungrouped data)
 Example 3.1: The following data is the weight (in
Kg) of eight youths: 32,37,41,39,36,43,48 and 36.
Calculate the arithmetic mean of their weight.
2/22/2023
Ashebir Feyisa

 Example 3.2: The ages of a random sample of
patients in a given hospital in Ethiopia is given
below:
 Calculate the average age of these patients.
 Solution:
Age 10 12 14 16 18 20 22
Number of patients 3 6 10 14 11 5 4
2/22/2023
Ashebir Feyisa

Age (xi) Number of patients (fi) fixi
10 3 30
12 6 72
14 10 140
16 14 224
18 11 198
20 5 100
22 4 88
Total 53 852
2/22/2023
Ashebir Feyisa

The weighted arithmetic mean
 In some cases the data in the sample or population
should not be weighted equally, and each value
weighted according to its importance.
 There is a measure of average for such problems known
as weighted Arithmetic mean.
 Weighted arithmetic mean is used to calculate the
average when the relative importance of the
observations differs.
 This relative importance is technically known as weight.
 Weight could be a frequency or numerical coefficient
associated with observations.
2/22/2023
Ashebir Feyisa

 Example 3.3: The GPA or CGPA of a student is a
good example of a weighted arithmetic mean.
Suppose that Solomon obtained the following
grades in the first semester of the freshman
program at AASTU in 2006.
Course Credit hour (wi) Grade
Math101 4 A=4
Stat2091 3 C=2
Chem101 3 B=3
Phys101 4 B=3
Flen101 3 C=2
2/22/2023
Ashebir Feyisa

 Find the GPA of Solomon.
2/22/2023
Ashebir Feyisa

 Properties of arithmetic mean
 It can be computed for any set of numerical data, it
always exists, and unique.
 It depends on all observations.
 The sum of deviations of the observations about the
mean is zero i.e.
2/22/2023
Ashebir Feyisa

 It is greatly affected by extreme values.
 It lends itself to further statistical treatment, for
instance, combinations of means.
 It is relatively reliable, i.e. it is not greatly affected
by fluctuations in sampling.
 The sum of squares of deviations of all observations
about the mean is the minimum

2/22/2023
Ashebir Feyisa

 Example 3.6: During the beginning of an epidemic in a
region 12 cases were reported in the first day, 18 on
second day and 48 on the third day.
 Find the average growth rate of the epidemic disease.
 Assuming that the growth pattern continues, forecast the
number of cases that would be reported on the 4th and
8th days.
 Solution:
 Find the 2 growth rates first.
 From first day to second day the rate is 18/12=1.5.
 From second day to third day the rate is 48/18=2.67.
2/22/2023
Ashebir Feyisa

 Therefore, the average rate
 .
2/22/2023
Ashebir Feyisa

 Properties of median
 It is an average of position.
 It is affected by the number of observations than by
extreme values.
 The sum of the deviations about the median, signs
ignored, is less than the sum of deviations taken from
any other value or specific average.
2/22/2023
Ashebir Feyisa

Definition 3.6: The mode (modal value) of an observed set of
data is the value that occurs the largest number of times.
 The mode for raw data
 Example 3.10: Find the modal value for the following sets of
data.
 5 6 5 8 7 4 . In this data set, 5 is the most frequent value.
Therefore, the mode is 5. Since the modal value is only one
number, we call the distribution unimodal.
 1 2 3 4 8 2 5 4 6. In this data the modal values are 2
and 4 since both 2 and 4 appear most frequently and they
occur equal number of times. These kind distributions are
called bimodal distribution.
 1 2 4 3 5 6 8 7 In this data set, all values appear
equal number of times so there is no modal value
2/22/2023
Ashebir Feyisa

 Note:
 If a distribution has more than two modal values then
we call the distribution multimodal.
 If in a set of observed values, all values occur once or
equal number of times, there is no mode.
2/22/2023
Ashebir Feyisa

 Properties of modal value
 It is easy to calculate and understand.
 It is not affected by extreme values.
 It is not based on all observations.
 Is not used in further analysis of data.
2/22/2023
Ashebir Feyisa

 The mean, median, and mode of grouped data
 The mean for grouped data can be found by
considering the values in the interval are centered at
the mid-point of the interval.
 Example 3.12: Consider the frequency distribution
of the time spent by the automobile workers. Find
the mean time spent by these workers from this
frequency distribution.
2/22/2023
Ashebir Feyisa

Note:
 We approximate the median by assuming that the
values in the median class are evenly distributed.
 We can compute the median for open-ended frequency
distribution as long as the middle value does not occur
in the open-ended class.
2/22/2023
Ashebir Feyisa

The mode for grouped data can be estimated by the
following formula.
2/22/2023
Ashebir Feyisa

 Example 3.15: The following data relate to sizes of
shoes sold at a stock during a week. Find the
quartiles, the seventh decile and the 90th percentile.
 Solution: The total number of observations is 191.
Size of shoes 5 5.5 6 6.5 7 7.5 8 8.5 9 9.5
Number of pairs 2 5 15 30 60 40 23 11 4 1
2/22/2023
Ashebir Feyisa

 Note: Relationships between fractile points
 Q1=P25
 Q2=P50=D5
 Q3=P75
 D1=P10; D2=P20 …D9=P90.
2/22/2023
Ashebir Feyisa

Points of discussions
2/22/2023
Ashebir Feyisa
 State the types of data and discuss the difference between
them
 write at least three sources of secondary data
 List methods of data collection for primary data
 What are the advantages of frequency distribution
 State types of frequency distribution based up on the
frequency assigned for the class
 Differentiate grouped and ungrouped frequency
distribution.
 What types of graphs do we use for quantitative and
qualitative data.


2/22/2023
Ashebir Feyisa
Objectives: Having studied this portion, you should be
able to
 understand the importance of measuring the
variability (dispersion) in a data set.
 measure the scatter or dispersion in a data set.
 understand ‘moments’ as a convenient and unifying
method for summarizing several descriptive
statistical measures.
 measure the extent to which the distribution of
values in a data set deviate from symmetry.

Introduction and objectives of measuring variation
2/22/2023
Ashebir Feyisa
 We have seen that averages are representatives of
a frequency distribution. But they fail to give a
complete picture of the distribution. They do not tell
anything about the spread or dispersion of
observations within the distribution. Suppose that we
have the distribution of yield (kg per plot) of two
rice varieties from 5 plots each.
Variety 1: 45 42 42 41 40
Variety 2: 54 48 42 33 30

2/22/2023
Ashebir Feyisa
 The mean yield of both varieties is 42 kg. The mean
yield of variety 1 is close to the values in this
variety.
 On the other hand, the mean yield of variety 2 is
not close to the values in variety 2.
 The mean doesn’t tell us how the observations are
close to each other

Objectives of measuring variation
2/22/2023
Ashebir Feyisa
 To describe dispersion (variability) in a data.
 To compare the spread in two or more distributions.
 To determine the reliability of an average.
 Note: The desirable properties of good measures of
variation are almost identical with that of a good
measure of central tendency.

2/22/2023
Ashebir Feyisa
Absolute and relative measures
 Measures of variation may be either absolute or
relative.
 Absolute measures of variation are expressed in the
same unit of measurement in which the original data
are given. These values may be used to compare the
variation in two distributions provided that the
variables are in the same units and of the same
average size.

2/22/2023
Ashebir Feyisa
 In case the two sets of data are expressed in different
units, however, such as quintals of sugar versus tones of
sugarcane or if the average sizes are very different
such as manager’s salary versus worker’s salary, the
absolute measures of dispersion are not comparable.
 In such cases measures of relative dispersion should be
used.
 A measure of relative dispersion is the ratio of a
measure of absolute dispersion to an appropriate
measure of central tendency.
 It is a unit less measure.

Types of measures of variation
2/22/2023
Ashebir Feyisa
 The range and relative range
Definition 4.1: Range is defined as the difference
between the maximum and minimum observations in a
set of data. 𝑅𝑎𝑛𝑔𝑒 = 𝑀𝑎𝑥𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒 −
𝑀𝑖𝑛𝑖𝑚𝑢𝑚 𝑣𝑎𝑙𝑢𝑒

2/22/2023
Ashebir Feyisa
 Range is the crudest absolute measures of variation.
It is widely used in the construction of quality control
charts.
Definition 4.2: Relative range (RR) is defined as
𝑹𝑹 =
𝑹𝒂𝒏𝒈𝒆
𝒎𝒂𝒙𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆 + 𝒎𝒊𝒏𝒊𝒎𝒖𝒎 𝒗𝒂𝒍𝒖𝒆

Variance, standard deviation and
coefficient of variation
2/22/2023
Ashebir Feyisa
 Definition 4.3: The variance is the average of the
squares of the distance each value is from the mean.
 The symbol for the population variance is σ2 (σ is the
Greek lower case letter sigma). Let x1,x2,…,xN be the
measurements on N population units then, the
population variance is given by the formula:
 𝜎2 =
σ𝑖=1
𝑁
(𝑥𝑖−µ)2
𝑁
=
{σ𝑖=1
𝑁
𝑥𝑖
2−
(σ 𝑥𝑖)2
𝑁
}
𝑁
where µ =
𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 =
σ𝑖=1
𝑁
𝑥𝑖
𝑁
and N=Population size.

2/22/2023
Ashebir Feyisa
 Definition 4.4: The standard deviation is the square
root of the variance. The symbol for the population
standard deviation is 𝜎. The corresponding formula
for the standard deviation is
𝜎 = 𝜎2 =
σ𝑖=1
𝑁 (𝑥𝑖−µ)2
𝑁
.

2/22/2023
Ashebir Feyisa
 Example 4.1: The height of members of a certain committee was measured in
inches and the data is presented below.
 Height(x): 69 66 67 69 64 63 65 68 72
µ = 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑎𝑛 =
σ𝑖=1
𝑁
𝑥𝑖
𝑁
=
69 + 66 + ⋯ + 72
9
=
603
9
= 67 𝑖𝑛𝑐ℎ𝑒𝑠
 𝜎2
=
σ𝑖=1
𝑁
(𝑥𝑖−µ)2
𝑁
=
4+1+0+4+9+16+4+1+25
9
=
64
9
= 7.11𝑖𝑛𝑐ℎ2
(𝐱 − µ) 2 -1 0 2 -3 -4 -2 1 5
(𝐱 − µ)𝟐 4 1 0 4 9 16 4 1 25
66
.
2
11
.
7
2
=
=
= 


2/22/2023
Ashebir Feyisa
 Definition 4.5: The sample variance is denoted by
S2, and its formula is
𝑆2 =
σ𝑖=1
𝑛
(𝑥𝑖− ҧ
𝑥)2
𝑛−1
=
σ 𝑓(𝑥− ҧ
𝑥)2
𝑛−1
= {
σ 𝑓𝑥2−
(σ 𝑓𝑥)
2
𝑛
𝑛−1
} .
 Definition 4.6: The sample standard deviation,
denoted by S, is the square root of the sample
variance
𝑆 = 𝑆2 =
σ𝑖=1
𝑛 (𝑥𝑖− ҧ
𝑥)2
𝑛−1
=
σ 𝑓(𝑥− ҧ
𝑥)2
𝑛−1
.

2/22/2023
Ashebir Feyisa
 Example 4.2: For a newly created position, a manager
interviewed the following numbers of applicants each
day over a five-day period: 16, 19, 15, 15, and 14.
Find the variance and standard deviation.
 Solution:
ҧ
𝑥 =
79
5
= 15.8
𝑆2 =
σ 𝑓 𝑥 − ҧ
𝑥 2
𝑛 − 1
=
14.8
4
= 3.7
𝑆2 =
σ 𝑓𝑥2 −
(σ 𝑓𝑥)
2
𝑛
𝑛 − 1
=
1263 −
(79)2
5
4
=
14.8
4
= 3.7

2/22/2023
Ashebir Feyisa
 Note that the procedure for finding the variance
and standard deviation for grouped data is similar
to that for finding the mean for grouped data, and
it uses the mid-points of each class.

Properties of variance
2/22/2023
Ashebir Feyisa
 The unit of measurement of the variance is the
square of the unit of measurement of the observed
values. It is one of its limitations.
 The variance gives more weight to extreme values
as compared to those which are near to mean
value, because the difference is squared in
variance.
 It is based on all observations in the data set.

Properties of standard deviation
2/22/2023
Ashebir Feyisa
 Standard deviation is considered to be the best
measure of dispersion and is used widely.
 There is, however, one difficulty with it. If the unit of
measurement of variables of two series is not the
same, then their variability cannot be compared by
comparing the values of standard deviation.

Uses of the variance and standard deviation
2/22/2023
Ashebir Feyisa
 The variance and standard deviations can be used to
determine the spread of data, consistency of a
variable and the proportion of data values that fall
within a specified interval in a distribution.
 If the variance or standard deviation is large, the data
is more dispersed.
 This information is useful in comparing two or more data
sets to determine which is more (most) variable.
 Finally, the variance and standard deviation are used
quite often in inferential statistics.

Coefficient of variation (CV)
2/22/2023
Ashebir Feyisa
 The standard deviation is an absolute measure of
dispersion. The corresponding relative measure is known
as the coefficient of variation (CV).
 Coefficient of variation is used in such problems where
we want to compare the variability of two or more
different series. Coefficient of variation is the ratio of
the standard deviation to the arithmetic mean, usually
expressed in percent:
 A distribution having less coefficient of variation is said
to be less variable or more consistent or more uniform
or more homogeneous.

2/22/2023
Ashebir Feyisa
Example 4.3: Last semester, the students of Biology and Chemistry Departments took Stat 273
course. At the end of the semester, the following information was recorded.
Department Biology Chemistry
Mean score 79 64
Standard deviation 23 11
Compare the relative dispersions of the two departments’ scores using the appropriate way.
Solution:
Biology Department Chemistry Department
23
100 29.11%
79
CV =  =
11
100 17.19%
64
CV =  =
Since the CV of Biology Department students is greater than that of Chemistry Department
students, we can say that there is more dispersion in the distribution of Biology students’ scores
compared with that of Chemistry students.

2/22/2023
Ashebir Feyisa
 Example 4.4: The mean weight of 20 children was
found to be 30 kg with variance of 16kg2 and their
mean height was 150 cm with variance of 25cm2.
Compare the variability of weight and height of these
children.
 𝐶𝑉
𝑚 =
𝑆𝑚
ҧ
𝑥𝑚
× 100 =
4 𝑘𝑔
30 𝑘𝑔
× 100% = 13.33%
𝐶𝑉ℎ =
𝑆ℎ
ҧ
𝑥ℎ
× 100 =
5𝑐𝑚
150𝑐𝑚
× 100 = 3.33%
 The weight of the children is more variable than their
height.

Standard score
2/22/2023
Ashebir Feyisa
A standard score is a measure that describes the relative position of a single score in the entire
distribution of scores in terms of the mean and standard deviation. It also gives us the number of
standard deviations a particular observation lie above or below the mean.
Population standard score:


−
=
x
Z wherexis the value of the observation,  and are the
mean and standard deviation of the population respectively.
Sample standard score:
S
x
x
Z
−
= wherexis the value of the observation, x andS are the mean
and standard deviation of the sample respectively.

2/22/2023
Ashebir Feyisa
 Interpretation:
𝐼𝑓 𝑍 𝑖𝑠 ቐ
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑎𝑏𝑜𝑣𝑒 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑙𝑖𝑒𝑠 𝑏𝑒𝑙𝑜𝑤 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛
𝑧𝑒𝑟𝑜, 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑒𝑞𝑢𝑎𝑙𝑠 𝑡𝑜 𝑡ℎ𝑒 𝑚𝑒𝑎𝑛

2/22/2023
Ashebir Feyisa
Example 4.5: Two sections were given an exam in a course. The average score was 72 with
standard deviation of 6 for section 1 and 85 with standard deviation of 5 for section 2. Student A
from section 1 scored 84 and student B from section 2 scored 90. Who performed better relative
to his/her group?
Solution: Section 1: x = 72, S = 6 and score of student A from Section 1; A
x = 84
Section 2: x = 85, S = 5 and score of student B from Section 2; B
x = 90
Z-score of student A: 00
.
2
6
72
84
1
1
=
−
=
−
=
S
x
x
Z A
Z-score of student B: 00
.
1
5
85
90
2
2
=
−
=
−
=
S
x
x
Z B
From these two standard scores, we can conclude that student A has performed better relative to
his/her section students because his/her score is two standard deviations above the mean score of
selection 1 while the score of student B is only one standard deviation above the mean score of
section 2 students.

2/22/2023
Ashebir Feyisa
 Example 4.6: A student scored 65 on a calculus test that
had a mean of 50 and a standard deviation of 10; she
scored 30 on a history test with a mean of 25 and a
standard deviation of 5. Compare her relative positions on
each test.
Solution: First, find the z-scores.
For calculus the z-score is
𝒛 =
𝒙 − µ
𝝈
=
𝟔𝟓 − 50
𝟏𝟎
= 𝟏. 𝟓
For history the z-score is
𝒛 =
𝒙 − µ
𝝈
=
𝟑𝟎 − 25
𝟓
= 𝟏. 𝟎
Since the z-score for calculus is larger, her relative position in the
calculus class is higher than her relative position in the history class.

Moments
2/22/2023
Ashebir Feyisa
 Definition 4.7: The average of deviations from an
arbitrary origin raised to an integral power of the
observations of a distribution is defined as a
moment. Let x1,x2,…,xn be observations, we define
the r-th moment about A as:
σ(𝑥𝑖 − 𝐴)𝑟
𝑛
.

2/22/2023
Ashebir Feyisa
 The most known moments are moments about the mean
also known as the central moments and the moments
about zero (also known as moments about the origin.)
 The rth moment about the mean, µr, is given by:
 µ𝑟 =
σ(𝑥𝑖− ҧ
𝑥)𝑟
𝑛
.
 Special ceases: µ0=1, µ1=0, µ2=s2.
 The rthmoment about the origin,µ𝑟
,
, is given by:
 µ𝑟
, =
σ 𝑥𝑖
𝑟
𝑛
.
 Special cases: µ0
, = 1,µ1
, = ҧ
𝑥,µ2
, =
σ 𝑥𝑖
2
𝑛
.

2/22/2023
Ashebir Feyisa
 Skewness: it refers to lack of symmetry in a distribution.
Note: for a symmetrical and unimodal distribution:
 Mean =median =mode
 The lower and upper quartiles are equidistant from the
median, so also are corresponding pairs of deciles and
percentiles.
 Sum of positive deviations from the median is equal to the
sum of negative deviations (signs ignored).
 The two tails of the frequency curve are equal in length
from the central value.
 If a distribution is not symmetrical we call it skewed
distribution.

2/22/2023
Ashebir Feyisa
 Measures of skewness
 Pearsonian coefficient of skewness (Pcsk) defined as:
 𝑃𝑐𝑠𝑘 =
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
𝑠.𝑑
 In moderately skewed distributions: Mode = mean-
3(mean-median)
3(𝑚𝑒𝑎𝑛−𝑚𝑒𝑑𝑖𝑎𝑛)
𝑠.𝑑

2/22/2023
Ashebir Feyisa
Interpretation:
 𝑖𝑓 𝑃𝑐𝑠𝑘 ቐ
< 0, 𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑.
= 0, 𝑠𝑦𝑚𝑚𝑒𝑡𝑟𝑖𝑐𝑎𝑙
> 0 , 𝑝𝑜𝑠𝑡𝑖𝑣𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑.


 Note: in a negatively skewed distribution larger values are more frequent than
smaller values. In a positively skewed distribution smaller values are more
frequent than larger values.

2/22/2023
Ashebir Feyisa
 Example 4.7: If the mean, mode and s.d of a
frequency distribution are 70.2, 73.6, and 6.4,
respectively. What can one state about its
skeweness
𝑚𝑒𝑎𝑛−𝑚𝑜𝑑𝑒
𝑠.𝑑
=
70.2−73.6
6.4
= −0.53.
 This figure suggests that there is some negative
skewness

2/22/2023
Ashebir Feyisa
 Kurtosis: it refers to the degree of peakedness of
a distribution.

2/22/2023
Ashebir Feyisa
 When the values of a distribution are closely
bunched around the mode in such a way that the
peak of the distribution becomes relatively high, the
distribution is said to be leptokurtic. If it is flat
topped we call it platykurtic. A distribution which is
neither highly peaked nor flat topped is known as a
meso-kurtic distribution (normal).

Measures of kurtosis
2/22/2023
Ashebir Feyisa
i. Moment coefficient of kurtosis (Mck) is given by
𝑀𝑐𝑘 =
µ4
µ2
2 =
µ4
𝑠4
where
µ4 =
σ(𝑥𝑖 −𝑥ҧ)4
𝑛
, µ2 =
σ(𝑥𝑖 −𝑥ҧ)2
𝑛
= 𝑆2
.
Interpretation:
𝑖𝑓 𝑀𝑐𝑘
< 3,𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑝𝑙𝑎𝑡𝑦𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.
= 3,𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑚𝑒𝑠𝑜𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.
> 3,𝑡ℎ𝑒 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 𝑖𝑠 𝑠𝑎𝑖𝑑 𝑡𝑜 𝑏𝑒 𝑙𝑒𝑝𝑡𝑜𝑘𝑢𝑟𝑡𝑖𝑐 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛.

2/22/2023
Ashebir Feyisa
 ELEMENTARY PROBABILITY

Definition of some probability terms
2/22/2023
Ashebir Feyisa
 Random experiment: is an experiment in which the
outcome cannot be determined or predicted exactly in
advance, i.e. it is the process of observing or measuring
the outcome of a chance event.
 Some of the characteristics of a random experiment
are:
 all the possible outcomes of the experiment can be specified
in advance.
 the experiment can be repeated indefinitely.
 there is a sort of regularity in the outcomes observed in
large repetitions of the experiment.

2/22/2023
Ashebir Feyisa
 Sample point (outcome): The individual result of a
random experiment.
 Sample space: The set containing all possible
sample points (out comes) of the random
experiment.
 The sample space is often called the universe and
denoted by S.
 Event: The collection of outcomes or simply a subset
of the sample space. We denote events with capital
letters, A, B, C, etc.

2/22/2023
Ashebir Feyisa
 Example 5.1: If an experiment consists of flipping of a coin
once, then
 S = {H, T} where H means that the outcome of the toss is a
head and T that it is a tail. A= {H} represents the event of
head occurring.
 Example 5.2: If an experiment consists of rolling a die once
and observing the number on top, then the sample space is
S = {1, 2, 3, 4, 5, 6} where the outcome i means that i
appeared on the die, i = 1, 2, 3, 4, 5, 6. {1}, {2},{3},{4},{5}
and {6}are elementary events i.e. events consisting of a
single outcome. Let A represents the event of an odd number
will occur, then A is simply the set containing 1, 3 and 5 i.e.
A= {1, 3, 5}.

Counting rules
2/22/2023
Ashebir Feyisa
 to assign probabilities for an event, we might need to
enumerate the possible outcomes of a random experiment
and need to know the number of possible outcomes
favoring the event. The following principles will help us in
determining the number of possible outcomes favoring a
given event.

Addition principle
2/22/2023
Ashebir Feyisa
 If a task can be accomplished by k distinct procedures
where the ith procedure has ni alternatives, then the total
number of ways of accomplishing the task equals n1 +
n2+…+nk.
 Example 5.3: Suppose one wants to purchase a certain
commodity and that this commodity is on sale in 5
government owned shops, 6 public shops and 10
private shops. How many alternatives are there for the
person to purchase this commodity?
 Solution: Total number of ways =5+6+10=21 ways

Multiplication principle
2/22/2023
Ashebir Feyisa
 If a choice consists of k steps of which the first can be
made in n1 ways, for each of these the second can be
made in n2 ways,…, and for each of these the kth can
be made in nk ways, then the whole choice can be made
in n1.n2….nk ways.
 Example 5.4: If we can go from Addis Ababa to Rome
in 2 ways and from Rome to Washington D.C. in 3 ways
then the number of ways in which we can go from Addis
Ababa to Rome to Washington D.C. is 2x3 ways or 6
ways. We may illustrate the situation by using a tree
diagram below:

2/22/2023
Ashebir Feyisa
R
A
R
W
W
W
W
W
W

2/22/2023
Ashebir Feyisa
 Example 5.5: If a test consists of 10 multiple choice questions, with
each permitting 4 possible answers, how many ways are there in which
a student gives his/her answers?
Solution: There are 10 steps required to complete the test.
First step: To give answer to question number one. He/she has 4
alternatives.
Second step: To give answer to question number two, he/she has 4
alternatives……
Last step: To give answer to last question, he/she has 4 alternatives.
Therefore, he/she has 4x4x4x…x4=410 ways or1, 048, 576 ways of
completing the exam. Note that there is only one way in which he/she can
give correct answers to all questions and that there are 310 ways in which
all the answers will be incorrect.

2/22/2023
Ashebir Feyisa
 Example 5.6: A manufactured item must pass through
three control stations. At each station the item is
inspected for a particular characteristic and marked
accordingly. At the first station, three ratings are
possible while at the last two stations four ratings are
possible. Hence there are 48 ways in which the item
may be marked.
 Example 5.7: Suppose that car plate has three letters
followed by three digits. How many possible car plates
are there, if each plate begins with a H or an F?
 2x 26x 26x 10x 10x 10 or 1, 352, 000 different
plates.

2/22/2023
Ashebir Feyisa
 If n is a positive integer, we define n!= n(n-1)(n-
2)…1 and call it n-factorial and 0!=1.
 Permutations
 Suppose that we have n different objects. In how
many ways, say nPn, may these objects be arranged
(permuted)? For example, if we have objects a, b
and c we can consider the following arrangements:
abc, acb, bac, bca, cab, and cba. Thus the answer is
6. The following theorem gives general result on the
number of such arrangements.

2/22/2023
Ashebir Feyisa
 Theorem 5.4: Permutation
 The number of permutations of n different objects is
given by nPn= n!
 A permutation of n objects, arranged in groups of
size r, without repetition, and order being important
is:
)!
(
!
r
n
n
Pr
n
−
=

2/22/2023
Ashebir Feyisa
 Example 5.8: Suppose that we have five letters a,
b, c, d.
What is the number of possible arrangements of these
letters taken all at a time?
What is the number of possible arrangements of these
letters if we use only three of the letters at a time?

2/22/2023
Ashebir Feyisa
 Solution:
 Using (i) of theorem 5.4, we have 4! ways of
arranging the 4 letters, i.e. we have 24 possible
arrangements.
 Using (ii) of theorem 5.4, we have 4P3 ways of
arranging 3 letters taken from the four letters, i.e.
we have 24 possible arrangements.

2/22/2023
Ashebir Feyisa
 Example 5.9: In a class with 8 boys and 8 girls
 In how many ways can the children line up if they alternate
girl-boy-girl-boy-... ?
 In how many ways can the children line up so that no two of
the same sex are next to each other?
 Solution:
 The 8 girls can line-up in 8! ways, and likewise the 8 boys can
line-up in 8! ways. For any single arrangement of the girls, all
possible arrangements of the boys are possible, thus by
multiplication principle we have 8!x 8! ways to arrange the
children in girl-boy lines.
 Now we must include the case of boy-girl. So we have 2x8!x
8! ways of arranging.

2/22/2023
Ashebir Feyisa
 Example 5.10: If I have 5 different books on my shelf, in
how many ways can I arrange these books? Solution: We
can arrange the books in 5! different ways or 5x4x3x2x1
ways or 120 ways.
 Remarks
 i) The number of permutations of n distinct objects arranged
in a circle is (n-1)!.
 This is because we consider two permutations the same if
one is a rotation of the other. For n objects arranged around
a circle, there a n rotations that give the same permutation.
Dividing n! by n gives (n - 1)!. The two circular permutations
below are considered the same; their order is a, b, c, d, e.

2/22/2023
Ashebir Feyisa
 ii) Permutations when not all objects are different
Given n objects of which n1 are one kind, n2 are
another kind, …, nk of another kind, then the total
number of distinct permutations that can be made
from these objects is

2/22/2023
Ashebir Feyisa
 Example 5.11
 How many "words" (text strings or distinct
arrangements) can be made from the letters b,k,o,o?
 How many permutations are there for the letters in
the word banana?

2/22/2023
Ashebir Feyisa
 If we label the two o’s as o1 and o2, and think of
them as distinct, then the number of permutations is
4!. For each permutation there will be a matching
permutation that switches the o’s, that is for o1o2bk
there is the matching o2o1bk permutation. We can
see then that if we divide the number of distinct
permutations by two, we have a count of the
number of permutations of the 4 letters where we
do not distinguish between the two o’s. Therefore,
there are distinct4!/2 text strings or 12 text strings.

2/22/2023
Ashebir Feyisa
 If we think of all 6 letters as distinct, then we would
have 6! permutations. As in the preceding example
for the two n’s, we would need to divide 6! by 2.
For the 3 a’s, we would have 6 counts for a single
permutation. For instance, each of the following
would be a single word if the a’s were not distinct.
a1a2a3bnn, a1a3a2bnn, a2a1a3bnn, a2a3a1bnn,
a3a1a2bnn, and a3a2a1bnn. Hence the number of
distinct permutations of the word banana is
.

Combinations
2/22/2023
Ashebir Feyisa
 Consider n different objects. This time we are
concerned with counting the number of ways we
may choose r out of these n objects without regard
to order. For example, we have the objects a, b, c
and d, and r=2; we wish to count ab, ac, ad, bc,
bd, and cd. In other words, we do not count ab and
ba since the same objects are involved and only the
order differs.

2/22/2023
Ashebir Feyisa
 There are many problems in which we are
interested in determining the number of ways in
which r objects can be selected from n distinct
objects without regard to the order in which they
are selected. Such selections are called
combinations or r-sets. It may help to think of
combinations as committees. The key here is without
regard for order.

2/22/2023
Ashebir Feyisa
 Example 5.12: How many different committees of 3
can be formed from Hawa, Segenet, Nigisty and
Lensa?
 Solution: The question can restated in terms of subsets
from a set of 4 objects, how many subsets of 3 elements
are there? In terms of combinations the question
becomes, what is the number of combinations of 4
distinct objects taken 3 at a time? The list of
committees:{H,S,N}, {H,S,L}, {H,N,L}, {S,N,L}.Therefore,
we have 4C3 or 4 possible number of committees.

2/22/2023
Ashebir Feyisa
 Example 5.13:
 (i) A committee of 3 is to be formed from a group
of 20 people. How many different committees are
possible?
 (ii) From a group of 5 men and 7 women, how many
different committees consisting of 2 men and 3
women can be formed?

2/22/2023
Ashebir Feyisa
 The Axioms of Probability
 Probabilities are real numbers assigned to events
(or subsets) of a sample space. We can think of the
assignment of probabilities to events, or probability
measure, as a function between the collection of
subsets of the sample space and the real numbers.

2/22/2023
Ashebir Feyisa
 Mathematically, a probability measure P for a random
experiment is a real-valued function defined on the
collection of events that satisfies the following axioms:
 Axiom 1: The probability of an event is a nonnegative
real number; that is, P(A) ≥ 0 for any subset A of S.
 Axoim 2: P(S) = 1
 Axiom 3: If A1, A2, A3 ... is a finite or infinite sequence
of mutually exclusive
 events of S, then P(A1 u A2 u A3 u ...) = P( A1) + P( A2)
+ P( A3) + ...=

2/22/2023
Ashebir Feyisa
 Suppose that we have a random experiment with
sample space S and probability function P and A
and B are events. Then we have the following
results:
 P( ) = 0
 P(Ac) = 1 − P(A)
 P(B n Ac) = P(B) − P(A n B)
 If A subset of B then P(A) ≤ P(B).

2/22/2023
Ashebir Feyisa
 The classical definition of probability
If an experiment can result in any one of N equally
likely and mutually exclusive outcomes, and if n of
these outcomes constitute the event A, then the
probability of event A is

2/22/2023
Ashebir Feyisa
 Consider the experiment of tossing a fair die. A fair
die means that all six numbers are equally likely to
appear. Calculate the probabilities of the following
events:
 A=One will occur ={1}
 B=Even number will occur ={2, 4, 6}
 C=Odd number will occur ={1, 3, 5}
 D=A number less than 3 will occur ={1,2}

2/22/2023
Ashebir Feyisa
 Example 5.15: Suppose that we toss two coins, and
assume that each of the four outcomes in the sample
space S = {(H,H),(H, T ), (T ,H), (T , T )} are
equally likely and hence has probability ¼. Let A =
{(H, H),(H, T )} and B = {(H,H), (T ,H)} that is, A is
the event that the first coin falls heads, and B is the
event that the second coin falls heads. Then,
calculate the probabilities of A, B, Ac, Bc, and Sc.
The event that none of the outcomes will occur is the
same as Sc.

2/22/2023
Ashebir Feyisa
 Example 5.16: From a group of 5 men and 7
women, it is required to form a committee of 5
persons. If the selection is made randomly, then
I. What is the probability that 2 men and 3 women
will be in the committee?
II. What is the probability that all members of the
committee will be men?
III. What is the probability that at least three
members will be women?

2/22/2023
Ashebir Feyisa
 Relative Frequency Definition of probability
 If an experiment is repeated a large number, n, of
times and the event A is observed nA times, the
probability of A is P(A) ≈ nA/n.
 The above definition of probability is based on
empirical data accumulated through time or based
on observations made from repeated experiments
for a large number of times.

Some probability rules
2/22/2023
Ashebir Feyisa
 If A and B , then P(A u B) = P(A) + P(B) − P(A n B).
 Example 5.17: Consider the experiment of tossing a
fair die. Let
 A = Even number occurring = {2,4,6}
 B = A number greater than 2 occurring ={3, 4, 5, 6}
 C = Odd number occurring ={1, 3, 5}
i. What is the probability that A and B will occur?
ii. What is the probability that A or B will occur?

2/22/2023
Ashebir Feyisa
 Solution: We use the concept of set theory to help
us solve probability questions very easily and vein
diagrams are useful tools to depict the relations
between events within the sample space. The
shaded region on Fig 1. shows the event that both A
and B will occur.
 A and B ≡ AnB ={4,6} Thus P(AnB)=2/6.
 A or B ≡ AUB ={2,3,4,5,6} AnB={4,6} Hence,

2/22/2023
Ashebir Feyisa
 Example 5.18: Sixty percent of the families in a certain
community own their own car, thirty percent own their own
home, and twenty percent own both their own car and their
own home. If a family is randomly chosen,
a) what is the probability that this family do not have a car?
b) what is the probability that this family owns a car or a
house?
c) what is the probability that this family owns a car or a
house but not both?
d) what is the probability that this family owns only a house?
e) what is the probability that this family neither owns a car
nor a house?

2/22/2023
Ashebir Feyisa
 Solution: Let A represents that the family owns a
car and B represents that the family owns a house.
Given information: P(A)=0.6,P(B)=0.3, and
P(AnB)=0.2.
a) Required: P(Ac) = ? P(Ac)=1-P(A) = 1-0.6 = 0.4
b) Required: P(AUB) = ? P(AUB) = P(A)+P(B)-P(AnB)
= 0.6+0.3-0.2 = 0.7
c) Required: P((AnBc)U(AcnB)) = ? P((AnBc)U(AcnB)) =
P(AnBc)+P(AcnB) = [P(A)-P(AnB)]+[P(B)-P(AnB)] =
[0.6-0.2]+[0.3-0.2]=0.5

2/22/2023
Ashebir Feyisa
d) Required: P(AcnB) =? P(AcnB) = P(B)-P(AnB) = 0.3-
0.2 = 0.1
e) Required: P(AcnBc) = ? P(AcnBc) = P((AUB)c) = 1-
P(AUB) = 1-0.7 = 0.3

2/22/2023
Ashebir Feyisa
 We can represent various events by an informative
diagram called vein diagram. If properly and
correctly drawn, a vein diagram helps to calculate
probabilities of events easily. The figure below
shows various events represented by shaded
regions. Note that the rectangle in each figure
represents the sample space.

Conditional probability and independence
2/22/2023
Ashebir Feyisa
 Conditional Probability
 Conditional probability provides us with a way to
reason about the outcome of an experiment, based
on partial information. Here are some examples of
situations we may have in our mind:
(a) What is the probability that a person will be HIV-
Positive given he has tuberculosis?
(d) A spot shows up on a radar screen. How likely is it
that it corresponds to an aircraft?

2/22/2023
Ashebir Feyisa
 If P(B) > 0, the conditional probability of A given B,
denoted by P(A|B), is

2/22/2023
Ashebir Feyisa
 Example 5.19: Suppose cards numbered one
through ten are placed in a hat, mixed up, and then
one of the cards is drawn at random. If we are told
that the number on the drawn card is at least five,
then what is the conditional probability that it is ten?
 Solution: Let A denote the event that the number on
the drawn card is ten, and B be the event that it is
at least five. The desired probability is P(A|B).

2/22/2023
Ashebir Feyisa
 Example 5.20: A family has two children. What is
the conditional probability that both are boys given
that at least one of them is a boy? Assume that the
sample space S is given by S = {(b, b), (b, g), (g,
b), (g, g)}, and all outcomes are equally likely. (b,
g) means, for instance, that the older child is a boy
and the younger child is a girl.

2/22/2023
Ashebir Feyisa
 Solution: Letting A denote the event that both
children are boys, and B the event that at least one
of them is a boy, then the desired probability is
given by

2/22/2023
Ashebir Feyisa
 Law of Multiplication
 The defining equation for conditional probability may
also be written as:
 P(AnB) = P(B) P(A|B)
 This formula is useful when the information given to us in
a problem is P(B) and P(A|B) and we are asked to find
P(AnB). An example illustrates the use of this formula.
Suppose that 5 good fuses and two defective ones
have been mixed up. To find the defective fuses, we test
them one-by-one, at random and without replacement.
What is the probability that we are lucky and find both
of the defective fuses in the first two tests?

2/22/2023
Ashebir Feyisa
 Example 5.21: Suppose an urn contains seven black
balls and five white balls. We draw two balls from the
urn without replacement. Assuming that each ball in the
urn is equally likely to be drawn, what is the
probability that both drawn balls are black?
 Solution: Let A and B denote, respectively, the events
that the first and second balls drawn are black. Now,
given that the first ball selected is black, there are six
remaining black balls and five white balls, and so
P(B|A) = 6/11. As P(A) is clearly 7/12 , our desired
probability is

Independence
2/22/2023
Ashebir Feyisa
We have introduced the conditional probability
P(A|B) to capture the partial information that event B
provides about event A. An interesting and important
special case arises when the occurrence of B provides
no information and does not alter the probability that
A has occurred, i.e., P(A|B) = P(A). When the above
equality holds, we say that A is independent of B.
Note that by the definition P(A|B) = P(A ∩ B)/P(B),
this is equivalent to P(A ∩ B) = P(A)P(B).

2/22/2023
Ashebir Feyisa
 Independence
Two events A and B are said to independent if P(A ∩
B) = P(A)P(B). If in addition, P(B) > 0, independence is
equivalent to the condition P(A|B) = P(A).

2/22/2023
Ashebir Feyisa
PROBABILITY DISTRIBUTIONS

Definition of random variables and probability distributions
2/22/2023
Ashebir Feyisa
 Given an experiment and the corresponding set of
possible outcomes (the sample space), a random
variable associates a particular number with each
outcome. Mathematically, a random variable is a
real-valued function of the experimental outcome.
The following are some examples of random
variables:

2/22/2023
Ashebir Feyisa
 (a) In an experiment involving a sequence of 5 tosses of
a coin, the number of heads in the sequence is a
random variable.
 (b) In an experiment involving two rolls of a die, the
following are examples of random variables: (1) The
sum of the two rolls, (2) The number of sixes in the two
rolls.
 (c) In an experiment involving the transmission of a
message, the time needed to transmit the message, the
number of symbols received in error, and the delay with
which the message is received are all random variables.

2/22/2023
Ashebir Feyisa
 Notation: We will use capital letters to denote random
variables, and lower case characters to denote real numbers
such as the numerical values of a random variable.
 Types of random variables: Generally, two types of
random variables exist: discrete and continuous. A random
variable is called discrete if its range (the set of values that
it can take) is finite or at most countably infinite. For
instance, the number of children in a family, number of car
accidents within given period of time in a certain locality,
the number of bacteria in a cubic mm of agar, etc.
 If random variable assumes any numerical value in an
interval or collection of intervals, then it is called a
continuous random variable.

2/22/2023
Ashebir Feyisa
 Examples include body weight of new born baby,
life time of a human being, height of a person, etc.
 The most important way to characterize a random
variable is through the probabilities of the values
that it can take. For a discrete random variable X,
these are captured by the probability mass function
(p.m.f. for short) of X, denoted PX(x). For a
continuous random variable X it is done by the
probability density function (p.d.f.), denoted fX(x).

2/22/2023
Ashebir Feyisa
 Example 6.1: Consider an experiment of tossing two
fair coins. Letting X denote the number of heads
appearing on the top face, then X is a random variable
taking on one of the values 0, 1, 2 . The random
variable X assigns a 0 value for the outcome (T,T), 1 for
outcomes (T ,H) and (H, T ), and 2 for the outcome (H,H).
Thus, we can calculate the probability that X can take
specific value/s as follows:
 P(X = 0) = P({(T , T )}) = ¼
 P(X = 1) = P({(T ,H),(H, T )}) = 2/4,
 P(X = 2) = P({(H,H)}) = ¼

2/22/2023
Ashebir Feyisa
 Figure: P (a≤ X ≤ b) is the shaded region

Introduction to expectation: mean and variance
2/22/2023
Ashebir Feyisa
 We can associate with each random variable
certain “averages” of interest, such as mean
and variance which give useful summary of a
probability distribution.

Variance
2/22/2023
Ashebir Feyisa

2/22/2023
Ashebir Feyisa
 The variance provides a measure of dispersion of X
around its mean. Another measure of dispersion is
the standard deviation of X, which is defined as the
square root of the variance and is denoted by σ.

Common discrete probability distributions –
binomial and Poisson
2/22/2023
Ashebir Feyisa
 The Binomial distribution
 Many real problems (experiments) have two
possible outcomes, for instance, a person may be
HIV-Positive or HIV-Negative, a seed may
germinate or not, the sex of a new born bay may
be a girl or a boy, etc. Technically, the two
outcomes are called Success and Failure.
 Experiments or trials whose outcomes can be
classified as either a “success” or as a “failure” are
called Bernoulli trails.

2/22/2023
Ashebir Feyisa
 Suppose that n independent trials, each of which
results in a “success” with probability p and in a
“failure” with probability 1 − p, are to be
performed. If X represents the number of successes
that occur in the n trials, then X is said to have
binomial distribution with parameters n and p. The
probability mass function of a binomial distribution
with parameters n and p is given by

2/22/2023
Ashebir Feyisa
 The mean and variance of the binomial distribution
are np and np(1-p), respectively. Note that the
binomial distributions are used to model situations
where there are just two possible outcomes, success
and failure. The following conditions also have to be
satisfied.
I. There must be a fixed number of trials called n
II. The probability of success (called p) must be the
same for each trial.
III. The trials must be independent

2/22/2023
Ashebir Feyisa
 Example 6.3: A fair coin is flipped 4 times. Let X
be the number of heads appearing out of the four
trials. Calculate the following probabilities:
I. 2 heads will appear
II. No head will appear
III. At least two heads will appear
IV. Less than two heads will appear
V. At most heads 2 will appear

2/22/2023
Ashebir Feyisa
 Solution: We can consider that the outcomes of
each trial are independent to each other. In
addition the probability that a head will appear in
each trial is the same. Thus, X has a binomial
distribution with number of trials 4 and probability
of success (the occurrence of head in a trial) is ½.
The probability mass function of X is given by

2/22/2023
Ashebir Feyisa
 Example 6.5: Suppose it is known that the
probability of recovery for a certain disease is 0.4.
If random sample of 10 people who are stricken
with the disease are selected, what is the
probability that:
(a) exactly 5 of them will recover?
(b) at most 9 of them will recover?

2/22/2023
Ashebir Feyisa
 Solution: Let X be the number of persons will
recover from the disease. We can assume that the
selection process will not affect the probability of
success (0.4) for each trial by assuming a large
diseased population size. Hence, X will have a
binomial distribution with number of trials equal to
10 and probability of success equal 0.4.

The Poisson Random Variable
2/22/2023
Ashebir Feyisa
 A random variable X, taking on one of the values 0,
1, 2, . . . , is said to have a Poisson distribution if its
probability mass function is given by

2/22/2023
Ashebir Feyisa
 λ is the parameter of this distribution. The mean and
variance of the poisson distribution are equal and
their values are equal to λ. Note that poisson
distributions is used to model situations where the
random variable X is the number of occurrences of
a particular event over a given period of time (or
space).

2/22/2023
Ashebir Feyisa
 Together with this , the following conditions must also be
fulfilled: events are independent of each other, events
occur singly, and events occur at a constant rate (in
other words for a given time interval the mean number
of occurrences is proportional to the length of the
interval).
 The poisson distribution is used as a distribution of rare
events such as telephone calls made to a switch board
in a given minute, number of misprints per page in a
book, road accidents on a particular motor way in one
day, etc.
 The process that give rise to such events are called
poisson processes.

2/22/2023
Ashebir Feyisa
 Example 6.6: Suppose that the number of
typographical errors on a single page of this
lecture note has a Poisson distribution with
parameter λ = 1. if we randomly select a
page in this lecture note, calculate the
probability that
a) no error will occur.
b) exactly three errors will occur.
c) less than 2 errors will occur.
d) there is at least one error.

2/22/2023
Ashebir Feyisa
 Example 6.7: If the number of accidents occurring
on a highway each day is a Poisson random
variable with parameter λ = 3, what is the
probability that no accidents will occur on a
randomly selected day in the future?

2/22/2023
Ashebir Feyisa
 Note: The Poisson random variable has a wide
range of applications in a diverse number of areas.
An important property of the Poisson random
variable is that it may be used to approximate a
binomial random variable when the binomial
parameter n is large and p is small. The probability
that X will be k can be approximated by
substituting λ by np in the poisson distribution, i.e.

Common examples of continous
probability distribution
2/22/2023
Ashebir Feyisa
 Normal distribution
 Student’s T distribution
 F distribution

Normal distribution
2/22/2023
Ashebir Feyisa
 The normal distribution plays an important role in
statistical inference because many real-life
distributions are approximately normal;
 many other distributions can be almost normalized
by appropriate data transformations (e.g., taking
the log) and as a sample size increases, the means
of samples drawn from a population of any
distribution will approach the normal distribution.

2/22/2023
Ashebir Feyisa
 A continuous random variable X is said to follow
normal distribution , if and only if , its probability
density function (p.d.f.) is:
2
)
(
2
1
2
1
)
( 



−
−
=
x
X e
x
f
wherex(-∞,∞),μ (-∞,∞) andσ (0,∞)

2/22/2023
Ashebir Feyisa
 There are infinitely many normal distributions since
different values of μ and σ define different
normal distributions. For instance, when μ= 0 and σ
=1 , the above density will have the following form
2
2
1
2
1
)
(
z
Z e
z
f
−
=


2/22/2023
Ashebir Feyisa
 This particular distribution is called the standard
normal distribution and sometimes known as Z-
distribution. The random variable corresponding to
this distribution is usually denoted by Z. If X has a
normal distribution with mean μ and variance σ2,
we denote it as ( )
2
,
~ 

N
X

Properties of normal distribution
2/22/2023
Ashebir Feyisa
 The normal distribution curve is a bell shaped,
symmetrical about μ and mesokurtic. The p.d.f.
attains its maximum value at x= μ.
 Since for x= μ divides the area under the normal
curve into two equal parts, μ is the mean, the
median and the mode of the distribution.
 The mean and variance of the normal distribution
are μ, and σ2, respectively.

Properties of normal distribution cont….
2/22/2023
Ashebir Feyisa
 The total area under the curve and bounded from
below by the horizontal axis is 1, i.e
Figure: The shaded area under the normal curve is one
1
)
( =



−
dx
x
fX

2/22/2023
Ashebir Feyisa
 Since a normal distribution is a continuous
probability distribution, the probability that X lies
between a and b is the area bounded under the
curve, from left to right by the vertical lines x = a
and x = b and below by the horizontal axis.
Figure: P(a<X<b) equals the shaded region

2/22/2023
Ashebir Feyisa
 However, evaluating
is very complicated.
 To facilitate this problem, we use the standard
normal table which gives area values bounded by
two points.
 Areas under the standard normal distribution curve
are tabulated in various ways. The most common
tables give areas bounded between Z=0 and a
positive value of Z.

=


b
a
X dx
x
f
b
X
a
P )
(
)
(

2/22/2023
Ashebir Feyisa
 In addition to the standard normal table, the
properties of normal distribution and the following
theorem are useful to make probability calculations
very easy for any normal distribution.

Standardization of a normal random variable
2/22/2023
Ashebir Feyisa
 If X has a normal distribution with mean, μ and
standard deviation ,σ , then
will have a standard normal distribution.


−
=
X
Z
)
(
)
(
)
(










−


−
=
−

−

−
=


b
Z
a
P
b
X
a
P
b
X
a
P

2/22/2023
Ashebir Feyisa
Let Z be the standard normal random variable. Calculate
the following probabilities using the standard normal
distribution table:
 a) P(0<Z<1.2)
 b) P(0<Z<1.43)
 c) P(Z≤0)
 d) P(-1.2<Z<0)
 e) P(Z≤-1.43)
 f) P(-1.43≤Z<1.2)
 g) P(Z≥1.52)
 h) P(Z≥-1.52)

2/22/2023
Ashebir Feyisa
 Solution:
The probability that Z lies between 0 and 1.2 can be
directly found from the standard normal table as follows:
look for the value 1.2 from z column ( first column) and
then move horizontally until you find the value of 0.00 in
the first row. The point of intersection made by the
horizontal and vertical movements will give the desired
area (probability). Hence P(0<Z<1.2)= 0.3849. Refer
the table below as a guide to find this probability.
standard normal table.docx T test and F test.docx

2/22/2023
Ashebir Feyisa
Figure: P(0<Z<1.2) is the shaded area

2/22/2023
Ashebir Feyisa
 In a similar way P(0<Z<1.43)= 0.4236.
 We know that the normal distribution is symmetric
about its mean. Hence the area to the left of 0 and
the to the right of zero are 0.5 each. Therefore
P(Z≤0)=P(Z≥0)=0.5

2/22/2023
Ashebir Feyisa
d) P(-1.2<Z<0)=P(0<Z<1.2)= 0.3849 due to
symmetry
e) P(Z<-1.43)= 1- P(Z ≥ -1.43) Using the probability
of the complement event.
= 1-[P(-1.43<Z<0)+P(Z≥0)] Since a region can be
broken down
=1-[P(0<Z<1.43)+P(Z ≥0)] into non overlapping
regions.
=1-[0.4236 + 0.5]
=1-0.9236=0.0764

2/22/2023
Ashebir Feyisa
F) P(-1.43≤Z<1.2) = P(-1.43≤Z<0) +
P(0≤Z<1.2)=P(0<Z≤1.43) + 0.3849= 0.4236 +
0.3849 =0.8085
Figure: P(-1.43≤Z<1.2) is the shaded region

2/22/2023
Ashebir Feyisa
G) P(Z≥1.52) = 0.5 – P(0≤ Z<1.52)=0.5 –
0.4357=0.0643
Figure: P(Z≥1.52) is the shaded region

2/22/2023
Ashebir Feyisa
h) P(Z≥-1.52) = P(-1.52≤Z<0) + P(Z ≥0 )= P(0 <
Z≤1.52) + 0.5 =0.4357 +0.5=0.9357

2/22/2023
Ashebir Feyisa
 Example: Find the following values of z* of a
standard normal random variable based on the given
probability values:
 P(Z > z*) =0.1446
 P(Z>z*) = 0.8554
 Solution: We need to find specific values of Z given
some probability values.
 If the probability that Z>z* is 0.1446 implies that z* is
to the right of zero because
P(Z>0) = 0.5 is greater than P(Z>z*).

2/22/2023
Ashebir Feyisa
 P(Z > z*) = 0.1446 implies that P(0<Z≤z*) = 0.5 -0.1446=0.3554.
Hence we can look for the value of z* satisfying the above condition
form the standard normal table. Thus z* =1.06

2/22/2023
Ashebir Feyisa
 If the probability that Z>z* is 0.8554 implies that
z* is to the left of zero because P(Z>0) = 0.5 is
less than P(Z>z*). It implies that z* is a negative
number.

2/22/2023
Ashebir Feyisa
 P(Z>z*) = 0.8554 = P(z*≤ Z <0) + P( Z ≥ 0) = P(0
≤ Z ≤ - z*) + 0.5
 Implies P(0 ≤ Z ≤ - z*) = 0.8554 – 0.5=0.3554.
Hence the value –z* form the table satisfying the
above condition is 1.06. Therefore z* = -1.06.

2/22/2023
Ashebir Feyisa
 Example: If the total cholesterol values for a
certain target population are approximately
normally distributed with a mean of 200 (mg/100
ml) and a standard deviation of 20 (mg/100 ml),
calculate the probability that a person picked at
random from this population will have a
cholesterol value
a. greater than 240 (mg/100 ml)
b. between 180 and 220(mg/100 ml)
c. less 200 (mg/100 ml)

2/22/2023
Ashebir Feyisa
 Assume that the test scores for a large class are
normally distributed with a mean of 74 and a
standard deviation of 10.
a. Suppose that you receive a score of 88. What
percent of the class received scores higher than
yours?
b. Suppose that the teacher wants to limit the number
of A grades in the class to no more than 20%.
What would be the lowest score for an A?

2/22/2023
Ashebir Feyisa
 SAMPLING AND SAMPLING DISTRIBUTION OF
SAMPLE MEAN

Objectives:
2/22/2023
Ashebir Feyisa
 After a successful completion of this unit, students
will be able to:
 Differentiate the two major sampling techniques:
probabilistic and non-probabilistic
 Apply simple random sampling technique to select
sample
 Define sampling distribution of the sample mean

Methods of sampling
2/22/2023
Ashebir Feyisa
 Definition of some basic terms
 Sampling: is the technique of selecting
representative sample from the whole.
 Population: is the totality of elements or units
under study.
 Sample: is the part of the population.
 Sampling Frame: A complete list of all the units of
the population is called the sampling frame.

2/22/2023
Ashebir Feyisa
 A unit of population is a relative term. If all the
workers in a factory make a population, then a
worker is a unit of the population. If all the factories
in a country are being studied for some purpose,
then a factory is a unit of the population of
factories. The frame provides a base for the
selection of a sample.

Major reasons to use sampling
2/22/2023
Ashebir Feyisa
 Saves Time and Cost: As the size of the sample is small as
compared to the population, the time and cost involved on
sample study are much less than the complete counts. Hence a
sample study requires less time and cost.
 To prevent destruction: The destructive nature of some
experiments (or inspection) do not allow to carryout complete
enumeration, for instance, to check quality of beers, to study
the efficacy of new drugs, testing the life length of a bulb, e t
c.
 Sample survey provides higher level of accuracy: This
accuracy can be achieved through more selective recruiting of
interviewers and supervisors, more extensive training
programs, a closer supervision of the personnel involved and
a more efficient monitoring of the field work.

2/22/2023
Ashebir Feyisa
 Types of sampling
 Generally, two types of sampling methods exist:
I. probability and
II. non-probability sampling.

Probability Sampling
2/22/2023
Ashebir Feyisa
 The term probability sampling (or random sampling) is
used when the selection of the sample is purely based
on chance.
 There is no subjective bias in the selection of units. Every
unit of the population has a known nonzero probability
to be in the sample.
 The following are some of the random sampling
methods: Simple random sampling, Stratified random
sampling, Cluster sampling, Systematic random
sampling.

Simple random sampling
2/22/2023
Ashebir Feyisa
 Simple random sampling is a method of selecting a
sample from a population in such a way that every
unit of the population is given an equal chance of
being selected.
 In practice, you can draw a simple random sample
of elements using either the 'lottery method' or
'tables of random numbers'.

Cont…
2/22/2023
Ashebir Feyisa
 For example, you may use the lottery method to
draw a random sample by using a set of 'N' tickets,
with numbers ' 1 to N' if there are 'N' units in the
population. After shuffling the tickets thoroughly, the
sample of a required size, say n, is selected by
picking the required n number of tickets.

Cont…
2/22/2023
Ashebir Feyisa
 The best method of drawing a simple random
sample is to use a table of random numbers. After
assigning consecutive numbers to the units of
population, the researcher starts at any point on the
table of random numbers and reads the consecutive
numbers in any direction horizontally, vertically or
diagonally. If the read out numbers corresponds
with the one written on a unit card, then that unit is
chosen for the sample.

2/22/2023
Ashebir Feyisa
 Suppose that a sample of 6 study centers is to be
selected at random from a serially numbered
population of 60 study centers.
 The following table is portion of a random numbers
table used to select a sample.

2/22/2023
Ashebir Feyisa
Row>
Column∀
1 2 3 4 5 …… N
1 2315 7548 5901 8372 5993 ….. 6744
2 0554 5550 4310 5374 3508 ….. 1343
3 1487 1603 5032 4043 6223 ….. 0834
4 3897 6749 5094 0517 5853 ….. 1695
5 9731 2617 1899 7553 0870 ….. 0510
6 1174 2693 8144 3393 0862 ….. 6850
7 4336 1288 5911 0164 5623 ….. 4036
8 9380 6204 7833 2680 4491 ….. 2571
9 4954 0131 8108 4298 4187 ….. 9527
10 3676 8726 3337 9482 1569 ….. 3880
11 ….. ….. ….. ….. ….. ….. …..
12 ….. ….. ….. ….. ….. ….. …..
13 ….. ….. ….. ….. ….. ….. …..
14 ….. ….. ….. ….. ….. ….. …..
15 ….. ….. ….. ….. ….. ….. …..
N 3914 5218 3587 4855 4888 ….. 8042

2/22/2023
Ashebir Feyisa
 If you start in the first row and first column, centers
numbered 23, 05, 14,…, will be selected. However,
centers numbered above the population size (60)
will not be included in the sample. In addition, if
any number is repeated in the table, it may be
substituted by the next number from the same
column.

2/22/2023
Ashebir Feyisa
 1, the number to start with is 83. In this way you can
select first 6 numbers from this column starting with
83.
 The sample, then, is as follows:
83 75
53 33
40 01
05 26
 Hence, the study centers numbered 53, 40, 05, 33,
01 and 26 will be in the sample.

2/22/2023
Ashebir Feyisa
 Simple random sampling ensures the best results.
However, from a practical point of view, a list of all
the units of a population is not possible to obtain.
 Even if it is possible, it may involve a very high cost
which a researcher or an organization may not be
able to afford. In addition, it may result an
unrepresentative sample by chance.

Stratified sampling
2/22/2023
Ashebir Feyisa
 Stratified random sampling takes into account the
stratification of the main population into a number of
sub-populations, each of which is homogeneous with
respect to one or more characteristic(s).
 Having ensured this stratification, it provides for
selecting randomly the required number of units from
each sub-population.
 The selection of a sample from each subpopulation may
be done using simple random sampling. It is useful in
providing more accurate results than simple random
sampling.

Systematic sampling
2/22/2023
Ashebir Feyisa
 In this method, samples are selected at equal
intervals from the listings of the elements.
 This method provides a sample as good as a simple
random sample and is comparatively easier to
draw a sample.
 For instance, to study the average monthly
expenditure of households in a city, you may
randomly select every fourth households from the
household listings

Cluster sampling
2/22/2023
Ashebir Feyisa
 Cluster sampling is used when sampling frame is difficult to
construct or using other sampling techniques (simple random
sampling) is not feasible or costly.
 For instance, when the geographic distribution of units is
scattered it is difficult to apply simple random sampling.
 It involves division of the population of elementary units
into groups or clusters that serve as primary sampling units.
 A selection of the clusters is then made to form the sample.
 The precision of estimates made based on samples taken
using this method is relatively low.

Non-probabilily sampling techniques
2/22/2023
Ashebir Feyisa
 In non-probability sampling, the sample is not
based on chance.
 It is rather determined by personal judgment. This
method is cost effective; however, we cannot make
objective statistical inferences.
 Depending on the technique used, non-probability
samples are classified into quota, judgment or
purposive and convenience samples.

Sampling and non-sampling errors
2/22/2023
Ashebir Feyisa
 Sampling error is the difference between the value of a
sample statistic and the value of the corresponding
population parameter.
 On the other hand, non-sampling error is an error that
occurs in the collection, recording and tabulation of data.
 Sampling error can be minimized by using appropriate
sampling methods and/or increasing the sample size.
 The non-sampling error is likely to increase with increase
in sample size.

Sampling distribution of the sample mean ഥ
𝒙
2/22/2023
Ashebir Feyisa
 The value of the sample mean for any sample
will depend on the elements included in that
sample.
 Consequently, the sample mean is a random
variable.
 Therefore, like other random variable, the
sample means possess a probability
distribution which is more commonly called the
sampling distribution of sample mean.

2/22/2023
Ashebir Feyisa
 In general, the probability distribution of a sample
statistic is called its sampling distribution.
 Sampling distribution is important in statistical
inference.
 The important characteristics of the sampling
distribution of the sample mean are its mean,
variance and the form of the distribution.

2/22/2023
Ashebir Feyisa
 Example: Suppose we have a hypothetical population of
size 3, consisting of three children namely: A is 3 years old, B
is 6 years old and C is 9 years old. Construct sampling
distribution of the sample mean of size 2 using sampling
without replacement and with replacement.
 Solution: The mean and variance of the population are
6 and 6, respectively.
 If sampling is without replacement we will have 3C2 = 3
possible samples: (A, B), (A, C) and (B, C) and their
corresponding sample means are (3+6)/2 = 4.5, 6 and 7.5,
respectively. Hence the probability distribution (sampling
distribution) of the sample mean is:

2/22/2023
Ashebir Feyisa
 Note:
 The mean of the sampling distribution of the sample
mean is the same as the population mean
irrespective of the sampling procedure.
 The variance of the sampling distribution of the
sample mean is:

2/22/2023
Ashebir Feyisa
 The problem with using sample mean to make
inferences about the population mean is that the sample
mean will probably differ from the population mean.
 This error is measured by the variance of the sampling
distribution of the sample mean and is known as the
standard error.
 The standard error is the average amount of sampling
error found because of taking a sample rather than the
whole population.
 As sample size increases, the standard error decreases.

2/22/2023
Ashebir Feyisa
 REGRESSION METHODS AND CORRELATION

Introduction
2/22/2023
Ashebir Feyisa
 The statistical methods discussed so far are used to
analyze the data involving only one variable.
 Often an analysis of data concerning two or more
variables is needed to look for any statistical
relationship or association between them.
 Thus, regression and correlation analysis are helpful
in ascertaining the probable form of the
relationship between variables and the strength of
the relationship.

Simple linear regression analysis
2/22/2023
Ashebir Feyisa
 Regression analysis is the statistical method that
helps to formulate a functional relationship between
two or more variables.
 It can be used for assessment of association,
estimation and prediction.
 For instance one might be interested to formulate a
statistical model to relate the height of fathers and
their sons, blood pressure and age, fertilizer amount
and yield, etc.

2/22/2023
Ashebir Feyisa
 A simple model to relate dependent (response)
variable Y and with only one predictor variable X is
to consider a linear relationship.
 The first step in regression analysis involving two
variables is to construct a scatter plot (diagram) of
the observed data.
 Scatter diagram is a plot of all ordered pairs
(Xi,Yi) on the coordinate plane which is helpful for
determining an apparent relationship between two
variables.

2/22/2023
Ashebir Feyisa
 The simple linear regression of Y on X can be
expressed with respect to the population parameters 
and  as

 where = y-intercept that represents the mean value
of the dependent variable Y when the independent
variable X is zero; = slope of the regression line that
represents the change in the mean of for a unit change
in the value of ; = error term


 +
+
= X
Y




2/22/2023
Ashebir Feyisa
 The population parameters  and  can be
estimated from sample data using the least square
technique. The estimators of  and  are usually
denoted by a and b, respectively.
 The resulting regression line is:
and the equation is known as the fitted regression
line.
The estimated values of y are denoted by . The
observed values of are denoted by y

Y

The covariance and the correlation coefficient
2/22/2023
Ashebir Feyisa
 Correlation coefficient measures the degree of
linear relationship between two variables. The
population correlation coefficient is represented by
 and its estimator is r.
 For a set of n pairs of sample values X and Y,
Pearson’s correlation coefficient is calculated as the
ratio of the covariance of the variables X and Y to
the product of the standard deviations of X and Y.

Properties of Pearson’s correlation coefficient r,
2/22/2023
Ashebir Feyisa

2/22/2023
Ashebir Feyisa
chapter 8
 ESTIMATION AND HYPOTHESIS TESTING

2/22/2023
Ashebir Feyisa
Objectives:
Having studied this unit, you should be able to
 construct and interpret confidence interval estimates
 formulate hypothesis about a population mean
 determine an appropriate sample size for
estimation

Introduction
2/22/2023
Ashebir Feyisa
 We now assume that we have collected, organized
and summarized a random sample of data and are
trying to use that sample to estimate a population
parameter.
 Statistical inference is a procedure whereby
inferences about a population are made on the
basis of the results obtained from a sample.

Introduction
2/22/2023
Ashebir Feyisa
 Statistical inference can be divided in to two main
areas: estimation and hypothesis testing.
 Estimation is concerned with estimating the values of
specific population parameters;
 hypothesis testing is concerned with testing whether
the value of a population parameter is equal to
some specific value.

Point and interval estimation of the mean
2/22/2023
Ashebir Feyisa

2/22/2023
Ashebir Feyisa
 From the standard normal distribution, we know that

Hypothesis Testing about the Mean
2/22/2023
Ashebir Feyisa
 In many circumstances we merely wish to know
whether a certain proposition is true or false.
 Different people can form different opinions
by looking at data, but a hypothesis test
provides a standardized decision-making
process that will be consistent for all people.

2/22/2023
Ashebir Feyisa
 Statistical hypothesis: is a claim (belief or assumption)
about an unknown population parameter values.
 Examples of hypothesis:
 There is association between lung cancer and number of
cigarettes an individual smokes.
 The proportion of female students in AASTU is 0.35.
 In sub-Saharan Africa 40% of individuals are leaving below
poverty line.
 Hypothesis testing: is the procedure that enables decision-
makers to draw inferences about population characteristics
by analyzing the difference between the value of sample
statistic and the corresponding hypothesized parameter
value.

2/22/2023
Ashebir Feyisa
General procedure for hypothesis testing
 To test the validity of the claim or assumption about
the population parameter, sample is drawn from the
population and analyzed.
 The result of the analysis are used to decide
whether the claim is valid or not.

2/22/2023
Ashebir Feyisa
Step 1: State the null hypothesis ( 0
H ) and alternative hypothesis ( 1
H )
Null hypothesis ( 0
H ): refers to a hypothesized numerical value of the population parameter which is initially
assumed to be true. The null hypothesis is always expressed in the form of an equation making a claim
regarding the specific value of the population parameter. That is, for example
0
0 : 
 =
H
where 0
 is hypothesized value of the population mean.
Alternative hypothesis ( 1
H ): is the logical opposite of the null hypothesis. The alternative hypothesis states
that specific population parameter value is not equal to the value stated in the null hypothesis. For example,
0
1 : 
 
H (Two-sided test)
0
1
0
1 :
: 


 
 H
or
H (One-sided test)

2/22/2023
Ashebir Feyisa
Step 2: State the level of significance (alpha) for
the test
 The level of significance is the probability to
wrongly reject the null hypothesis when it is actually
true. It is specified by the statistician or the
researcher before the sample is drawn. The most
commonly used values are 0.10, 0.50 or 0.01.

2/22/2023
Ashebir Feyisa
 Step 4: Establish a decision rule (critical or rejection
region)
 The cut-off point to reject or not reject depends on the
level of significance , the type of test statistic chosen
and the form of the alternative hypothesis. If the value
of the test statistic falls in the rejection region, the null
hypothesis is rejected, otherwise we do not reject (see
the next fig).
 The value of the sample statistic that separates the
regions of acceptance and rejection is called critical
value. For a specified , we read the critical values from
the Z or t tables, depending on the test statistic chosen.

2/22/2023
Ashebir Feyisa
 Based on the form of the alternative hypothesis and
the test statistic we can make the following
decisions:

2/22/2023
Ashebir Feyisa
 Step 5: Interpret the result.
 Errors in Hypothesis Testing
 Ideally the hypothesis testing procedure should lead
to the rejection of the null hypothesis when it is false
and non rejection of when it is true.
 However, the correct decision is not always
possible. Since the decision to reject or do not reject
a hypothesis is based on sample data, there is a
possibility of committing an incorrect decision or
error.

2/22/2023
Ashebir Feyisa
 Hence, a decision-maker may commit one of the two
types of errors while testing a null hypothesis. These
errors are summarized as follows:

2/22/2023
Ashebir Feyisa
 Type I error is committed if we reject the null hypothesis
when it is true. The probability of committing a type I error,
denoted by is called the level of significance.
 The probability level of this error is decided by the decision-
maker before the hypothesis test is performed. Type II error
is committed if we do not reject the null hypothesis when it is
false.
 The probability of committing a type II error is denoted by
(Greek letter beta). As type one error increases type two
errors will decrease (they are inversely proportional).
 Hence we cannot reduce both errors simultaneously.
 As the sample size increases both errors will decrease.

2/22/2023
Ashebir Feyisa
 Example 8.3: The life expectancy of people in the
year 1999 in a country is expected to be 50 years.
A survey was conducted in eleven regions of the
country and the data obtained, in years, are given
below:
 Life expectancy (years): 54.2, 50.4, 44.2, 49.7,
55.4, 47.0, 58.2, 56.6, 61.9, 57.5, and 53.4.
 Do the data confirm the expected view? (Assuming
normal population) Use 5% level of significance.

2/22/2023
Ashebir Feyisa
 Example 8.4: Suppose that we want to test the
hypothesis with a significance level of .05 that the
climate has changed since industrialization. Suppose
that the mean temperature throughout history is 50
degrees.
 During the last 40 years, the mean temperature has
been 51 degrees and the population standard
deviation is 2 degrees. What can we conclude?

2/22/2023
Ashebir Feyisa
 Example 8.5: A study was conducted to describe
the menopausal status, menopausal symptoms,
energy expenditure and aerobic fitness of healthy
midwife women and to determine relationship
among these factors. Among the variables
measured was maximum oxygen uptake (Vo2max).
The mean Vo2max score for a sample of 242 women
was 33.3 with a standard deviation of 12.14. On
the basis of these data, can we conclude that the
mean score for a population of such women is
greater than 30? Use 5% level of significance.

Statistics and probability Lecture Notes 5 (Chapters 1–8).pdf

More Related Content

Similar to Statistics and probability Lecture Notes 5 (Chapters 1–8).pdf (20)

More from ruhamadana111 (8)

Recently uploaded (20)

Statistics and probability Lecture Notes 5 (Chapters 1–8).pdf