SlideShare a Scribd company logo
Abdiweli Mohamed Abdi
Biostatistics
Table of contents
Section I
Chapter 1: Introduction to Biostatistics
Chapter 2: Measures of location
Chapter 3; Measures of dispersion
Chapter 4; Collection and organization of data
Chapter 5; Visualization and presentation of data
Chapter 6; Probability and Normal distribution of data
Section II
Chapter 7; Hypothesis and significance testing
Chapter 8; Comparing the significance of two sample and three sample
means (z-test, t-test and ANOVA)
Chapter 9; Association, correlation and regression
Chapter 10: Estimation
Sunday, September 29, 2024 3
Statistics – is a branch of
mathematics used for
collection, analysis and
interpretation of data.
Biostatistics- is a branch of
statistics used for collection,
analysis and interpretation
of biological data.
Chapter One; Introduction to Biostatistics
Sunday, September 29, 2024 4
Statistics
Use in Health Issues
Biostatistics
Use in Agricultural
Sector Agri-statistics
Use in Business Admin
Business Statistics
Use in Industrial
Sector Industrial
Statistics
Use in Insurance
Actuarial Statistics
Use in Economic Sector
Economic Statistics
Sunday, September 29, 2024 5
Biostatistics Types
Inferential
 Measures of location
 measures of central tendency
 measures of other location
 Measures of Dispersion
 range
 variance
 standard deviation
 coefficient of variation
 Estimation
Point Estimation
Interval Estimation
 Hypothesis testing
 z-test  t test
 Anova   2
test
 Correlation  Regression
Descriptive
Sunday, September 29, 2024 6
Describes
characteristics of
data from a sample.
Ex: Mean, standard
deviation, frequency,
and percentage.
Descriptive Statistics
Sunday, September 29, 2024 7
Ex: prevalence of
malaria among a
sample of 150
pregnant women =
40%.
Can we estimate
Prevalence of
malaria among the
population?
Inferential Statistics
Sunday, September 29, 2024 8
Why medicine and
health science
students need to
learn Biostatistics?
To be able to:
 Conduct research
 Identify health
problems
 Monitor and evaluate
health programs
Sunday, September 29, 2024 9
Variables
• Variable is any characteristic that
differs between individuals, Time or
place.
• Example of Variables:
• (1) No. of patients (2) Height
• (3) Sex (4) Educational Level
Sunday, September 29, 2024 10
Types of Statistical variables
1.Quantitative/Numerical Variable: is a
characteristic that can be measured in
numbers.
Examples:
(i) Family Size (ii) No. of patients
(iii) Weight (iv) height (v) Age
Sunday, September 29, 2024 11
Types of Quantitative Variable
a) Discrete Variables: quantitative variables
with no decimals or have gabs b/w numbers
Examples: Family size, Number of patients, No. of
students, parity, gravidity
(b) Continuous Variables: Quantitative variables
with decimals or have no gabs b/w numbers
Examples: Height, weight, income, blood sugar
level, creatinine level.
Sunday, September 29, 2024 12
2. Qualitative/Categorical Variable: is a
characteristic that its values can be divided
into categories. No numbers!
Example:-
Blood type, Nationality, Students Grades,
Educational level, e.t.c.
Sunday, September 29, 2024 13
Discrete
(whole
number)
Qualitative Quantitative
Continuos
(Decimal)
Variable
Type:
Nature:
Sunday, September 29, 2024 14
Scales of Measurements
• Nominal Scale: implies name only no order or rank is
involved. E.g. Sex, blood type, institutional departments,
nationality.
• Ordinal Scale: implies name and order or rank. E.g.
Educational Level, military rank, students’ grades.
• Interval Scale:0 does implies presence of the
characteristics. E.g. Temperature and pH
• Ratio: 0 imply absence of the characteristics
both Interval and Ratio between two numbers – are
meaningful. eg. Height, Weight, age, income
Sunday, September 29, 2024 15
Qualitative Quantitative
Variable
Interval
Ratio
Type:
Scale: Ordinal
Nominal
Sunday, September 29, 2024 16
Variables Types Scales of
measurement
Sunday, September 29, 2024 17
Population
A population is the largest collection of
objects (elements or individuals) in which
we want to draw some conclusions.
Populations may be finite or infinite.
Sunday, September 29, 2024 18
Example: If we are interested to study
the socio-demographic characteristics of
students in a class, then our population
consists of all those students in the
class.
Sunday, September 29, 2024 19
• Population Size (N): The number of
elements in the population is called the
population size and is denoted by N.
• Ex: 100 students in a class.
• Sample: - A sample is a part of a population
from which we collect the data.
• Ex: 30 students out of 100 students in a
class.
Sunday, September 29, 2024 20
Population Sample
Statistic
Parameter
Sunday, September 29, 2024 21
Common statistical symbols
Title Symbol
Sample Mean x
Population mean 
Sample standard deviation s
Population standard deviation 
Sample variance s2
Population variance 2
Summation 
Correlation coefficient r
Coefficient of determination r2
Degree of freedom df
Sunday, September 29, 2024 22
Title Symbol
Chi-square value 2
Sample proportion p
Population proportion ∏
Null hypothesis Ho
Alternative hypothesis H1 or HA
Sample Size n
Type I error  error
Type II error  error
Power of the test 1- 
Sunday, September 29, 2024 23
Chapter Two: Measures of Central tendency and
Measures of Other Location
a single value around the
center of the data used to
represent entire data.
In a word, measures of central
tendency conveys a single
information regarding the
entire data set.
Sunday, September 29, 2024 24
•Measures of central tendency are not
calculated from qualitative/categorical
data
• Measures of Central tendency include
I.Mean (average)
II.Median
III.Mode
Sunday, September 29, 2024 25
Mode
Mean The average
Median The number or
average of the
numbers in the
middle
Mode The number that
occurs most
Sunday, September 29, 2024 26
Mean
Mean is the average of the data set.
There are four types of mean
a.Arthematic mean
b.Harmonic mean
c.Geometric mean
d.Weighted mean
Sunday, September 29, 2024 27
Arithmetic mean
Arithmetic mean is the most
familiar measure of central
tendency as it is termed as
average or mean.
Arithmetic mean uses the
symbol (readed as X-bar)
28
Arithmetic mean formula:
The sum of all observations divided by
the total number of observations.
=
=sum of all observations, n= total
number of observations
Sunday, September 29, 2024 29
Example-1
Suppose the pulse rates for 10 individuals was
recorded as:-
69,70,71,71,72,72,72,75,76,74
Find mean?
solution
= = =
72.2
= 72.2bits/minute
Sunday, September 29, 2024 30
Example-2
The age 12 selected school and university
students were
19,18,14,13,22,25,13,22,12,18,14,16
What is the mean age of the selected
students?
= = =15.58
= 15.58 years
Sunday, September 29, 2024 31
Advantages of mean
a) Easy to compute
b) Takes all data values into account
c) Reliable
d) It can be calculated if any value is zero or
negative.
e) Arranging of data is not necessary.
Disadvantages of mean
a) Highly effected by the extreme value.
b) Can not be calculated for qualitative/categorical
data.
Sunday, September 29, 2024 32
Median
In an ordered array, the median is the
“middle” number.
If n is odd, the median is the middle
number.
If n is even, the median is the average
of the 2 middle numbers
Not Affected by Extreme Values
Sunday, September 29, 2024 33
Procedure to find Median for Raw data
i. Arrange in order
ii. Find middle value
 for odd number : (n+1)/2
 for even number :
1st
middle value= n/2
2nd
middle value = (n/2 +1)
Median = average of the 1st
and 2nd
middle
values
Sunday, September 29, 2024 34
Example-:
Data: 4 3 7 4 6
1. Arranged in ascending order: 3 4 4
6 7
2. Since it is odd, The middle = (n+1/2=
5+1/2) = 3rd
item
The Value in the 3rd
item = 4
 Median = 4.
Sunday, September 29, 2024 35
Example-:
x: 4 3 7 4 6 9
Arranged in ascending order:
x: 3 4 4 6 7 9
1st
middle item = 6/2 = 3rd item
2nd
middle item= 6/2= 3+1= 4th item
The value of 3rd and 4th items are: 4 & 6
Median = av. of 4 & 6 = (4+6)/2 = 5.
Median=5.
Sunday, September 29, 2024 36
Advantages of median
oA) Easy to compute.
oB) Not influenced by extreme
values.
Disadvantages of median
oDifficult to rank large number
of data values.
Sunday, September 29, 2024 37
Mode
• A Measure of Central Tendency
• Value that Occurs Most Often
• Not Affected by Extreme Values
• There May Not be a Mode
• There May be Several Modes
Sunday, September 29, 2024 38
Mode is the Value that Occurs Most
Example-: calculate mode for this data
set
2,3,4,3,4,5,4
Solution
Mode is 4
Sunday, September 29, 2024 39
Advantages
Advantages of Mode
A) Easy to locate and understand.
B) Not influenced by extreme values.
C) Is an actual value of the data.
Disadvantages of Mode
a) Can’t always locate just one mode.
b) It does not depend on all
observations of the data set.
Sunday, September 29, 2024 40
Measures of Other Location
Sunday, September 29, 2024 41
Percentiles
Percentiles are positional measures that are
used to indicate what percent of the data
set have a value less than a specified value
when the data is divided into hundred parts.
Percentiles are not same as percentages.
=r
r: represents given percentile and n for
Sunday, September 29, 2024 42
Deciles
Deciles are an other positional measures that
are used to indicate how much of the data
set have a value less than a specified value
when the data is divided into ten parts.
=r
where r represents given Deciles and n for
sample size
Sunday, September 29, 2024 43
Quartiles
Quartiles are an other positional measures
that are used to indicate how much of the
data set have a value less than a specified
value when the data is divided into four parts.
=r
• where r represents given quartile (r=1 for
Q1, r=2 for Q2 and r=3 for Q3) and n for
sample size
Sunday, September 29, 2024 44
Example
Calculate the 70th
percentile, 6th
decile and Q3 of the
following age data 28, 17, 12, 25, 26,19,13,27,21, 16
Percentiles
n=10
r= 70th
percentile
1st
Order data into ascending
12,13,16,17,19,21,25,26,27,28
=r==7=7.7 digit
Sunday, September 29, 2024 45
7.7 lies somewhere between 25 and 26
To find the exact position we use this
formula for fraction percentiles
P70= decimal*(upper digit value - selected
digit value) + selected digit value
= 0.7* (26-25=1)= 0.7+25= 25.7
P70 =25.7, this means that 70 percentile of
values lie below 25.7 and 30% of the data
lies above 25.7
Sunday, September 29, 2024 46
Deciles
Data ordered: 12,13,16,17,19,21,25,26,27,28
Question: Find 6th
decile?
Given
n=10
r=6
Solution
=r = 6=6.6
Sunday, September 29, 2024 47
So 6.6 decile lies between 21 and 25
To find the exact position we use this
formula for fraction deciles=
decimal*(upper digit value - selected digit
value) + selected digit value
= 0.6 * (25-21=4) +21=23.4
Thus the 6th
decile is 23.4
This means that 6 deciles of the data lie
below 23.4
Sunday, September 29, 2024 48
Quartiles
Data ordered: 12,13,16,17,19,21,25,26,27,28
Question: Find 3rd
Quartile?
Given
n=10 formula =r
r=3
Solution
• =3=8.25 digit
Sunday, September 29, 2024 49
So 8.25 decile lies between 26 and 27
To find the exact position we use this
formula for fraction quartiles
Q3=decimal*(upper digit value - selected digit
value) + selected digit value
= 0.25* (27-26) + 26 =26.25
Thus Q3=26.25
This means that 3 quartiles (75%) of the
data lies below 26.25
Chapter Three
Measures of dispersion
50
Measures of dispersion or measures of
variation measure variability a set of
observations exhibit.
They measure how values spread out from
each other.
The variation is small when the values are
close together.
There is no dispersion (variation) if the
values are the same
51
There are several measures of
dispersion, some of which are
1. Range
2.Variance
3.Standard deviation
4.Coefficient of variation
Sunday, September 29, 2024 52
The range
Range is the difference between the largest
value (maximum) and smallest value
(minimum).
Rang (R)=Max-Min
Example
Find the range for the sample values:
26,25,35,27,29
Sunday, September 29, 2024 53
Solution
Max=35
Min=25
Range=35-25=10
Notes:
I. The unit of the range is the same as the unit
of the data
II.The range is poor measure as it takes into
account only two values (Max and Min)
54
The Variance
• The variance is one of the most important
measures of dispersion.
• The variance is a measure that uses mean as
point of reference
• Sample Variance is taken as symbol (S2
)
S2
=
Sunday, September 29, 2024 55
• The population Variance is taken as symbol
(σ2
)
σ2
=
Sunday, September 29, 2024 56
Example
We want to compute a sample variance of the
following sampled health care workers’ income
values per week 10, 21, 33, 53, 54
Solution
n=5
= = 10+21+33+53+54/5 = 171/5=34.2
Thus = 34.2 USD/week
Sunday, September 29, 2024 57
S2
= = = 376.7
)2
10 10-34.2 =-24.2 (-24.2)2
=585.64
21 21-34.2 = -13.2 (-13.2)2
=174.24
33 33-34.2 = -1.2 (-1.2)2
=1.44
53 53-34.2 =18.8 (18.8)2
=353.44
54 54-34.2 =19.8 (19.8)2
=392.04
=171 =0 )2
=1506.8
Sunday, September 29, 2024 58
• The standard deviation is another measure of
deviation.
• It s square root of the variance.
• Population standard deviation (σ)= √σ2
• Sample standard deviation (S)= √S2
Standard Deviation
Sunday, September 29, 2024 59
Example
We want to compute a sample variance of the
following sampled health care workers’ income
values per week 10, 21, 33, 53, 54
Solution
n=5
S2
=376.7
S=√S2
= √376.7= 19.41
Sunday, September 29, 2024 60
Coefficient of variation
The variance and standard deviation are useful
as measure of variation of the values of a single
variable for a single population.
If we want to compare the variation of two
variables we cannot use the variance or the
standard deviation because:
I. The variables might have different units.
II.The variables might have different means.
Sunday, September 29, 2024 61
• We need a measure of the relative variation
that will not depend on either the units or
on how large the values are.
• This measure is the coefficient of variation
(C.V.).
• C.V= x100
Sunday, September 29, 2024 62
Example
Compare the variability of weights of two groups
C.V1= x100 = x100=6.8%
C.V2 = x100 = x100=12.5%
Since C.V2>C.V1, the relative variability of the 2nd
group
is larger than the relative variability of the 1st
group
Groups Mean SD C.V
1st
group 66 kg 4.5 kg 6.8 %
2nd
Group 36 g 4.5 kg 12.5 %
63
Sunday, September 29, 2024
Exercise 1
A student was asked to mention the results of
the 5 subjects he/she covered for the last
semester and the data was presented as the
following: 80, 71, 63, 53, 54
- Now calculate:
1] Range
2] variance
3] Standard deviation 64
Sunday, September 29, 2024
Exercise 2
Let us compare the exam results of 2 groups
The 1st
group:
Mean exam result= 75
Standard deviation= 7.5
The 2nd
group:
Mean exam result= 80
Standard deviation= 9
Calculate the variability of results among the 2
groups?
65
Sunday, September 29, 2024
Data: raw, unorganized
facts that need to be
processed.
When data is processed
to make it useful, it is
called information.
66
Chapter 4; Collection and Organization
of data
Types of data
67
Primary Data:
• Definition: data
collected firsthand
by the researcher.
68
Primary data collection methods
 Interviews
 Observations,
 Focus group discussions
 Blood, body fluid, urine,
feces,
 Imaging (X-ray, US, CT, MRI) 69
Common primary data collection tools
1. Questionnaires 2. Google form 3. Kobo tool box
70
Secondary Data:
• Definition: data that
has been collected
by some one else or
institution.
71
Journals
Books
Magazines
Newspaper
Libraries
Websites
Medical records
SECONDARY DATA SOURCES
72
Organizing data in Array (Ordered
Array)
• A first step in organizing data is the
preparation of an ordered array.
• An ordered array is a listing of the
values of data in order of magnitude
from the smallest value to the largest
value
73
Ex: the following data related to the age
of 6 individuals is arranged in array
55 46 58 54 52 69
Ascending form: 46 52 54 55 58 69
Descending form: 69 58 55 54 52 46
Sunday, September 29, 2024 74
Frequency Distribution
• The most convenient method of
organizing data is to construct a
frequency distribution.
• A frequency distribution is the
organization of raw data in a table
form, using classes and frequencies.
75
Grouped Frequency Distributions
When the range of the data is large, the data
must be grouped into classes.
Class Boundary
Definition: Class Boundary: A class boundaries
(lower limit on class –upper limit of the previous
class) / 2.
The difference between the two boundaries of a
class gives the class width. The class width is also
called the class size. 76
Finding Class Width
Class width = Upper boundary - Lower
boundary
Calculating Class Midpoint or Mark
Class midpoint or mark=
Sunday, September 29, 2024 77
Example: In the following Table gives
the weekly earnings of 100 employees of
a large company.
The first column lists the classes, which
represent the (quantitative) variable:
weekly INCOME.
78
79
Weekly Income in USD Number of employee (Freq)
801-1000 9
1001-1200 22
1201-1400 39
1401-1600 15
1601-1800 9
1801-2000 6
Calculate Class Boundaries, Class Widths, and Class Midpoints for the above data
Solution:
A class boundary = (lower limit on class – upper
limit of the previous class) / 2 = 1001 – 1000 / 2
= 1 / 2 = 0.5
Lower limit ( 801 – 0.5 ) = 800.5
Upper limit ( 1000 + 0.5 ) = 1000.5
Width of the first class = 1000.5 - 800.5 = 200
Midpoint of the first class = = 900.5 80
81
Constructing Frequency Distribution
Tables
Important steps for a Constructing of a
frequency Distribution for continuous table.
1.The number of classes depends on the
range of the data.
Range = largest value – smallest value
82
2. Number of class:
Number of class should not be too large or
too small.
As a general rule, the number of classes
should be around where n is the number of
data values observed.
83
4. Number of columns: usually there will
be two columns in a frequency table: class
intervals and frequency.
84
Example: the following data represents
the number of patients admitted by a
hospital in 30 days.
Construct a frequency distribution table.
85
86
Solution:
In this data, the minimum value is 5, and
the maximum value is 29.
Number of class = = 5
Range = largest value – smallest value
= 4.8 5
87
Sunday, September 29, 2024 88
Patients admitted Frequency
5-9 3
10-14 6
15-19 8
20-24 8
25-29 5
Total frequency: 30
Example: Calculate the class boundaries relative frequencies
and percentages for the table in the previous example
89
90
Patients
admitted
Frequency Relative
frequency
Percentage (%)
5-9 3 3/30= 0.1 0.1x100= 10
10-14 6 6/30= 0.2 0.2x100= 20
15-19 8 8/30= 0.267 0.267x100= 26.7
20-24 8 8/30= 0.267 0.267x100= 26.7
25-29 5 5/30= 0.167 0.167x100= 16.7
Total 30 1 100
Cumulative Frequency Distribution
A cumulative frequency distribution gives the
total number of values that fall below the upper
boundary of each class.
91
Example: Calculate cumulative frequency and
cumulative percentages for the table in the
previous example
92
Sunday, September 29, 2024 93
Patients
admitted
Frequency Cumulative
relative
frequency
Percentage (%) Cumulative
Percentage
5-9 3 3/30=0.100 0.1x100= 10 10
10-14 6 9/30=0.300 0.2x100= 20 30
15-19 8 17/30=0.567 0.267x100= 26.7 56.7
20-24 8 25/30=0.833 0.267x100= 26.7 83.3
25-29 5 30/30=1 0.167x100= 16.7 100
Total 30 100
Ungrouped frequency distribution of
numerical data
Data that has not been organized into groups.
Also called raw data.
Ungrouped data can be either numerical or
categorical.
94
Creating a Numerical Ungrouped
Frequency Distribution table
Step 1- arrange the data in an ascending
array.
Step 2- count the frequency of each value.
Step 3- create a table
Step 4- insert the data values in the table
95
Example: Blood Pressure Readings of 8
individuals.
120, 130, 130, 125, 140, 140, 140, 122.
create a frequency distribution table for
this data.
96
Step 1- arrange the data in an ascending
array.
120, 122, 125, 130, 130, 140, 140, 140.
Step 2- count the frequency of each
value.
120 (1), 122 (1), 125 (1), 130 (2), 140 (3).
97
Step 3- create a table
Step 4- insert the data values in the table
Creating a Categorical Frequency
Distribution table
Step 1-count the frequency of each value.
Step 2-create a table
Step 3-insert the data values in the table
99
Example of ungrouped categorical data
related to the blood types of 20
individuals:
• Blood Types:
A, B, O, AB, O, A, B, A, O, B, AB, A, O,
B, B, A, O, AB, B, A
100
Step 1- count the frequency of each
category.
A= 6 individuals
B= 5 individuals
AB= 5 individuals
O= 4 individuals
101
Step 2-create a table
Step 3-insert the data in the table
Blood Type Frequency
A 6
B 5
O 5
AB 4
Total frequency 20 102
:
Relative Frequency and Percentage
Distributions
Shows what fractional part of the total
frequency belongs to the corresponding
category.
The relative frequency of a category is
obtained by dividing the frequency of that
category by the sum of all frequencies.
103
104
The percentage for a category is obtained by
multiplying the relative frequency of that
category by 100.
A percentage distribution lists the
percentages for all categories.
Calculating Percentage
• Percentage = (Relative frequency) 100
105
Example: Determine the relative frequency
and percentage distributions for this data.
106
Chapter five
Visualization and presentation of data
107
Techniques of Data presentation
Data can be presented in:-
 Tabular
 Graphical
Sunday, September 29, 2024 108
Tabular data presentation
A table contains data in rows
and columns.
Types of Tables
1. Univariate table
2.Bivariate table
3.Multivariate table
109
Age Frequency Percentage
21-26 6 30
27-32 6 30
33-38 2 10
39-44 3 15
45-50 3 15
Total 20 100
Univarate Table-2: Age
110
Age Male Femal
e
Total
21-26 1 5 6
27-32 3 3 6
33-38 0 2 2
39-44 3 0 3
45-50 1 2 3
Bivariate Table-1: Sex and Age
111
Multivariate Table-3: Age, sex and residence
Sunday, September 29, 2024 112
Gender__
Age
Male Female
Total
Urban Rural Urban Rural
21-26 1 2 5 1 9
27-32 3 2 3 2 10
33-38 0 1 2 1 4
39-44 3 2 0 2 7
45-50 1 3 2 1 7
Total 8 10 12 7 37
Graphical presentation of data
Tabulation is an important systemic
presentation of data but often data is
easily revealed by diagrams or graphs.
Sunday, September 29, 2024 113
Types of graphical presentation
Data Type Type of Table
Qualitative Univariate
Simple Bar
Components Bar
Pie chart
 multiple pie chart
Quantitative Histogram
Line graph/chart 114
Simple bar
Simple bar chart is used for presenting
Univariate qualitative data.
• Bar charts have horizontal axis called X-
axis and Vertical axis called Y-axis
• Categories are putted on X-axis and
percentage or Frequency on Y-axis
115
Male Female
0
10
20
30
40
50
60
Male
Female
Sunday, September 29, 2024 116
Component Bar
• To draw component bar, divide
100% into components equal to
the number of categories of
the variable you want to draw.
Sunday, September 29, 2024 117
Sex
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Female
Male
Sunday, September 29, 2024 118
Pie chart
A pie chart is circular statistical graph,
which divides the data into slices to
illustrate numerical proportion of each
category.
Sunday, September 29, 2024 119
40%
60%
Sex
Male
Female
Sunday, September 29, 2024 120
Multiple bar chart
• A multiple bar chart is a type of bar chart tat is
used for bivariate qualitative data.
• Using this data construct Multiple bar chart.?
Sex Diabetes No diabetes
Male 3 5
Female 8 4
Total 11 9
121
Diabetes No diabetes
0
10
20
30
40
50
60
70
80
27.3%
55.5%
72.7%
44.5%
Male
Female
Sunday, September 29, 2024 122
Graph for Quantitative variables
Graphs used to present quantitative
univariate variables include:-
• Histogram,
• Line graph/Line chart
123
Histogram
• Histogram is the common graph for
quantitative variables.
• It is similar to bar chart except that
there is no gaps between its bars
Sunday, September 29, 2024 124
50 100 150 200
0
1
2
3
4
5
6
7
8
9
Histogram
Sunday, September 29, 2024 125
Chapter Six: Probability and Normal
distribution of data
Probability is the likelihood of occurrence of
an event and is measured by the proportion
of times an event occurs. An event is taken
by “E”; the number of times event occurs is
taken by “n” and all possible events
(outcomes) is taken by “N”
P(E) =
or P(E) = n/N
126
EXAMPLE: 1
A coin is tossed, what is the probability of
getting head?
Coin has two outcomes, head and tail, so total
outcomes (N) is 2
There is only one head, so event (head) =1
P(Head) = = P(Head) = = 0.5
The probability of getting head if coin is tossed
is 0.5 or 1/2
Sunday, September 29, 2024 127
EXAMPLE: 2
OPD attendance of a hospital is shown in here
What is the probability a randomly selected individual has
diabetes?
What is the probability a randomly selected individual has
hypertension?
Diseases Frequency
Diabetes 80
Hypertension 40
Total 120
128
Solution
• P(Diabetes) =
= P(Diabetes) = = 0.67
• P(Hypertension) =
= P(Hypertension) = = 0.33
129
Characteristics of Events
Events possess certain characteristics,
which are:-
a. Mutually exclusive events
b.Mutually non-exclusive events
c. Independent events
d.Dependent events
130
Mutually exclusive Events
• Events of a trail are called mutually exclusive if an
only one event occurs in each single trail. This means
that events cannot occur simultaneously that if one
event the other can occur.
• Example: suppose if a coin is tossed, for any toss
(trail) there is only one event (either head or tall).
131
Mutually non-exclusive Events
events which can occur simultaneously, for
example an individual can have only diabetes
or only hypertension or both diabetes and
hypertension at same time, these events
which can occur simultaneously are called
mutually non-exclusive.
Sunday, September 29, 2024 132
Example: Suppose in OPD attendance there
are two categories, people with Diabetes
and people with hypertension.
However there some people who have both
Diabetes and Hypertension.
Thus events like Diabetes and Hypertension
are considered as Mutually Non-exclusive
events
133
Independent Events
if A and B are two events of a particular trail and
the outcome of event A does not effect and is not
effected by the outcome of event B then A and B
are called Independent Events.
For example: if you toss two coins, the outcome
of one first toss (head or tail) is will not affect
and it is not affected by the outcome of the
second toss.
134
Dependent Events:
• If outcome of event A influences outcome of
event B or B affects A, event A and event B are
considered as dependent events.
Example:
• Having smoking and lung cancer
• Driving a car and getting in a traffic accident
• Robbing a bank and going to jail.
135
Properties of probability
Probability is expressed in proportion. So it
takes any value between 0 to 1. However you
can show it in percentage, that is it can take
0 to 100%.
Probability of 1 means that event is certain
to occur (E.g. probability of dying).
Probability of 0 means that event is certain
not to occur (E.g. probability not dying).
136
A probability of 0.5 means that events have
equal chance of occurrence.
The higher the probability value, the higher
the chance of occurrence and the smaller the
probability value, the lower the chance of
occurrence.
The sum of probability of all events must be
equal to 1 or 100%
137
Types of probability
According to the time of occurrence of
events probability is categorized as :-
Priori probability: is calculated before the
occurrence of event by logically examining
the existing knowledge.
It usually deals with the independent events.
For example probability of having head or
tail is 1/2 or 0.5 138
Posteriori probability: is calculated
after the occurrence of the event, that
is it is based on frequency of
occurrence.
For example: number of hypertensive in
a sample of 100 patients.
139
Rules of probability
There are two basic rules in probability
i. Addition Rule
ii. Multiplication Rule
Sunday, September 29, 2024 140
Addition Rule
This rule applies to both mutually exclusive and
mutually non-exclusive events of a single random
variable. This rule is characteristics by the term “or”
(sometimes as means of union) in between the two
∪
events E.g. P(A or B) sometimes also shown as P(A ∪
B)
For mutually exclusive Events
P(A or B) = P(A) + P(B)
For mutually non-exclusive Events
P(A or B) = P(A) + P(B) - P(AB) 141
Example 1 (mutually exclusive Events)
A single 6-sided die is rolled. What is the
probability of rolling a 2 or a 5?
Solution
Since 2 and 5 are mutually exclusive , the P (2
and 5) =0
P(2) = 1/6 , P(5) = 1/6
P(2 or 5) = P(2) + P(5) =1/6+1/6 =2/6 =1/3=
0.333 142
Example 2 (mutually exclusive and
mutually non exclusive Events)
Suppose patients attending a hospital OPD are
categorized as in the following table.
Disease No. of patients
Eye disease 5
Respiratory disease 15
Only Diabetes 90
Only Heart disease 30
Both Diabetes and Heart disease 10
Total 150
143
Sunday, September 29, 2024
If person is drawn at random
a. What is the probability that he/she
will have Eye disease or Respiratory
disease
b.What is the probability that he/she
will have Diabetes or heart disease
Sunday, September 29, 2024 144
Solution
a.Eye disease or Respiratory
disease (mutually exclusive In
here)
• Patients with eye disease =5
• Patients with respiratory
disease=15
• Total patients =150
P(eye disease or respiratory
disease) = 5/150+15/150 = 0.13
Sunday, September 29, 2024 145
b. Diabetes or Heart Disease (mutually Non-
exclusive In here)
• Patients with diabetes =90+10=100
• Patients with Heart disease=30+10=40
• Total patients =150
P(Diabetes or Heart disease) = P(Diabetes) +
P(Heart disease) - P(Diabetes and Heart Disease)
P(Diabetes or Heart disease) = 100/150 +
40/150 - 10/150 =0.87
146
Normal Distributions of data
In the normal distribution, observations are
more clustered around the mean.
Normally almost half of the observations lie
above the mean and half below the mean and
all observations are symmetrically
distributed on each side of the mean.
147
Characteristics of Normal Curve/Distribution
a) The normal curve is symmetrical and bell shaped
b) Maximum values at the centre and decrease to
zero systemically on each side
c) Mean, median and mode are all equal
• Mean ± 1SD limits includes 68.2% of all
observations
• Mean ± 2SD limits includes 95% of all observations
• Mean ± 3SD limits includes 99.7% of all
observations
Sunday, September 29, 2024 149
Normal Curve
Skewed Distributions
Distributions that are not symmetric and
have long tail in one direction are called
Skewed Distributions.
In skewed distribution, most values are
closer to one end and relatively few values in
the other direction. 150
151
Positively Skewed Distributions
If the tail of the distribution extend to the
right (positive side), the distribution is
called Positively Skewed Distribution or
right skewed distribution.
In right skewed distributions, majority of
the values lie at the left part of the
distribution.
152
Negatively Skewed Distributions
If the tail of the distribution extend to the
left (negative side), the distribution is
called negatively Skewed Distributions or
left skewed distributions.
In left skewed distributions, majority of the
values lies at the right side of the
distribution
Sunday, September 29, 2024
153
Left and Right Skewed Examples
Sunday, September 29, 2024
Sunday, September 29, 2024 154
Section II
Inferential Biostatistics
155
Inferential Biostatistics
Descriptive statistics remains local to the
sample, describing its central tendency and
variability while inferential statistics focuses
on making statements about the population.
156
Statistics Vs. Parameter
Statistics(Sample value)
• Mean ()
• Variance (2
)
• Standard deviation ()
• Proportion ()
Parameter (population value)
• Mean (μ)
• Variance (2
)
• Standard deviation ()
• Proportion (
, ,
Sunday, September 29, 2024 157
Chapter Seven
Hypothesis and significance testing
158
Test of significance
is the determination of whether a
result is statistically significant or if
it could have occurred by chance.
159
Hypothesis
It is researchers assumed answer for
relationship between two variables or the
significance of a test result.
There are two statistical hypotheses:-
a.Null Hypothesis
b.Alternative hypothesis
Sunday, September 29, 2024 160
Null Hypothesis
it states that there is no real difference
between statistic and parameter, say sample
mean = population mean.
Any observed difference is just by chance.
Null hypothesis is donated by the symbol of
H0.
Sunday, September 29, 2024 161
Alternative hypothesis
Alternative hypothesis: it states that there
is real difference between statistic and
parameter, say sample mean ≠ population
mean. Alternative hypothesis is donated by
the symbol of H1 or Ha.
H0 = µ1=µ2 Ha.= µ1 ≠ µ2
• When Null hypothesis is rejected,
alternative hypothesis is accepted. 162
P-Value
P-value indicates the amount of support
possessed by the null hypothesis.
As the p-value which lies between 0%-100%)
approaches to 0, the support (for H0)
becomes weaker and weaker while as it
approaches to 100, the support is stronger
and stronger.
Sunday, September 29, 2024 163
Level of significance
In order to decide whether the support is
strong or weak we need some cut-off value or
level.
This cut-off value or level is known as level of
significance denoted by α.
Sunday, September 29, 2024 164
Internationally accepted levels of
Significance
•10% (or 0.1)
•5% (or 0.05)
•1% (or 0.01)
The most commonly used is 5% (or 0.05)
165
The zone of the null hypothesis acceptance
1] If the calculated value is less than the
tabulated value, the null hypothesis is
accepted and alternative hypothesis is
rejected. (Calculation based)
2] If the support of the null hypothesis (p-
value ≥0.05) the null hypothesis is accepted
and alternative hypothesis is rejected.
(Computer Based) 166
The zone of the null hypothesis rejection
1] If the calculated value is greater than the
tabulated value, the null hypothesis is rejected
and alternative hypothesis is accepted.
(Calculation based)
2] If the support of the null hypothesis (p-
value) is less than the most commonly used
significance level (p-value <0.05) the null
hypothesis is rejected and alternative
hypothesis is accepted (Computer Based) 167
One-Tailed and Two-Tailed Tests
One-Tailed Test
The null hypothesis can be tested using either
one-tailed or two tailed tests.
A test involving null hypothesis that favors only
one direction is called one tailed test.
Example: suppose a study compares two drugs,
drug A and Drug B.
Sunday, September 29, 2024 168
So null hypothesis (H0) = Drug A is not more
effective than Drug B. and alternative
hypothesis (Ha) = Drug A is more effective
than Drug B.
H0 Drug A = Drug B
Ha. Drug A > Drug B
Sunday, September 29, 2024 169
Two-tailed Test
In Two-tailed Test deviation of both directions are
considered when testing.
For example: in the previous example of comparing the
effectiveness of Drug A and Drug B. The two tailed null
hypothesis and alternative hypothesis will be as H0 =
Drug A and Drug B has same effect. Ha = Drug A and
Drug B has no same effect or in short way:
H0 Drug A=Drug B
Ha. Drug A ≠ Drug B
Sunday, September 29, 2024 170
Sunday, September 29, 2024 171
Steps for Hypothesis Testing
a) Describe the given data
b) State the assumptions (assumption is
unexamined belief)
c) State Null and Alternative Hypothesis
d) State Level of significance
e) Choose test statistic (z-test, t-test,
ANOVA, X2
)
f) Compute the test statistic
Sunday, September 29, 2024 172
G) Look the tabulated test statistic responding
to significance level or degree of freedom or p-
value and compare the calculated test statistic.
Or p-value. If the calculated test statistic > the
tabulated test statistic Otherwise we will not
reject (accept) Null hypothesis.
H) Decision: Reject or accept the Null
hypothesis.
I) Conclusion: conclude in the language of the
accepted hypothesis.
173
Chapter Eight
Testing the significance difference
between two and three sample means
Sunday, September 29, 2024 174
Testing the significance difference between two sample means
When we want to determine that the
difference between two group means are
significant (large enough) or insignificant
(only due to chance) we do Z-test or t-tests.
Here are the decision criteria for using Z-
test or t-tests
175
Sunday, September 29, 2024 176
Z-test (normal test)
Z or z =
Tabulated z values
Significance level
(α)
Two-tailed
1-(alpha/2)
One-tailed,>
1-alpha
One-tailed, <
1-alpha
10% (or 0.1) 1.64 1.28 -1.28
5% (or 0.05) 1.96 1.64 -1.64 177
Example
The mean birth weight of babies born on large
community over several years was 2470 gram and
standard deviation of 230 gram.
Following implementation of ANC program, the
mean birth weight obtained from a sample of 40
babies was 2560 gram and standard deviation of
250 gram.
Does the ANC program has any impact on birth
weight of the new born babies? 178
Solution
Data: Given=2470gm, 2560 gm, σ = 230gm, s=250gm, n=40
Assumption: a)birth weight of the baby population is
normally distributed
b) Sample was selected at random
Hypothesis: H0: =2470gm (mean birth weight of the
populations will not change even after ANC). Ha: ≠2470gm
(mean birth weight of the populations will change after ANC).
Level of significance (α): 5% (0.05)
Choose Test statistic: since σ is known, we do Z-test
179
Compute the test statistic
Z = Z
Compare the calculated Z to the Tabulated z :
Tabulated z with 5% level of significance is 1.96
Decision: we reject Null hypothesis since the
calculated z (2.47)> the tabulated z(1.96)
Conclusion: the mean birth weight of baby born
has increased after ANC program implementation.
180
Example-2
The Hemoglobin level of children was measured in 143
girls and 127 boys with known population SD. Here are
the results.
Here girls have Hb level than boys on average, so the
question is whether the observed difference is
significant or not?
Girls Boys
Mean 11.2 11.0
SD 1.4 1.3
n 143 127
Sunday, September 29, 2024 181
Solution
• Data: Given,, s1 = 1.4 s2=1.3, n1=143, n2=127
• Assumption: a)HB level of the population is normally
distributed
• b) Sample was selected at random
• Hypothesis: H0: (any observed difference is due to by
chance alone).
Ha: : (mean Hb Level of girls and boys
are significantly differ)
• Level of significance (α): 5% (0.05)
• Choose Test statistic: since n>30, we do Z-test
182
Compute the test statistic
z = = = 0.2/0.14119=1.413
Compare the calculated Z to the Tabulated z with 5%
level of significance : Tabulated z with 5% level of
significance is 1.96
Decision: we accepted Null hypothesis since the
calculated z (1.413) is <the tabulated z(1.96)
Conclusion:mean Hb Level of girls and boys are not
significantly different.
183
t Test
184
t Test is a test for comparing means of one
sample as well as means of two sample
situations.
Types of t test
a) One sample t test b) Independent sample
t test c) Paired sample t test
185
One sample t test
• One sample t test is used to test whether a
population mean is significantly different
from some hypothesized value.
• t =
• is sample mean, m is the hypothesized
value, s = is sample SD and n = is sample size
186
Example : A professor of Statistics wants to
know whether if his introductory statistics class
has a good grasp of basic math. Six students were
chosen at random from the class and given a math
proficiency test.
The professor wants the class to be able to score
above 70 on the test. The six students get scores
of 63, 93, 75, 68, 83, and 92. with SD of 13.17.
Can the professor have 95% Confidence that the
mean score for the class on the test would be
above 70?
187
Since the population standard deviation is not known, we use t
test.
Solution
H0=
== 63+93+75+68+83+92/6 = 79
M= above 70
t = t = =
s = = 13.17
188
Solution
t =
df = n-1 = 6-1=5
Note that we are testing only
whether the average mean of
score of students is greater than
70, so we are dealing with one
tailed t-test.
189
Sunday, September 29, 2024
The tabulated t test with 5%
significance level and df of 5 is
2.015
Thus the calculated t-test (1.67) is
less than the tabulated t-test with
df=5 at 5% level of which is 2.015.
(Calculated t<tabulated t0.05,5)so the
null hypothesis is accepted
Sunday, September 29, 2024 190
Independent sample t-test
Independent sample t-test is used to test the
means of two independent groups. Usually a
qualitative Dependent variable with two categories
and quantitative continues independent variable.
Such as the height of male and females, blood
pressure of two groups. Example to test whether
male income and female income are different or
not. t =
Ex: Here is the blood pressure of male and female
patients. The question is whether the blood pressure of
the patients differs?
Solution
H0=Ha=
t = t =
Male Female
n 25 25
155 160
S 10 8
192
Df = n1+n2-2 =25+25-2=48 at 5%
significance level, the tabulated t =2.021
Thus ignoring the sign t calculated < t tabulated, so
null hypothesis is accepted.
We can conclude that the two means (the
mean male blood pressure and the mean
female blood pressure) are not
significantly different.
193
Paired sample t test
Paired sample t test is used to test the mean
difference of two dependent observations, such as
blood pressure before exercise and blood after
exercise for a group of individuals. In independent t
test we were interesting between group differences
but in paired t test we are interesting within group
difference. , where is the mean difference the two
pairs (eg. before and after) =
194
Example
Here is the temperature of 8 individuals before and after
the treatment
Patient Before (X) After (Y)
1 25.8 24.7
2 26.7 25.8
3 27.3 26.3
4 26.1 25.2
5 26.4 25.5
6 27.4 26.6
7 27.1 26.0 195
Solution
Lets first calculate d and d2
Patient Before (X) After (Y) d=x-y d2
1 25.8 24.7 1.1 1.21
2 26.7 25.8 0.9 0.81
3 27.3 26.3 1.0 1.00
4 26.1 25.2 0.9 0.81
5 26.4 25.5 0.9 0.81
6 27.4 26.6 0.8 0.64
7 27.1 26.0 1.1 1.21 196
• = 7.9/8=0.98
• sd=
• (Variance of d)=
• sd= =0.1
•
197
The tabulated t value with df 8-1=7 at 5%
significance level is 2.365, so the calculated
t>tabulated t with 7df at 5% significance
level.
Decision: Null hypothesis is rejected and
alternative hypothesis is accepted. We
conclude that the temperature of the
individuals before and after treatment is
not the same.
Analysis of Variance (ANOVA or F test)
199
Analysis of Variance (ANOVA or F
test)
Analysis of variance is statistical methods of
analyzing data with objective of comparing three
or more group means.
It replaces t-test that comparing two group
means only.
Analysis of variance is sometimes called F test,
after the British R A Fisher (the British
Statistician who developed this test).
200
One way ANOVA: used when we have
One continues dependent variable and
one categorical independent variable with
more two categories, to compare the
means of these groups.
Example: If we want to know whether people
residing three different areas (Rural, Urban
and Semi-urban) earn different incomes
201
How to calculate One- Way ANOVA
1) F = MSSBG/ MSSWG
2) SST = or SSBG +SSWG
3) SSBG = =
4) SSWG= SST - SSBG
Sunday, September 29, 2024 202
5) MSSBG =
6) MSSWG=
7) F test =
Sunday, September 29, 2024 203
Sunday, September 29, 2024 204
Example
Three different treatments are given to 3
groups of patients with anemia. Increase in
HB% level was noted after one month and is
given in Table 2.0. we are interested to find
whether the difference in improvement in3
groups is significant or not.
205
Three different treatments are given to 3
groups of patients with anemia. Increase in
HB% level was noted after one month and is
given in Table below. we are interested to
find whether the difference in improvement
in 3 groups is significant or not.
Sunday, September 29, 2024 206
Group A Group B Group C
x1 x2 x3
3 3 3
1 2 4
2 2 5
0 3 4
1 1 2
2 3 2
Sunday, September 29, 2024 207
Solution
Group A Group B Group C Group A Group B Group C
x1 x2 x3 x1
2
x2
2
x3
2
3 3 3 9 9 9
1 2 4 1 4 16
2 2 5 4 4 25
0 3 4 0 9 16
1 1 2 1 1 4
2 3 2 4 9 4
2 2 4 4 4 16
=11 =16 =24 2
=23 2
=40 2
=90
=23+40+90= 153
=11+16+24= 51
Sunday, September 29, 2024 208
SST = = = =29.14
SSBG = = =
=12.28
4) SSWG= SST - SSBG =29.14-12.28=16.86
5) MSSBG = = = 6.14
Sunday, September 29, 2024 209
Source
of
variation
Degree
of
freedom
SUM of
Squares
Mean of
Squares
F
Between
Groups
K-1 = 3-
1= 2df
12.28 6.14 6.53
With in n-K= 21-
3=18
16.86 0.94
6) MSSWG= = =0.94
7) F = =6.53
Sunday, September 29, 2024 210
Interpretation
The tabulated F value at df 2,18 is 3.55 at
5% level of significance. Our calculated F
value is 6.53, that is our calculated F value
is greater than the tabulated F value (F
calculated > F tabulated= 6.53> 3.55).
Thus the null hypothesis is rejected. Hence
we conclude at least one of the groups has a
significant increase of HB%
211
Chapter Nine
Association, Corrélation and prédictions
212
Chi-square Test
Sunday, September 29, 2024 213
A chi square (χ2) test is useful in making
statistical association about two
independent categorical variables in
which the categories are two and above
(but usually two).
214
215
df= (r-1) (c-1), r=number of rows, c=number
of columns
Example
Suppose a researcher wants to test if the
knowledge of people is associated with
service
utilization. He conducted a sample survey of
100 individuals of which 78 had High level of
knowledge.
Sunday, September 29, 2024 216
Of these 78 who had god knowledge, 50
were service user. Whereas 22 who had
low knowledge level, 10 of them used
service.
Do these data provides evidence of
association between knowledge level and
service utilization? 217
Sunday, September 29, 2024 218
2. Assumption: data follows a normal
distribution and the sample was drawn
randomly.
3. Hypothesis:
Ho. There is no association between
“knowledge level” and “service utilization”
Ha. There is association between
“knowledge level” and “service utilization”
4. Level of significance: α=5% (0.05)
Sunday, September 29, 2024 219
Sunday, September 29, 2024 220
Sunday, September 29, 2024 221
7. Compute the degree of freedom (df)
df= (r-1) (c-1)= (2-1)(2-1) =1df
8. Tabulated Value of χ2: with df=1 and 5%
level of significance =3.84
9. Compare the computed value with tabulated
value: calculated χ2
(2.481)<Tabulated χ2 (3.84)
10. Decision: H0. Is accepted
11. Conclusion: the data does not provide
evidence of association between
knowledge level and service utilization
Sunday, September 29, 2024 222
Correlation analysis
223
When one quantitative variable changes with
the change of other quantitative variable
they are said to be correlated.
The variable that changes the other variable
is called Independent variable (IV) and the
variable that is changed is called Dependent
(DV).
The DV is represented by Y and IV is
represented by X.
224
Example: Income and Age are both quantitative.
They are correlated because when age
changes the income changes as well.
Therefore Age is (X=IV) while income is
(Y=DV).
When the change occurs in fixed rate it is called
linear correlation.
The correlation between one DV and One IV is
called Simple correlation. E.g. correlation
between Income and Age
Sunday, September 29, 2024 225
The correlation between one DV and more
IVs is called multiple correlation. E.g.
correlation between Income, Age and family
size.
Correlation Coefficient (r)
To calculate the correlation between
variables, we use a measure called
correlation coefficient (r)
Sunday, September 29, 2024 226
227
Characteristics of relationship
The correlation coefficient (r) indicates both
the strength and direction of relationship.
Strength (Magnitude) of the relationship:
When correlation coefficient is zero it
indicates no correlation.
<=0.3= weak correlation.
0.4-0.6= Moderate correlation.
0.7-1= Strong correlation
Sunday, September 29, 2024 228
When the correlation coefficient is one
(either + or -) it indicates a perfect
correlation. As r approaches to 1(either
+ or -), the strength of the relationship
increases.
229
Direction of relationship: the relationship can
be positive, negative or no correlation.
Positive correlation is when the two variables
move the same direction (increase or decrease
together). E.g. Gestational period and birth
weight. This is when r=+ve
Negative correlation: is when the two variables
move on different directions (when one
increases the other decreases) E.g. Age and Eye
sight. This is when r= -ve
Sunday, September 29, 2024 230
No correlation: is when the change in one variable
does not influence the change in another variable.
E.g. Age and Sex.
This is when r=0
Example:
Suppose 4 person were selected as a sample to
determine the correlation between weight
and height
Sunday, September 29, 2024 231
Weight in
Pound (Y)
Height in
inches (X)
Y2
X2
XY
240 73 57600 5329 17520
210 70 44100 4900 14700
180 69 32400 4761 12420
160 68 25600 4624 10880
∑y: 790 ∑X: 280 ∑Y2
:
159700
∑X2
:
19614
∑XY:
55520
Sunday, September 29, 2024 232
Sunday, September 29, 2024 233
Interpretation
There is a very strong positive
correlation between the weight and
height of the respondents.
Sunday, September 29, 2024 234
Coefficient of Determination (r2)
The square value of r is called coefficient of
determination.
The coefficient of determination (r2)
measures the amount of variability in Y (DV)
is explained by X (IV).
Coefficient of Determination (r2) is shown as
percentage.
Sunday, September 29, 2024 235
Example: for the above example correlation
coefficient (r) is 0.97, thus coefficient of
determination (r2) is 0.97x0.97=0.94x100 =
94%
Interpretation
94% of the variability in the weight (DV) is
explained by the height (IV).
This means the remaining 6% variability in
weight is responsible by other variables but
not by height.
Sunday, September 29, 2024 236
Correlation Significant Test
To test the significance of the correlation
value we use the following formula to find
calculated T-value
t= 0.97*5.77= 5.6 (calculated t-value)
Sunday, September 29, 2024 237
Then we go to dependent t-test assuming the
significance level of 0.05 we look for Degree
of freedom which is in here calculated as n-1
then we go to T-TABLE and look for the
junction between the significance level and
the degree of freedom and we find the
tabulated T-value.
The tabulated t-value with two tailed test
of 0.05 significance level and a degree of
freedom of 3 is: 3.182
Sunday, September 29, 2024 238
Since the calculated t-value of 5.6 is > the
tabulated t-value of 3.182, the null
hypothesis is rejected.
Therefor we can conclude that there is a
significant, very strong positive correlation
between the height and weight of our
participants.
239
Regression Analysis:
A statistical procedure used to find
relationships among a set of variables
In regression analysis, there is a dependent
variable, which is the one you are trying to
explain, and one or more independent
variables that are related to it.
Sunday, September 29, 2024 240
REGRESSION TYPES
1) Linear regression = quantitative DV
A) simple (1 dv and 1 IV)
B) multiple (Multiple IV and 1 DV)
2) Logistic regression= qualitative DV
A) Binary = DV with 2 categories
simple (1 dv and 1 IV) multiple (Multiple IV and 1
DV)
B) Multinomial = DV with > 2 categories
C) Ordinal = DV which is ordinal.
Sunday, September 29, 2024 241
Linear Regression:
Linear regression is used when the dependent
variable is continuous and assumes a linear
relationship with the independent variables.
It aims to find the best-fitting line that
represents the relationship between the
dependent variable and one or more
independent variables.
Sunday, September 29, 2024 242
For example, a study might use linear
regression to determine the relationship
between smoking behavior (independent
variable) and lung function (dependent
variable) among a sample of individuals.
Sunday, September 29, 2024 243
Logistic Regression:
Logistic regression is used when the dependent
variable is categorical or binary. It models the
probability of an event occurring or the likelihood
of an outcome belonging to a particular category.
The dependent variable is usually binary (e.g.,
yes/no, success/failure), but it can also be
multinomial (more than two categories) or ordinal
(ordered categories).
Sunday, September 29, 2024 244
Why is regression analysis superior
compared with chi-square and
correlation
1. Prediction capability:
Regression analysis allows for prediction
that can estimate the value of the
dependent variable based on the values of
the independent variables.
245
2. Handling both categorical and numerical
variables
3. Control of confounding variables:
Regression analysis enables researchers to
control for the effects of confounding
variables by including them as independent
variables in the model.
Sunday, September 29, 2024 246
Confounding variables: are factors that are
associated with both the independent variable(s)
and the dependent variable in a study. Age is
frequently a confounding variable in health
studies.
Ex: if studying the association between a specific
medication and heart disease risk, age must be
considered as a confounding variable because
older individuals are more likely to have both
higher heart disease risk and higher medication
usage
Sunday, September 29, 2024 247
Regression equation: Beta0 + Beta1*X
Y= Dependent variable X= Independent variable
Beta 0 (CONSTANT) = (the value of Y when X
is zero).
It shows how much DV is if IV is 0.
Beta 0 formula= Y-bar – beta 1 * X-bar
Sunday, September 29, 2024 248
Beta 1 (Regression co-officient/INTERCEPT)
It measures the amount of change in DV (Y)
for any change in IV (X).
It represents the relationship between IV and
DV.
Beta1=
• ∑xy – (∑x * ∑y)
n
∑X2
- (∑X2
/n)
249
Example 1. The height and weight of 4
individuals were given as presented in
the following table.
Let us predict how much the weight (DV)
of an individual could be if his height
(IV) is 80 inches.
Sunday, September 29, 2024 250
Weight in POUND (y) Height in inch
(x)
Y 2
X2
Xy
240 73 57600 5329 17520
210 70 44100 4900 14700
180 69 32400 4761 12420
160 68 25600 4624 10880
∑y= 790 ∑x= 280 ∑y2
= 159700 ∑x2
=19614 ∑xy= 55520
251
Beta 0 formula= Y-bar – beta 1 * X-bar=
197.5 -15.7 * 70 = -- - 901.5.
Interpretation of Beta 0: if height is 0
the weight will be = -901.5 (a value that
does not exist) = 0
252
Beta1=
• ∑xy – (∑x * ∑y) = 55520 – (280 *790)
n 4
= 220 = 15.7
. ∑X2
- (∑X2
/n) = 19614 – (2802
/4) = 14
Beta1= 15.7
Interpretation of Beta 1: for any unit (inch)
change in height there will be 15.7 unit (pounds)
change in weight.
253
Regression equation: Beta0 + Beta1*X
-901.5+15.7*80 = 354.5
Interpretation of regression result:
based on the distribution of this data If
height is 80 inches the weight will be
354 pounds.
Sunday, September 29, 2024 254
Chapter TEN
Estimation
Estimation is a procedure to find values of a
parameter based on the value of statistic.
There are various techniques available for
different situations. We shall, however, limit our
discussions on two estimations.
There are two types of estimation:-
–Point Estimation
–Interval estimation 255
Point Estimation
Point Estimation occurs when we estimate that
the unknown parameter is equal to the calculated
statistic e.g. = μ or = or s=
Remember that statistic means sample based
summery measure (and parameter is population
based summery measure (e.g μ
Sunday, September 29, 2024 256
Interval estimation
Interval estimation occurs when we estimate that
the parameter will be included in an interval.
This interval is called confidence interval.
The likelihood that the parameter will include in
the confidence interval is called confidence level.
For example 95% Confidence level means, there is
95% likelihood (chance) that the parameter will
include the specified interval.
257
Estimation of a single population mean (μ)
Example-1:The mean reading speed of a random
sample of 81 University students is 325 words per
minute.
Find the mean reading speed of all Modern
students (μ) if it is known that the standard
deviation for all Modern students is 45 words per
minute.
258
Solution
Point Estimation: = μ = as the mean reading speed of a sample
is 325 words per minute, then the mean reading speed of all
Modern University students is also 325 words per minute
Interval Estimation for μ
μ = ±Z*SE(), Z=1.96 SE()= σ/√n =SE()=45 /√81=5 so
1.96*5= 9.8
325 ± 9.8 = 315.2 to 334.8 words/minute
This means if 100 samples is selected in university
students, the result of 90 of them will include in this
range.
259
Estimation of population mean differences(μ1-μ2)
Example-2:If a random sample of 50 non-smokers have
a mean life of 76 years with a standard deviation of 8
years, and a random sample of 65 smokers has a mean
live of 68 years with a standard deviation of 9 years,
A) What is the point estimate for the difference of the
population means?
B) Find a 95% C.I. for the difference of mean lifetime
of non-smokers and smokers.
Sunday, September 29, 2024 260
solution
Point Estimation of μ1-μ2
μ1-μ2= 1- = as the mean difference of life in the sample is 76-
68=8 years, then the mean difference of the population is also
8 years.
Interval Estimation of μ1-μ2
μ1-μ2 = 1- ±1.96*SE(1- ),
SE(1- )= + + = 1.57 = 1.96*1.57= 3
= 8±3 = 5 to 11 years
So the population mean life difference b/w the two groups will
lie in the range from 5 to 11 years. 261
Estimation single population proportion (
Example: An epidemiologist is worried about the ever
increasing trend of malaria in a certain locality and
wants to estimate the proportion of persons infected in
the peak malaria transmission period.
If he takes a random sample of 150 persons in that
locality during the peak transmission period and finds
that 60 of them are positive for malaria, find
a) Point estimation for ?
b) Find 95% CI?
262
Solution
Point Estimation of
p==40%. That the proportion of malaria positive people in the
population is 40%.
Interval Estimation of
= ±1.96SE(), SE()= =SE()= =0.04 = 1.96*0.04=
0.078*100 =7.8%
40%±7.8% =32.2% to 47.8%
So the proportion of malaria positive individuals in the
population will lie between 32.2% to 47.8%
263
Estimation population proportion differences (1-2)
Example: Two groups each consists of 100 patients who
have leukemia.
A new drug is given to the first group but not to the
second (the control group). It is found that in the first
group 75 people have remission for 2 years; but only 60
in the second group.
Find 95% confidence limits for the difference in the
proportion of all patients with leukemia who have
remission for 2 years. 264
Solution
Point Estimation of1-2
1-2 =1-2=75%-60%=15. That is the proportion difference for
the two groups is 15%
Interval Estimation of1-2
1-2=1-2±1.96*SE(p),
SE(p)= = =0.065*100 = 6.5% =1.96*6.5% = 12.7%
So 15% ± 12.7%= 2.3% to 27.7%
So the population proportion difference will lie somewhere
between 2.3% to 27.7%
Sunday, September 29, 2024 265
Questions ?
Complete Biostatistics (Descriptive and Inferential analysis)

More Related Content

PDF
Advanced Biostatistics Course Note_Summer_Tadesse Awoke 2018.pdf
PPTX
How to choose a right statistical test
PPTX
Implementation Research: A Primer
PDF
Introduction to Systematic Review & Meta-Analysis
PPT
Effect Size
DOC
Introduction To Research
PPTX
Systematic Review & Meta Analysis.pptx
PPTX
Grounded Theory
Advanced Biostatistics Course Note_Summer_Tadesse Awoke 2018.pdf
How to choose a right statistical test
Implementation Research: A Primer
Introduction to Systematic Review & Meta-Analysis
Effect Size
Introduction To Research
Systematic Review & Meta Analysis.pptx
Grounded Theory

What's hot (20)

PPT
DIstinguish between Parametric vs nonparametric test
PPTX
Parametric and Non Parametric methods
PDF
Introduction To Survival Analysis
PPT
Effect size presentation revised
PPSX
Inferential statistics.ppt
PPT
Validity of a screening test
PPTX
Parametric vs Nonparametric Tests: When to use which
PDF
Cohort Study
PPTX
Demographic surveillance
PDF
Basic Biostatistics and Data managment
PPTX
Meta analysis
PDF
Inferential Statistics
PPTX
Case control study - Part 2
PPT
Measures Of Association
PPTX
PPSX
Bias, confounding and fallacies in epidemiology
PPTX
Research design andmethods
PDF
Cross sectional study
PPTX
Introduction to Descriptive Epidemiology.pptx
PPT
Quantitative data analysis
DIstinguish between Parametric vs nonparametric test
Parametric and Non Parametric methods
Introduction To Survival Analysis
Effect size presentation revised
Inferential statistics.ppt
Validity of a screening test
Parametric vs Nonparametric Tests: When to use which
Cohort Study
Demographic surveillance
Basic Biostatistics and Data managment
Meta analysis
Inferential Statistics
Case control study - Part 2
Measures Of Association
Bias, confounding and fallacies in epidemiology
Research design andmethods
Cross sectional study
Introduction to Descriptive Epidemiology.pptx
Quantitative data analysis
Ad

Similar to Complete Biostatistics (Descriptive and Inferential analysis) (20)

PDF
QUANTITATIVE METHODS NOTES.pdf
PDF
BIOSTATISTICS LESSON 1 INTRODUCTION-1.pdf
PPTX
Session 1 and 2.pptx
PPTX
Analyzing ang interpreting quantitative data.pptx
PPTX
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
PPTX
Q3-M7-1styr-howtomakeinquiriesss_(1).pptx
PPTX
Data analysis presentation by Jameel Ahmed Qureshi
PPTX
Biostatistics ppt.pptx teaching handout of statistics
PDF
1Measurements of health and disease_Introduction.pdf
PPTX
Biostatistics
PPTX
Basics in Biostats,applications,types,about in detile
PPTX
Biostatistics, lesson 101 (Introduction).pptx
PPTX
Unit 10 Data Management in epidemiology.pptx
PDF
Construction of composite index: process & methods
PPTX
Chapter one Business statistics referesh
PPTX
Introduction to statistics.pptx
PPTX
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
PPT
NOTES1.ppt
PPTX
Adv.-Statistics-2.pptx
DOCX
Assignment 2 RA Annotated BibliographyIn your final paper for .docx
QUANTITATIVE METHODS NOTES.pdf
BIOSTATISTICS LESSON 1 INTRODUCTION-1.pdf
Session 1 and 2.pptx
Analyzing ang interpreting quantitative data.pptx
Lect1.pptxdglsgldjtzjgd csjfsjtskysngfkgfhxvxfhhdhz
Q3-M7-1styr-howtomakeinquiriesss_(1).pptx
Data analysis presentation by Jameel Ahmed Qureshi
Biostatistics ppt.pptx teaching handout of statistics
1Measurements of health and disease_Introduction.pdf
Biostatistics
Basics in Biostats,applications,types,about in detile
Biostatistics, lesson 101 (Introduction).pptx
Unit 10 Data Management in epidemiology.pptx
Construction of composite index: process & methods
Chapter one Business statistics referesh
Introduction to statistics.pptx
Unit 5 8614.pptx A_Movie_Review_Pursuit_Of_Happiness
NOTES1.ppt
Adv.-Statistics-2.pptx
Assignment 2 RA Annotated BibliographyIn your final paper for .docx
Ad

More from DrAbdiwaliMohamedAbd (19)

PPTX
SESSION 4 ANATOMY & PHYSIOLOGY for health science students.pptx
PPTX
SESSION 3 ANATOMY & PHYSIOLOGY for health science students.pptx
PPTX
SESSION 2 ANATOMY & PHYSIOLOGY for health science students.pptx
PPTX
SESSION 1 ANATOMY & PHYSIOLOGY for health science students.pptx
PDF
Chapter 6 Evaluation: It is the process of determining the worth of a system,...
PPTX
Chapter 1. The action of leading a group of people or an organization. Lpptx
PPTX
BIO-6 Distribution: how data of a variable is arranged..pptx
PPTX
BIOSTATISTICS LESSON 2 MEASURES OF CENTRAL TENDENCY-2.pptx
PPTX
Correlation analysis IN ADVANCED BIOSTATISTICS.pptx
PPTX
Chi-square ppt IN ADVANCED BIOSTATISTICS.pptx
PPT
Chapter Three ANOVA: Analysis of Variance.ppt
PPT
Chapter two Independent sample t test.ppt
PPT
Chapter One Hypothesis SIGNIFICANCE Testing.ppt
PPT
Chapter One Hypothesis SIGNIFICANCE Testing.ppt
PPT
Chapter Three Analysis of Variance (ANOVA).ppt
PPTX
Introduction to environmental health.pptx
PPT
lecture 5; Investigaton of an epidemic.doc (1).ppt
PPT
Lec_2_Data source,collection methods, and tools [Autosaved].ppt
PPT
Lec_1_Application of Epidemiology in Public Health.ppt
SESSION 4 ANATOMY & PHYSIOLOGY for health science students.pptx
SESSION 3 ANATOMY & PHYSIOLOGY for health science students.pptx
SESSION 2 ANATOMY & PHYSIOLOGY for health science students.pptx
SESSION 1 ANATOMY & PHYSIOLOGY for health science students.pptx
Chapter 6 Evaluation: It is the process of determining the worth of a system,...
Chapter 1. The action of leading a group of people or an organization. Lpptx
BIO-6 Distribution: how data of a variable is arranged..pptx
BIOSTATISTICS LESSON 2 MEASURES OF CENTRAL TENDENCY-2.pptx
Correlation analysis IN ADVANCED BIOSTATISTICS.pptx
Chi-square ppt IN ADVANCED BIOSTATISTICS.pptx
Chapter Three ANOVA: Analysis of Variance.ppt
Chapter two Independent sample t test.ppt
Chapter One Hypothesis SIGNIFICANCE Testing.ppt
Chapter One Hypothesis SIGNIFICANCE Testing.ppt
Chapter Three Analysis of Variance (ANOVA).ppt
Introduction to environmental health.pptx
lecture 5; Investigaton of an epidemic.doc (1).ppt
Lec_2_Data source,collection methods, and tools [Autosaved].ppt
Lec_1_Application of Epidemiology in Public Health.ppt

Recently uploaded (20)

PPTX
SKIN Anatomy and physiology and associated diseases
PPTX
Slider: TOC sampling methods for cleaning validation
PPTX
surgery guide for USMLE step 2-part 1.pptx
PDF
Khadir.pdf Acacia catechu drug Ayurvedic medicine
PPTX
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
PDF
Human Health And Disease hggyutgghg .pdf
PPTX
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
PPTX
Important Obstetric Emergency that must be recognised
PPT
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
PPTX
History and examination of abdomen, & pelvis .pptx
PPTX
Fundamentals of human energy transfer .pptx
PDF
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
PPT
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
PPTX
post stroke aphasia rehabilitation physician
PPTX
Gastroschisis- Clinical Overview 18112311
DOCX
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
PPT
ASRH Presentation for students and teachers 2770633.ppt
PPTX
ACID BASE management, base deficit correction
PDF
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
PPTX
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus
SKIN Anatomy and physiology and associated diseases
Slider: TOC sampling methods for cleaning validation
surgery guide for USMLE step 2-part 1.pptx
Khadir.pdf Acacia catechu drug Ayurvedic medicine
NEET PG 2025 Pharmacology Recall | Real Exam Questions from 3rd August with D...
Human Health And Disease hggyutgghg .pdf
Chapter-1-The-Human-Body-Orientation-Edited-55-slides.pptx
Important Obstetric Emergency that must be recognised
CHAPTER FIVE. '' Association in epidemiological studies and potential errors
History and examination of abdomen, & pelvis .pptx
Fundamentals of human energy transfer .pptx
NEET PG 2025 | 200 High-Yield Recall Topics Across All Subjects
1b - INTRODUCTION TO EPIDEMIOLOGY (comm med).ppt
post stroke aphasia rehabilitation physician
Gastroschisis- Clinical Overview 18112311
NEET PG 2025 | Pharmacology Recall: 20 High-Yield Questions Simplified
ASRH Presentation for students and teachers 2770633.ppt
ACID BASE management, base deficit correction
Deadly Stampede at Yaounde’s Olembe Stadium Forensic.pdf
ca esophagus molecula biology detailaed molecular biology of tumors of esophagus

Complete Biostatistics (Descriptive and Inferential analysis)

  • 2. Table of contents Section I Chapter 1: Introduction to Biostatistics Chapter 2: Measures of location Chapter 3; Measures of dispersion Chapter 4; Collection and organization of data Chapter 5; Visualization and presentation of data Chapter 6; Probability and Normal distribution of data Section II Chapter 7; Hypothesis and significance testing Chapter 8; Comparing the significance of two sample and three sample means (z-test, t-test and ANOVA) Chapter 9; Association, correlation and regression Chapter 10: Estimation
  • 3. Sunday, September 29, 2024 3 Statistics – is a branch of mathematics used for collection, analysis and interpretation of data. Biostatistics- is a branch of statistics used for collection, analysis and interpretation of biological data. Chapter One; Introduction to Biostatistics
  • 4. Sunday, September 29, 2024 4 Statistics Use in Health Issues Biostatistics Use in Agricultural Sector Agri-statistics Use in Business Admin Business Statistics Use in Industrial Sector Industrial Statistics Use in Insurance Actuarial Statistics Use in Economic Sector Economic Statistics
  • 5. Sunday, September 29, 2024 5 Biostatistics Types Inferential  Measures of location  measures of central tendency  measures of other location  Measures of Dispersion  range  variance  standard deviation  coefficient of variation  Estimation Point Estimation Interval Estimation  Hypothesis testing  z-test  t test  Anova   2 test  Correlation  Regression Descriptive
  • 6. Sunday, September 29, 2024 6 Describes characteristics of data from a sample. Ex: Mean, standard deviation, frequency, and percentage. Descriptive Statistics
  • 7. Sunday, September 29, 2024 7 Ex: prevalence of malaria among a sample of 150 pregnant women = 40%. Can we estimate Prevalence of malaria among the population? Inferential Statistics
  • 8. Sunday, September 29, 2024 8 Why medicine and health science students need to learn Biostatistics? To be able to:  Conduct research  Identify health problems  Monitor and evaluate health programs
  • 9. Sunday, September 29, 2024 9 Variables • Variable is any characteristic that differs between individuals, Time or place. • Example of Variables: • (1) No. of patients (2) Height • (3) Sex (4) Educational Level
  • 10. Sunday, September 29, 2024 10 Types of Statistical variables 1.Quantitative/Numerical Variable: is a characteristic that can be measured in numbers. Examples: (i) Family Size (ii) No. of patients (iii) Weight (iv) height (v) Age
  • 11. Sunday, September 29, 2024 11 Types of Quantitative Variable a) Discrete Variables: quantitative variables with no decimals or have gabs b/w numbers Examples: Family size, Number of patients, No. of students, parity, gravidity (b) Continuous Variables: Quantitative variables with decimals or have no gabs b/w numbers Examples: Height, weight, income, blood sugar level, creatinine level.
  • 12. Sunday, September 29, 2024 12 2. Qualitative/Categorical Variable: is a characteristic that its values can be divided into categories. No numbers! Example:- Blood type, Nationality, Students Grades, Educational level, e.t.c.
  • 13. Sunday, September 29, 2024 13 Discrete (whole number) Qualitative Quantitative Continuos (Decimal) Variable Type: Nature:
  • 14. Sunday, September 29, 2024 14 Scales of Measurements • Nominal Scale: implies name only no order or rank is involved. E.g. Sex, blood type, institutional departments, nationality. • Ordinal Scale: implies name and order or rank. E.g. Educational Level, military rank, students’ grades. • Interval Scale:0 does implies presence of the characteristics. E.g. Temperature and pH • Ratio: 0 imply absence of the characteristics both Interval and Ratio between two numbers – are meaningful. eg. Height, Weight, age, income
  • 15. Sunday, September 29, 2024 15 Qualitative Quantitative Variable Interval Ratio Type: Scale: Ordinal Nominal
  • 16. Sunday, September 29, 2024 16 Variables Types Scales of measurement
  • 17. Sunday, September 29, 2024 17 Population A population is the largest collection of objects (elements or individuals) in which we want to draw some conclusions. Populations may be finite or infinite.
  • 18. Sunday, September 29, 2024 18 Example: If we are interested to study the socio-demographic characteristics of students in a class, then our population consists of all those students in the class.
  • 19. Sunday, September 29, 2024 19 • Population Size (N): The number of elements in the population is called the population size and is denoted by N. • Ex: 100 students in a class. • Sample: - A sample is a part of a population from which we collect the data. • Ex: 30 students out of 100 students in a class.
  • 20. Sunday, September 29, 2024 20 Population Sample Statistic Parameter
  • 21. Sunday, September 29, 2024 21 Common statistical symbols Title Symbol Sample Mean x Population mean  Sample standard deviation s Population standard deviation  Sample variance s2 Population variance 2 Summation  Correlation coefficient r Coefficient of determination r2 Degree of freedom df
  • 22. Sunday, September 29, 2024 22 Title Symbol Chi-square value 2 Sample proportion p Population proportion ∏ Null hypothesis Ho Alternative hypothesis H1 or HA Sample Size n Type I error  error Type II error  error Power of the test 1- 
  • 23. Sunday, September 29, 2024 23 Chapter Two: Measures of Central tendency and Measures of Other Location a single value around the center of the data used to represent entire data. In a word, measures of central tendency conveys a single information regarding the entire data set.
  • 24. Sunday, September 29, 2024 24 •Measures of central tendency are not calculated from qualitative/categorical data • Measures of Central tendency include I.Mean (average) II.Median III.Mode
  • 25. Sunday, September 29, 2024 25 Mode Mean The average Median The number or average of the numbers in the middle Mode The number that occurs most
  • 26. Sunday, September 29, 2024 26 Mean Mean is the average of the data set. There are four types of mean a.Arthematic mean b.Harmonic mean c.Geometric mean d.Weighted mean
  • 27. Sunday, September 29, 2024 27 Arithmetic mean Arithmetic mean is the most familiar measure of central tendency as it is termed as average or mean. Arithmetic mean uses the symbol (readed as X-bar)
  • 28. 28 Arithmetic mean formula: The sum of all observations divided by the total number of observations. = =sum of all observations, n= total number of observations
  • 29. Sunday, September 29, 2024 29 Example-1 Suppose the pulse rates for 10 individuals was recorded as:- 69,70,71,71,72,72,72,75,76,74 Find mean? solution = = = 72.2 = 72.2bits/minute
  • 30. Sunday, September 29, 2024 30 Example-2 The age 12 selected school and university students were 19,18,14,13,22,25,13,22,12,18,14,16 What is the mean age of the selected students? = = =15.58 = 15.58 years
  • 31. Sunday, September 29, 2024 31 Advantages of mean a) Easy to compute b) Takes all data values into account c) Reliable d) It can be calculated if any value is zero or negative. e) Arranging of data is not necessary. Disadvantages of mean a) Highly effected by the extreme value. b) Can not be calculated for qualitative/categorical data.
  • 32. Sunday, September 29, 2024 32 Median In an ordered array, the median is the “middle” number. If n is odd, the median is the middle number. If n is even, the median is the average of the 2 middle numbers Not Affected by Extreme Values
  • 33. Sunday, September 29, 2024 33 Procedure to find Median for Raw data i. Arrange in order ii. Find middle value  for odd number : (n+1)/2  for even number : 1st middle value= n/2 2nd middle value = (n/2 +1) Median = average of the 1st and 2nd middle values
  • 34. Sunday, September 29, 2024 34 Example-: Data: 4 3 7 4 6 1. Arranged in ascending order: 3 4 4 6 7 2. Since it is odd, The middle = (n+1/2= 5+1/2) = 3rd item The Value in the 3rd item = 4  Median = 4.
  • 35. Sunday, September 29, 2024 35 Example-: x: 4 3 7 4 6 9 Arranged in ascending order: x: 3 4 4 6 7 9 1st middle item = 6/2 = 3rd item 2nd middle item= 6/2= 3+1= 4th item The value of 3rd and 4th items are: 4 & 6 Median = av. of 4 & 6 = (4+6)/2 = 5. Median=5.
  • 36. Sunday, September 29, 2024 36 Advantages of median oA) Easy to compute. oB) Not influenced by extreme values. Disadvantages of median oDifficult to rank large number of data values.
  • 37. Sunday, September 29, 2024 37 Mode • A Measure of Central Tendency • Value that Occurs Most Often • Not Affected by Extreme Values • There May Not be a Mode • There May be Several Modes
  • 38. Sunday, September 29, 2024 38 Mode is the Value that Occurs Most Example-: calculate mode for this data set 2,3,4,3,4,5,4 Solution Mode is 4
  • 39. Sunday, September 29, 2024 39 Advantages Advantages of Mode A) Easy to locate and understand. B) Not influenced by extreme values. C) Is an actual value of the data. Disadvantages of Mode a) Can’t always locate just one mode. b) It does not depend on all observations of the data set.
  • 40. Sunday, September 29, 2024 40 Measures of Other Location
  • 41. Sunday, September 29, 2024 41 Percentiles Percentiles are positional measures that are used to indicate what percent of the data set have a value less than a specified value when the data is divided into hundred parts. Percentiles are not same as percentages. =r r: represents given percentile and n for
  • 42. Sunday, September 29, 2024 42 Deciles Deciles are an other positional measures that are used to indicate how much of the data set have a value less than a specified value when the data is divided into ten parts. =r where r represents given Deciles and n for sample size
  • 43. Sunday, September 29, 2024 43 Quartiles Quartiles are an other positional measures that are used to indicate how much of the data set have a value less than a specified value when the data is divided into four parts. =r • where r represents given quartile (r=1 for Q1, r=2 for Q2 and r=3 for Q3) and n for sample size
  • 44. Sunday, September 29, 2024 44 Example Calculate the 70th percentile, 6th decile and Q3 of the following age data 28, 17, 12, 25, 26,19,13,27,21, 16 Percentiles n=10 r= 70th percentile 1st Order data into ascending 12,13,16,17,19,21,25,26,27,28 =r==7=7.7 digit
  • 45. Sunday, September 29, 2024 45 7.7 lies somewhere between 25 and 26 To find the exact position we use this formula for fraction percentiles P70= decimal*(upper digit value - selected digit value) + selected digit value = 0.7* (26-25=1)= 0.7+25= 25.7 P70 =25.7, this means that 70 percentile of values lie below 25.7 and 30% of the data lies above 25.7
  • 46. Sunday, September 29, 2024 46 Deciles Data ordered: 12,13,16,17,19,21,25,26,27,28 Question: Find 6th decile? Given n=10 r=6 Solution =r = 6=6.6
  • 47. Sunday, September 29, 2024 47 So 6.6 decile lies between 21 and 25 To find the exact position we use this formula for fraction deciles= decimal*(upper digit value - selected digit value) + selected digit value = 0.6 * (25-21=4) +21=23.4 Thus the 6th decile is 23.4 This means that 6 deciles of the data lie below 23.4
  • 48. Sunday, September 29, 2024 48 Quartiles Data ordered: 12,13,16,17,19,21,25,26,27,28 Question: Find 3rd Quartile? Given n=10 formula =r r=3 Solution • =3=8.25 digit
  • 49. Sunday, September 29, 2024 49 So 8.25 decile lies between 26 and 27 To find the exact position we use this formula for fraction quartiles Q3=decimal*(upper digit value - selected digit value) + selected digit value = 0.25* (27-26) + 26 =26.25 Thus Q3=26.25 This means that 3 quartiles (75%) of the data lies below 26.25
  • 50. Chapter Three Measures of dispersion 50
  • 51. Measures of dispersion or measures of variation measure variability a set of observations exhibit. They measure how values spread out from each other. The variation is small when the values are close together. There is no dispersion (variation) if the values are the same 51
  • 52. There are several measures of dispersion, some of which are 1. Range 2.Variance 3.Standard deviation 4.Coefficient of variation Sunday, September 29, 2024 52
  • 53. The range Range is the difference between the largest value (maximum) and smallest value (minimum). Rang (R)=Max-Min Example Find the range for the sample values: 26,25,35,27,29 Sunday, September 29, 2024 53
  • 54. Solution Max=35 Min=25 Range=35-25=10 Notes: I. The unit of the range is the same as the unit of the data II.The range is poor measure as it takes into account only two values (Max and Min) 54
  • 55. The Variance • The variance is one of the most important measures of dispersion. • The variance is a measure that uses mean as point of reference • Sample Variance is taken as symbol (S2 ) S2 = Sunday, September 29, 2024 55
  • 56. • The population Variance is taken as symbol (σ2 ) σ2 = Sunday, September 29, 2024 56
  • 57. Example We want to compute a sample variance of the following sampled health care workers’ income values per week 10, 21, 33, 53, 54 Solution n=5 = = 10+21+33+53+54/5 = 171/5=34.2 Thus = 34.2 USD/week Sunday, September 29, 2024 57
  • 58. S2 = = = 376.7 )2 10 10-34.2 =-24.2 (-24.2)2 =585.64 21 21-34.2 = -13.2 (-13.2)2 =174.24 33 33-34.2 = -1.2 (-1.2)2 =1.44 53 53-34.2 =18.8 (18.8)2 =353.44 54 54-34.2 =19.8 (19.8)2 =392.04 =171 =0 )2 =1506.8 Sunday, September 29, 2024 58
  • 59. • The standard deviation is another measure of deviation. • It s square root of the variance. • Population standard deviation (σ)= √σ2 • Sample standard deviation (S)= √S2 Standard Deviation Sunday, September 29, 2024 59
  • 60. Example We want to compute a sample variance of the following sampled health care workers’ income values per week 10, 21, 33, 53, 54 Solution n=5 S2 =376.7 S=√S2 = √376.7= 19.41 Sunday, September 29, 2024 60
  • 61. Coefficient of variation The variance and standard deviation are useful as measure of variation of the values of a single variable for a single population. If we want to compare the variation of two variables we cannot use the variance or the standard deviation because: I. The variables might have different units. II.The variables might have different means. Sunday, September 29, 2024 61
  • 62. • We need a measure of the relative variation that will not depend on either the units or on how large the values are. • This measure is the coefficient of variation (C.V.). • C.V= x100 Sunday, September 29, 2024 62
  • 63. Example Compare the variability of weights of two groups C.V1= x100 = x100=6.8% C.V2 = x100 = x100=12.5% Since C.V2>C.V1, the relative variability of the 2nd group is larger than the relative variability of the 1st group Groups Mean SD C.V 1st group 66 kg 4.5 kg 6.8 % 2nd Group 36 g 4.5 kg 12.5 % 63 Sunday, September 29, 2024
  • 64. Exercise 1 A student was asked to mention the results of the 5 subjects he/she covered for the last semester and the data was presented as the following: 80, 71, 63, 53, 54 - Now calculate: 1] Range 2] variance 3] Standard deviation 64 Sunday, September 29, 2024
  • 65. Exercise 2 Let us compare the exam results of 2 groups The 1st group: Mean exam result= 75 Standard deviation= 7.5 The 2nd group: Mean exam result= 80 Standard deviation= 9 Calculate the variability of results among the 2 groups? 65 Sunday, September 29, 2024
  • 66. Data: raw, unorganized facts that need to be processed. When data is processed to make it useful, it is called information. 66 Chapter 4; Collection and Organization of data
  • 68. Primary Data: • Definition: data collected firsthand by the researcher. 68
  • 69. Primary data collection methods  Interviews  Observations,  Focus group discussions  Blood, body fluid, urine, feces,  Imaging (X-ray, US, CT, MRI) 69
  • 70. Common primary data collection tools 1. Questionnaires 2. Google form 3. Kobo tool box 70
  • 71. Secondary Data: • Definition: data that has been collected by some one else or institution. 71
  • 73. Organizing data in Array (Ordered Array) • A first step in organizing data is the preparation of an ordered array. • An ordered array is a listing of the values of data in order of magnitude from the smallest value to the largest value 73
  • 74. Ex: the following data related to the age of 6 individuals is arranged in array 55 46 58 54 52 69 Ascending form: 46 52 54 55 58 69 Descending form: 69 58 55 54 52 46 Sunday, September 29, 2024 74
  • 75. Frequency Distribution • The most convenient method of organizing data is to construct a frequency distribution. • A frequency distribution is the organization of raw data in a table form, using classes and frequencies. 75
  • 76. Grouped Frequency Distributions When the range of the data is large, the data must be grouped into classes. Class Boundary Definition: Class Boundary: A class boundaries (lower limit on class –upper limit of the previous class) / 2. The difference between the two boundaries of a class gives the class width. The class width is also called the class size. 76
  • 77. Finding Class Width Class width = Upper boundary - Lower boundary Calculating Class Midpoint or Mark Class midpoint or mark= Sunday, September 29, 2024 77
  • 78. Example: In the following Table gives the weekly earnings of 100 employees of a large company. The first column lists the classes, which represent the (quantitative) variable: weekly INCOME. 78
  • 79. 79 Weekly Income in USD Number of employee (Freq) 801-1000 9 1001-1200 22 1201-1400 39 1401-1600 15 1601-1800 9 1801-2000 6
  • 80. Calculate Class Boundaries, Class Widths, and Class Midpoints for the above data Solution: A class boundary = (lower limit on class – upper limit of the previous class) / 2 = 1001 – 1000 / 2 = 1 / 2 = 0.5 Lower limit ( 801 – 0.5 ) = 800.5 Upper limit ( 1000 + 0.5 ) = 1000.5 Width of the first class = 1000.5 - 800.5 = 200 Midpoint of the first class = = 900.5 80
  • 81. 81
  • 82. Constructing Frequency Distribution Tables Important steps for a Constructing of a frequency Distribution for continuous table. 1.The number of classes depends on the range of the data. Range = largest value – smallest value 82
  • 83. 2. Number of class: Number of class should not be too large or too small. As a general rule, the number of classes should be around where n is the number of data values observed. 83
  • 84. 4. Number of columns: usually there will be two columns in a frequency table: class intervals and frequency. 84
  • 85. Example: the following data represents the number of patients admitted by a hospital in 30 days. Construct a frequency distribution table. 85
  • 86. 86
  • 87. Solution: In this data, the minimum value is 5, and the maximum value is 29. Number of class = = 5 Range = largest value – smallest value = 4.8 5 87
  • 88. Sunday, September 29, 2024 88 Patients admitted Frequency 5-9 3 10-14 6 15-19 8 20-24 8 25-29 5 Total frequency: 30
  • 89. Example: Calculate the class boundaries relative frequencies and percentages for the table in the previous example 89
  • 90. 90 Patients admitted Frequency Relative frequency Percentage (%) 5-9 3 3/30= 0.1 0.1x100= 10 10-14 6 6/30= 0.2 0.2x100= 20 15-19 8 8/30= 0.267 0.267x100= 26.7 20-24 8 8/30= 0.267 0.267x100= 26.7 25-29 5 5/30= 0.167 0.167x100= 16.7 Total 30 1 100
  • 91. Cumulative Frequency Distribution A cumulative frequency distribution gives the total number of values that fall below the upper boundary of each class. 91
  • 92. Example: Calculate cumulative frequency and cumulative percentages for the table in the previous example 92
  • 93. Sunday, September 29, 2024 93 Patients admitted Frequency Cumulative relative frequency Percentage (%) Cumulative Percentage 5-9 3 3/30=0.100 0.1x100= 10 10 10-14 6 9/30=0.300 0.2x100= 20 30 15-19 8 17/30=0.567 0.267x100= 26.7 56.7 20-24 8 25/30=0.833 0.267x100= 26.7 83.3 25-29 5 30/30=1 0.167x100= 16.7 100 Total 30 100
  • 94. Ungrouped frequency distribution of numerical data Data that has not been organized into groups. Also called raw data. Ungrouped data can be either numerical or categorical. 94
  • 95. Creating a Numerical Ungrouped Frequency Distribution table Step 1- arrange the data in an ascending array. Step 2- count the frequency of each value. Step 3- create a table Step 4- insert the data values in the table 95
  • 96. Example: Blood Pressure Readings of 8 individuals. 120, 130, 130, 125, 140, 140, 140, 122. create a frequency distribution table for this data. 96
  • 97. Step 1- arrange the data in an ascending array. 120, 122, 125, 130, 130, 140, 140, 140. Step 2- count the frequency of each value. 120 (1), 122 (1), 125 (1), 130 (2), 140 (3). 97
  • 98. Step 3- create a table Step 4- insert the data values in the table
  • 99. Creating a Categorical Frequency Distribution table Step 1-count the frequency of each value. Step 2-create a table Step 3-insert the data values in the table 99
  • 100. Example of ungrouped categorical data related to the blood types of 20 individuals: • Blood Types: A, B, O, AB, O, A, B, A, O, B, AB, A, O, B, B, A, O, AB, B, A 100
  • 101. Step 1- count the frequency of each category. A= 6 individuals B= 5 individuals AB= 5 individuals O= 4 individuals 101
  • 102. Step 2-create a table Step 3-insert the data in the table Blood Type Frequency A 6 B 5 O 5 AB 4 Total frequency 20 102 :
  • 103. Relative Frequency and Percentage Distributions Shows what fractional part of the total frequency belongs to the corresponding category. The relative frequency of a category is obtained by dividing the frequency of that category by the sum of all frequencies. 103
  • 104. 104
  • 105. The percentage for a category is obtained by multiplying the relative frequency of that category by 100. A percentage distribution lists the percentages for all categories. Calculating Percentage • Percentage = (Relative frequency) 100 105
  • 106. Example: Determine the relative frequency and percentage distributions for this data. 106
  • 107. Chapter five Visualization and presentation of data 107
  • 108. Techniques of Data presentation Data can be presented in:-  Tabular  Graphical Sunday, September 29, 2024 108
  • 109. Tabular data presentation A table contains data in rows and columns. Types of Tables 1. Univariate table 2.Bivariate table 3.Multivariate table 109
  • 110. Age Frequency Percentage 21-26 6 30 27-32 6 30 33-38 2 10 39-44 3 15 45-50 3 15 Total 20 100 Univarate Table-2: Age 110
  • 111. Age Male Femal e Total 21-26 1 5 6 27-32 3 3 6 33-38 0 2 2 39-44 3 0 3 45-50 1 2 3 Bivariate Table-1: Sex and Age 111
  • 112. Multivariate Table-3: Age, sex and residence Sunday, September 29, 2024 112 Gender__ Age Male Female Total Urban Rural Urban Rural 21-26 1 2 5 1 9 27-32 3 2 3 2 10 33-38 0 1 2 1 4 39-44 3 2 0 2 7 45-50 1 3 2 1 7 Total 8 10 12 7 37
  • 113. Graphical presentation of data Tabulation is an important systemic presentation of data but often data is easily revealed by diagrams or graphs. Sunday, September 29, 2024 113
  • 114. Types of graphical presentation Data Type Type of Table Qualitative Univariate Simple Bar Components Bar Pie chart  multiple pie chart Quantitative Histogram Line graph/chart 114
  • 115. Simple bar Simple bar chart is used for presenting Univariate qualitative data. • Bar charts have horizontal axis called X- axis and Vertical axis called Y-axis • Categories are putted on X-axis and percentage or Frequency on Y-axis 115
  • 117. Component Bar • To draw component bar, divide 100% into components equal to the number of categories of the variable you want to draw. Sunday, September 29, 2024 117
  • 119. Pie chart A pie chart is circular statistical graph, which divides the data into slices to illustrate numerical proportion of each category. Sunday, September 29, 2024 119
  • 121. Multiple bar chart • A multiple bar chart is a type of bar chart tat is used for bivariate qualitative data. • Using this data construct Multiple bar chart.? Sex Diabetes No diabetes Male 3 5 Female 8 4 Total 11 9 121
  • 123. Graph for Quantitative variables Graphs used to present quantitative univariate variables include:- • Histogram, • Line graph/Line chart 123
  • 124. Histogram • Histogram is the common graph for quantitative variables. • It is similar to bar chart except that there is no gaps between its bars Sunday, September 29, 2024 124
  • 125. 50 100 150 200 0 1 2 3 4 5 6 7 8 9 Histogram Sunday, September 29, 2024 125
  • 126. Chapter Six: Probability and Normal distribution of data Probability is the likelihood of occurrence of an event and is measured by the proportion of times an event occurs. An event is taken by “E”; the number of times event occurs is taken by “n” and all possible events (outcomes) is taken by “N” P(E) = or P(E) = n/N 126
  • 127. EXAMPLE: 1 A coin is tossed, what is the probability of getting head? Coin has two outcomes, head and tail, so total outcomes (N) is 2 There is only one head, so event (head) =1 P(Head) = = P(Head) = = 0.5 The probability of getting head if coin is tossed is 0.5 or 1/2 Sunday, September 29, 2024 127
  • 128. EXAMPLE: 2 OPD attendance of a hospital is shown in here What is the probability a randomly selected individual has diabetes? What is the probability a randomly selected individual has hypertension? Diseases Frequency Diabetes 80 Hypertension 40 Total 120 128
  • 129. Solution • P(Diabetes) = = P(Diabetes) = = 0.67 • P(Hypertension) = = P(Hypertension) = = 0.33 129
  • 130. Characteristics of Events Events possess certain characteristics, which are:- a. Mutually exclusive events b.Mutually non-exclusive events c. Independent events d.Dependent events 130
  • 131. Mutually exclusive Events • Events of a trail are called mutually exclusive if an only one event occurs in each single trail. This means that events cannot occur simultaneously that if one event the other can occur. • Example: suppose if a coin is tossed, for any toss (trail) there is only one event (either head or tall). 131
  • 132. Mutually non-exclusive Events events which can occur simultaneously, for example an individual can have only diabetes or only hypertension or both diabetes and hypertension at same time, these events which can occur simultaneously are called mutually non-exclusive. Sunday, September 29, 2024 132
  • 133. Example: Suppose in OPD attendance there are two categories, people with Diabetes and people with hypertension. However there some people who have both Diabetes and Hypertension. Thus events like Diabetes and Hypertension are considered as Mutually Non-exclusive events 133
  • 134. Independent Events if A and B are two events of a particular trail and the outcome of event A does not effect and is not effected by the outcome of event B then A and B are called Independent Events. For example: if you toss two coins, the outcome of one first toss (head or tail) is will not affect and it is not affected by the outcome of the second toss. 134
  • 135. Dependent Events: • If outcome of event A influences outcome of event B or B affects A, event A and event B are considered as dependent events. Example: • Having smoking and lung cancer • Driving a car and getting in a traffic accident • Robbing a bank and going to jail. 135
  • 136. Properties of probability Probability is expressed in proportion. So it takes any value between 0 to 1. However you can show it in percentage, that is it can take 0 to 100%. Probability of 1 means that event is certain to occur (E.g. probability of dying). Probability of 0 means that event is certain not to occur (E.g. probability not dying). 136
  • 137. A probability of 0.5 means that events have equal chance of occurrence. The higher the probability value, the higher the chance of occurrence and the smaller the probability value, the lower the chance of occurrence. The sum of probability of all events must be equal to 1 or 100% 137
  • 138. Types of probability According to the time of occurrence of events probability is categorized as :- Priori probability: is calculated before the occurrence of event by logically examining the existing knowledge. It usually deals with the independent events. For example probability of having head or tail is 1/2 or 0.5 138
  • 139. Posteriori probability: is calculated after the occurrence of the event, that is it is based on frequency of occurrence. For example: number of hypertensive in a sample of 100 patients. 139
  • 140. Rules of probability There are two basic rules in probability i. Addition Rule ii. Multiplication Rule Sunday, September 29, 2024 140
  • 141. Addition Rule This rule applies to both mutually exclusive and mutually non-exclusive events of a single random variable. This rule is characteristics by the term “or” (sometimes as means of union) in between the two ∪ events E.g. P(A or B) sometimes also shown as P(A ∪ B) For mutually exclusive Events P(A or B) = P(A) + P(B) For mutually non-exclusive Events P(A or B) = P(A) + P(B) - P(AB) 141
  • 142. Example 1 (mutually exclusive Events) A single 6-sided die is rolled. What is the probability of rolling a 2 or a 5? Solution Since 2 and 5 are mutually exclusive , the P (2 and 5) =0 P(2) = 1/6 , P(5) = 1/6 P(2 or 5) = P(2) + P(5) =1/6+1/6 =2/6 =1/3= 0.333 142
  • 143. Example 2 (mutually exclusive and mutually non exclusive Events) Suppose patients attending a hospital OPD are categorized as in the following table. Disease No. of patients Eye disease 5 Respiratory disease 15 Only Diabetes 90 Only Heart disease 30 Both Diabetes and Heart disease 10 Total 150 143 Sunday, September 29, 2024
  • 144. If person is drawn at random a. What is the probability that he/she will have Eye disease or Respiratory disease b.What is the probability that he/she will have Diabetes or heart disease Sunday, September 29, 2024 144
  • 145. Solution a.Eye disease or Respiratory disease (mutually exclusive In here) • Patients with eye disease =5 • Patients with respiratory disease=15 • Total patients =150 P(eye disease or respiratory disease) = 5/150+15/150 = 0.13 Sunday, September 29, 2024 145
  • 146. b. Diabetes or Heart Disease (mutually Non- exclusive In here) • Patients with diabetes =90+10=100 • Patients with Heart disease=30+10=40 • Total patients =150 P(Diabetes or Heart disease) = P(Diabetes) + P(Heart disease) - P(Diabetes and Heart Disease) P(Diabetes or Heart disease) = 100/150 + 40/150 - 10/150 =0.87 146
  • 147. Normal Distributions of data In the normal distribution, observations are more clustered around the mean. Normally almost half of the observations lie above the mean and half below the mean and all observations are symmetrically distributed on each side of the mean. 147
  • 148. Characteristics of Normal Curve/Distribution a) The normal curve is symmetrical and bell shaped b) Maximum values at the centre and decrease to zero systemically on each side c) Mean, median and mode are all equal • Mean ± 1SD limits includes 68.2% of all observations • Mean ± 2SD limits includes 95% of all observations • Mean ± 3SD limits includes 99.7% of all observations
  • 149. Sunday, September 29, 2024 149 Normal Curve
  • 150. Skewed Distributions Distributions that are not symmetric and have long tail in one direction are called Skewed Distributions. In skewed distribution, most values are closer to one end and relatively few values in the other direction. 150
  • 151. 151 Positively Skewed Distributions If the tail of the distribution extend to the right (positive side), the distribution is called Positively Skewed Distribution or right skewed distribution. In right skewed distributions, majority of the values lie at the left part of the distribution.
  • 152. 152 Negatively Skewed Distributions If the tail of the distribution extend to the left (negative side), the distribution is called negatively Skewed Distributions or left skewed distributions. In left skewed distributions, majority of the values lies at the right side of the distribution Sunday, September 29, 2024
  • 153. 153 Left and Right Skewed Examples Sunday, September 29, 2024
  • 156. Inferential Biostatistics Descriptive statistics remains local to the sample, describing its central tendency and variability while inferential statistics focuses on making statements about the population. 156
  • 157. Statistics Vs. Parameter Statistics(Sample value) • Mean () • Variance (2 ) • Standard deviation () • Proportion () Parameter (population value) • Mean (μ) • Variance (2 ) • Standard deviation () • Proportion ( , , Sunday, September 29, 2024 157
  • 158. Chapter Seven Hypothesis and significance testing 158
  • 159. Test of significance is the determination of whether a result is statistically significant or if it could have occurred by chance. 159
  • 160. Hypothesis It is researchers assumed answer for relationship between two variables or the significance of a test result. There are two statistical hypotheses:- a.Null Hypothesis b.Alternative hypothesis Sunday, September 29, 2024 160
  • 161. Null Hypothesis it states that there is no real difference between statistic and parameter, say sample mean = population mean. Any observed difference is just by chance. Null hypothesis is donated by the symbol of H0. Sunday, September 29, 2024 161
  • 162. Alternative hypothesis Alternative hypothesis: it states that there is real difference between statistic and parameter, say sample mean ≠ population mean. Alternative hypothesis is donated by the symbol of H1 or Ha. H0 = µ1=µ2 Ha.= µ1 ≠ µ2 • When Null hypothesis is rejected, alternative hypothesis is accepted. 162
  • 163. P-Value P-value indicates the amount of support possessed by the null hypothesis. As the p-value which lies between 0%-100%) approaches to 0, the support (for H0) becomes weaker and weaker while as it approaches to 100, the support is stronger and stronger. Sunday, September 29, 2024 163
  • 164. Level of significance In order to decide whether the support is strong or weak we need some cut-off value or level. This cut-off value or level is known as level of significance denoted by α. Sunday, September 29, 2024 164
  • 165. Internationally accepted levels of Significance •10% (or 0.1) •5% (or 0.05) •1% (or 0.01) The most commonly used is 5% (or 0.05) 165
  • 166. The zone of the null hypothesis acceptance 1] If the calculated value is less than the tabulated value, the null hypothesis is accepted and alternative hypothesis is rejected. (Calculation based) 2] If the support of the null hypothesis (p- value ≥0.05) the null hypothesis is accepted and alternative hypothesis is rejected. (Computer Based) 166
  • 167. The zone of the null hypothesis rejection 1] If the calculated value is greater than the tabulated value, the null hypothesis is rejected and alternative hypothesis is accepted. (Calculation based) 2] If the support of the null hypothesis (p- value) is less than the most commonly used significance level (p-value <0.05) the null hypothesis is rejected and alternative hypothesis is accepted (Computer Based) 167
  • 168. One-Tailed and Two-Tailed Tests One-Tailed Test The null hypothesis can be tested using either one-tailed or two tailed tests. A test involving null hypothesis that favors only one direction is called one tailed test. Example: suppose a study compares two drugs, drug A and Drug B. Sunday, September 29, 2024 168
  • 169. So null hypothesis (H0) = Drug A is not more effective than Drug B. and alternative hypothesis (Ha) = Drug A is more effective than Drug B. H0 Drug A = Drug B Ha. Drug A > Drug B Sunday, September 29, 2024 169
  • 170. Two-tailed Test In Two-tailed Test deviation of both directions are considered when testing. For example: in the previous example of comparing the effectiveness of Drug A and Drug B. The two tailed null hypothesis and alternative hypothesis will be as H0 = Drug A and Drug B has same effect. Ha = Drug A and Drug B has no same effect or in short way: H0 Drug A=Drug B Ha. Drug A ≠ Drug B Sunday, September 29, 2024 170
  • 172. Steps for Hypothesis Testing a) Describe the given data b) State the assumptions (assumption is unexamined belief) c) State Null and Alternative Hypothesis d) State Level of significance e) Choose test statistic (z-test, t-test, ANOVA, X2 ) f) Compute the test statistic Sunday, September 29, 2024 172
  • 173. G) Look the tabulated test statistic responding to significance level or degree of freedom or p- value and compare the calculated test statistic. Or p-value. If the calculated test statistic > the tabulated test statistic Otherwise we will not reject (accept) Null hypothesis. H) Decision: Reject or accept the Null hypothesis. I) Conclusion: conclude in the language of the accepted hypothesis. 173
  • 174. Chapter Eight Testing the significance difference between two and three sample means Sunday, September 29, 2024 174
  • 175. Testing the significance difference between two sample means When we want to determine that the difference between two group means are significant (large enough) or insignificant (only due to chance) we do Z-test or t-tests. Here are the decision criteria for using Z- test or t-tests 175
  • 177. Z-test (normal test) Z or z = Tabulated z values Significance level (α) Two-tailed 1-(alpha/2) One-tailed,> 1-alpha One-tailed, < 1-alpha 10% (or 0.1) 1.64 1.28 -1.28 5% (or 0.05) 1.96 1.64 -1.64 177
  • 178. Example The mean birth weight of babies born on large community over several years was 2470 gram and standard deviation of 230 gram. Following implementation of ANC program, the mean birth weight obtained from a sample of 40 babies was 2560 gram and standard deviation of 250 gram. Does the ANC program has any impact on birth weight of the new born babies? 178
  • 179. Solution Data: Given=2470gm, 2560 gm, σ = 230gm, s=250gm, n=40 Assumption: a)birth weight of the baby population is normally distributed b) Sample was selected at random Hypothesis: H0: =2470gm (mean birth weight of the populations will not change even after ANC). Ha: ≠2470gm (mean birth weight of the populations will change after ANC). Level of significance (α): 5% (0.05) Choose Test statistic: since σ is known, we do Z-test 179
  • 180. Compute the test statistic Z = Z Compare the calculated Z to the Tabulated z : Tabulated z with 5% level of significance is 1.96 Decision: we reject Null hypothesis since the calculated z (2.47)> the tabulated z(1.96) Conclusion: the mean birth weight of baby born has increased after ANC program implementation. 180
  • 181. Example-2 The Hemoglobin level of children was measured in 143 girls and 127 boys with known population SD. Here are the results. Here girls have Hb level than boys on average, so the question is whether the observed difference is significant or not? Girls Boys Mean 11.2 11.0 SD 1.4 1.3 n 143 127 Sunday, September 29, 2024 181
  • 182. Solution • Data: Given,, s1 = 1.4 s2=1.3, n1=143, n2=127 • Assumption: a)HB level of the population is normally distributed • b) Sample was selected at random • Hypothesis: H0: (any observed difference is due to by chance alone). Ha: : (mean Hb Level of girls and boys are significantly differ) • Level of significance (α): 5% (0.05) • Choose Test statistic: since n>30, we do Z-test 182
  • 183. Compute the test statistic z = = = 0.2/0.14119=1.413 Compare the calculated Z to the Tabulated z with 5% level of significance : Tabulated z with 5% level of significance is 1.96 Decision: we accepted Null hypothesis since the calculated z (1.413) is <the tabulated z(1.96) Conclusion:mean Hb Level of girls and boys are not significantly different. 183
  • 185. t Test is a test for comparing means of one sample as well as means of two sample situations. Types of t test a) One sample t test b) Independent sample t test c) Paired sample t test 185
  • 186. One sample t test • One sample t test is used to test whether a population mean is significantly different from some hypothesized value. • t = • is sample mean, m is the hypothesized value, s = is sample SD and n = is sample size 186
  • 187. Example : A professor of Statistics wants to know whether if his introductory statistics class has a good grasp of basic math. Six students were chosen at random from the class and given a math proficiency test. The professor wants the class to be able to score above 70 on the test. The six students get scores of 63, 93, 75, 68, 83, and 92. with SD of 13.17. Can the professor have 95% Confidence that the mean score for the class on the test would be above 70? 187
  • 188. Since the population standard deviation is not known, we use t test. Solution H0= == 63+93+75+68+83+92/6 = 79 M= above 70 t = t = = s = = 13.17 188
  • 189. Solution t = df = n-1 = 6-1=5 Note that we are testing only whether the average mean of score of students is greater than 70, so we are dealing with one tailed t-test. 189 Sunday, September 29, 2024
  • 190. The tabulated t test with 5% significance level and df of 5 is 2.015 Thus the calculated t-test (1.67) is less than the tabulated t-test with df=5 at 5% level of which is 2.015. (Calculated t<tabulated t0.05,5)so the null hypothesis is accepted Sunday, September 29, 2024 190
  • 191. Independent sample t-test Independent sample t-test is used to test the means of two independent groups. Usually a qualitative Dependent variable with two categories and quantitative continues independent variable. Such as the height of male and females, blood pressure of two groups. Example to test whether male income and female income are different or not. t =
  • 192. Ex: Here is the blood pressure of male and female patients. The question is whether the blood pressure of the patients differs? Solution H0=Ha= t = t = Male Female n 25 25 155 160 S 10 8 192
  • 193. Df = n1+n2-2 =25+25-2=48 at 5% significance level, the tabulated t =2.021 Thus ignoring the sign t calculated < t tabulated, so null hypothesis is accepted. We can conclude that the two means (the mean male blood pressure and the mean female blood pressure) are not significantly different. 193
  • 194. Paired sample t test Paired sample t test is used to test the mean difference of two dependent observations, such as blood pressure before exercise and blood after exercise for a group of individuals. In independent t test we were interesting between group differences but in paired t test we are interesting within group difference. , where is the mean difference the two pairs (eg. before and after) = 194
  • 195. Example Here is the temperature of 8 individuals before and after the treatment Patient Before (X) After (Y) 1 25.8 24.7 2 26.7 25.8 3 27.3 26.3 4 26.1 25.2 5 26.4 25.5 6 27.4 26.6 7 27.1 26.0 195
  • 196. Solution Lets first calculate d and d2 Patient Before (X) After (Y) d=x-y d2 1 25.8 24.7 1.1 1.21 2 26.7 25.8 0.9 0.81 3 27.3 26.3 1.0 1.00 4 26.1 25.2 0.9 0.81 5 26.4 25.5 0.9 0.81 6 27.4 26.6 0.8 0.64 7 27.1 26.0 1.1 1.21 196
  • 197. • = 7.9/8=0.98 • sd= • (Variance of d)= • sd= =0.1 • 197
  • 198. The tabulated t value with df 8-1=7 at 5% significance level is 2.365, so the calculated t>tabulated t with 7df at 5% significance level. Decision: Null hypothesis is rejected and alternative hypothesis is accepted. We conclude that the temperature of the individuals before and after treatment is not the same.
  • 199. Analysis of Variance (ANOVA or F test) 199
  • 200. Analysis of Variance (ANOVA or F test) Analysis of variance is statistical methods of analyzing data with objective of comparing three or more group means. It replaces t-test that comparing two group means only. Analysis of variance is sometimes called F test, after the British R A Fisher (the British Statistician who developed this test). 200
  • 201. One way ANOVA: used when we have One continues dependent variable and one categorical independent variable with more two categories, to compare the means of these groups. Example: If we want to know whether people residing three different areas (Rural, Urban and Semi-urban) earn different incomes 201
  • 202. How to calculate One- Way ANOVA 1) F = MSSBG/ MSSWG 2) SST = or SSBG +SSWG 3) SSBG = = 4) SSWG= SST - SSBG Sunday, September 29, 2024 202
  • 203. 5) MSSBG = 6) MSSWG= 7) F test = Sunday, September 29, 2024 203
  • 205. Example Three different treatments are given to 3 groups of patients with anemia. Increase in HB% level was noted after one month and is given in Table 2.0. we are interested to find whether the difference in improvement in3 groups is significant or not. 205
  • 206. Three different treatments are given to 3 groups of patients with anemia. Increase in HB% level was noted after one month and is given in Table below. we are interested to find whether the difference in improvement in 3 groups is significant or not. Sunday, September 29, 2024 206
  • 207. Group A Group B Group C x1 x2 x3 3 3 3 1 2 4 2 2 5 0 3 4 1 1 2 2 3 2 Sunday, September 29, 2024 207
  • 208. Solution Group A Group B Group C Group A Group B Group C x1 x2 x3 x1 2 x2 2 x3 2 3 3 3 9 9 9 1 2 4 1 4 16 2 2 5 4 4 25 0 3 4 0 9 16 1 1 2 1 1 4 2 3 2 4 9 4 2 2 4 4 4 16 =11 =16 =24 2 =23 2 =40 2 =90 =23+40+90= 153 =11+16+24= 51 Sunday, September 29, 2024 208
  • 209. SST = = = =29.14 SSBG = = = =12.28 4) SSWG= SST - SSBG =29.14-12.28=16.86 5) MSSBG = = = 6.14 Sunday, September 29, 2024 209
  • 210. Source of variation Degree of freedom SUM of Squares Mean of Squares F Between Groups K-1 = 3- 1= 2df 12.28 6.14 6.53 With in n-K= 21- 3=18 16.86 0.94 6) MSSWG= = =0.94 7) F = =6.53 Sunday, September 29, 2024 210
  • 211. Interpretation The tabulated F value at df 2,18 is 3.55 at 5% level of significance. Our calculated F value is 6.53, that is our calculated F value is greater than the tabulated F value (F calculated > F tabulated= 6.53> 3.55). Thus the null hypothesis is rejected. Hence we conclude at least one of the groups has a significant increase of HB% 211
  • 212. Chapter Nine Association, Corrélation and prédictions 212
  • 214. A chi square (χ2) test is useful in making statistical association about two independent categorical variables in which the categories are two and above (but usually two). 214
  • 215. 215
  • 216. df= (r-1) (c-1), r=number of rows, c=number of columns Example Suppose a researcher wants to test if the knowledge of people is associated with service utilization. He conducted a sample survey of 100 individuals of which 78 had High level of knowledge. Sunday, September 29, 2024 216
  • 217. Of these 78 who had god knowledge, 50 were service user. Whereas 22 who had low knowledge level, 10 of them used service. Do these data provides evidence of association between knowledge level and service utilization? 217
  • 219. 2. Assumption: data follows a normal distribution and the sample was drawn randomly. 3. Hypothesis: Ho. There is no association between “knowledge level” and “service utilization” Ha. There is association between “knowledge level” and “service utilization” 4. Level of significance: α=5% (0.05) Sunday, September 29, 2024 219
  • 222. 7. Compute the degree of freedom (df) df= (r-1) (c-1)= (2-1)(2-1) =1df 8. Tabulated Value of χ2: with df=1 and 5% level of significance =3.84 9. Compare the computed value with tabulated value: calculated χ2 (2.481)<Tabulated χ2 (3.84) 10. Decision: H0. Is accepted 11. Conclusion: the data does not provide evidence of association between knowledge level and service utilization Sunday, September 29, 2024 222
  • 224. When one quantitative variable changes with the change of other quantitative variable they are said to be correlated. The variable that changes the other variable is called Independent variable (IV) and the variable that is changed is called Dependent (DV). The DV is represented by Y and IV is represented by X. 224
  • 225. Example: Income and Age are both quantitative. They are correlated because when age changes the income changes as well. Therefore Age is (X=IV) while income is (Y=DV). When the change occurs in fixed rate it is called linear correlation. The correlation between one DV and One IV is called Simple correlation. E.g. correlation between Income and Age Sunday, September 29, 2024 225
  • 226. The correlation between one DV and more IVs is called multiple correlation. E.g. correlation between Income, Age and family size. Correlation Coefficient (r) To calculate the correlation between variables, we use a measure called correlation coefficient (r) Sunday, September 29, 2024 226
  • 227. 227
  • 228. Characteristics of relationship The correlation coefficient (r) indicates both the strength and direction of relationship. Strength (Magnitude) of the relationship: When correlation coefficient is zero it indicates no correlation. <=0.3= weak correlation. 0.4-0.6= Moderate correlation. 0.7-1= Strong correlation Sunday, September 29, 2024 228
  • 229. When the correlation coefficient is one (either + or -) it indicates a perfect correlation. As r approaches to 1(either + or -), the strength of the relationship increases. 229
  • 230. Direction of relationship: the relationship can be positive, negative or no correlation. Positive correlation is when the two variables move the same direction (increase or decrease together). E.g. Gestational period and birth weight. This is when r=+ve Negative correlation: is when the two variables move on different directions (when one increases the other decreases) E.g. Age and Eye sight. This is when r= -ve Sunday, September 29, 2024 230
  • 231. No correlation: is when the change in one variable does not influence the change in another variable. E.g. Age and Sex. This is when r=0 Example: Suppose 4 person were selected as a sample to determine the correlation between weight and height Sunday, September 29, 2024 231
  • 232. Weight in Pound (Y) Height in inches (X) Y2 X2 XY 240 73 57600 5329 17520 210 70 44100 4900 14700 180 69 32400 4761 12420 160 68 25600 4624 10880 ∑y: 790 ∑X: 280 ∑Y2 : 159700 ∑X2 : 19614 ∑XY: 55520 Sunday, September 29, 2024 232
  • 234. Interpretation There is a very strong positive correlation between the weight and height of the respondents. Sunday, September 29, 2024 234
  • 235. Coefficient of Determination (r2) The square value of r is called coefficient of determination. The coefficient of determination (r2) measures the amount of variability in Y (DV) is explained by X (IV). Coefficient of Determination (r2) is shown as percentage. Sunday, September 29, 2024 235
  • 236. Example: for the above example correlation coefficient (r) is 0.97, thus coefficient of determination (r2) is 0.97x0.97=0.94x100 = 94% Interpretation 94% of the variability in the weight (DV) is explained by the height (IV). This means the remaining 6% variability in weight is responsible by other variables but not by height. Sunday, September 29, 2024 236
  • 237. Correlation Significant Test To test the significance of the correlation value we use the following formula to find calculated T-value t= 0.97*5.77= 5.6 (calculated t-value) Sunday, September 29, 2024 237
  • 238. Then we go to dependent t-test assuming the significance level of 0.05 we look for Degree of freedom which is in here calculated as n-1 then we go to T-TABLE and look for the junction between the significance level and the degree of freedom and we find the tabulated T-value. The tabulated t-value with two tailed test of 0.05 significance level and a degree of freedom of 3 is: 3.182 Sunday, September 29, 2024 238
  • 239. Since the calculated t-value of 5.6 is > the tabulated t-value of 3.182, the null hypothesis is rejected. Therefor we can conclude that there is a significant, very strong positive correlation between the height and weight of our participants. 239
  • 240. Regression Analysis: A statistical procedure used to find relationships among a set of variables In regression analysis, there is a dependent variable, which is the one you are trying to explain, and one or more independent variables that are related to it. Sunday, September 29, 2024 240
  • 241. REGRESSION TYPES 1) Linear regression = quantitative DV A) simple (1 dv and 1 IV) B) multiple (Multiple IV and 1 DV) 2) Logistic regression= qualitative DV A) Binary = DV with 2 categories simple (1 dv and 1 IV) multiple (Multiple IV and 1 DV) B) Multinomial = DV with > 2 categories C) Ordinal = DV which is ordinal. Sunday, September 29, 2024 241
  • 242. Linear Regression: Linear regression is used when the dependent variable is continuous and assumes a linear relationship with the independent variables. It aims to find the best-fitting line that represents the relationship between the dependent variable and one or more independent variables. Sunday, September 29, 2024 242
  • 243. For example, a study might use linear regression to determine the relationship between smoking behavior (independent variable) and lung function (dependent variable) among a sample of individuals. Sunday, September 29, 2024 243
  • 244. Logistic Regression: Logistic regression is used when the dependent variable is categorical or binary. It models the probability of an event occurring or the likelihood of an outcome belonging to a particular category. The dependent variable is usually binary (e.g., yes/no, success/failure), but it can also be multinomial (more than two categories) or ordinal (ordered categories). Sunday, September 29, 2024 244
  • 245. Why is regression analysis superior compared with chi-square and correlation 1. Prediction capability: Regression analysis allows for prediction that can estimate the value of the dependent variable based on the values of the independent variables. 245
  • 246. 2. Handling both categorical and numerical variables 3. Control of confounding variables: Regression analysis enables researchers to control for the effects of confounding variables by including them as independent variables in the model. Sunday, September 29, 2024 246
  • 247. Confounding variables: are factors that are associated with both the independent variable(s) and the dependent variable in a study. Age is frequently a confounding variable in health studies. Ex: if studying the association between a specific medication and heart disease risk, age must be considered as a confounding variable because older individuals are more likely to have both higher heart disease risk and higher medication usage Sunday, September 29, 2024 247
  • 248. Regression equation: Beta0 + Beta1*X Y= Dependent variable X= Independent variable Beta 0 (CONSTANT) = (the value of Y when X is zero). It shows how much DV is if IV is 0. Beta 0 formula= Y-bar – beta 1 * X-bar Sunday, September 29, 2024 248
  • 249. Beta 1 (Regression co-officient/INTERCEPT) It measures the amount of change in DV (Y) for any change in IV (X). It represents the relationship between IV and DV. Beta1= • ∑xy – (∑x * ∑y) n ∑X2 - (∑X2 /n) 249
  • 250. Example 1. The height and weight of 4 individuals were given as presented in the following table. Let us predict how much the weight (DV) of an individual could be if his height (IV) is 80 inches. Sunday, September 29, 2024 250
  • 251. Weight in POUND (y) Height in inch (x) Y 2 X2 Xy 240 73 57600 5329 17520 210 70 44100 4900 14700 180 69 32400 4761 12420 160 68 25600 4624 10880 ∑y= 790 ∑x= 280 ∑y2 = 159700 ∑x2 =19614 ∑xy= 55520 251
  • 252. Beta 0 formula= Y-bar – beta 1 * X-bar= 197.5 -15.7 * 70 = -- - 901.5. Interpretation of Beta 0: if height is 0 the weight will be = -901.5 (a value that does not exist) = 0 252
  • 253. Beta1= • ∑xy – (∑x * ∑y) = 55520 – (280 *790) n 4 = 220 = 15.7 . ∑X2 - (∑X2 /n) = 19614 – (2802 /4) = 14 Beta1= 15.7 Interpretation of Beta 1: for any unit (inch) change in height there will be 15.7 unit (pounds) change in weight. 253
  • 254. Regression equation: Beta0 + Beta1*X -901.5+15.7*80 = 354.5 Interpretation of regression result: based on the distribution of this data If height is 80 inches the weight will be 354 pounds. Sunday, September 29, 2024 254
  • 255. Chapter TEN Estimation Estimation is a procedure to find values of a parameter based on the value of statistic. There are various techniques available for different situations. We shall, however, limit our discussions on two estimations. There are two types of estimation:- –Point Estimation –Interval estimation 255
  • 256. Point Estimation Point Estimation occurs when we estimate that the unknown parameter is equal to the calculated statistic e.g. = μ or = or s= Remember that statistic means sample based summery measure (and parameter is population based summery measure (e.g μ Sunday, September 29, 2024 256
  • 257. Interval estimation Interval estimation occurs when we estimate that the parameter will be included in an interval. This interval is called confidence interval. The likelihood that the parameter will include in the confidence interval is called confidence level. For example 95% Confidence level means, there is 95% likelihood (chance) that the parameter will include the specified interval. 257
  • 258. Estimation of a single population mean (μ) Example-1:The mean reading speed of a random sample of 81 University students is 325 words per minute. Find the mean reading speed of all Modern students (μ) if it is known that the standard deviation for all Modern students is 45 words per minute. 258
  • 259. Solution Point Estimation: = μ = as the mean reading speed of a sample is 325 words per minute, then the mean reading speed of all Modern University students is also 325 words per minute Interval Estimation for μ μ = ±Z*SE(), Z=1.96 SE()= σ/√n =SE()=45 /√81=5 so 1.96*5= 9.8 325 ± 9.8 = 315.2 to 334.8 words/minute This means if 100 samples is selected in university students, the result of 90 of them will include in this range. 259
  • 260. Estimation of population mean differences(μ1-μ2) Example-2:If a random sample of 50 non-smokers have a mean life of 76 years with a standard deviation of 8 years, and a random sample of 65 smokers has a mean live of 68 years with a standard deviation of 9 years, A) What is the point estimate for the difference of the population means? B) Find a 95% C.I. for the difference of mean lifetime of non-smokers and smokers. Sunday, September 29, 2024 260
  • 261. solution Point Estimation of μ1-μ2 μ1-μ2= 1- = as the mean difference of life in the sample is 76- 68=8 years, then the mean difference of the population is also 8 years. Interval Estimation of μ1-μ2 μ1-μ2 = 1- ±1.96*SE(1- ), SE(1- )= + + = 1.57 = 1.96*1.57= 3 = 8±3 = 5 to 11 years So the population mean life difference b/w the two groups will lie in the range from 5 to 11 years. 261
  • 262. Estimation single population proportion ( Example: An epidemiologist is worried about the ever increasing trend of malaria in a certain locality and wants to estimate the proportion of persons infected in the peak malaria transmission period. If he takes a random sample of 150 persons in that locality during the peak transmission period and finds that 60 of them are positive for malaria, find a) Point estimation for ? b) Find 95% CI? 262
  • 263. Solution Point Estimation of p==40%. That the proportion of malaria positive people in the population is 40%. Interval Estimation of = ±1.96SE(), SE()= =SE()= =0.04 = 1.96*0.04= 0.078*100 =7.8% 40%±7.8% =32.2% to 47.8% So the proportion of malaria positive individuals in the population will lie between 32.2% to 47.8% 263
  • 264. Estimation population proportion differences (1-2) Example: Two groups each consists of 100 patients who have leukemia. A new drug is given to the first group but not to the second (the control group). It is found that in the first group 75 people have remission for 2 years; but only 60 in the second group. Find 95% confidence limits for the difference in the proportion of all patients with leukemia who have remission for 2 years. 264
  • 265. Solution Point Estimation of1-2 1-2 =1-2=75%-60%=15. That is the proportion difference for the two groups is 15% Interval Estimation of1-2 1-2=1-2±1.96*SE(p), SE(p)= = =0.065*100 = 6.5% =1.96*6.5% = 12.7% So 15% ± 12.7%= 2.3% to 27.7% So the population proportion difference will lie somewhere between 2.3% to 27.7% Sunday, September 29, 2024 265

Editor's Notes

  • #1: Go to next slide.