SlideShare a Scribd company logo
Lectures of Stat -145
(Biostatistics)
Text book
Biostatistics
Basic Concepts and Methodology for the Health
Sciences
By
Wayne W. Daniel
Text Book : Basic Concepts and
Methodology for the Health
Sciences 2
Chapter 1
Introduction To
Biostatistics
Text Book : Basic Concepts and
Methodology for the Health
Sciences 3
 Key words :
 Statistics , data , Biostatistics,
 Variable ,Population ,Sample
Text Book : Basic Concepts
and Methodology for the Health
4
Introduction
Some Basic concepts
Statistics is a field of study concerned
with
1- collection, organization, summarization
and analysis of data.
2- drawing of inferences about a body of
data when only a part of the data is
observed.
Statisticians try to interpret and
communicate the results to others.
Text Book : Basic Concepts
and Methodology for the Health
5
* Biostatistics:
The tools of statistics are employed in
many fields:
business, education, psychology,
agriculture, economics, … etc.
When the data analyzed are derived from
the biological science and medicine,
we use the term biostatistics to
distinguish this particular application of
statistical tools and concepts.
Text Book : Basic Concepts
and Methodology for the Health
6
Data:
• The raw material of Statistics is data.
• We may define data as figures. Figures
result from the process of counting or
from taking a measurement.
• For example:
• - When a hospital administrator counts
the number of patients (counting).
• - When a nurse weighs a patient
(measurement)
Text Book : Basic Concepts
and Methodology for the Health
7
We search for suitable data to serve as
the raw material for our investigation.
Such data are available from one or more
of the following sources:
1- Routinely kept records.
For example:
- Hospital medical records contain
immense amounts of information on
patients.
- Hospital accounting records contain a
wealth of data on the facility’s business
- activities.
* Sources of Data:
Text Book : Basic Concepts
and Methodology for the Health
8
2- External sources.
The data needed to answer a question may
already exist in the form of
published reports, commercially available
data banks, or the research literature,
i.e. someone else has already asked the
same question.
Text Book : Basic Concepts
and Methodology for the Health
9
3- Surveys:
The source may be a survey, if the data
needed is about answering certain
questions.
For example:
If the administrator of a clinic wishes to
obtain information regarding the mode of
transportation used by patients to visit
the clinic,
then a survey may be conducted among
patients to obtain this information.
Text Book : Basic Concepts
and Methodology for the Health
10
4- Experiments.
Frequently the data needed to answer
a question are available only as the
result of an experiment.
For example:
If a nurse wishes to know which of several
strategies is best for maximizing patient
compliance,
she might conduct an experiment in which the
different strategies of motivating compliance
are tried with different patients.
Text Book : Basic Concepts
and Methodology for the Health
11
* A variable:
It is a characteristic that takes on
different values in different persons,
places, or things.
For example:
- heart rate,
- the heights of adult males,
- the weights of preschool children,
- the ages of patients seen in a dental
clinic.
Text Book : Basic Concepts
and Methodology for the Health
12
Quantitative Variables
It can be measured
in the usual sense.
For example:
- the heights of
adult males,
- the weights of
preschool children,
- the ages of
patients seen in a
- dental clinic.
Qualitative Variables
Many characteristics are
not capable of being
measured. Some of them
can be ordered or
ranked.
For example:
- classification of people into
socio-economic groups,
- social classes based on
income, education, etc.
Types of variables
Quantitative Qualitative
Text Book : Basic Concepts
and Methodology for the Health
13
A discrete variable
is characterized by
gaps or interruptions
in the values that it
can assume.
For example:
- The number of daily
admissions to a
general hospital,
- The number of
decayed, missing or
filled teeth per child
- in an
- elementary
- school.
A continuous variable
can assume any value within a
specified relevant interval of
values assumed by the variable.
For example:
- Height,
- weight,
- skull circumference.
No matter how close together the
observed heights of two people,
we can find another person
whose height falls somewhere
in between.
Types of quantitative variables
Discrete Continuous
Text Book : Basic Concepts
and Methodology for the Health
14
* A population:
It is the largest collection of values of a
random variable for which we have an
interest at a particular time.
For example:
The weights of all the children enrolled in
a certain elementary school.
Populations may be finite or infinite.
Text Book : Basic Concepts
and Methodology for the Health
15
* A sample:
It is a part of a population.
For example:
The weights of only a fraction of
these children.
Types of Data
Data
Constant Variable
16
These are observations that remain the same
from person to person, from time to time, or
from place to place.
Examples;
1- number of eyes, fingers, ears… etc.
2- number of minutes in an hour
3- the speed of light
4- no. of centimeters in an inch
CONSTANT DATA
17
VARIABLE DATA 1
These are observations, which vary from one
person to another or from one group of members
to others and are classified as following:
 Statistically:
1. Quantitative variable data
2. Qualitative variable data
 Epidemiologically:
1. Dependant (outcome variable)
2. Independent (study variables)
 Clinically:
- Measured (BP, Lab. parameters, etc.)
- Counted (Pulse rate, resp. rate, etc.)
- Observed (Jaundice, pallor, wound infection)
- Subjective (headache, colic, etc.)
18
VARIABLE DATA 2
Statistically, variable could be:
- Quantitative variable:
a- Continuous quantitative
b- Discrete quantitative
- Qualitative variable:
a- Nominal qualitative
b- Ordinal qualitative
19
VARIABLE DATA 3
1- Quantitative variable:
These may be continuous or discrete.
a- Continuous quantitative variable:
Which are obtained by measurement and its value could
be integer or fractionated value.
Examples: Weight, height, Hgb, age, volume of urine.
b-Discrete quantitative variable:
Which are obtained by enumeration and its value is
always integer value.
Examples: Pulse, family size, number of live births.
20
Continuous Variable
0 3
2
1
-2 -1
-
3
0 1 2 3
Discrete Variable
Continuous &
Discrete Variables
21
2- Qualitative variable:
Which are expressed in quality and cannot be
enumerated or measured but can be categorized only.
They can be ordinal or nominal.
a- Nominal qualitative: can not be put in order, and is
further subdivided into dichotomous (e.g. sex,
male/female and Yes/No variables) and
multichotomous (e.g. blood groups, A, B, AB, O).
b- Ordinal qualitative: can be put in order. e.g. degree of
success, level of education, stage of disease.
VARIABLE DATA 4
22
Epidemiologically, variable could be:
Dependent Variable:
Usually the health outcome(s) that you are studying.
Independent Variables:
Risk factors, casual factors, experimental treatment,
and other relevant factors. They also termed
“predictors”.
e.g. Cancer lung is the dependent variable while
smoking is independent variable.
VARIABLE DATA 5
23
Section (2.4) :
Descriptive Statistics
Measures of Central
Tendency
Page 38 - 41
Text Book : Basic Concepts and
Methodology for the Health Sciences 25
key words:
Descriptive Statistic, measure of
central tendency ,statistic, parameter,
mean (μ) ,median, mode.
Text Book : Basic Concepts and
Methodology for the Health Sciences 26
The Statistic and The Parameter
• A Statistic:
It is a descriptive measure computed from the
data of a sample.
• A Parameter:
It is a a descriptive measure computed from
the data of a population.
Since it is difficult to measure a parameter from the
population, a sample is drawn of size n, whose
values are  1 ,  2 , …,  n. From this data, we
measure the statistic.
Text Book : Basic Concepts and
Methodology for the Health Sciences 27
Measures of Central Tendency
A measure of central tendency is a measure which
indicates where the middle of the data is.
The three most commonly used measures of central
tendency are:
The Mean, the Median, and the Mode.
The Mean:
It is the average of the data.
Text Book : Basic Concepts and
Methodology for the Health Sciences 28
The Population Mean:
 = which is usually unknown, then we use the
sample mean to estimate or approximate it.
The Sample Mean:
=
Example:
Here is a random sample of size 10 of ages, where
 1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,
 6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37.
= (42 + 28 + … + 37) / 10 = 36.6
x
1
N
i
i
N
X


x
1
n
i
i
n
x


Text Book : Basic Concepts and
Methodology for the Health Sciences 29
Properties of the Mean:
• Uniqueness. For a given set of data there is
one and only one mean.
• Simplicity. It is easy to understand and to
compute.
• Affected by extreme values. Since all
values enter into the computation.
Example: Assume the values are 115, 110, 119, 117, 121 and
126. The mean = 118.
But assume that the values are 75, 75, 80, 80 and 280. The
mean = 118, a value that is not representative of the set of
data as a whole.
Text Book : Basic Concepts and
Methodology for the Health Sciences 30
The Median:
When ordering the data, it is the observation that divide the
set of observations into two equal parts such that half of
the data are before it and the other are after it.
* If n is odd, the median will be the middle of observations. It
will be the (n+1)/2 th ordered observation.
When n = 11, then the median is the 6th observation.
* If n is even, there are two middle observations. The median
will be the mean of these two middle observations. It will
be the (n+1)/2 th ordered observation.
When n = 12, then the median is the 6.5th observation, which
is an observation halfway between the 6th and 7th ordered
observation.
Text Book : Basic Concepts and
Methodology for the Health Sciences 31
Example:
For the same random sample, the ordered
observations will be as:
23, 28, 28, 31, 32, 34, 37, 42, 50, 61.
Since n = 10, then the median is the 5.5th
observation, i.e. = (32+34)/2 = 33.
Properties of the Median:
• Uniqueness. For a given set of data there is
one and only one median.
• Simplicity. It is easy to calculate.
• It is not affected by extreme values as
is the mean.
Text Book : Basic Concepts and
Methodology for the Health Sciences 32
The Mode:
It is the value which occurs most frequently.
If all values are different there is no mode.
Sometimes, there are more than one mode.
Example:
For the same random sample, the value 28 is
repeated two times, so it is the mode.
Properties of the Mode:
• Sometimes, it is not unique.
• It may be used for describing qualitative
data.
Section (2.5) :
Descriptive Statistics
Measures of Dispersion
Page 43 - 46
Text Book : Basic Concepts and
Methodology for the Health Sciences 34
key words:
Descriptive Statistic, measure of
dispersion , range ,variance, coefficient of
variation.
Text Book : Basic Concepts and
Methodology for the Health Sciences 35
2.5. Descriptive Statistics –
Measures of Dispersion:
• A measure of dispersion conveys information
regarding the amount of variability present in a set of
data.
• Note:
1. If all the values are the same
→ There is no dispersion .
2. If all the values are different
→ There is a dispersion:
3.If the values close to each other
→The amount of Dispersion small.
b) If the values are widely scattered
→ The Dispersion is greater.
Text Book : Basic Concepts and
Methodology for the Health Sciences 36
Ex. Figure 2.5.1 –Page 43
• ** Measures of Dispersion are :
1.Range (R).
2. Variance.
3. Standard deviation.
4.Coefficient of variation (C.V).
Text Book : Basic Concepts and
Methodology for the Health Sciences 37
1.The Range (R):
• Range =Largest value- Smallest value =
• Note:
• Range concern only onto two values
• Example 2.5.1 Page 40:
• Refer to Ex 2.4.2.Page 37
• Data:
• 43,66,61,64,65,38,59,57,57,50.
• Find Range?
• Range=66-38=28
S
L x
x 
Text Book : Basic Concepts and
Methodology for the Health Sciences 38
2.The Variance:
• It measure dispersion relative to the scatter of the values
a bout there mean.
a) Sample Variance ( ) :
• ,where is sample mean
• Example 2.5.2 Page 40:
• Refer to Ex 2.4.2.Page 37
• Find Sample Variance of ages , = 56
• Solution:
• S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10
• = 900/10 = 90
x
2
S
1
)
(
1
2
2





n
x
x
S
n
i
i
x
Text Book : Basic Concepts and
Methodology for the Health Sciences 39
• b)Population Variance ( ) :
• where , is Population mean
3.The Standard Deviation:
• is the square root of variance=
a) Sample Standard Deviation = S =
b) Population Standard Deviation = σ =
2

N
x
N
i
i



 1
2
2
)
( 

Varince
2
S
2

Text Book : Basic Concepts and
Methodology for the Health Sciences 40
4.The Coefficient of Variation
(C.V):
• Is a measure used to compare the
dispersion in two sets of data which is
independent of the unit of the
measurement .
• where S: Sample standard
deviation.
• : Sample mean.
)
100
(
.
X
S
V
C 
X
Text Book : Basic Concepts and
Methodology for the Health Sciences 41
Example 2.5.3 Page 46:
• Suppose two samples of human males yield the
following data:
Sampe1 Sample2
Age 25-year-olds 11year-olds
Mean weight 145 pound 80 pound
Standard deviation 10 pound 10 pound
Text Book : Basic Concepts and
Methodology for the Health Sciences 42
• We wish to know which is more variable.
• Solution:
• c.v (Sample1)= (10/145)*100%= 6.9%
• c.v (Sample2)= (10/80)*100%= 12.5%
• Then age of 11-years old(sample2) is more
variation
Chapter 4:
Probabilistic features of
certain data Distributions
Pages 93- 111
Text Book : Basic Concepts and Methodology for the
Health Sciences
44
Key words
Probability distribution , random variable ,
Bernolli distribution, Binomail distribution,
Poisson distribution
Text Book : Basic Concepts and Methodology for the
Health Sciences
45
The Random Variable (X):
When the values of a variable (height,
weight, or age) can’t be predicted in
advance, the variable is called a random
variable.
An example is the adult height.
When a child is born, we can’t predict
exactly his or her height at maturity.
Text Book : Basic Concepts and Methodology for the
Health Sciences
46
4.2 Probability Distributions for
Discrete Random Variables
Definition:
The probability distribution of a
discrete random variable is a table,
graph, formula, or other device used
to specify all possible values of a
discrete random variable along with
their respective probabilities.
Text Book : Basic Concepts and Methodology for the
Health Sciences
47
The Cumulative Probability
Distribution of X, F(x):
It shows the probability that the
variable X is less than or equal to a
certain value, P(X  x).
Text Book : Basic Concepts and
Methodology for the Health
Sciences 48
Example 4.2.1 page 94:
F(x)=
P(X≤ x)
P(X=x)
frequenc
y
Number of
Programs
0.2088
0.2088
62
1
0.3670
0.1582
47
2
0.4983
0.1313
39
3
0.6296
0.1313
39
4
0.8249
0.1953
58
5
0.9495
0.1246
37
6
0.9630
0.0135
4
7
1.0000
0.0370
11
8
1.0000
297
Total
Text Book : Basic Concepts and Methodology for the
Health Sciences
49
See figure 4.2.1 page 96
See figure 4.2.2 page 97
Properties of probability distribution
of discrete random variable.
1.
2.
3. P(a  X  b) = P(X  b) – P(X  a-1)
4. P(X < b) = P(X  b-1)
0 ( ) 1
P X x
  
( ) 1
P X x
 

Text Book : Basic Concepts and Methodology for the
Health Sciences
50
Example 4.2.2 page 96: (use table
in example 4.2.1)
What is the probability that a randomly
selected family will be one who used
three assistance programs?
Example 4.2.3 page 96: (use table
in example 4.2.1)
What is the probability that a randomly
selected family used either one or two
programs?
Text Book : Basic Concepts and Methodology for the
Health Sciences
51
Example 4.2.4 page 98: (use table in
example 4.2.1)
What is the probability that a family picked
at random will be one who used two or
fewer assistance programs?
Example 4.2.5 page 98: (use table in
example 4.2.1)
What is the probability that a randomly
selected family will be one who used fewer
than four programs?
Example 4.2.6 page 98: (use table in
example 4.2.1)
What is the probability that a randomly
selected family used five or more
programs?
Text Book : Basic Concepts and Methodology for the
Health Sciences
52
Example 4.2.7 page 98: (use table
in example 4.2.1)
What is the probability that a randomly
selected family is one who used
between three and five programs,
inclusive?
Text Book : Basic Concepts and Methodology for the
Health Sciences
53
4.3 The Binomial Distribution:
The binomial distribution is one of the most
widely encountered probability distributions
in applied statistics. It is derived from a
process known as a Bernoulli trial.
Bernoulli trial is :
When a random process or experiment
called a trial can result in only one of two
mutually exclusive outcomes, such as dead
or alive, sick or well, the trial is called a
Bernoulli trial.
Text Book : Basic Concepts and Methodology for the
Health Sciences
54
The Bernoulli Process
A sequence of Bernoulli trials forms a Bernoulli
process under the following conditions
1- Each trial results in one of two possible,
mutually exclusive, outcomes. One of the
possible outcomes is denoted (arbitrarily) as a
success, and the other is denoted a failure.
2- The probability of a success, denoted by p,
remains constant from trial to trial. The
probability of a failure, 1-p, is denoted by q.
3- The trials are independent, that is the outcome
of any particular trial is not affected by the
outcome of any other trial
Text Book : Basic Concepts and Methodology for the
Health Sciences
55
The probability distribution of the binomial
random variable X, the number of
successes in n independent trials is:
Where is the number of combinations
of n distinct objects taken x of them at a
time.
* Note: 0! =1
( ) ( ) , 0,1,2,....,
X n X
n
f x P X x p q x n
x

 
   
 
 
 
n
x
 
 
 
 
!
!( )!
n n
x n x
x
 

 
  
 
! ( 1)( 2)....(1)
x x x x
  
Text Book : Basic Concepts and Methodology for the
Health Sciences
56
Properties of the binomial
distribution
1.
2.
3.The parameters of the binomial
distribution are n and p
4.
5.
( ) 0
f x 
( ) 1
f x 

( )
E X np
  
2
var( ) (1 )
X np p
   
Text Book : Basic Concepts and Methodology for the
Health Sciences
57
Example 4.3.1 page 100
If we examine all birth records from the North
Carolina State Center for Health statistics for
year 2001, we find that 85.8 percent of the
pregnancies had delivery in week 37 or later
(full- term birth).
If we randomly selected five birth records from
this population what is the probability that
exactly three of the records will be for full-term
births?
Exercise: example 4.3.2 page 104
Text Book : Basic Concepts and Methodology for the
Health Sciences
58
Example 4.3.3 page 104
Suppose it is known that in a certain
population 10 percent of the population is
color blind. If a random sample of 25
people is drawn from this population, find
the probability that
a) Five or fewer will be color blind.
b) Six or more will be color blind
c) Between six and nine inclusive will be color
blind.
d) Two, three, or four will be color blind.
Exercise: example 4.3.4 page 106
Text Book : Basic Concepts and Methodology for the
Health Sciences
59
4.4 The Poisson Distribution
If the random variable X is the number of
occurrences of some random event in a certain
period of time or space (or some volume of
matter).
The probability distribution of X is given by:
f (x) =P(X=x) = ,x = 0,1,…..
The symbol e is the constant equal to 2.7183.
(Lambda) is called the parameter of the
distribution and is the average number of
occurrences of the random event in the interval
(or volume)
!
x
x
e  


Text Book : Basic Concepts and Methodology for the
Health Sciences
60
Properties of the Poisson
distribution
1.
2.
3.
4.
( ) 0
f x 
( ) 1
f x 

( )
E X
 
 
2
var( )
X
 
 
Text Book : Basic Concepts and Methodology for the
Health Sciences
61
Example 4.4.1 page 111
In a study of a drug -induced anaphylaxis
among patients taking rocuronium bromide
as part of their anesthesia, Laake and
Rottingen found that the occurrence of
anaphylaxis followed a Poisson model with
=12 incidents per year in Norway .Find
1- The probability that in the next year,
among patients receiving rocuronium,
exactly three will experience anaphylaxis?

Text Book : Basic Concepts and Methodology for the
Health Sciences
62
2- The probability that less than two patients
receiving rocuronium, in the next year will
experience anaphylaxis?
3- The probability that more than two patients
receiving rocuronium, in the next year will
experience anaphylaxis?
4- The expected value of patients receiving
rocuronium, in the next year who will
experience anaphylaxis.
5- The variance of patients receiving
rocuronium, in the next year who will
experience anaphylaxis
6- The standard deviation of patients receiving
rocuronium, in the next year who will
experience anaphylaxis
Text Book : Basic Concepts and Methodology for the
Health Sciences
63
Example 4.4.2 page 111: Refer to
example 4.4.1
1-What is the probability that at least three
patients in the next year will experience
anaphylaxis if rocuronium is administered
with anesthesia?
2-What is the probability that exactly one
patient in the next year will experience
anaphylaxis if rocuronium is administered
with anesthesia?
3-What is the probability that none of the
patients in the next year will experience
anaphylaxis if rocuronium is administered
with anesthesia?
Text Book : Basic Concepts and Methodology for the
Health Sciences
64
4-What is the probability that at most
two patients in the next year will
experience anaphylaxis if rocuronium
is administered with anesthesia?
Exercises: examples 4.4.3, 4.4.4
and 4.4.5 pages111-113
Exercises: Questions 4.3.4 ,4.3.5,
4.3.7 ,4.4.1,4.4.5
4.5 Continuous
Probability Distribution
Pages 114 – 127
Text Book : Basic Concepts
and Methodology for the Health
66
• Key words:
Continuous random variable, normal
distribution , standard normal
distribution , T-distribution
Text Book : Basic Concepts
and Methodology for the Health
67
• Now consider distributions of
continuous random variables.
Text Book : Basic Concepts
and Methodology for the Health
68
1- Area under the curve = 1.
2- P(X = a) = 0 , where a is a constant.
3- Area between two points a , b =
P(a<x<b) .
Properties of continuous
probability Distributions:
Text Book : Basic Concepts
and Methodology for the Health
69
4.6 The normal distribution:
• It is one of the most important probability
distributions in statistics.
• The normal density is given by
• , - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0
• π, e : constants
• µ: population mean.
• σ : Population standard deviation.
2
2
2
)
(
2
1
)
( 






x
e
x
f
Text Book : Basic Concepts
and Methodology for the Health
70
Characteristics of the normal
distribution: Page 111
• The following are some important
characteristics of the normal distribution:
1- It is symmetrical about its mean, µ.
2- The mean, the median, and the mode are all
equal.
3- The total area under the curve above the
x-axis is one.
4-The normal distribution is completely
determined by the parameters µ and σ.
Text Book : Basic Concepts
and Methodology for the Health
71
5- The normal distribution
depends on the two
parameters  and .
 determines the
location of
the curve.
(As seen in figure 4.6.3) ,
But,  determines
the scale of the curve, i.e.
the degree of flatness or
peaked ness of the curve.
(as seen in figure 4.6.4)
1 2 3
1 < 2 < 3

1
2
3
1 < 2 < 3
Text Book : Basic Concepts
and Methodology for the Health
72
Note that : (As seen in Figure
4.6.2)
1. P( µ- σ < x < µ+ σ) = 0.68
2. P( µ- 2σ< x < µ+ 2σ)= 0.95
3. P( µ-3σ < x < µ+ 3σ) = 0.997
Text Book : Basic Concepts
and Methodology for the Health
73
The Standard normal
distribution:
• Is a special case of normal distribution
with mean equal 0 and a standard deviation
of 1.
• The equation for the standard normal
distribution is written as
• , - ∞ < z < ∞
2
2
2
1
)
(
z
e
z
f



Text Book : Basic Concepts
and Methodology for the Health
74
Characteristics of the
standard normal distribution
1- It is symmetrical about 0.
2- The total area under the curve
above the x-axis is one.
3- We can use table (D) to find the
probabilities and areas.
Text Book : Basic Concepts
and Methodology for the Health
75
“How to use tables of Z”
Note that
The cumulative probabilities P(Z  z) are given in
tables for -3.49 < z < 3.49. Thus,
P (-3.49 < Z < 3.49)  1.
For standard normal distribution,
P (Z > 0) = P (Z < 0) = 0.5
Example 4.6.1:
If Z is a standard normal distribution, then
1) P( Z < 2) = 0.9772
is the area to the left to 2
and it equals 0.9772.
2
Text Book : Basic Concepts
and Methodology for the Health
76
Example 4.6.2:
P(-2.55 < Z < 2.55) is the area between
-2.55 and 2.55, Then it equals
P(-2.55 < Z < 2.55) =0.9946 – 0.0054
= 0.9892.
Example 4.6.2:
P(-2.74 < Z < 1.53) is the area between
-2.74 and 1.53.
P(-2.74 < Z < 1.53) =0.9370 – 0.0031
= 0.9339.
-2.74 1.53
-2.55 2.55
0
Text Book : Basic Concepts
and Methodology for the Health
77
Example 4.6.3:
P(Z > 2.71) is the area to the right to 2.71.
So,
P(Z > 2.71) =1 – 0.9966 = 0.0034.
Example :
P(Z = 0.84) is the area at z = 2.71.
So,
P(Z = 0.84) =1 – 0.9966 = 0.0034
0.84
2.71
Text Book : Basic Concepts
and Methodology for the Health
78
How to transform normal
distribution (X) to standard
normal distribution (Z)?
• This is done by the following formula:
• Example:
• If X is normal with µ = 3, σ = 2. Find the
value of standard normal Z, If X= 6?
• Answer:




x
z
5
.
1
2
3
6







x
z
Text Book : Basic Concepts
and Methodology for the Health
79
4.7 Normal Distribution Applications
The normal distribution can be used to model the distribution of
many variables that are of interest. This allow us to answer
probability questions about these random variables.
Example 4.7.1:
The ‘Uptime ’is a custom-made light weight battery-operated
activity monitor that records the amount of time an individual
spend the upright position. In a study of children ages 8 to 15
years. The researchers found that the amount of time children
spend in the upright position followed a normal distribution with
Mean of 5.4 hours and standard deviation of 1.3.Find
Text Book : Basic Concepts
and Methodology for the Health
80
If a child selected at random ,then
1-The probability that the child spend less than 3
hours in the upright position 24-hour period
P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322
-------------------------------------------------------------------------
2-The probability that the child spend more than 5
hours in the upright position 24-hour period
P( X > 5) = P( > ) = P(Z > -0.31)
= 1- P(Z < - 0.31) = 1- 0.3520= 0.648
-----------------------------------------------------------------------
3-The probability that the child spend exactly 6.2
hours in the upright position 24-hour period
P( X = 6.2) = 0



X
3
.
1
4
.
5
3 



X
3
.
1
4
.
5
5 
Text Book : Basic Concepts
and Methodology for the Health
81
4-The probability that the child spend from 4.5 to
7.3 hours in the upright position 24-hour period
P( 4.5 < X < 7.3) = P( < < )
= P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69)
= 0.9279 – 0.2451 = 0.6828
• Hw…EX. 4.7.2 – 4.7.3



X
3
.
1
4
.
5
5
.
4 
3
.
1
4
.
5
3
.
7 
Text Book : Basic Concepts
and Methodology for the Health
82
6.3 The T Distribution:
(167-173)
1- It has mean of zero.
2- It is symmetric about the
mean.
3- It ranges from - to .
0
Text Book : Basic Concepts
and Methodology for the Health
83
4- compared to the normal distribution,
the t distribution is less peaked in the
center and has higher tails.
5- It depends on the degrees of freedom
(n-1).
6- The t distribution approaches the
standard normal distribution as (n-1)
approaches .
Text Book : Basic Concepts
and Methodology for the Health
84
Examples
t (7, 0.975) = 2.3646
------------------------------
t (24, 0.995) = 2.7696
--------------------------
If P (T(18) > t) = 0.975,
then t = -2.1009
-------------------------
If P (T(22) < t) = 0.99,
then t = 2.508
0.005
t (24, 0.995)
0.995
t (7, 0.975)
0.025
0.975
t
0.975
0.025
0.99
0.01
t
Chapter 7
Using sample statistics to
Test Hypotheses
about population
parameters
Pages 215-233
Text Book : Basic Concepts and
Methodology for the Health Sciences
86
 Key words :
 Null hypothesis H0, Alternative hypothesis HA ,
testing hypothesis , test statistic , P-value
Text Book : Basic Concepts and
Methodology for the Health Sciences
87
Hypothesis Testing
 One type of statistical inference, estimation,
was discussed in Chapter 6 .
 The other type ,hypothesis testing ,is discussed
in this chapter.
Text Book : Basic Concepts and
Methodology for the Health Sciences
88
Definition of a hypothesis
 It is a statement about one or more populations .
It is usually concerned with the parameters of
the population. e.g. the hospital administrator
may want to test the hypothesis that the average
length of stay of patients admitted to the
hospital is 5 days
Text Book : Basic Concepts and
Methodology for the Health Sciences
89
Definition of Statistical hypotheses
 They are hypotheses that are stated in such a way that
they may be evaluated by appropriate statistical
techniques.
 There are two hypotheses involved in hypothesis
testing
 Null hypothesis H0: It is the hypothesis to be tested .
 Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to reject
the null hypothesis
Text Book : Basic Concepts and
Methodology for the Health Sciences
90
7.2 Testing a hypothesis about the
mean of a population:
 We have the following steps:
1.Data: determine variable, sample size (n), sample
mean( ) , population standard deviation or sample
standard deviation (s) if is unknown
2. Assumptions : We have two cases:
 Case1: Population is normally or approximately
normally distributed with known or unknown
variance (sample size n may be small or large),
 Case 2: Population is not normal with known or
unknown variance (n is large i.e. n≥30).
x
Text Book : Basic Concepts and
Methodology for the Health Sciences
91
 3.Hypotheses:
 we have three cases
 Case I : H0: μ=μ0
HA: μ μ0
 e.g. we want to test that the population mean is
different than 50
 Case II : H0: μ = μ0
HA: μ > μ0
 e.g. we want to test that the population mean is greater
than 50
 Case III : H0: μ = μ0
HA: μ< μ0
 e.g. we want to test that the population mean is less
than 50


Text Book : Basic Concepts and
Methodology for the Health Sciences
92
4.Test Statistic:
 Case 1: population is normal or approximately
normal
σ2
is known σ2
is unknown
( n large or small)
n large n small
 Case2: If population is not normally distributed and n is
large
 i)If σ2
is known ii) If σ2
is unknown
n
X
Z

o
-

n
s
X
Z o
- 

n
s
X
T o
- 

n
s
X
Z o
- 

n
X
Z

o
-

Text Book : Basic Concepts and
Methodology for the Health Sciences
93
5.Decision Rule:
i) If HA: μ μ0
 Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1
(when use T- test)
 __________________________
 ii) If HA: μ> μ0
 Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,n-1 (when use T - test)

Text Book : Basic Concepts and
Methodology for the Health Sciences
94
 iii) If HA: μ< μ0
Reject H0 if Z< - Z1-α (when use Z - test)
 Or
Reject H0 if T<- t1-α,n-1 (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table D
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n-1) degree of freedom (df)
Text Book : Basic Concepts and
Methodology for the Health Sciences
95
 6.Decision :
 If we reject H0, we can conclude that HA is
true.
 If ,however ,we do not reject H0, we may
conclude that H0 is true.
Text Book : Basic Concepts and
Methodology for the Health Sciences
96
An Alternative Decision Rule using the
p - value Definition
 The p-value is defined as the smallest value of
α for which the null hypothesis can be rejected.
 If the p-value is less than or equal to α ,we
reject the null hypothesis (p ≤ α)
 If the p-value is greater than α ,we do not
reject the null hypothesis (p > α)
Text Book : Basic Concepts and
Methodology for the Health Sciences
97
Example 7.2.1 Page 223
 Researchers are interested in the mean age of a
certain population.
 A random sample of 10 individuals drawn from the
population of interest has a mean of 27.
 Assuming that the population is approximately
normally distributed with variance 20,can we
conclude that the mean is different from 30 years ?
(α=0.05) .
 If the p - value is 0.0340 how can we use it in making
a decision?
Text Book : Basic Concepts and
Methodology for the Health Sciences
98
Solution
1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05
2-Assumptions: the population is approximately
normally distributed with variance 20
3-Hypotheses:
 H0 : μ=30
 HA: μ 30
x

Text Book : Basic Concepts and
Methodology for the Health Sciences
99
4-Test Statistic:
 Z = -2.12
5.Decision Rule
 The alternative hypothesis is
 HA: μ > 30
 Hence we reject H0 if Z >Z1-0.025/2= Z0.975
 or Z< - Z1-0.025/2= - Z0.975
 Z0.975=1.96(from table D)
Text Book : Basic Concepts and
Methodology for the Health Sciences
100
 6.Decision:
 We reject H0 ,since -2.12 is in the rejection
region .
 We can conclude that μ is not equal to 30
 Using the p value ,we note that p-value
=0.0340< 0.05,therefore we reject H0
Text Book : Basic Concepts and
Methodology for the Health Sciences
101
Example7.2.2 page227
 Referring to example 7.2.1.Suppose that the
researchers have asked: Can we conclude
that μ<30.
1.Data.see previous example
2. Assumptions .see previous example
3.Hypotheses:
 H0 μ =30
 HِA: μ < 30
Text Book : Basic Concepts and
Methodology for the Health Sciences
102
4.Test Statistic :
 = = -2.12
5. Decision Rule: Reject H0 if Z< Z α, where
 Z α= -1.645. (from table D)
6. Decision: Reject H0 ,thus we can conclude that the
population mean is smaller than 30.
n
X
Z

o
-

10
20
30
27 
Text Book : Basic Concepts and
Methodology for the Health Sciences
103
Example7.2.4 page232
 Among 157 African-American men ,the mean
systolic blood pressure was 146 mm Hg with a
standard deviation of 27. We wish to know if
on the basis of these data, we may conclude
that the mean systolic blood pressure for a
population of African-American is greater than
140. Use α=0.01.
Text Book : Basic Concepts and
Methodology for the Health Sciences
104
Solution
1. Data: Variable is systolic blood pressure,
n=157 , =146, s=27, α=0.01.
2. Assumption: population is not normal, σ2 is
unknown
3. Hypotheses: H0 :μ=140
HA: μ>140
4.Test Statistic:
 = = = 2.78
n
s
X
Z o
- 

157
27
140
146 
1548
.
2
6
Text Book : Basic Concepts and
Methodology for the Health Sciences
105
5. Desicion Rule:
we reject H0 if Z>Z1-α
= Z0.99= 2.33
(from table D)
6. Desicion: We reject H0.
Hence we may conclude that the mean systolic
blood pressure for a population of African-
American is greater than 140.
Text Book : Basic Concepts and
Methodology for the Health Sciences
106
7.3 Hypothesis Testing :The Difference
between two population mean :
 We have the following steps:
1.Data: determine variable, sample size (n), sample means,
population standard deviation or samples standard deviation
(s) if is unknown for two population.
2. Assumptions : We have two cases:
 Case1: Population is normally or approximately normally
distributed with known or unknown variance (sample size
n may be small or large),
 Case 2: Population is not normal with known variances (n
is large i.e. n≥30).
Text Book : Basic Concepts and
Methodology for the Health Sciences
107
 3.Hypotheses:
 we have three cases
 Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0
 HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
 e.g. we want to test that the mean for first population is
different from second population mean.
 Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
 e.g. we want to test that the mean for first population is
greater than second population mean.
 Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0
HA: μ 1 < μ 2 → μ 1 - μ 2 < 0
 e.g. we want to test that the mean for first population
is greater than second population mean.
Text Book : Basic Concepts and
Methodology for the Health Sciences
108
4.Test Statistic:
 Case 1: Two population is normal or approximately
normal
σ2
is known σ2
is unknown if
( n1 ,n2 large or small) ( n1 ,n2 small)
population population Variances
Variances equal not equal
where
2
2
2
1
2
1
2
1
2
1 )
(
-
)
X
-
X
(
n
n
Z







2
1
2
1
2
1
1
1
)
(
-
)
X
-
X
(
n
n
S
T
p 




2
2
2
1
2
1
2
1
2
1 )
(
-
)
X
-
X
(
n
S
n
S
T





2
)
1
(n
)
1
(n
2
1
2
2
2
2
1
1
2






n
n
S
S
Sp
Text Book : Basic Concepts and
Methodology for the Health Sciences
109
 Case2: If population is not normally distributed
 and n1, n2 is large(n1 ≥ 0 ,n2≥ 0)
 and population variances is known,
2
2
2
1
2
1
2
1
2
1 )
(
-
)
X
-
X
(
n
n
Z







Text Book : Basic Concepts and
Methodology for the Health Sciences
110
5.Decision Rule:
i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
 Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
(when use Z - test)
Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2)
(when use T- test)
 __________________________
 ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
 Reject H0 if Z>Z1-α (when use Z - test)
Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
Text Book : Basic Concepts and
Methodology for the Health Sciences
111
 iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0
Reject H0 if Z< - Z1-α (when use Z - test)
 Or
Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test)
Note:
Z1-α/2 , Z1-α , Zα are tabulated values obtained
from table D
t1-α/2 , t1-α , tα are tabulated values obtained from
table E with (n1+n2 -2) degree of freedom (df)
6. Conclusion: reject or fail to reject H0
Text Book : Basic Concepts and
Methodology for the Health Sciences
112
Example7.3.1 page238
 Researchers wish to know if the data have collected provide
sufficient evidence to indicate a difference in mean serum
uric acid levels between normal individuals and individual
with Down’s syndrome. The data consist of serum uric
reading on 12 individuals with Down’s syndrome from
normal distribution with variance 1 and 15 normal individuals
from normal distribution with variance 1.5 . The mean
are and α=0.05.
Solution:
1. Data: Variable is serum uric acid levels, n1=12 , n2=15,
σ2
1=1, σ2
2=1.5 ,α=0.05.
100
/
5
.
4
1 mg
X  100
/
4
.
3
2 mg
X 
Text Book : Basic Concepts and
Methodology for the Health Sciences
113
2. Assumption: Two population are normal, σ2
1 , σ2
2
are known
3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0
 HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0
4.Test Statistic:
 = = 2.57
5. Desicion Rule:
Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D)
6-Conclusion: Reject H0 since 2.57 > 1.96
Or if p-value =0.102→ reject H0 if p < α → then reject H0
2
2
2
1
2
1
2
1
2
1 )
(
-
)
X
-
X
(
n
n
Z







15
5
.
1
12
1
)
0
(
-
3.4)
-
(4.5


Text Book : Basic Concepts and
Methodology for the Health Sciences
114
Example7.3.2 page 240
The purpose of a study by Tam, was to investigate wheelchair
Maneuvering in individuals with over-level spinal cord injury (SCI)
And healthy control (C). Subjects used a modified a wheelchair to
incorporate a rigid seat surface to facilitate the specified
experimental measurements. The data for measurements of the
left ischial tuerosity (
‫المتحرك‬ ‫الكرسي‬ ‫من‬ ‫وتأثيرها‬ ‫الفخذ‬ ‫عظام‬
) for SCI and
control C are shown below
169
150
114
88
117
122
131
124
115
131
C
143
130
119
121
130
163
180
130
150
60
SCI
Text Book : Basic Concepts and
Methodology for the Health Sciences
115
We wish to know if we can conclude, on the
basis of the above data that the mean of
left ischial tuberosity for control C lower
than mean of left ischial tuerosity for SCI,
Assume normal populations equal
variances. α=0.05, p-value = -1.33
Text Book : Basic Concepts and
Methodology for the Health Sciences
116
Solution:
1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05.
 , (calculated from data)
2.Assumption: Two population are normal, σ2
1 , σ2
2 are
unknown but equal
3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0
HA: μ C < μ SCI → μ C - μ SCI < 0
4.Test Statistic:

Where,
1
.
126

C
X 1
.
133

SCI
X
569
.
0
10
1
10
1
04
.
756
0
)
1
.
133
1
.
126
(
1
1
)
(
-
)
X
-
X
(
2
1
2
1
2
1









n
n
S
T
p


04
.
756
2
10
10
)
3
.
32
(
9
)
8
.
21
(
9
2
)
1
(n
)
1
(n 2
2
2
1
2
2
2
2
1
1
2











n
n
S
S
Sp
Text Book : Basic Concepts and
Methodology for the Health Sciences
117
5. Decision Rule:
Reject H 0 if T< - T1-α,(n1+n2 -2)
T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E)
6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341
Or
Fail to reject H0 since p = -1.33 > α =0.05
Text Book : Basic Concepts and
Methodology for the Health Sciences
118
Example7.3.3 page 241
Dernellis and Panaretou examined subjects with hypertension
and healthy control subjects .One of the variables of interest was
the aortic stiffness index. Measures of this variable were
calculated From the aortic diameter evaluated by M-mode and
blood pressure measured by a sphygmomanometer. Physics wish
to reduce aortic stiffness. In the 15 patients with hypertension
(Group 1),the mean aortic stiffness index was 19.16 with a
standard deviation of 5.29. In the30 control subjects (Group 2),the
mean aortic stiffness index was 9.53 with a standard deviation of
2.69. We wish to determine if the two populations represented by
these samples differ with respect to mean stiffness index .we wish
to know if we can conclude that in general a person with
thrombosis have on the average higher IgG levels than persons
without thrombosis at α=0.01, p-value = 0.0559
Text Book : Basic Concepts and
Methodology for the Health Sciences
119
Solution:
1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01.
2.Assumption: Two population are not normal, σ2
1 , σ2
2
are unknown and sample size large
3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0
HA: μ 1 > μ 2 → μ 1 - μ 2 > 0
4.Test Statistic:

ِ
standard
deviation
Sample
Size
Mean LgG level
Group
44.89
53
59.01
Thrombosis
34.85
54
46.61
No
Thrombosis
59
.
1
54
85
.
34
53
89
.
44
0
)
61
.
46
01
.
59
(
)
(
-
)
X
-
X
(
2
2
2
2
2
1
2
1
2
1
2
1








n
S
n
S
Z


Text Book : Basic Concepts and
Methodology for the Health Sciences
120
5. Decision Rule:
Reject H 0 if Z > Z1-α
Z1-α = Z0.99 = 2.33 (from table D)
6-Conclusion: Fail to reject H0 since 1.59 > 2.33
Or
Fail to reject H0 since p = 0.0559 > α =0.01
Text Book : Basic Concepts and
Methodology for the Health Sciences
121
7.5 Hypothesis Testing A single
population proportion:
 Testing hypothesis about population proportion (P) is carried out
in much the same way as for mean when condition is necessary for
using normal curve are met
 We have the following steps:
1.Data: sample size (n), sample proportion( ) , P0
2. Assumptions :normal distribution ,
p̂
n
a
p 

sample
in the
element
of
no.
Total
istic
charachtar
some
with
sample
in the
element
of
no.
ˆ
Text Book : Basic Concepts and
Methodology for the Health Sciences
122
 3.Hypotheses:
 we have three cases
 Case I : H0: P = P0
HA: P ≠ P0
 Case II : H0: P = P0
HA: P > P0
 Case III : H0: P = P0
HA: P < P0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
n
q
p
p
p
Z
0
0
0
ˆ 

Text Book : Basic Concepts and
Methodology for the Health Sciences
123
5.Decision Rule:
i) If HA: P ≠ P0
 Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
 _______________________
 ii) If HA: P> P0
 Reject H0 if Z>Z1-α
 _____________________________
 iii) If HA: P< P0
Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Text Book : Basic Concepts and
Methodology for the Health Sciences
124
2. Assumptions : is approximately normaly distributed
3.Hypotheses:
 we have three cases
 H0: P = 0.063
HA: P > 0.063
 4.Test Statistic :
5.Decision Rule: Reject H0 if Z>Z1-α
Where Z1-α = Z1-0.05 =Z0.95= 1.645
21
.
1
301
)
0.937
(
063
.
0
063
.
0
08
.
0
ˆ
0
0
0





n
q
p
p
p
Z
p̂
Text Book : Basic Concepts and
Methodology for the Health Sciences
125
6. Conclusion: Fail to reject H0
Since
Z =1.21 > Z1-α=1.645
Or ,
If P-value = 0.1131,
fail to reject H0 → P > α
Text Book : Basic Concepts and
Methodology for the Health Sciences
126
Example7.5.1 page 259
Wagen collected data on a sample of 301 Hispanic women
Living in Texas .One variable of interest was the percentage
of subjects with impaired fasting glucose (IFG). In the
study,24 women were classified in the (IFG) stage .The article
cites population estimates for (IFG) among Hispanic women
in Texas as 6.3 percent .Is there sufficient evidence to
indicate that the population Hispanic women in Texas has a
prevalence of IFG higher than 6.3 percent ,let α=0.05
Solution:
1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24,
q0 =1- p0 = 1- 0.063 =0.937, α=0.05
08
.
0
301
24
ˆ 


n
a
p
Text Book : Basic Concepts and
Methodology for the Health Sciences
127
7.6 Hypothesis Testing :The
Difference between two
population proportion:
 Testing hypothesis about two population proportion (P1,, P2 ) is
carried out in much the same way as for difference between two
means when condition is necessary for using normal curve are
met
 We have the following steps:
1.Data: sample size (n1 ‫و‬n2), sample proportions( ),
Characteristic in two samples (x1 , x2),
2- Assumption : Two populations are independent .
2
1
ˆ
,
ˆ P
P
2
1
2
1
n
n
x
x
p



Text Book : Basic Concepts and
Methodology for the Health Sciences
128
 3.Hypotheses:
 we have three cases
 Case I : H0: P1 = P2 → P1 - P2 = 0
HA: P1 ≠ P2 → P1 - P2 ≠ 0
 Case II : H0: P1 = P2 → P1 - P2 = 0
HA: P1 > P2 → P1 - P2 > 0
 Case III : H0: P1 = P2 → P1 - P2 = 0
HA: P1 < P2 → P1 - P2 < 0
4.Test Statistic:
Where H0 is true ,is distributed approximately as the standard
normal
2
1
2
1
2
1
)
1
(
)
1
(
)
(
)
ˆ
ˆ
(
n
p
p
n
p
p
p
p
p
p
Z







Text Book : Basic Concepts and
Methodology for the Health Sciences
129
5.Decision Rule:
i) If HA: P1 ≠ P2
 Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2
 _______________________
 ii) If HA: P1 > P2
 Reject H0 if Z >Z1-α
 _____________________________
 iii) If HA: P1 < P2
 Reject H0 if Z< - Z1-α
Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from
table D
6. Conclusion: reject or fail to reject H0
Text Book : Basic Concepts and
Methodology for the Health Sciences
130
Example7.6.1 page 262
Noonan is a genetic condition that can affect the heart growth,
blood clotting and mental and physical development. Noonan examined
the stature of men and women with Noonan. The study contained 29
Male and 44 female adults. One of the cut-off values used to assess
stature was the third percentile of adult height .Eleven of the males fell
below the third percentile of adult male height ,while 24 of the female
fell below the third percentile of female adult height .Does this study
provide sufficient evidence for us to conclude that among subjects with
Noonan ,females are more likely than males to fall below the respective
of adult height? Let α=0.05
Solution:
1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05
479
.
0
44
29
24
11







F
M
F
M
n
n
x
x
p 545
.
0
44
24
ˆ
,
379
.
0
29
11
ˆ 





F
F
F
M
m
M
n
x
p
n
x
p
Text Book : Basic Concepts and
Methodology for the Health Sciences
131
2- Assumption : Two populations are independent .
3.Hypotheses:
 Case II : H0: PF = PM → PF - PM = 0
HA: PF > PM → PF - PM > 0
 4.Test Statistic:
5.Decision Rule:
Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645
6. Conclusion: Fail to reject H0
Since Z =1.39 > Z1-α=1.645
Or , If P-value = 0.0823 → fail to reject H0 → P > α
39
.
1
29
)
521
.
0
)(
479
.
0
(
44
)
521
.
0
)(
479
.
0
(
0
)
379
.
0
545
.
0
(
)
1
(
)
1
(
)
(
)
ˆ
ˆ
(
2
1
2
1
2
1












n
p
p
n
p
p
p
p
p
p
Z
Text Book : Basic Concepts and
Methodology for the Health Sciences 132
Chapter 9
Statistical Inference and The
Relationship between two variables
Prepared By : Dr. Shuhrat Khan
Text Book : Basic Concepts and
Methodology for the Health Sciences 133
REGRESSION
CORRELATION
ANALYSIS OF VARIANCE
•
Regression, Correlation and Analysis of
Covariance are all statistical techniques that
use the idea that one variable say, may be
related to one or more variables through an
equation. Here we consider the relationship of
two variables only in a linear form, which is
called linear regression and linear correlation;
or simple regression and correlation. The
relationships between more than two
variables, called multiple regression and
correlation will be considered later.
•
Simple regression uses the relationship
between the two variables to obtain
information about one variable by knowing
the values of the other. The equation showing
this type of relationship is called simple linear
regression equation. The related method of
correlation is used to measure how strong the
relationship is between the two variables is.
133
EQUATION OF REGRESSION
Text Book : Basic Concepts and
Methodology for the Health Sciences 134
Line of Regression
•
Simple Linear Regression:
•
Suppose that we are interested in a variable Y, but we want to
know about its relationship to another variable X or we want
to use X to predict (or estimate) the value of Y that might be
obtained without actually measuring it, provided the
relationship between the two can be expressed by a line.’ X’ is
usually called the independent variable and ‘Y’ is called the
dependent variable.
•
•
We assume that the values of variable X are either fixed or
random. By fixed, we mean that the values are chosen by
researcher--- either an experimental unit (patient) is given this
value of X (such as the dosage of drug or a unit (patient) is
chosen which is known to have this value of X.
•
By random, we mean that units (patients) are chosen at
random from all the possible units,, and both variables X and
Y are measured.
•
We also assume that for each value of x of X, there is a whole
range or population of possible Y values and that the mean of
the Y population at X = x, denoted by µy/x , is a linear
function of x. That is,
•
•
µy/x = α +βx
DEPENDENT VARIABLE
INDEPENDENT VARIABLE
TWO RANDOM VARIABLE
OR
BIVARIATE
RANDOM
VARIABLE
Text Book : Basic Concepts and
Methodology for the Health Sciences 135
ESTIMATION
•
Estimate α and β.
•
Predict the value of Y at a
given value x of X.
•
Make tests to draw
conclusions about the model
and its usefulness.
•
We estimate the parameters α
and β by ‘a’ and ‘b’
respectively by using sample
regression line:
•
Ŷ = a+ bx
•
Where we calculate
•
We select a sample of
n observations (xi,yi)
from the population,
WITH
the goals
Text Book : Basic Concepts and
Methodology for the Health Sciences 136
B =
ESTIMATION AND CALCULATION OF CONSTANTS , ‘’a’’ AND ‘’b’’
Text Book : Basic Concepts and
Methodology for the Health Sciences 137
EXAMPLE
•
investigators at a sports health centre are
interested in the relationship between oxygen
consumption and exercise time in athletes
recovering from injury. Appropriate mechanics
for exercising and measuring oxygen
consumption are set up, and the results are
presented below:
–
x variable
Text Book : Basic Concepts and
Methodology for the Health Sciences 138
exercise
time
(min)
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
y variable
oxygen consumption
620
630
800
840
840
870
1010
940
950
1130
Text Book : Basic Concepts and
Methodology for the Health Sciences 139
calculations
•
o
r
Text Book : Basic Concepts and
Methodology for the Health Sciences 140
Pearson’s Correlation Coefficient
• With the aid of Pearson’s correlation coefficient (r),
we can determine the strength and the direction of
the relationship between X and Y variables,
• both of which have been measured and they must
be quantitative.
• For example, we might be interested in examining
the association between height and weight for the
following sample of eight children:
Text Book : Basic Concepts and
Methodology for the Health Sciences 141
Height and weights of 8 children
Child Height(inches)X Weight(pounds)Y
A 49 81
B 50 88
C 53 87
D 55 99
E 60 91
F 55 89
G 60 95
H 50 90
Average ( = 54 inches) ( = 90 pounds)
Text Book : Basic Concepts and
Methodology for the Health Sciences 142
Scatter plot for 8 babies
height weight
49 81
50 88
53 83
55 99
60 91
55 89
60 95
50 90
0
20
40
60
80
100
120
0 10 20 30 40 50 60 70
1‫متسلسلة‬
Text Book : Basic Concepts and
Methodology for the Health Sciences 143
Table : The Strength of a Correlation
•
• Value of r (positive or negative) Meaning
• _______________________________________________________
•
• 0.00 to 0.19 A very weak correlation
• 0.20 to 0.39 A weak correlation
• 0.40 to 0.69 A modest correlation
• 0.70 to 0.89 A strong correlation
• 0.90 to 1.00 A very strong correlation
• _______________________________________________________
_
Text Book : Basic Concepts and
Methodology for the Health Sciences 144
FORMULA FOR CORRELATION
COEFFECIENT ( r )
• With Pearson’s r,
• means that we add the products of the deviations to see if the positive
products or negative products are more abundant and sizable. Positive
products indicate cases in which the variables go in the same direction (that is,
both taller or heavier than average or both shorter and lighter than average);
• negative products indicate cases in which the variables go in opposite
directions (that is, taller but lighter than average or shorter but heavier than
average).
•
Text Book : Basic Concepts and
Methodology for the Health Sciences 145
Computational Formula for Pearsons’s Correlation Coefficient r
•
Where SP (sum of the product), SSx (Sum of
the squares for x) and SSy (sum of the squares
for y) can be computed as follows:
Text Book : Basic Concepts and
Methodology for the Health Sciences 146
Child
X
Y
X2
Y2
XY
A 12
12 144
144
144
B
10
8
100
64
80
C
6
12
36
144
72
D
16
11
256
121
176
E
8
10
64 100
80
F
9
8
81
64
72
G
12
16
144
256
192
H
11
15
121
225
165
∑ 84 92 946 1118 981
Text Book : Basic Concepts and
Methodology for the Health Sciences 147
Table 2 : Chest circumference and Birth
Weight of 10 babies
• X(cm) y(kg) x2 y2 xy
• ___________________________________________________
• 22.4 2.00 501.76 4.00 44.8
• 27.5 2.25 756.25 5.06 61.88
• 28.5 2.10 812.25 4.41 59.85
• 28.5 2.35 812.25 5.52 66.98
• 29.4 2.45 864.36 6.00 72.03
• 29.4 2.50 864.36 6.25 73.5
• 30.5 2.80 930.25 7.84 85.4
• 32.0 2.80 1024.0 7.84 89.6
• 31.4 2.55 985.96 6.50 80.07
• 32.5 3.00 1056.25 9.00 97.5
• TOTAL
• 292.1 24.8 8607.69 62.42 731.61
Text Book : Basic Concepts and
Methodology for the Health Sciences 148
Checking for significance
• There appears to be a strong between chest circumference and birth
weight in babies.
• We need to check that such a correlation is unlikely to have arisen by
in a sample of ten babies.
• Tables are available that gives the significant values of this correlation
ratio at two probability levels.
• First we need to work out degrees of freedom. They are the number
of pair of observations less two, that is (n – 2)= 8.
• Looking at the table we find that our calculated value of 0.86 exceeds
the tabulated value at 8 df of 0.765 at p= 0.01. Our correlation is
therefore statistically highly significant.

More Related Content

PPTX
Seminar 10 BIOSTATISTICS
PPTX
Biostatistics and data analysis
PDF
Introduction to biostatistics
PPTX
Biostatistics : Types of Variable
PPTX
Introduction to biostatistic
PPT
Biostatistics lec 1
PPTX
INTRODUCTION TO BIO STATISTICS
PPTX
Fundamentals of biostatistics
Seminar 10 BIOSTATISTICS
Biostatistics and data analysis
Introduction to biostatistics
Biostatistics : Types of Variable
Introduction to biostatistic
Biostatistics lec 1
INTRODUCTION TO BIO STATISTICS
Fundamentals of biostatistics

What's hot (20)

PPTX
22 sammelan-bikram-shahi-journal club presentation
PPT
Part 1 Survival Analysis
PDF
Data Collection Methods
PPTX
systematic review and metaanalysis
PPTX
Descriptive research
PPTX
Lec. biostatistics introduction
PPT
Dialysis nurse kpi
PPTX
Hypothesis testing lectures
PPTX
Statistical software packages ,their layout & applications
PPT
Inferential Statistics
PDF
Introduction To Survival Analysis
PPT
Clinical epidemiology
PPTX
4. case control studies
PPTX
Literature review in research methodology
PPTX
Causation in epidemiology
PPTX
Basic concept of statistics
PPTX
Cross-Sectional Study.pptx
PPTX
research process Presentation .pptx
PDF
Basic survival analysis
PDF
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
22 sammelan-bikram-shahi-journal club presentation
Part 1 Survival Analysis
Data Collection Methods
systematic review and metaanalysis
Descriptive research
Lec. biostatistics introduction
Dialysis nurse kpi
Hypothesis testing lectures
Statistical software packages ,their layout & applications
Inferential Statistics
Introduction To Survival Analysis
Clinical epidemiology
4. case control studies
Literature review in research methodology
Causation in epidemiology
Basic concept of statistics
Cross-Sectional Study.pptx
research process Presentation .pptx
Basic survival analysis
Measure of dispersion part I (Range, Quartile Deviation, Interquartile devi...
Ad

Similar to biostatfinal_(2).ppt (20)

PPT
1 stat.ppt
PPTX
Biostatistics PowerPoint Presentation...
PPTX
Descriptive statistics
PDF
1_Introduction to Biostatistics-2 (2).pdf
PDF
1_Introduction to Biostatistics-2 (2).pdf
PDF
1Measurements of health and disease_Introduction.pdf
PPTX
Biostatistics
PPT
statistics introduction.ppt
PDF
Introduction to Applied Biostatistics in public health
PPTX
1 Introduction to Biostatistics.pptx
PPTX
Introduction to basics of bio statistics.
PPTX
Biostatistics
PDF
lecture introduction to biostatics 1.pdf
PDF
1 Introduction to Biostatistics.pdf
PPTX
bio 1 & 2.pptx
PPTX
Chapter-1;-introduction to biostatistics-1.pptx
PPTX
Introduction to statistics.pptx
PPTX
Biostatistics khushbu
PPTX
Biostatistics introduction.pptx
PPTX
BIOSTATISTICS (MPT) 11 (1).pptx
1 stat.ppt
Biostatistics PowerPoint Presentation...
Descriptive statistics
1_Introduction to Biostatistics-2 (2).pdf
1_Introduction to Biostatistics-2 (2).pdf
1Measurements of health and disease_Introduction.pdf
Biostatistics
statistics introduction.ppt
Introduction to Applied Biostatistics in public health
1 Introduction to Biostatistics.pptx
Introduction to basics of bio statistics.
Biostatistics
lecture introduction to biostatics 1.pdf
1 Introduction to Biostatistics.pdf
bio 1 & 2.pptx
Chapter-1;-introduction to biostatistics-1.pptx
Introduction to statistics.pptx
Biostatistics khushbu
Biostatistics introduction.pptx
BIOSTATISTICS (MPT) 11 (1).pptx
Ad

Recently uploaded (20)

PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PDF
.pdf is not working space design for the following data for the following dat...
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Business Analytics and business intelligence.pdf
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
climate analysis of Dhaka ,Banglades.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Business Acumen Training GuidePresentation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Fluorescence-microscope_Botany_detailed content
.pdf is not working space design for the following data for the following dat...
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
STUDY DESIGN details- Lt Col Maksud (21).pptx
Clinical guidelines as a resource for EBP(1).pdf
Business Analytics and business intelligence.pdf
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
1_Introduction to advance data techniques.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg

biostatfinal_(2).ppt

  • 1. Lectures of Stat -145 (Biostatistics) Text book Biostatistics Basic Concepts and Methodology for the Health Sciences By Wayne W. Daniel
  • 2. Text Book : Basic Concepts and Methodology for the Health Sciences 2 Chapter 1 Introduction To Biostatistics
  • 3. Text Book : Basic Concepts and Methodology for the Health Sciences 3  Key words :  Statistics , data , Biostatistics,  Variable ,Population ,Sample
  • 4. Text Book : Basic Concepts and Methodology for the Health 4 Introduction Some Basic concepts Statistics is a field of study concerned with 1- collection, organization, summarization and analysis of data. 2- drawing of inferences about a body of data when only a part of the data is observed. Statisticians try to interpret and communicate the results to others.
  • 5. Text Book : Basic Concepts and Methodology for the Health 5 * Biostatistics: The tools of statistics are employed in many fields: business, education, psychology, agriculture, economics, … etc. When the data analyzed are derived from the biological science and medicine, we use the term biostatistics to distinguish this particular application of statistical tools and concepts.
  • 6. Text Book : Basic Concepts and Methodology for the Health 6 Data: • The raw material of Statistics is data. • We may define data as figures. Figures result from the process of counting or from taking a measurement. • For example: • - When a hospital administrator counts the number of patients (counting). • - When a nurse weighs a patient (measurement)
  • 7. Text Book : Basic Concepts and Methodology for the Health 7 We search for suitable data to serve as the raw material for our investigation. Such data are available from one or more of the following sources: 1- Routinely kept records. For example: - Hospital medical records contain immense amounts of information on patients. - Hospital accounting records contain a wealth of data on the facility’s business - activities. * Sources of Data:
  • 8. Text Book : Basic Concepts and Methodology for the Health 8 2- External sources. The data needed to answer a question may already exist in the form of published reports, commercially available data banks, or the research literature, i.e. someone else has already asked the same question.
  • 9. Text Book : Basic Concepts and Methodology for the Health 9 3- Surveys: The source may be a survey, if the data needed is about answering certain questions. For example: If the administrator of a clinic wishes to obtain information regarding the mode of transportation used by patients to visit the clinic, then a survey may be conducted among patients to obtain this information.
  • 10. Text Book : Basic Concepts and Methodology for the Health 10 4- Experiments. Frequently the data needed to answer a question are available only as the result of an experiment. For example: If a nurse wishes to know which of several strategies is best for maximizing patient compliance, she might conduct an experiment in which the different strategies of motivating compliance are tried with different patients.
  • 11. Text Book : Basic Concepts and Methodology for the Health 11 * A variable: It is a characteristic that takes on different values in different persons, places, or things. For example: - heart rate, - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a dental clinic.
  • 12. Text Book : Basic Concepts and Methodology for the Health 12 Quantitative Variables It can be measured in the usual sense. For example: - the heights of adult males, - the weights of preschool children, - the ages of patients seen in a - dental clinic. Qualitative Variables Many characteristics are not capable of being measured. Some of them can be ordered or ranked. For example: - classification of people into socio-economic groups, - social classes based on income, education, etc. Types of variables Quantitative Qualitative
  • 13. Text Book : Basic Concepts and Methodology for the Health 13 A discrete variable is characterized by gaps or interruptions in the values that it can assume. For example: - The number of daily admissions to a general hospital, - The number of decayed, missing or filled teeth per child - in an - elementary - school. A continuous variable can assume any value within a specified relevant interval of values assumed by the variable. For example: - Height, - weight, - skull circumference. No matter how close together the observed heights of two people, we can find another person whose height falls somewhere in between. Types of quantitative variables Discrete Continuous
  • 14. Text Book : Basic Concepts and Methodology for the Health 14 * A population: It is the largest collection of values of a random variable for which we have an interest at a particular time. For example: The weights of all the children enrolled in a certain elementary school. Populations may be finite or infinite.
  • 15. Text Book : Basic Concepts and Methodology for the Health 15 * A sample: It is a part of a population. For example: The weights of only a fraction of these children.
  • 17. These are observations that remain the same from person to person, from time to time, or from place to place. Examples; 1- number of eyes, fingers, ears… etc. 2- number of minutes in an hour 3- the speed of light 4- no. of centimeters in an inch CONSTANT DATA 17
  • 18. VARIABLE DATA 1 These are observations, which vary from one person to another or from one group of members to others and are classified as following:  Statistically: 1. Quantitative variable data 2. Qualitative variable data  Epidemiologically: 1. Dependant (outcome variable) 2. Independent (study variables)  Clinically: - Measured (BP, Lab. parameters, etc.) - Counted (Pulse rate, resp. rate, etc.) - Observed (Jaundice, pallor, wound infection) - Subjective (headache, colic, etc.) 18
  • 19. VARIABLE DATA 2 Statistically, variable could be: - Quantitative variable: a- Continuous quantitative b- Discrete quantitative - Qualitative variable: a- Nominal qualitative b- Ordinal qualitative 19
  • 20. VARIABLE DATA 3 1- Quantitative variable: These may be continuous or discrete. a- Continuous quantitative variable: Which are obtained by measurement and its value could be integer or fractionated value. Examples: Weight, height, Hgb, age, volume of urine. b-Discrete quantitative variable: Which are obtained by enumeration and its value is always integer value. Examples: Pulse, family size, number of live births. 20
  • 21. Continuous Variable 0 3 2 1 -2 -1 - 3 0 1 2 3 Discrete Variable Continuous & Discrete Variables 21
  • 22. 2- Qualitative variable: Which are expressed in quality and cannot be enumerated or measured but can be categorized only. They can be ordinal or nominal. a- Nominal qualitative: can not be put in order, and is further subdivided into dichotomous (e.g. sex, male/female and Yes/No variables) and multichotomous (e.g. blood groups, A, B, AB, O). b- Ordinal qualitative: can be put in order. e.g. degree of success, level of education, stage of disease. VARIABLE DATA 4 22
  • 23. Epidemiologically, variable could be: Dependent Variable: Usually the health outcome(s) that you are studying. Independent Variables: Risk factors, casual factors, experimental treatment, and other relevant factors. They also termed “predictors”. e.g. Cancer lung is the dependent variable while smoking is independent variable. VARIABLE DATA 5 23
  • 24. Section (2.4) : Descriptive Statistics Measures of Central Tendency Page 38 - 41
  • 25. Text Book : Basic Concepts and Methodology for the Health Sciences 25 key words: Descriptive Statistic, measure of central tendency ,statistic, parameter, mean (μ) ,median, mode.
  • 26. Text Book : Basic Concepts and Methodology for the Health Sciences 26 The Statistic and The Parameter • A Statistic: It is a descriptive measure computed from the data of a sample. • A Parameter: It is a a descriptive measure computed from the data of a population. Since it is difficult to measure a parameter from the population, a sample is drawn of size n, whose values are  1 ,  2 , …,  n. From this data, we measure the statistic.
  • 27. Text Book : Basic Concepts and Methodology for the Health Sciences 27 Measures of Central Tendency A measure of central tendency is a measure which indicates where the middle of the data is. The three most commonly used measures of central tendency are: The Mean, the Median, and the Mode. The Mean: It is the average of the data.
  • 28. Text Book : Basic Concepts and Methodology for the Health Sciences 28 The Population Mean:  = which is usually unknown, then we use the sample mean to estimate or approximate it. The Sample Mean: = Example: Here is a random sample of size 10 of ages, where  1 = 42,  2 = 28,  3 = 28,  4 = 61,  5 = 31,  6 = 23,  7 = 50,  8 = 34,  9 = 32,  10 = 37. = (42 + 28 + … + 37) / 10 = 36.6 x 1 N i i N X   x 1 n i i n x  
  • 29. Text Book : Basic Concepts and Methodology for the Health Sciences 29 Properties of the Mean: • Uniqueness. For a given set of data there is one and only one mean. • Simplicity. It is easy to understand and to compute. • Affected by extreme values. Since all values enter into the computation. Example: Assume the values are 115, 110, 119, 117, 121 and 126. The mean = 118. But assume that the values are 75, 75, 80, 80 and 280. The mean = 118, a value that is not representative of the set of data as a whole.
  • 30. Text Book : Basic Concepts and Methodology for the Health Sciences 30 The Median: When ordering the data, it is the observation that divide the set of observations into two equal parts such that half of the data are before it and the other are after it. * If n is odd, the median will be the middle of observations. It will be the (n+1)/2 th ordered observation. When n = 11, then the median is the 6th observation. * If n is even, there are two middle observations. The median will be the mean of these two middle observations. It will be the (n+1)/2 th ordered observation. When n = 12, then the median is the 6.5th observation, which is an observation halfway between the 6th and 7th ordered observation.
  • 31. Text Book : Basic Concepts and Methodology for the Health Sciences 31 Example: For the same random sample, the ordered observations will be as: 23, 28, 28, 31, 32, 34, 37, 42, 50, 61. Since n = 10, then the median is the 5.5th observation, i.e. = (32+34)/2 = 33. Properties of the Median: • Uniqueness. For a given set of data there is one and only one median. • Simplicity. It is easy to calculate. • It is not affected by extreme values as is the mean.
  • 32. Text Book : Basic Concepts and Methodology for the Health Sciences 32 The Mode: It is the value which occurs most frequently. If all values are different there is no mode. Sometimes, there are more than one mode. Example: For the same random sample, the value 28 is repeated two times, so it is the mode. Properties of the Mode: • Sometimes, it is not unique. • It may be used for describing qualitative data.
  • 33. Section (2.5) : Descriptive Statistics Measures of Dispersion Page 43 - 46
  • 34. Text Book : Basic Concepts and Methodology for the Health Sciences 34 key words: Descriptive Statistic, measure of dispersion , range ,variance, coefficient of variation.
  • 35. Text Book : Basic Concepts and Methodology for the Health Sciences 35 2.5. Descriptive Statistics – Measures of Dispersion: • A measure of dispersion conveys information regarding the amount of variability present in a set of data. • Note: 1. If all the values are the same → There is no dispersion . 2. If all the values are different → There is a dispersion: 3.If the values close to each other →The amount of Dispersion small. b) If the values are widely scattered → The Dispersion is greater.
  • 36. Text Book : Basic Concepts and Methodology for the Health Sciences 36 Ex. Figure 2.5.1 –Page 43 • ** Measures of Dispersion are : 1.Range (R). 2. Variance. 3. Standard deviation. 4.Coefficient of variation (C.V).
  • 37. Text Book : Basic Concepts and Methodology for the Health Sciences 37 1.The Range (R): • Range =Largest value- Smallest value = • Note: • Range concern only onto two values • Example 2.5.1 Page 40: • Refer to Ex 2.4.2.Page 37 • Data: • 43,66,61,64,65,38,59,57,57,50. • Find Range? • Range=66-38=28 S L x x 
  • 38. Text Book : Basic Concepts and Methodology for the Health Sciences 38 2.The Variance: • It measure dispersion relative to the scatter of the values a bout there mean. a) Sample Variance ( ) : • ,where is sample mean • Example 2.5.2 Page 40: • Refer to Ex 2.4.2.Page 37 • Find Sample Variance of ages , = 56 • Solution: • S2= [(43-56) 2 +(66-56) 2+…..+(50-56) 2 ]/ 10 • = 900/10 = 90 x 2 S 1 ) ( 1 2 2      n x x S n i i x
  • 39. Text Book : Basic Concepts and Methodology for the Health Sciences 39 • b)Population Variance ( ) : • where , is Population mean 3.The Standard Deviation: • is the square root of variance= a) Sample Standard Deviation = S = b) Population Standard Deviation = σ = 2  N x N i i     1 2 2 ) (   Varince 2 S 2 
  • 40. Text Book : Basic Concepts and Methodology for the Health Sciences 40 4.The Coefficient of Variation (C.V): • Is a measure used to compare the dispersion in two sets of data which is independent of the unit of the measurement . • where S: Sample standard deviation. • : Sample mean. ) 100 ( . X S V C  X
  • 41. Text Book : Basic Concepts and Methodology for the Health Sciences 41 Example 2.5.3 Page 46: • Suppose two samples of human males yield the following data: Sampe1 Sample2 Age 25-year-olds 11year-olds Mean weight 145 pound 80 pound Standard deviation 10 pound 10 pound
  • 42. Text Book : Basic Concepts and Methodology for the Health Sciences 42 • We wish to know which is more variable. • Solution: • c.v (Sample1)= (10/145)*100%= 6.9% • c.v (Sample2)= (10/80)*100%= 12.5% • Then age of 11-years old(sample2) is more variation
  • 43. Chapter 4: Probabilistic features of certain data Distributions Pages 93- 111
  • 44. Text Book : Basic Concepts and Methodology for the Health Sciences 44 Key words Probability distribution , random variable , Bernolli distribution, Binomail distribution, Poisson distribution
  • 45. Text Book : Basic Concepts and Methodology for the Health Sciences 45 The Random Variable (X): When the values of a variable (height, weight, or age) can’t be predicted in advance, the variable is called a random variable. An example is the adult height. When a child is born, we can’t predict exactly his or her height at maturity.
  • 46. Text Book : Basic Concepts and Methodology for the Health Sciences 46 4.2 Probability Distributions for Discrete Random Variables Definition: The probability distribution of a discrete random variable is a table, graph, formula, or other device used to specify all possible values of a discrete random variable along with their respective probabilities.
  • 47. Text Book : Basic Concepts and Methodology for the Health Sciences 47 The Cumulative Probability Distribution of X, F(x): It shows the probability that the variable X is less than or equal to a certain value, P(X  x).
  • 48. Text Book : Basic Concepts and Methodology for the Health Sciences 48 Example 4.2.1 page 94: F(x)= P(X≤ x) P(X=x) frequenc y Number of Programs 0.2088 0.2088 62 1 0.3670 0.1582 47 2 0.4983 0.1313 39 3 0.6296 0.1313 39 4 0.8249 0.1953 58 5 0.9495 0.1246 37 6 0.9630 0.0135 4 7 1.0000 0.0370 11 8 1.0000 297 Total
  • 49. Text Book : Basic Concepts and Methodology for the Health Sciences 49 See figure 4.2.1 page 96 See figure 4.2.2 page 97 Properties of probability distribution of discrete random variable. 1. 2. 3. P(a  X  b) = P(X  b) – P(X  a-1) 4. P(X < b) = P(X  b-1) 0 ( ) 1 P X x    ( ) 1 P X x   
  • 50. Text Book : Basic Concepts and Methodology for the Health Sciences 50 Example 4.2.2 page 96: (use table in example 4.2.1) What is the probability that a randomly selected family will be one who used three assistance programs? Example 4.2.3 page 96: (use table in example 4.2.1) What is the probability that a randomly selected family used either one or two programs?
  • 51. Text Book : Basic Concepts and Methodology for the Health Sciences 51 Example 4.2.4 page 98: (use table in example 4.2.1) What is the probability that a family picked at random will be one who used two or fewer assistance programs? Example 4.2.5 page 98: (use table in example 4.2.1) What is the probability that a randomly selected family will be one who used fewer than four programs? Example 4.2.6 page 98: (use table in example 4.2.1) What is the probability that a randomly selected family used five or more programs?
  • 52. Text Book : Basic Concepts and Methodology for the Health Sciences 52 Example 4.2.7 page 98: (use table in example 4.2.1) What is the probability that a randomly selected family is one who used between three and five programs, inclusive?
  • 53. Text Book : Basic Concepts and Methodology for the Health Sciences 53 4.3 The Binomial Distribution: The binomial distribution is one of the most widely encountered probability distributions in applied statistics. It is derived from a process known as a Bernoulli trial. Bernoulli trial is : When a random process or experiment called a trial can result in only one of two mutually exclusive outcomes, such as dead or alive, sick or well, the trial is called a Bernoulli trial.
  • 54. Text Book : Basic Concepts and Methodology for the Health Sciences 54 The Bernoulli Process A sequence of Bernoulli trials forms a Bernoulli process under the following conditions 1- Each trial results in one of two possible, mutually exclusive, outcomes. One of the possible outcomes is denoted (arbitrarily) as a success, and the other is denoted a failure. 2- The probability of a success, denoted by p, remains constant from trial to trial. The probability of a failure, 1-p, is denoted by q. 3- The trials are independent, that is the outcome of any particular trial is not affected by the outcome of any other trial
  • 55. Text Book : Basic Concepts and Methodology for the Health Sciences 55 The probability distribution of the binomial random variable X, the number of successes in n independent trials is: Where is the number of combinations of n distinct objects taken x of them at a time. * Note: 0! =1 ( ) ( ) , 0,1,2,...., X n X n f x P X x p q x n x              n x         ! !( )! n n x n x x           ! ( 1)( 2)....(1) x x x x   
  • 56. Text Book : Basic Concepts and Methodology for the Health Sciences 56 Properties of the binomial distribution 1. 2. 3.The parameters of the binomial distribution are n and p 4. 5. ( ) 0 f x  ( ) 1 f x   ( ) E X np    2 var( ) (1 ) X np p    
  • 57. Text Book : Basic Concepts and Methodology for the Health Sciences 57 Example 4.3.1 page 100 If we examine all birth records from the North Carolina State Center for Health statistics for year 2001, we find that 85.8 percent of the pregnancies had delivery in week 37 or later (full- term birth). If we randomly selected five birth records from this population what is the probability that exactly three of the records will be for full-term births? Exercise: example 4.3.2 page 104
  • 58. Text Book : Basic Concepts and Methodology for the Health Sciences 58 Example 4.3.3 page 104 Suppose it is known that in a certain population 10 percent of the population is color blind. If a random sample of 25 people is drawn from this population, find the probability that a) Five or fewer will be color blind. b) Six or more will be color blind c) Between six and nine inclusive will be color blind. d) Two, three, or four will be color blind. Exercise: example 4.3.4 page 106
  • 59. Text Book : Basic Concepts and Methodology for the Health Sciences 59 4.4 The Poisson Distribution If the random variable X is the number of occurrences of some random event in a certain period of time or space (or some volume of matter). The probability distribution of X is given by: f (x) =P(X=x) = ,x = 0,1,….. The symbol e is the constant equal to 2.7183. (Lambda) is called the parameter of the distribution and is the average number of occurrences of the random event in the interval (or volume) ! x x e    
  • 60. Text Book : Basic Concepts and Methodology for the Health Sciences 60 Properties of the Poisson distribution 1. 2. 3. 4. ( ) 0 f x  ( ) 1 f x   ( ) E X     2 var( ) X    
  • 61. Text Book : Basic Concepts and Methodology for the Health Sciences 61 Example 4.4.1 page 111 In a study of a drug -induced anaphylaxis among patients taking rocuronium bromide as part of their anesthesia, Laake and Rottingen found that the occurrence of anaphylaxis followed a Poisson model with =12 incidents per year in Norway .Find 1- The probability that in the next year, among patients receiving rocuronium, exactly three will experience anaphylaxis? 
  • 62. Text Book : Basic Concepts and Methodology for the Health Sciences 62 2- The probability that less than two patients receiving rocuronium, in the next year will experience anaphylaxis? 3- The probability that more than two patients receiving rocuronium, in the next year will experience anaphylaxis? 4- The expected value of patients receiving rocuronium, in the next year who will experience anaphylaxis. 5- The variance of patients receiving rocuronium, in the next year who will experience anaphylaxis 6- The standard deviation of patients receiving rocuronium, in the next year who will experience anaphylaxis
  • 63. Text Book : Basic Concepts and Methodology for the Health Sciences 63 Example 4.4.2 page 111: Refer to example 4.4.1 1-What is the probability that at least three patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia? 2-What is the probability that exactly one patient in the next year will experience anaphylaxis if rocuronium is administered with anesthesia? 3-What is the probability that none of the patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia?
  • 64. Text Book : Basic Concepts and Methodology for the Health Sciences 64 4-What is the probability that at most two patients in the next year will experience anaphylaxis if rocuronium is administered with anesthesia? Exercises: examples 4.4.3, 4.4.4 and 4.4.5 pages111-113 Exercises: Questions 4.3.4 ,4.3.5, 4.3.7 ,4.4.1,4.4.5
  • 66. Text Book : Basic Concepts and Methodology for the Health 66 • Key words: Continuous random variable, normal distribution , standard normal distribution , T-distribution
  • 67. Text Book : Basic Concepts and Methodology for the Health 67 • Now consider distributions of continuous random variables.
  • 68. Text Book : Basic Concepts and Methodology for the Health 68 1- Area under the curve = 1. 2- P(X = a) = 0 , where a is a constant. 3- Area between two points a , b = P(a<x<b) . Properties of continuous probability Distributions:
  • 69. Text Book : Basic Concepts and Methodology for the Health 69 4.6 The normal distribution: • It is one of the most important probability distributions in statistics. • The normal density is given by • , - ∞ < x < ∞, - ∞ < µ < ∞, σ > 0 • π, e : constants • µ: population mean. • σ : Population standard deviation. 2 2 2 ) ( 2 1 ) (        x e x f
  • 70. Text Book : Basic Concepts and Methodology for the Health 70 Characteristics of the normal distribution: Page 111 • The following are some important characteristics of the normal distribution: 1- It is symmetrical about its mean, µ. 2- The mean, the median, and the mode are all equal. 3- The total area under the curve above the x-axis is one. 4-The normal distribution is completely determined by the parameters µ and σ.
  • 71. Text Book : Basic Concepts and Methodology for the Health 71 5- The normal distribution depends on the two parameters  and .  determines the location of the curve. (As seen in figure 4.6.3) , But,  determines the scale of the curve, i.e. the degree of flatness or peaked ness of the curve. (as seen in figure 4.6.4) 1 2 3 1 < 2 < 3  1 2 3 1 < 2 < 3
  • 72. Text Book : Basic Concepts and Methodology for the Health 72 Note that : (As seen in Figure 4.6.2) 1. P( µ- σ < x < µ+ σ) = 0.68 2. P( µ- 2σ< x < µ+ 2σ)= 0.95 3. P( µ-3σ < x < µ+ 3σ) = 0.997
  • 73. Text Book : Basic Concepts and Methodology for the Health 73 The Standard normal distribution: • Is a special case of normal distribution with mean equal 0 and a standard deviation of 1. • The equation for the standard normal distribution is written as • , - ∞ < z < ∞ 2 2 2 1 ) ( z e z f   
  • 74. Text Book : Basic Concepts and Methodology for the Health 74 Characteristics of the standard normal distribution 1- It is symmetrical about 0. 2- The total area under the curve above the x-axis is one. 3- We can use table (D) to find the probabilities and areas.
  • 75. Text Book : Basic Concepts and Methodology for the Health 75 “How to use tables of Z” Note that The cumulative probabilities P(Z  z) are given in tables for -3.49 < z < 3.49. Thus, P (-3.49 < Z < 3.49)  1. For standard normal distribution, P (Z > 0) = P (Z < 0) = 0.5 Example 4.6.1: If Z is a standard normal distribution, then 1) P( Z < 2) = 0.9772 is the area to the left to 2 and it equals 0.9772. 2
  • 76. Text Book : Basic Concepts and Methodology for the Health 76 Example 4.6.2: P(-2.55 < Z < 2.55) is the area between -2.55 and 2.55, Then it equals P(-2.55 < Z < 2.55) =0.9946 – 0.0054 = 0.9892. Example 4.6.2: P(-2.74 < Z < 1.53) is the area between -2.74 and 1.53. P(-2.74 < Z < 1.53) =0.9370 – 0.0031 = 0.9339. -2.74 1.53 -2.55 2.55 0
  • 77. Text Book : Basic Concepts and Methodology for the Health 77 Example 4.6.3: P(Z > 2.71) is the area to the right to 2.71. So, P(Z > 2.71) =1 – 0.9966 = 0.0034. Example : P(Z = 0.84) is the area at z = 2.71. So, P(Z = 0.84) =1 – 0.9966 = 0.0034 0.84 2.71
  • 78. Text Book : Basic Concepts and Methodology for the Health 78 How to transform normal distribution (X) to standard normal distribution (Z)? • This is done by the following formula: • Example: • If X is normal with µ = 3, σ = 2. Find the value of standard normal Z, If X= 6? • Answer:     x z 5 . 1 2 3 6        x z
  • 79. Text Book : Basic Concepts and Methodology for the Health 79 4.7 Normal Distribution Applications The normal distribution can be used to model the distribution of many variables that are of interest. This allow us to answer probability questions about these random variables. Example 4.7.1: The ‘Uptime ’is a custom-made light weight battery-operated activity monitor that records the amount of time an individual spend the upright position. In a study of children ages 8 to 15 years. The researchers found that the amount of time children spend in the upright position followed a normal distribution with Mean of 5.4 hours and standard deviation of 1.3.Find
  • 80. Text Book : Basic Concepts and Methodology for the Health 80 If a child selected at random ,then 1-The probability that the child spend less than 3 hours in the upright position 24-hour period P( X < 3) = P( < ) = P(Z < -1.85) = 0.0322 ------------------------------------------------------------------------- 2-The probability that the child spend more than 5 hours in the upright position 24-hour period P( X > 5) = P( > ) = P(Z > -0.31) = 1- P(Z < - 0.31) = 1- 0.3520= 0.648 ----------------------------------------------------------------------- 3-The probability that the child spend exactly 6.2 hours in the upright position 24-hour period P( X = 6.2) = 0    X 3 . 1 4 . 5 3     X 3 . 1 4 . 5 5 
  • 81. Text Book : Basic Concepts and Methodology for the Health 81 4-The probability that the child spend from 4.5 to 7.3 hours in the upright position 24-hour period P( 4.5 < X < 7.3) = P( < < ) = P( -0.69 < Z < 1.46 ) = P(Z<1.46) – P(Z< -0.69) = 0.9279 – 0.2451 = 0.6828 • Hw…EX. 4.7.2 – 4.7.3    X 3 . 1 4 . 5 5 . 4  3 . 1 4 . 5 3 . 7 
  • 82. Text Book : Basic Concepts and Methodology for the Health 82 6.3 The T Distribution: (167-173) 1- It has mean of zero. 2- It is symmetric about the mean. 3- It ranges from - to . 0
  • 83. Text Book : Basic Concepts and Methodology for the Health 83 4- compared to the normal distribution, the t distribution is less peaked in the center and has higher tails. 5- It depends on the degrees of freedom (n-1). 6- The t distribution approaches the standard normal distribution as (n-1) approaches .
  • 84. Text Book : Basic Concepts and Methodology for the Health 84 Examples t (7, 0.975) = 2.3646 ------------------------------ t (24, 0.995) = 2.7696 -------------------------- If P (T(18) > t) = 0.975, then t = -2.1009 ------------------------- If P (T(22) < t) = 0.99, then t = 2.508 0.005 t (24, 0.995) 0.995 t (7, 0.975) 0.025 0.975 t 0.975 0.025 0.99 0.01 t
  • 85. Chapter 7 Using sample statistics to Test Hypotheses about population parameters Pages 215-233
  • 86. Text Book : Basic Concepts and Methodology for the Health Sciences 86  Key words :  Null hypothesis H0, Alternative hypothesis HA , testing hypothesis , test statistic , P-value
  • 87. Text Book : Basic Concepts and Methodology for the Health Sciences 87 Hypothesis Testing  One type of statistical inference, estimation, was discussed in Chapter 6 .  The other type ,hypothesis testing ,is discussed in this chapter.
  • 88. Text Book : Basic Concepts and Methodology for the Health Sciences 88 Definition of a hypothesis  It is a statement about one or more populations . It is usually concerned with the parameters of the population. e.g. the hospital administrator may want to test the hypothesis that the average length of stay of patients admitted to the hospital is 5 days
  • 89. Text Book : Basic Concepts and Methodology for the Health Sciences 89 Definition of Statistical hypotheses  They are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical techniques.  There are two hypotheses involved in hypothesis testing  Null hypothesis H0: It is the hypothesis to be tested .  Alternative hypothesis HA : It is a statement of what we believe is true if our sample data cause us to reject the null hypothesis
  • 90. Text Book : Basic Concepts and Methodology for the Health Sciences 90 7.2 Testing a hypothesis about the mean of a population:  We have the following steps: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if is unknown 2. Assumptions : We have two cases:  Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large),  Case 2: Population is not normal with known or unknown variance (n is large i.e. n≥30). x
  • 91. Text Book : Basic Concepts and Methodology for the Health Sciences 91  3.Hypotheses:  we have three cases  Case I : H0: μ=μ0 HA: μ μ0  e.g. we want to test that the population mean is different than 50  Case II : H0: μ = μ0 HA: μ > μ0  e.g. we want to test that the population mean is greater than 50  Case III : H0: μ = μ0 HA: μ< μ0  e.g. we want to test that the population mean is less than 50  
  • 92. Text Book : Basic Concepts and Methodology for the Health Sciences 92 4.Test Statistic:  Case 1: population is normal or approximately normal σ2 is known σ2 is unknown ( n large or small) n large n small  Case2: If population is not normally distributed and n is large  i)If σ2 is known ii) If σ2 is unknown n X Z  o -  n s X Z o -   n s X T o -   n s X Z o -   n X Z  o - 
  • 93. Text Book : Basic Concepts and Methodology for the Health Sciences 93 5.Decision Rule: i) If HA: μ μ0  Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2,n-1 or T< - t1-α/2,n-1 (when use T- test)  __________________________  ii) If HA: μ> μ0  Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,n-1 (when use T - test) 
  • 94. Text Book : Basic Concepts and Methodology for the Health Sciences 94  iii) If HA: μ< μ0 Reject H0 if Z< - Z1-α (when use Z - test)  Or Reject H0 if T<- t1-α,n-1 (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n-1) degree of freedom (df)
  • 95. Text Book : Basic Concepts and Methodology for the Health Sciences 95  6.Decision :  If we reject H0, we can conclude that HA is true.  If ,however ,we do not reject H0, we may conclude that H0 is true.
  • 96. Text Book : Basic Concepts and Methodology for the Health Sciences 96 An Alternative Decision Rule using the p - value Definition  The p-value is defined as the smallest value of α for which the null hypothesis can be rejected.  If the p-value is less than or equal to α ,we reject the null hypothesis (p ≤ α)  If the p-value is greater than α ,we do not reject the null hypothesis (p > α)
  • 97. Text Book : Basic Concepts and Methodology for the Health Sciences 97 Example 7.2.1 Page 223  Researchers are interested in the mean age of a certain population.  A random sample of 10 individuals drawn from the population of interest has a mean of 27.  Assuming that the population is approximately normally distributed with variance 20,can we conclude that the mean is different from 30 years ? (α=0.05) .  If the p - value is 0.0340 how can we use it in making a decision?
  • 98. Text Book : Basic Concepts and Methodology for the Health Sciences 98 Solution 1-Data: variable is age, n=10, =27 ,σ2=20,α=0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses:  H0 : μ=30  HA: μ 30 x 
  • 99. Text Book : Basic Concepts and Methodology for the Health Sciences 99 4-Test Statistic:  Z = -2.12 5.Decision Rule  The alternative hypothesis is  HA: μ > 30  Hence we reject H0 if Z >Z1-0.025/2= Z0.975  or Z< - Z1-0.025/2= - Z0.975  Z0.975=1.96(from table D)
  • 100. Text Book : Basic Concepts and Methodology for the Health Sciences 100  6.Decision:  We reject H0 ,since -2.12 is in the rejection region .  We can conclude that μ is not equal to 30  Using the p value ,we note that p-value =0.0340< 0.05,therefore we reject H0
  • 101. Text Book : Basic Concepts and Methodology for the Health Sciences 101 Example7.2.2 page227  Referring to example 7.2.1.Suppose that the researchers have asked: Can we conclude that μ<30. 1.Data.see previous example 2. Assumptions .see previous example 3.Hypotheses:  H0 μ =30  HِA: μ < 30
  • 102. Text Book : Basic Concepts and Methodology for the Health Sciences 102 4.Test Statistic :  = = -2.12 5. Decision Rule: Reject H0 if Z< Z α, where  Z α= -1.645. (from table D) 6. Decision: Reject H0 ,thus we can conclude that the population mean is smaller than 30. n X Z  o -  10 20 30 27 
  • 103. Text Book : Basic Concepts and Methodology for the Health Sciences 103 Example7.2.4 page232  Among 157 African-American men ,the mean systolic blood pressure was 146 mm Hg with a standard deviation of 27. We wish to know if on the basis of these data, we may conclude that the mean systolic blood pressure for a population of African-American is greater than 140. Use α=0.01.
  • 104. Text Book : Basic Concepts and Methodology for the Health Sciences 104 Solution 1. Data: Variable is systolic blood pressure, n=157 , =146, s=27, α=0.01. 2. Assumption: population is not normal, σ2 is unknown 3. Hypotheses: H0 :μ=140 HA: μ>140 4.Test Statistic:  = = = 2.78 n s X Z o -   157 27 140 146  1548 . 2 6
  • 105. Text Book : Basic Concepts and Methodology for the Health Sciences 105 5. Desicion Rule: we reject H0 if Z>Z1-α = Z0.99= 2.33 (from table D) 6. Desicion: We reject H0. Hence we may conclude that the mean systolic blood pressure for a population of African- American is greater than 140.
  • 106. Text Book : Basic Concepts and Methodology for the Health Sciences 106 7.3 Hypothesis Testing :The Difference between two population mean :  We have the following steps: 1.Data: determine variable, sample size (n), sample means, population standard deviation or samples standard deviation (s) if is unknown for two population. 2. Assumptions : We have two cases:  Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large),  Case 2: Population is not normal with known variances (n is large i.e. n≥30).
  • 107. Text Book : Basic Concepts and Methodology for the Health Sciences 107  3.Hypotheses:  we have three cases  Case I : H0: μ 1 = μ2 → μ 1 - μ2 = 0  HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0  e.g. we want to test that the mean for first population is different from second population mean.  Case II : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 > μ 2 → μ 1 - μ 2 > 0  e.g. we want to test that the mean for first population is greater than second population mean.  Case III : H0: μ 1 = μ2 → μ 1 - μ2 = 0 HA: μ 1 < μ 2 → μ 1 - μ 2 < 0  e.g. we want to test that the mean for first population is greater than second population mean.
  • 108. Text Book : Basic Concepts and Methodology for the Health Sciences 108 4.Test Statistic:  Case 1: Two population is normal or approximately normal σ2 is known σ2 is unknown if ( n1 ,n2 large or small) ( n1 ,n2 small) population population Variances Variances equal not equal where 2 2 2 1 2 1 2 1 2 1 ) ( - ) X - X ( n n Z        2 1 2 1 2 1 1 1 ) ( - ) X - X ( n n S T p      2 2 2 1 2 1 2 1 2 1 ) ( - ) X - X ( n S n S T      2 ) 1 (n ) 1 (n 2 1 2 2 2 2 1 1 2       n n S S Sp
  • 109. Text Book : Basic Concepts and Methodology for the Health Sciences 109  Case2: If population is not normally distributed  and n1, n2 is large(n1 ≥ 0 ,n2≥ 0)  and population variances is known, 2 2 2 1 2 1 2 1 2 1 ) ( - ) X - X ( n n Z       
  • 110. Text Book : Basic Concepts and Methodology for the Health Sciences 110 5.Decision Rule: i) If HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0  Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 (when use Z - test) Or Reject H 0 if T >t1-α/2 ,(n1+n2 -2) or T< - t1-α/2,,(n1+n2 -2) (when use T- test)  __________________________  ii) HA: μ 1 > μ 2 → μ 1 - μ 2 > 0  Reject H0 if Z>Z1-α (when use Z - test) Or Reject H0 if T>t1-α,(n1+n2 -2) (when use T - test)
  • 111. Text Book : Basic Concepts and Methodology for the Health Sciences 111  iii) If HA: μ 1 < μ 2 → μ 1 - μ 2 < 0 Reject H0 if Z< - Z1-α (when use Z - test)  Or Reject H0 if T<- t1-α, ,(n1+n2 -2) (when use T - test) Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D t1-α/2 , t1-α , tα are tabulated values obtained from table E with (n1+n2 -2) degree of freedom (df) 6. Conclusion: reject or fail to reject H0
  • 112. Text Book : Basic Concepts and Methodology for the Health Sciences 112 Example7.3.1 page238  Researchers wish to know if the data have collected provide sufficient evidence to indicate a difference in mean serum uric acid levels between normal individuals and individual with Down’s syndrome. The data consist of serum uric reading on 12 individuals with Down’s syndrome from normal distribution with variance 1 and 15 normal individuals from normal distribution with variance 1.5 . The mean are and α=0.05. Solution: 1. Data: Variable is serum uric acid levels, n1=12 , n2=15, σ2 1=1, σ2 2=1.5 ,α=0.05. 100 / 5 . 4 1 mg X  100 / 4 . 3 2 mg X 
  • 113. Text Book : Basic Concepts and Methodology for the Health Sciences 113 2. Assumption: Two population are normal, σ2 1 , σ2 2 are known 3. Hypotheses: H0: μ 1 = μ2 → μ 1 - μ2 = 0  HA: μ 1 ≠ μ 2 → μ 1 - μ 2 ≠ 0 4.Test Statistic:  = = 2.57 5. Desicion Rule: Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2 Z1-α/2= Z1-0.05/2= Z0.975=1.96 (from table D) 6-Conclusion: Reject H0 since 2.57 > 1.96 Or if p-value =0.102→ reject H0 if p < α → then reject H0 2 2 2 1 2 1 2 1 2 1 ) ( - ) X - X ( n n Z        15 5 . 1 12 1 ) 0 ( - 3.4) - (4.5  
  • 114. Text Book : Basic Concepts and Methodology for the Health Sciences 114 Example7.3.2 page 240 The purpose of a study by Tam, was to investigate wheelchair Maneuvering in individuals with over-level spinal cord injury (SCI) And healthy control (C). Subjects used a modified a wheelchair to incorporate a rigid seat surface to facilitate the specified experimental measurements. The data for measurements of the left ischial tuerosity ( ‫المتحرك‬ ‫الكرسي‬ ‫من‬ ‫وتأثيرها‬ ‫الفخذ‬ ‫عظام‬ ) for SCI and control C are shown below 169 150 114 88 117 122 131 124 115 131 C 143 130 119 121 130 163 180 130 150 60 SCI
  • 115. Text Book : Basic Concepts and Methodology for the Health Sciences 115 We wish to know if we can conclude, on the basis of the above data that the mean of left ischial tuberosity for control C lower than mean of left ischial tuerosity for SCI, Assume normal populations equal variances. α=0.05, p-value = -1.33
  • 116. Text Book : Basic Concepts and Methodology for the Health Sciences 116 Solution: 1. Data:, nC=10 , nSCI=10, SC=21.8, SSCI=133.1 ,α=0.05.  , (calculated from data) 2.Assumption: Two population are normal, σ2 1 , σ2 2 are unknown but equal 3. Hypotheses: H0: μ C = μ SCI → μ C - μ SCI = 0 HA: μ C < μ SCI → μ C - μ SCI < 0 4.Test Statistic:  Where, 1 . 126  C X 1 . 133  SCI X 569 . 0 10 1 10 1 04 . 756 0 ) 1 . 133 1 . 126 ( 1 1 ) ( - ) X - X ( 2 1 2 1 2 1          n n S T p   04 . 756 2 10 10 ) 3 . 32 ( 9 ) 8 . 21 ( 9 2 ) 1 (n ) 1 (n 2 2 2 1 2 2 2 2 1 1 2            n n S S Sp
  • 117. Text Book : Basic Concepts and Methodology for the Health Sciences 117 5. Decision Rule: Reject H 0 if T< - T1-α,(n1+n2 -2) T1-α,(n1+n2 -2) = T0.95,18 = 1.7341 (from table E) 6-Conclusion: Fail to reject H0 since -0.569 < - 1.7341 Or Fail to reject H0 since p = -1.33 > α =0.05
  • 118. Text Book : Basic Concepts and Methodology for the Health Sciences 118 Example7.3.3 page 241 Dernellis and Panaretou examined subjects with hypertension and healthy control subjects .One of the variables of interest was the aortic stiffness index. Measures of this variable were calculated From the aortic diameter evaluated by M-mode and blood pressure measured by a sphygmomanometer. Physics wish to reduce aortic stiffness. In the 15 patients with hypertension (Group 1),the mean aortic stiffness index was 19.16 with a standard deviation of 5.29. In the30 control subjects (Group 2),the mean aortic stiffness index was 9.53 with a standard deviation of 2.69. We wish to determine if the two populations represented by these samples differ with respect to mean stiffness index .we wish to know if we can conclude that in general a person with thrombosis have on the average higher IgG levels than persons without thrombosis at α=0.01, p-value = 0.0559
  • 119. Text Book : Basic Concepts and Methodology for the Health Sciences 119 Solution: 1. Data:, n1=53 , n2=54, S1= 44.89, S2= 34.85 α=0.01. 2.Assumption: Two population are not normal, σ2 1 , σ2 2 are unknown and sample size large 3. Hypotheses: H0: μ 1 = μ 2 → μ 1 - μ 2 = 0 HA: μ 1 > μ 2 → μ 1 - μ 2 > 0 4.Test Statistic:  ِ standard deviation Sample Size Mean LgG level Group 44.89 53 59.01 Thrombosis 34.85 54 46.61 No Thrombosis 59 . 1 54 85 . 34 53 89 . 44 0 ) 61 . 46 01 . 59 ( ) ( - ) X - X ( 2 2 2 2 2 1 2 1 2 1 2 1         n S n S Z  
  • 120. Text Book : Basic Concepts and Methodology for the Health Sciences 120 5. Decision Rule: Reject H 0 if Z > Z1-α Z1-α = Z0.99 = 2.33 (from table D) 6-Conclusion: Fail to reject H0 since 1.59 > 2.33 Or Fail to reject H0 since p = 0.0559 > α =0.01
  • 121. Text Book : Basic Concepts and Methodology for the Health Sciences 121 7.5 Hypothesis Testing A single population proportion:  Testing hypothesis about population proportion (P) is carried out in much the same way as for mean when condition is necessary for using normal curve are met  We have the following steps: 1.Data: sample size (n), sample proportion( ) , P0 2. Assumptions :normal distribution , p̂ n a p   sample in the element of no. Total istic charachtar some with sample in the element of no. ˆ
  • 122. Text Book : Basic Concepts and Methodology for the Health Sciences 122  3.Hypotheses:  we have three cases  Case I : H0: P = P0 HA: P ≠ P0  Case II : H0: P = P0 HA: P > P0  Case III : H0: P = P0 HA: P < P0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal n q p p p Z 0 0 0 ˆ  
  • 123. Text Book : Basic Concepts and Methodology for the Health Sciences 123 5.Decision Rule: i) If HA: P ≠ P0  Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2  _______________________  ii) If HA: P> P0  Reject H0 if Z>Z1-α  _____________________________  iii) If HA: P< P0 Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 124. Text Book : Basic Concepts and Methodology for the Health Sciences 124 2. Assumptions : is approximately normaly distributed 3.Hypotheses:  we have three cases  H0: P = 0.063 HA: P > 0.063  4.Test Statistic : 5.Decision Rule: Reject H0 if Z>Z1-α Where Z1-α = Z1-0.05 =Z0.95= 1.645 21 . 1 301 ) 0.937 ( 063 . 0 063 . 0 08 . 0 ˆ 0 0 0      n q p p p Z p̂
  • 125. Text Book : Basic Concepts and Methodology for the Health Sciences 125 6. Conclusion: Fail to reject H0 Since Z =1.21 > Z1-α=1.645 Or , If P-value = 0.1131, fail to reject H0 → P > α
  • 126. Text Book : Basic Concepts and Methodology for the Health Sciences 126 Example7.5.1 page 259 Wagen collected data on a sample of 301 Hispanic women Living in Texas .One variable of interest was the percentage of subjects with impaired fasting glucose (IFG). In the study,24 women were classified in the (IFG) stage .The article cites population estimates for (IFG) among Hispanic women in Texas as 6.3 percent .Is there sufficient evidence to indicate that the population Hispanic women in Texas has a prevalence of IFG higher than 6.3 percent ,let α=0.05 Solution: 1.Data: n = 301, p0 = 6.3/100=0.063 ,a=24, q0 =1- p0 = 1- 0.063 =0.937, α=0.05 08 . 0 301 24 ˆ    n a p
  • 127. Text Book : Basic Concepts and Methodology for the Health Sciences 127 7.6 Hypothesis Testing :The Difference between two population proportion:  Testing hypothesis about two population proportion (P1,, P2 ) is carried out in much the same way as for difference between two means when condition is necessary for using normal curve are met  We have the following steps: 1.Data: sample size (n1 ‫و‬n2), sample proportions( ), Characteristic in two samples (x1 , x2), 2- Assumption : Two populations are independent . 2 1 ˆ , ˆ P P 2 1 2 1 n n x x p   
  • 128. Text Book : Basic Concepts and Methodology for the Health Sciences 128  3.Hypotheses:  we have three cases  Case I : H0: P1 = P2 → P1 - P2 = 0 HA: P1 ≠ P2 → P1 - P2 ≠ 0  Case II : H0: P1 = P2 → P1 - P2 = 0 HA: P1 > P2 → P1 - P2 > 0  Case III : H0: P1 = P2 → P1 - P2 = 0 HA: P1 < P2 → P1 - P2 < 0 4.Test Statistic: Where H0 is true ,is distributed approximately as the standard normal 2 1 2 1 2 1 ) 1 ( ) 1 ( ) ( ) ˆ ˆ ( n p p n p p p p p p Z       
  • 129. Text Book : Basic Concepts and Methodology for the Health Sciences 129 5.Decision Rule: i) If HA: P1 ≠ P2  Reject H 0 if Z >Z1-α/2 or Z< - Z1-α/2  _______________________  ii) If HA: P1 > P2  Reject H0 if Z >Z1-α  _____________________________  iii) If HA: P1 < P2  Reject H0 if Z< - Z1-α Note: Z1-α/2 , Z1-α , Zα are tabulated values obtained from table D 6. Conclusion: reject or fail to reject H0
  • 130. Text Book : Basic Concepts and Methodology for the Health Sciences 130 Example7.6.1 page 262 Noonan is a genetic condition that can affect the heart growth, blood clotting and mental and physical development. Noonan examined the stature of men and women with Noonan. The study contained 29 Male and 44 female adults. One of the cut-off values used to assess stature was the third percentile of adult height .Eleven of the males fell below the third percentile of adult male height ,while 24 of the female fell below the third percentile of female adult height .Does this study provide sufficient evidence for us to conclude that among subjects with Noonan ,females are more likely than males to fall below the respective of adult height? Let α=0.05 Solution: 1.Data: n M = 29, n F = 44 , x M= 11 , x F= 24, α=0.05 479 . 0 44 29 24 11        F M F M n n x x p 545 . 0 44 24 ˆ , 379 . 0 29 11 ˆ       F F F M m M n x p n x p
  • 131. Text Book : Basic Concepts and Methodology for the Health Sciences 131 2- Assumption : Two populations are independent . 3.Hypotheses:  Case II : H0: PF = PM → PF - PM = 0 HA: PF > PM → PF - PM > 0  4.Test Statistic: 5.Decision Rule: Reject H0 if Z >Z1-α , Where Z1-α = Z1-0.05 =Z0.95= 1.645 6. Conclusion: Fail to reject H0 Since Z =1.39 > Z1-α=1.645 Or , If P-value = 0.0823 → fail to reject H0 → P > α 39 . 1 29 ) 521 . 0 )( 479 . 0 ( 44 ) 521 . 0 )( 479 . 0 ( 0 ) 379 . 0 545 . 0 ( ) 1 ( ) 1 ( ) ( ) ˆ ˆ ( 2 1 2 1 2 1             n p p n p p p p p p Z
  • 132. Text Book : Basic Concepts and Methodology for the Health Sciences 132 Chapter 9 Statistical Inference and The Relationship between two variables Prepared By : Dr. Shuhrat Khan
  • 133. Text Book : Basic Concepts and Methodology for the Health Sciences 133 REGRESSION CORRELATION ANALYSIS OF VARIANCE • Regression, Correlation and Analysis of Covariance are all statistical techniques that use the idea that one variable say, may be related to one or more variables through an equation. Here we consider the relationship of two variables only in a linear form, which is called linear regression and linear correlation; or simple regression and correlation. The relationships between more than two variables, called multiple regression and correlation will be considered later. • Simple regression uses the relationship between the two variables to obtain information about one variable by knowing the values of the other. The equation showing this type of relationship is called simple linear regression equation. The related method of correlation is used to measure how strong the relationship is between the two variables is. 133 EQUATION OF REGRESSION
  • 134. Text Book : Basic Concepts and Methodology for the Health Sciences 134 Line of Regression • Simple Linear Regression: • Suppose that we are interested in a variable Y, but we want to know about its relationship to another variable X or we want to use X to predict (or estimate) the value of Y that might be obtained without actually measuring it, provided the relationship between the two can be expressed by a line.’ X’ is usually called the independent variable and ‘Y’ is called the dependent variable. • • We assume that the values of variable X are either fixed or random. By fixed, we mean that the values are chosen by researcher--- either an experimental unit (patient) is given this value of X (such as the dosage of drug or a unit (patient) is chosen which is known to have this value of X. • By random, we mean that units (patients) are chosen at random from all the possible units,, and both variables X and Y are measured. • We also assume that for each value of x of X, there is a whole range or population of possible Y values and that the mean of the Y population at X = x, denoted by µy/x , is a linear function of x. That is, • • µy/x = α +βx DEPENDENT VARIABLE INDEPENDENT VARIABLE TWO RANDOM VARIABLE OR BIVARIATE RANDOM VARIABLE
  • 135. Text Book : Basic Concepts and Methodology for the Health Sciences 135 ESTIMATION • Estimate α and β. • Predict the value of Y at a given value x of X. • Make tests to draw conclusions about the model and its usefulness. • We estimate the parameters α and β by ‘a’ and ‘b’ respectively by using sample regression line: • Ŷ = a+ bx • Where we calculate • We select a sample of n observations (xi,yi) from the population, WITH the goals
  • 136. Text Book : Basic Concepts and Methodology for the Health Sciences 136 B = ESTIMATION AND CALCULATION OF CONSTANTS , ‘’a’’ AND ‘’b’’
  • 137. Text Book : Basic Concepts and Methodology for the Health Sciences 137 EXAMPLE • investigators at a sports health centre are interested in the relationship between oxygen consumption and exercise time in athletes recovering from injury. Appropriate mechanics for exercising and measuring oxygen consumption are set up, and the results are presented below: – x variable
  • 138. Text Book : Basic Concepts and Methodology for the Health Sciences 138 exercise time (min) 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 y variable oxygen consumption 620 630 800 840 840 870 1010 940 950 1130
  • 139. Text Book : Basic Concepts and Methodology for the Health Sciences 139 calculations • o r
  • 140. Text Book : Basic Concepts and Methodology for the Health Sciences 140 Pearson’s Correlation Coefficient • With the aid of Pearson’s correlation coefficient (r), we can determine the strength and the direction of the relationship between X and Y variables, • both of which have been measured and they must be quantitative. • For example, we might be interested in examining the association between height and weight for the following sample of eight children:
  • 141. Text Book : Basic Concepts and Methodology for the Health Sciences 141 Height and weights of 8 children Child Height(inches)X Weight(pounds)Y A 49 81 B 50 88 C 53 87 D 55 99 E 60 91 F 55 89 G 60 95 H 50 90 Average ( = 54 inches) ( = 90 pounds)
  • 142. Text Book : Basic Concepts and Methodology for the Health Sciences 142 Scatter plot for 8 babies height weight 49 81 50 88 53 83 55 99 60 91 55 89 60 95 50 90 0 20 40 60 80 100 120 0 10 20 30 40 50 60 70 1‫متسلسلة‬
  • 143. Text Book : Basic Concepts and Methodology for the Health Sciences 143 Table : The Strength of a Correlation • • Value of r (positive or negative) Meaning • _______________________________________________________ • • 0.00 to 0.19 A very weak correlation • 0.20 to 0.39 A weak correlation • 0.40 to 0.69 A modest correlation • 0.70 to 0.89 A strong correlation • 0.90 to 1.00 A very strong correlation • _______________________________________________________ _
  • 144. Text Book : Basic Concepts and Methodology for the Health Sciences 144 FORMULA FOR CORRELATION COEFFECIENT ( r ) • With Pearson’s r, • means that we add the products of the deviations to see if the positive products or negative products are more abundant and sizable. Positive products indicate cases in which the variables go in the same direction (that is, both taller or heavier than average or both shorter and lighter than average); • negative products indicate cases in which the variables go in opposite directions (that is, taller but lighter than average or shorter but heavier than average). •
  • 145. Text Book : Basic Concepts and Methodology for the Health Sciences 145 Computational Formula for Pearsons’s Correlation Coefficient r • Where SP (sum of the product), SSx (Sum of the squares for x) and SSy (sum of the squares for y) can be computed as follows:
  • 146. Text Book : Basic Concepts and Methodology for the Health Sciences 146 Child X Y X2 Y2 XY A 12 12 144 144 144 B 10 8 100 64 80 C 6 12 36 144 72 D 16 11 256 121 176 E 8 10 64 100 80 F 9 8 81 64 72 G 12 16 144 256 192 H 11 15 121 225 165 ∑ 84 92 946 1118 981
  • 147. Text Book : Basic Concepts and Methodology for the Health Sciences 147 Table 2 : Chest circumference and Birth Weight of 10 babies • X(cm) y(kg) x2 y2 xy • ___________________________________________________ • 22.4 2.00 501.76 4.00 44.8 • 27.5 2.25 756.25 5.06 61.88 • 28.5 2.10 812.25 4.41 59.85 • 28.5 2.35 812.25 5.52 66.98 • 29.4 2.45 864.36 6.00 72.03 • 29.4 2.50 864.36 6.25 73.5 • 30.5 2.80 930.25 7.84 85.4 • 32.0 2.80 1024.0 7.84 89.6 • 31.4 2.55 985.96 6.50 80.07 • 32.5 3.00 1056.25 9.00 97.5 • TOTAL • 292.1 24.8 8607.69 62.42 731.61
  • 148. Text Book : Basic Concepts and Methodology for the Health Sciences 148 Checking for significance • There appears to be a strong between chest circumference and birth weight in babies. • We need to check that such a correlation is unlikely to have arisen by in a sample of ten babies. • Tables are available that gives the significant values of this correlation ratio at two probability levels. • First we need to work out degrees of freedom. They are the number of pair of observations less two, that is (n – 2)= 8. • Looking at the table we find that our calculated value of 0.86 exceeds the tabulated value at 8 df of 0.765 at p= 0.01. Our correlation is therefore statistically highly significant.