Biostatics part 7.pdf

Inferential Statistics
1
1/30/2024

• Inferential statistics are the statistical methods used to
draw conclusions from a sample and make inferences to
the entire population.
• The two primary methods for making inference are
estimation and hypothesis testing.
Infer. cont.…
2
1/30/2024

• Estimation is the process of determining a likely value for
a variable in the survey collected population, based on
information collected from the sample.
• Estimation is the use of sample statistics to estimate
population parameters.
• The true population parameter value is usually
unknown.
3
cont.…
1/30/2024

• Researchers are usually interested in looking totals at
estimates of many statistic.
• For example, a sample survey could be used to produce
any of the following statistics:
✓Estimates for the proportion of smokers among all
people aged 15 to 24 in the population;
✓The mean level of a certain enzyme among healthy
men.
4
cont.…
1/30/2024

Statistical estimation
• Point estimate is always within the interval estimate
Point Estimate
✓ sample mean
✓ sample proportion
Interval Estimate
✓ confidence interval for
mean
✓ confidence interval for
proportion
Estimate
5
1/30/2024

Point Estimate
• Single numerical value used to estimate the
corresponding population parameter.
• A single value quoted as an estimate of a population
parameter is of little use unless it is accompanied by
some indication of its precision.
• The following slides describe various ways of
enhancing the value of point estimates.
6
1/30/2024

Parameter Statistic
Mean
Mean Difference
Variance
Proportion
Proportion Difference
Correlation Coefficient
OR
RR
cont.…
7
1/30/2024

•Desirable properties of estimators include:
✓Unbiasedness
✓Efficiency
✓Consistency
✓Sufficiency
Properties of Estimators
8
1/30/2024

Interval Estimation
• Usually, we only have a sample and don’t know the entire
population.
• Example: Point estimate of 0.30 for population proportion
• It is not reasonable to assume that the population
proportion is exactly 0.30.
• The probability of getting a sample statistic value that is
exactly equal to the corresponding population parameter
is usually quite small.
9
1/30/2024

• It may be reasonable to assume that 0.30 is close to the
population proportion
• We use a point estimate to obtain an interval estimate
• Ideally, we would like to completely certain that the
population parameter of interest is included in our
interval estimate. This is too much to ask for.
• We have to settle for something less than certainty,
namely, confidence levels.
cont.…
10
1/30/2024

• While we construct CI we need to consider the following.
• Takes into consideration variation in sample statistics from
sample to sample
• Provides Range of Values
✓Based on Observations from 1 Sample
• Gives Information about Closeness to Unknown
Population Parameter
• Stated in terms of Probability
✓Never 100% Sure
cont.…
11
1/30/2024

• Two questions to put bounds on our point estimates to reflect
our level of confidence
✓How wide does the bracket have to be?
✓What is our tolerance of error(variability, not mistake)?
• Scientists usually accept a 5% chance that the range will not
include the true that the range will not include the true
population value
✓The range or interval is called 95% confidence interval
cont.…
12
1/30/2024

Parameter =
Statistic ±Its
Error
cont.…
13
1/30/2024

• A Probability That the Population Parameter Falls
Somewhere Within the Interval is LC.
Confidence Interval
Sample Statistic
Confidence Limit
(Lower)
Confidence Limit
(Upper)
cont.…
14
1/30/2024

Factors Affecting Interval Width
16
1/30/2024

1. C.I. For a Population Mean (Normally
Distributed)
a) Known variance(large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values
of α are 0.05, 0.01, 0.001 and 0.1
17
1/30/2024

cont.…
• 100(1-α)% CI for μ when σ is known (sampling from normal
population or large sample)
• Interpretation:
a. Probabilistic: in repeated sampling, 100(1-α)% of all intervals will
include μ
b. Practical: we are 100(1-α)% confident that a single interval contains μ
Estimator
Precision of the estimate
(margin of error)
Reliability Coefficient
Standard Error
18
1/30/2024

Example
• A data on 199 patients on systolic blood pressure gives a
mean value of 125.8 mmHg. Let us assume that the
standard deviation for this patient population is known to
be 20 mmHg.
✓Construct a 95 percent confidence interval for the
population mean.
cont.…
19
1/30/2024

• Solution
• The 95% CI is (123.0, 128.6 mmHg )
• We are 95% sure that the average systolic blood pressure
for similar patients is between 123 and 128.6
cont.…
20
1/30/2024

t
X
s
n
=
− 
b) When σ Is Unknown - The t-Distribution
• If the population standard deviation, σ, is not known,
replace σ with the sample standard deviation, s. If the
population is normal, the resulting statistic:
has a t distribution with (n - 1) degrees of freedom
21
1/30/2024

• The t is a family of bell-shaped and symmetric distributions,
one for each number of degree of freedom.
• The expected value of t is 0.
• For df > 2, the variance of t is df/(df-2). This is greater than 1,
but approaches 1 as the number of degrees of freedom
increases. The t is flatter and has fatter tails than does the
standard normal.
• The t distribution approaches a standard normal as the number
of degrees of freedom increases
Cont…
22
1/30/2024

A (1-α)100% confidence interval for µ when σ is not
known (assuming a normally distributed population):
where is the value of the t distribution with n-1 degrees
of freedom that cuts off a tail area of to its right.
2

t

2
n
s
t
x
2


Cont…
24
1/30/2024

df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.326 2.576

0
0 .4
0 .3
0 .2
0 .1
0 .0
t
f(t)
t D istrib utio n: d f=1 0
Area = 0.10
}
Area = 0.10
}
Area = 0.025
}
Area = 0.025
}
1.372
-1.372
2.228
-2.228
Whenever σ is not known (and the
population is assumed normal), the
correct distribution to use is the t
distribution with n-1 degrees of
freedom. Note, however, that for
large degrees of freedom, the t
distribution is approximated well by
the Z distribution.
The t Distribution
25
1/30/2024

• Example
• In a study of preeclampsia, Kaminski and Rechberger found
the mean systolic blood pressure of 10 healthy, non-pregnant
women to be 119 with a standard deviation of 2.1.
• What is the estimated standard error of the mean?
• Construct the 99% confidence interval for the mean of the
population from which the 10 subjects may be presumed to be a
random sample.
• What is the precision of the estimate?
• What assumptions are necessary for the validity of the confidence
interval you constructed?
27
1/30/2024

Solution
A. 𝑆𝐸 =
𝑠
𝑛
=
2.1
10
= 0.66
B. Df = n-1 =10-1 = 9
99% α = 1% , α/2 =0.005
𝑡(𝑛−1)(α/2)= 3.250
ҧ
𝑥+𝑡(𝑛−1)(α/2) ∗
𝑠
𝑛
= 119 +3.25*0.66
The 99% CI is (116.8 --- 121.2)
C. Precision = 3.25*0.66 = 2.16
28
1/30/2024

Cont…
D. i. A population should be normally distributed
ii. The sample of 10 subjects should represent
random sample from this population.
29
1/30/2024

Example
• Suppose a researcher , interested in obtaining an estimate of the
average level of some enzyme in a certain human population, takes
a sample of 10 individuals, determines the level of the enzyme in
each, and computes a sample mean of approximately = 22
• Suppose further it is known that the variable of interest is
approximately normally distributed with a variance of 45. We
wish to estimate . (=0.05)
• Compute the 95% CI for 
30
1/30/2024

Solution
• 1-  = 0.95 →  = 0.05 → /2 = 0.025,
• variance = σ2 = 45 → σ =  45, n = 10
• 95% confidence interval for  is given by:
P( ± Z (1- /2) /n ) = 1- 
• Z (1- /2) = Z 0.975 = 1.96 (refer to z table)
• Z 0.975(/n) = 1.96 ( 45 / 10) = 4.1578
• 22 ± 1.96 4.1578 → (22- 4.1578, 22+4.1578)
→(17.84, 26.16)
• We are 95% confident that the population mean level of
enzyme is between 17.84 and 26.16
22
=
x
x
31
1/30/2024

Example
• The activity values of a certain enzyme measured in normal gastric
tissue of 35 patients with gastric carcinoma has a mean of 0.718 and
a standard deviation of 0.511.
• We want to construct a 90% confidence interval for the population
mean enzyme activity.
Solution
✓Note that the population is not normal,
✓ n = 35 (n > 30) n is large and  is unknown , s = 0.511
✓1-  = 0.90 →  = 0.1
✓→ /2 = 0.05 → 1-/2 = 0.95,
32
1/30/2024

• Solution
• Then 90% confident interval for  is given by :
• P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1- 
• Z (1- /2) = Z0.95 = 1.645 (refer to table)
• Z 0.95(s/n) = 1.645 (0.511/ 35)= 0.1421
0.718 ± 0.1421 → (0.718-0.1421, 0.718+0.1421) →
✓ (0.576,0.860).
✓We are 90% confident that population mean enzyme
activity is between 0.576 and 0.860
x
x
33
1/30/2024

Example
• Suppose a researcher , studied the effectiveness of early
weight bearing and ankle therapies following acute repair
of a ruptured Achilles tendon.
• One of the variables they measured following treatment
was the muscle strength.
• In 19 subjects, the mean of the strength was 250.8 with
standard deviation of 130.9
• we assume that the sample was taken from is
approximately normally distributed population.
• Calculate 95% confident interval for the mean of the
strength ?
34
1/30/2024

Solution
• 1- =0.95→ =0.05→ /2=0.025,
• Standard deviation= S = 130.9 ,n=19
95%confidence interval for  is given by:
➢P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1- 
• t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table)
• t 0.975,18(s/n) =2.1009 (130.9 / 19)= 63.1
• 250.8 ± 63.1→ (250.8- 63.1 , 22+63.1) →
• (187.7, 313.9)
We are 95% certain that the population mean of strength
is between 187.7 and 313.9
8
.
250
=
x
x x
35
1/30/2024

Confidence interval for population
proportion (assuming n large)
• Assumption
• two category outcome
• Population follows binomial distribution
• Normal approximation can be used if
• nP > 5 and n(1-P) > 5
• The 95% CI for P is given by:
• [P( Ƹ
𝑝 - Z (1- /2)
𝑝𝑞
𝑛
<  < Ƹ
𝑝 + Z (1- /2)
𝑝𝑞
𝑛
)]
36
1/30/2024

• Example
• A research study obtained data regarding sexual
behavior from a sample of unmarried men and women
between the age of 20 and 44 residing in geographic
areas characterized by high rate of sexually transmitted
diseases admission to drug programs. Fifty percent of
1229 respondents reported that they never used condom.
Construct a 95% CI for population proportion never
using condom.
37
1/30/2024

• Solution
• n = 1229
• Ƹ
𝑝 = 0.5 a point estimator of population proportion
• [P( Ƹ
𝑝 - Z (1- /2)
𝑝𝑞
𝑛
<  < Ƹ
𝑝 + Z (1- /2)
𝑝𝑞
𝑛
)]
• α = 0.05, Z (1- /2) = Z (0.975) =1.96
• [P(0.5- 1.96)
0.5∗0.5
1229
<  < 0.5+ 1.96)
0.5∗0.5
1229
)]
• The 95% CI for population proportion is (0.47, 0.53)
38
1/30/2024

Using sample statistics to Test Hypotheses
about population parameters
HYPOTHESES TESTING
39
1/30/2024

Cont…
• Data are often collected to answer specified
questions, such as:
✓Do children under five from Urban have a lower
prevalence of malnutrition compared with Rural
children?
✓Is a new treatment beneficial to those suffering
from a certain disease compared with the
standard treatment
40
1/30/2024

• Such questions may be answered by setting up a
hypothesis and then using the data to test this
hypothesis.
• It is generally agreed that some caution should be
exercised before claiming that some effect, such as a
reduction in malnutrition or an improved cure rate, has
been established.
• The way to proceed is to set up a null hypothesis, that
there is no effect. 41
Cont…
1/30/2024

Definition
• Hypothesis: is a statement about one or more populations. It
is usually concerned with the parameters of the population
• Statistical hypotheses: are hypotheses that are stated in such
a way that they may be evaluated by appropriate statistical
techniques
• e.g. the hospital administrator may want to test the hypothesis
that the average length of stay of patients admitted to the
hospital is 5 days
42
1/30/2024

Statistical hypotheses
• There are two hypotheses involved in hypothesis testing
✓Null hypothesis H0: It is the hypothesis to be tested.
Also called hypotheses of no difference
✓Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to
reject the null hypothesis
43
1/30/2024

Steps in Testing Hypothesis
1. Data
2. Assumptions
3. Hypotheses
4. Test statistic
5. Select the level of significance (α):
6. Determine Critical value (ztab, ttab):
7. Calculation of test statistic (zcalc, tcalc):
8. Statistical decision
9. conclusion
10. P-values
44
1/30/2024

1. Data: understand the nature of data (e.g. counts or
measurements or proportions)
2. Assumptions: about normality of population
distribution, equality of variance, independence of
samples
3. Hypotheses: the H0 and HA should be explicitly
stated
45
Cont…
1/30/2024

Step 3. Hypotheses cont’d
• Rules for stating statistical hypotheses
a) What you hope to be able to conclude as a result of the test
usually should be placed in the alternative hypothesis.
b) The null hypothesis should contain a statement of equality,
either =,≥, or ≤.
c) The null hypothesis is the hypothesis that is tested.
d) The null and alternative hypotheses are complementary.
• That is, the two together exhaust all possibilities
regarding the value that the hypothesized parameter can
assume.
46
Cont…
1/30/2024

Step 3. Hypotheses cont’d
Examples: suppose that we want to answer the question;
i. Can we conclude that a certain population mean is not 50?
Our hypotheses are H0 : μ=50 HA : μ 50
ii. Can we conclude that the population mean is greater than
50? Our hypotheses are H0: μ ≤ 50 HA: μ >50
iii. Can we conclude that the population mean is less than 50?
Our hypotheses are H0: μ ≥ 50 HA: μ<50

47
Cont…
1/30/2024

4. Test statistic: It is a value computed from the sample data that is
used in making the decision about the rejection of the null hypothesis
• Decide on the appropriate test statistic for the hypothesis (z, t, etc.)
Based on the
✓sample size (n<30 or n>30),
✓ type of data (count i.e. qualitative or measurement or
quantitative),
✓functional form of the distribution (normal or non normal),
✓known or unknown population variance,
✓number of means or proportions, etc.
48
Cont…
1/30/2024

General formula for test statistic
test statistic = observed statistic − hypothesized parameter
standard error of the observed statistic
5. Select the level of significance (α): (α =0.05, 0.01,
0.001, etc…). If not given take 0.05
• The level of significance (α) is the probability of
rejecting a true null hypothesis.
49
Cont…
1/30/2024

Level of Significance, α
• Is the probability of rejecting a true Ho
• Defines unlikely values of sample statistic if Ho is true
✓Defines rejection region of the sampling distribution
• The decision is made on the basis of the level of
significance, designated by α.
• More frequently used values of α are 0.01, 0.05 and 0.10.
• α is selected by the researcher
50
Cont…
1/30/2024

One tail and two tail tests
• In a one tail test, the rejection region is at one end
of the distribution or the other.
• In a two tail test, the rejection region is split
between the two tails.
• Which one is used depends on the way the HA is
stated.
51
Cont…
1/30/2024

Level of Significance and the Rejection Region
Example:
• The average survival year after cancer diagnosis is less
than 3 years.
52
Cont…
1/30/2024

6. Determine Critical value (ztab, ttab):
• It is the value the test statistic must attain to be declared
significant (i.e. label the rejection & "acceptance"
regions)
7. Calculation of test statistic (zcalc, tcalc):
• calculate the test statistic based on step 4 and compare
it with the critical value
54
Cont…
1/30/2024

8. Statistical decision: statistical decision consists of rejecting
or not rejecting the null hypothesis.
• It is rejected if the computed value of the test statistic falls in
the rejection area.
✓i.e. Reject Ho if, Zcal > Ztab OR tcal>ttab
• It is not rejected if the computed value of the test statistic
falls in the non-rejection area.
✓i.e. Accept or don't reject Ho if, Zcal < Ztab OR tcal< t tab
55
Cont…
1/30/2024

Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we commit
errors.
• Two types of errors are committed.
• Type I Error
• Type II Error
56
1/30/2024

Type I Error
• The error committed when
a true Ho is rejected
• The probability of type I
error is α
• Called level of significance
of the test
Type II Error
• The error committed
when a false Ho is not
rejected
• The probability of Type II
Error is 
57
Cont…
1/30/2024

Action
(Conclusion)
Reality
Ho True Ho False
Do not
reject Ho
Correct action Type II error (β)
Reject Ho Type I error (α) Correct action
58
Cont…
1/30/2024

9. Conclusion:
✓if Ho is rejected, we conclude that HA is true.
✓If Ho is not rejected, we conclude that Ho may be true.
10. P-values:
• The p-value is the probability of getting a value for the test
statistic as large or larger than the observed value of the test
statistic just by random chance.
✓Reject the null hypothesis if P≤α
✓Don't reject ("accept") the null hypothesis if P>α
59
Cont…
1/30/2024

TESTING A HYPOTHESIS ABOUT
THE MEAN OF A POPULATION:
60
1/30/2024

Testing a hypothesis about the mean of a population:
1.Data: determine variable, sample size (n), sample mean( ) ,
population standard deviation or sample standard deviation
(s) if it is unknown
2. Assumptions: We have two cases:
• Case1: Population is normally or approximately normally
distributed with known or unknown variance (sample size n
may be small or large),
• Case 2: Population is not normal with known or unknown
variance (n is large i.e. n≥30).
x
61
1/30/2024

3. Hypotheses: we have three cases
✓Case I: H0: μ=μ0
HA: μ μ0
• e.g. we want to test that the population mean is different
from 50
✓Case II: H0: μ ≤ μ0
HA: μ > μ0
• e.g. we want to test that the population mean is greater
than 50
✓Case III: H0: μ ≥ μ0
HA: μ < μ0
• e.g. we want to test that the population mean is less than
50


62
Cont…
1/30/2024

4.Test Statistic:
• Case 1: population is normal or approximately normal
σ2
is known σ2
is unknown
( n large or small)
n large n small
• Case2: If population is not normally distributed and n is
large
i)If σ2
is known ii) If σ2
is unknown
n
X
Z

o
-
=
n
s
X
Z o
- 
=
n
s
X
T o
- 
=
n
s
X
Z o
- 
=
n
X
Z

o
-
=
63
Cont…
1/30/2024

5.Decision Rule:
i) If HA: μ μ0
✓Reject H0 if Z > Z1-α/2 OR
Z < - Z1-α/2
✓Reject H0 if T > t1-α/2,n-1
OR T < -t1-α/2,n-1
ii) If HA: μ > μ0
✓Reject H0 if Z > Z1-α
Or
✓Reject H0 if T > t1-α,n-1
iii) If HA: μ < μ0
✓Reject H0 if Z < - Z1-α
Or
✓Reject H0 if T < - t1-α,n-1

64
Cont…
1/30/2024

Note:
✓ Z1-α/2 , Z1-α , Zα are tabulated values obtained from Z
table
✓ t1-α/2 , t1-α , tα are tabulated values obtained from t table
with (n-1) degree of freedom (df)
65
Cont…
1/30/2024

6. Decision :
• If we reject H0, we can conclude that HA is true.
• If ,however ,we do not reject H0, we may conclude
that H0 is may be true.
66
Cont…
1/30/2024

An Alternative Decision Rule using the p - value
• The p-value is defined as the smallest value of α for
which the null hypothesis can be rejected.
• If the p-value is less than or equal to α ,we reject the null
hypothesis (p ≤ α if one tailed test or p ≤ α/2, if two
tailed test )
• If the p-value is greater than α ,we do not reject the null
hypothesis (p > α if one tailed test or p > α/2, if two
tailed test )
67
Cont…
1/30/2024

Example
• Researchers are interested in the mean age of a certain population.
• A random sample of 10 individuals drawn from the population of
interest has a mean of 27.
• Assuming that the population is approximately normally
distributed with variance 20,
• Can we conclude that the population mean is different from 30
years? (α=0.05) .
68
1/30/2024

Solution
1-Data: variable is age, n = 10, = 27, σ2 = 20, α = 0.05
2-Assumptions: the population is approximately normally
distributed with variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
x

69
Cont…
1/30/2024

4- Distribution of Test Statistic:
5. Level of significance α = 0.05
6. Decision Rule
• The alternative hypothesis is HA: μ 30
✓reject H0 if Zcal > Ztab or Zcal < - Ztab
✓Generally when HA: μ μ0
Reject H0 if │Zcal│> Ztab
n
X
Z

o
-
=


70
Cont…
1/30/2024

6. Critical value
• Since the HA is two sided we divide α by 2
• Ztab = Z1-α/2 = Z1-0.05/2 = Z0.975 = 1.96 in right tail and -
1.96 in left tail
71
Cont…
1/30/2024

7. Calculation of test statistic
• Zcal = 27-30/(√20/√10) = -2.12
8. Statistical Decision:
• We reject H0 ,since -2.12 is in the rejection region
i.e. │-2.12│> 1.96
9. Conclusion
• We can conclude that the mean age (μ) is different from 30
years
10. p-value: p =0.0174 < 0.025, i.e. p ≤ α/2 therefore we reject H0
72
Cont…
1/30/2024

Example
• Among 157 African-American men ,the mean systolic
blood pressure was 146 mm Hg with a standard
deviation of 27.
• We wish to know if on the basis of these data,
• we may conclude that the mean systolic blood pressure
for a population of African-American is greater than
140. Use α=0.01.
73
1/30/2024

Solution
1. Data: Variable is systolic blood pressure, n=157, = 146,
s = 27, α = 0.01.
2. Assumption: population is not normal, σ2 is unknown, n>30
3. Hypotheses: H0 : μ ≤ 140
HA : μ > 140
4. Test Statistic: = =
• Zcal = 2.78
n
s
X
Z o
- 
=
157
27
140
146 −
1548
.
2
6
x
74
Cont…
1/30/2024

5. Decision Rule:
✓we reject H0 if Zcal>Z1-α
✓Ztab = Z0.99= 2.33
(from z table)
6. Decision: We reject H0.
• Hence we may conclude that the mean systolic blood
pressure for a population of African-American is
greater than 140 mm Hg.
75
Cont…
1/30/2024

Example
• A simple random sample of 17 patients with muscle
injury were treated at a research center.
• The variable of interest was number of days between
injury and recovery. The number of days until recovery
was normally distributed in the population.
• Can we conclude that the mean number of days is not 15
days in the population represented by the sample data?
(See the data below)
76
1/30/2024

Table: number of days until recovery for subjects with muscle injury
Subject Days Subject Days
1 14 11 28
2 9 12 24
3 18 13 24
4 26 14 2
5 12 15 3
6 0 16 14
7 10 17 9
8 4
9 8
10 21 77
1/30/2024

Solution
1. Data: number of days n = 17, =13.294, S = 8.886,
(calculate from the data)
2. Assumptions: n < 30, Simple random sampling, normally
distributed, unknown population variance
3. Hypotheses:
H0 : μ = 15
HA: μ ≠ 15
4. Test statistic: our test statistics is distributed as students t
with 17-1=16 df . And given by
x
n
s
X
T o
- 
=
78
1/30/2024

5. Level of significance:
• let α=0.05 since we have a two-tailed test we put α/2 =0.025 in
each tail of the distribution
• Decision rule: Reject H0 if │tcal│> tn-1,1- α/2
6. Critical value:
ttab = tn-1,1- α/2
= t16,0.975
= 2.1199 to right and
-2.1199 to left
79
Cont…
1/30/2024

7. Calculation of test statistic:
ttab =
𝑋−𝑢0
𝑠√𝑛
=
13.2941−15
8.886√17
=
−1.7059
2.1553
= -0.791
8.Statistical decision:
✓don’t reject H0, since │-0.791│< 2.1199
9. Conclusion: based on this data the mean of population from
which the sample came may be 15.
10. P-value: P(t ≤ -0.791) and P( t ≥ 0.791) > 0.1 which is
greater than α, so don’t reject H0.
80
Cont…
1/30/2024

Hypothesis Testing:
A single population proportion
81
1/30/2024

A single population proportion:
• Testing hypothesis about population proportion (P) have
the following steps:
1. Data: sample size (n), sample proportion( ),
hypothesized population proportion (P0)
2. Assumptions :normal distribution ,
p̂
n
a
p =
=
sample
in the
element
of
no.
Total
istic
charachtar
some
with
sample
in the
element
of
no.
ˆ
82
1/30/2024

• Case I: H0: P = P0
HA: P ≠ P0
• Case II: H0: P ≤ P0
HA: P > P0
• Case III: H0: P ≥ P0
HA: P < P0
4. Test Statistic:
✓Where H0 is true, is distributed approximately as the standard
normal
n
q
p
p
p
Z
0
0
0
ˆ −
=
83
Cont…
1/30/2024

5. Decision Rule:
i) If HA: P ≠ P0
✓Reject H 0 if Z > Zα/2 or Z < - Zα/2
ii) If HA: P > P0
✓Reject H0 if Z > Zα
iii) If HA: P < P0
✓Reject H0 if Z < - Zα
Note: Zα/2 , Zα , Zα are tabulated values obtained from table
6. Conclusion: reject or fail to reject H0
84
Cont…
1/30/2024

Example
• A study on 301 Hispanic women in San Antonio, Texas
investigated percentage of subjects with impaired fasting
glucose (IFG). In the study, 24 women were classified in
the IFG stage. The population estimates for IFG among
Hispanic women in Texas as 6.3%. Is there sufficient
evidence to indicate that the population of Hispanic
women in San Antonio has a prevalence of IFG higher
than 6.3%.
85
1/30/2024

Solution
1. Data: n = 301, p0 = 6.3/100 = 0.063 , a =24,
✓q0 = 1- p0 = 1- 0.063 = 0.937, α = 0.05
2. Assumptions : is approximately normally distributed
H0: P ≤ 0.063
HA: P > 0.063
p̂
08
.
0
301
24
ˆ =
=
=
n
a
p
86
1/30/2024

4. Test Statistic
5. Decision Rule: α=0.05
✓Reject H0 if Z > Z1-α
✓Where Z1-α = Z1-0.05 = Z0.95 = 1.645
6. Conclusion: Fail to reject H0 Since
✓Z =1.21 > Z1-α= 1.645 Or , P-value = 0.1131,
✓ Fail to reject H0 → P > α
21
.
1
301
)
0.937
(
063
.
0
063
.
0
08
.
0
ˆ
0
0
0
=
−
=
−
=
n
q
p
p
p
Z
87
Cont…
1/30/2024

Sample Size For Crossectional Study
88
1/30/2024

• An essential part of planning any study is to decide how
many people need to be studied ?
Sample Size
• The number of study subjects selected to represent a
given study population.
• Important to make inferences based on the findings from
the sample.
• Should be sufficient to represent the characteristics of
interest of the study population.
89
1/30/2024

• In estimating a certain characteristic of a population,
sample size calculations are important to ensure that
estimates are obtained with required precision or
confidence.
• The accuracy of the envisaged results determine the
size of the sample.
90
Cont…
1/30/2024

• Sample size determination depends on the:
– objective of the study;
– design of the study;
– plan for statistical analysis;
– accuracy of the measurements to be made;
–degree of precision required for generalization;
– degree of confidence with which to conclude.
91
Cont…
1/30/2024

• Common questions:
– “How many subjects should I study?”
– Too small sample = Waste of time and
resources = Results have no practical use
– Too large sample = Waste of resources = Data
quality compromised
92
Cont…
1/30/2024

• The feasible sample size is also determined by
the availability of resources:
– Human resource
– Time
– Transport
– Available facility, and
– Money
93
Cont…
1/30/2024

Sample Size: Single Sample
• The aim is to have a large enough sample with
which to estimate a population mean or
proportion within a narrow interval with high
reliability.
• Concerned with the precision of the estimate
(“narrowness of the CI”).
estimate ± d units
95
1/30/2024

Sample Size For Single Sample Includes:
A. Sample size for estimating a single population
mean.
B. Sample size to estimate a single population
proportion.
• The minimum sample size required, for a very large
population (N10,000)
96
1/30/2024

Sample size for Estimating single population mean
• Suppose we want to estimate the average daily
caloric intake of people in a community. The daily
caloric intake is assumed to have a normal
distribution with mean µ and standard deviation (σ).
•The sample measure used to estimate µ is the sample
mean. The sampling distribution of the sample mean is
also normal, with the same mean, µ and standard
deviation, σ ⁄√n (the standard error of the mean).
97
1/30/2024

Sample size for estimating a single population mean
• AIM: Estimate µ
• WANT: Estimate ( ) ± d units
where d = Margin of error =
= Absolute precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use known σ2 or estimate using s2
98
1/30/2024

3.
99
Where d = e in some text books
Standard error of the
estimator of the parameter
of interest
1/30/2024

Example:
1. Find the minimum sample size needed to estimate the
drop in heart rate (µ) for a new study using a higher
dose of propranolol than the standard one. We require
that the two-sided 95% CI for µ be no wider than 5
beats per minute and the sample sd for change in heart
rate equals 10 beats per minute.
n = (1.96)
2
10
2
/(2.5)
2
= 62 patients
100
1/30/2024

101
2. Suppose that for a certain group of cancer patients,
we are interested in estimating the mean age at
diagnosis. We would like a 95% CI of 5 years wide.
If the population SD is 12 years, how large should
our sample be?
1/30/2024

102
• Suppose d = 1
• Then the sample size increases
Cont…
1/30/2024

3. A hospital director wishes to estimate the
mean weight of babies born in the hospital.
How large a sample of birth records should be
taken if she/he wants a 95% CI of 0.5 wide?
Assume that a reasonable estimate of  is 2.
Ans: 246 birth records.
103
1/30/2024

But the population 2 is most of the time unknown
As a result, it has to be estimated from:
• Pilot or preliminary study or survey:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous or similar studies
104
1/30/2024

Sample size to estimate a single population proportion
• Aim: Estimate p
• Want: Estimate ± d units where d = Z•SE
(95% CI of width=2d)
Steps:
1. Specify d (or w = 2d)
2. Use estimated p (use p=0.5 if no
information)
105
1/30/2024

107
1. Suppose that you are interested to know the proportion
of infants who breastfed >18 months of age in a rural
area. Suppose that in a similar area, the proportion (p)
of breastfed infants was found to be 0.20. What
sample size is required to estimate the true proportion
within ±3% points with 95% confidence. Let p=0.20,
d=0.03, α=5%
Example
1/30/2024

108
• Suppose there is no prior information about the
proportion (p) who breastfeed
• Assume p = q = 0.5 (most conservative)
• Then the required sample size increases
1/30/2024

109
• An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1 → n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2 → n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3 → n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5 → n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7 → n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8 → n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
1/30/2024

➢For a fixed absolute precision (d), the required
sample size increases as P increases form 0 to
0.5, and then decreases in the same way as the
prevalence approaches 1.
110
1/30/2024

2. A survey is planned to determine what proportion
of the medical students have regularly chewed khat.
If no estimate of p is available and a pilot sample
cannot be drawn, what sample size would be
required if a 95% confidence is desired, and d=0.04
is to be used.
Ans: 600 students
111
1/30/2024

3. Suppose an estimate is desired of the average retail
price of twenty tablets of a commonly used
tranquilizer. A random sample of retail
pharmacies is to be selected. The estimate is
required to be within 10 cents of the true average
price with 95% confidence. Based on a small pilot
study, the standard deviation in price, σ, can be
estimated as 85 cents. How many pharmacies
should be randomly selected?
 Solution
 Using the above formula, it follows that
 n = [(1.960)2(0.85)2]/{0.1 0) 2 = 277.56.
 As a result, a sample of 278 pharmacies should be
taken
112
1/30/2024

Points for Consideration
1. Sample size estimates might need to be adjusted to
compensate for non-response rate, patient dropout or loss
to follow-up, lack of compliance, etc.
2. If sampling is from a finite population of size N (<10,000),
then:
where n0 is the sample from an infinite population. When N
is large in comparison to n, (i.e., n/N ≤ 0.05), the finite
population correction may be ignored.
3. Design effect for complex cluster sampling. Common values:
multiply n by 2, 3, …5.
n =
n
1 +
n
N
0
0






113
1/30/2024

Biostatics part 7.pdf

More Related Content

Similar to Biostatics part 7.pdf (20)

More from NatiphBasha (11)

Recently uploaded (20)

Biostatics part 7.pdf