SlideShare a Scribd company logo
Inferential Statistics
1
1/30/2024
• Inferential statistics are the statistical methods used to
draw conclusions from a sample and make inferences to
the entire population.
• The two primary methods for making inference are
estimation and hypothesis testing.
Infer. cont.…
2
1/30/2024
• Estimation is the process of determining a likely value for
a variable in the survey collected population, based on
information collected from the sample.
• Estimation is the use of sample statistics to estimate
population parameters.
• The true population parameter value is usually
unknown.
3
cont.…
1/30/2024
• Researchers are usually interested in looking totals at
estimates of many statistic.
• For example, a sample survey could be used to produce
any of the following statistics:
✓Estimates for the proportion of smokers among all
people aged 15 to 24 in the population;
✓The mean level of a certain enzyme among healthy
men.
4
cont.…
1/30/2024
Statistical estimation
• Point estimate is always within the interval estimate
Point Estimate
✓ sample mean
✓ sample proportion
Interval Estimate
✓ confidence interval for
mean
✓ confidence interval for
proportion
Estimate
5
1/30/2024
Point Estimate
• Single numerical value used to estimate the
corresponding population parameter.
• A single value quoted as an estimate of a population
parameter is of little use unless it is accompanied by
some indication of its precision.
• The following slides describe various ways of
enhancing the value of point estimates.
6
1/30/2024
Parameter Statistic
Mean
Mean Difference
Variance
Proportion
Proportion Difference
Correlation Coefficient
OR
RR
cont.…
7
1/30/2024
•Desirable properties of estimators include:
✓Unbiasedness
✓Efficiency
✓Consistency
✓Sufficiency
Properties of Estimators
8
1/30/2024
Interval Estimation
• Usually, we only have a sample and don’t know the entire
population.
• Example: Point estimate of 0.30 for population proportion
• It is not reasonable to assume that the population
proportion is exactly 0.30.
• The probability of getting a sample statistic value that is
exactly equal to the corresponding population parameter
is usually quite small.
9
1/30/2024
• It may be reasonable to assume that 0.30 is close to the
population proportion
• We use a point estimate to obtain an interval estimate
• Ideally, we would like to completely certain that the
population parameter of interest is included in our
interval estimate. This is too much to ask for.
• We have to settle for something less than certainty,
namely, confidence levels.
cont.…
10
1/30/2024
• While we construct CI we need to consider the following.
• Takes into consideration variation in sample statistics from
sample to sample
• Provides Range of Values
✓Based on Observations from 1 Sample
• Gives Information about Closeness to Unknown
Population Parameter
• Stated in terms of Probability
✓Never 100% Sure
cont.…
11
1/30/2024
• Two questions to put bounds on our point estimates to reflect
our level of confidence
✓How wide does the bracket have to be?
✓What is our tolerance of error(variability, not mistake)?
• Scientists usually accept a 5% chance that the range will not
include the true that the range will not include the true
population value
✓The range or interval is called 95% confidence interval
cont.…
12
1/30/2024
Parameter =
Statistic ±Its
Error
cont.…
13
1/30/2024
• A Probability That the Population Parameter Falls
Somewhere Within the Interval is LC.
Confidence Interval
Sample Statistic
Confidence Limit
(Lower)
Confidence Limit
(Upper)
cont.…
14
1/30/2024
cont.…
15
1/30/2024
Factors Affecting Interval Width
16
1/30/2024
1. C.I. For a Population Mean (Normally
Distributed)
a) Known variance(large sample size)
• A 100(1‐α)% C.I. for μ is
• α is to be chosen by the researcher, most common values
of α are 0.05, 0.01, 0.001 and 0.1
17
1/30/2024
cont.…
• 100(1-α)% CI for μ when σ is known (sampling from normal
population or large sample)
• Interpretation:
a. Probabilistic: in repeated sampling, 100(1-α)% of all intervals will
include μ
b. Practical: we are 100(1-α)% confident that a single interval contains μ
Estimator
Precision of the estimate
(margin of error)
Reliability Coefficient
Standard Error
18
1/30/2024
Example
• A data on 199 patients on systolic blood pressure gives a
mean value of 125.8 mmHg. Let us assume that the
standard deviation for this patient population is known to
be 20 mmHg.
✓Construct a 95 percent confidence interval for the
population mean.
cont.…
19
1/30/2024
• Solution
• The 95% CI is (123.0, 128.6 mmHg )
• We are 95% sure that the average systolic blood pressure
for similar patients is between 123 and 128.6
cont.…
20
1/30/2024
t
X
s
n
=
− 
b) When σ Is Unknown - The t-Distribution
• If the population standard deviation, σ, is not known,
replace σ with the sample standard deviation, s. If the
population is normal, the resulting statistic:
has a t distribution with (n - 1) degrees of freedom
21
1/30/2024
• The t is a family of bell-shaped and symmetric distributions,
one for each number of degree of freedom.
• The expected value of t is 0.
• For df > 2, the variance of t is df/(df-2). This is greater than 1,
but approaches 1 as the number of degrees of freedom
increases. The t is flatter and has fatter tails than does the
standard normal.
• The t distribution approaches a standard normal as the number
of degrees of freedom increases
Cont…
22
1/30/2024
23
1/30/2024
A (1-α)100% confidence interval for µ when σ is not
known (assuming a normally distributed population):
where is the value of the t distribution with n-1 degrees
of freedom that cuts off a tail area of to its right.
2

t

2
n
s
t
x
2


Cont…
24
1/30/2024
df t0.100 t0.050 t0.025 t0.010 t0.005
--- ----- ----- ------ ------ ------
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
1.282 1.645 1.960 2.326 2.576

0
0 .4
0 .3
0 .2
0 .1
0 .0
t
f(t)
t D istrib utio n: d f=1 0
Area = 0.10
}
Area = 0.10
}
Area = 0.025
}
Area = 0.025
}
1.372
-1.372
2.228
-2.228
Whenever σ is not known (and the
population is assumed normal), the
correct distribution to use is the t
distribution with n-1 degrees of
freedom. Note, however, that for
large degrees of freedom, the t
distribution is approximated well by
the Z distribution.
The t Distribution
25
1/30/2024
Cont…
26
1/30/2024
• Example
• In a study of preeclampsia, Kaminski and Rechberger found
the mean systolic blood pressure of 10 healthy, non-pregnant
women to be 119 with a standard deviation of 2.1.
• What is the estimated standard error of the mean?
• Construct the 99% confidence interval for the mean of the
population from which the 10 subjects may be presumed to be a
random sample.
• What is the precision of the estimate?
• What assumptions are necessary for the validity of the confidence
interval you constructed?
27
1/30/2024
Solution
A. 𝑆𝐸 =
𝑠
𝑛
=
2.1
10
= 0.66
B. Df = n-1 =10-1 = 9
99% α = 1% , α/2 =0.005
𝑡(𝑛−1)(α/2)= 3.250
ҧ
𝑥+𝑡(𝑛−1)(α/2) ∗
𝑠
𝑛
= 119 +3.25*0.66
The 99% CI is (116.8 --- 121.2)
C. Precision = 3.25*0.66 = 2.16
28
1/30/2024
Cont…
D. i. A population should be normally distributed
ii. The sample of 10 subjects should represent
random sample from this population.
29
1/30/2024
Example
• Suppose a researcher , interested in obtaining an estimate of the
average level of some enzyme in a certain human population, takes
a sample of 10 individuals, determines the level of the enzyme in
each, and computes a sample mean of approximately = 22
• Suppose further it is known that the variable of interest is
approximately normally distributed with a variance of 45. We
wish to estimate . (=0.05)
• Compute the 95% CI for 
30
1/30/2024
Solution
• 1-  = 0.95 →  = 0.05 → /2 = 0.025,
• variance = σ2 = 45 → σ =  45, n = 10
• 95% confidence interval for  is given by:
P( ± Z (1- /2) /n ) = 1- 
• Z (1- /2) = Z 0.975 = 1.96 (refer to z table)
• Z 0.975(/n) = 1.96 ( 45 / 10) = 4.1578
• 22 ± 1.96 4.1578 → (22- 4.1578, 22+4.1578)
→(17.84, 26.16)
• We are 95% confident that the population mean level of
enzyme is between 17.84 and 26.16
22
=
x
x
31
1/30/2024
Example
• The activity values of a certain enzyme measured in normal gastric
tissue of 35 patients with gastric carcinoma has a mean of 0.718 and
a standard deviation of 0.511.
• We want to construct a 90% confidence interval for the population
mean enzyme activity.
Solution
✓Note that the population is not normal,
✓ n = 35 (n > 30) n is large and  is unknown , s = 0.511
✓1-  = 0.90 →  = 0.1
✓→ /2 = 0.05 → 1-/2 = 0.95,
32
1/30/2024
• Solution
• Then 90% confident interval for  is given by :
• P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1- 
• Z (1- /2) = Z0.95 = 1.645 (refer to table)
• Z 0.95(s/n) = 1.645 (0.511/ 35)= 0.1421
0.718 ± 0.1421 → (0.718-0.1421, 0.718+0.1421) →
✓ (0.576,0.860).
✓We are 90% confident that population mean enzyme
activity is between 0.576 and 0.860
x
x
33
1/30/2024
Example
• Suppose a researcher , studied the effectiveness of early
weight bearing and ankle therapies following acute repair
of a ruptured Achilles tendon.
• One of the variables they measured following treatment
was the muscle strength.
• In 19 subjects, the mean of the strength was 250.8 with
standard deviation of 130.9
• we assume that the sample was taken from is
approximately normally distributed population.
• Calculate 95% confident interval for the mean of the
strength ?
34
1/30/2024
Solution
• 1- =0.95→ =0.05→ /2=0.025,
• Standard deviation= S = 130.9 ,n=19
95%confidence interval for  is given by:
➢P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1- 
• t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table)
• t 0.975,18(s/n) =2.1009 (130.9 / 19)= 63.1
• 250.8 ± 63.1→ (250.8- 63.1 , 22+63.1) →
• (187.7, 313.9)
We are 95% certain that the population mean of strength
is between 187.7 and 313.9
8
.
250
=
x
x x
35
1/30/2024
Confidence interval for population
proportion (assuming n large)
• Assumption
• two category outcome
• Population follows binomial distribution
• Normal approximation can be used if
• nP > 5 and n(1-P) > 5
• The 95% CI for P is given by:
• [P( Ƹ
𝑝 - Z (1- /2)
𝑝𝑞
𝑛
<  < Ƹ
𝑝 + Z (1- /2)
𝑝𝑞
𝑛
)]
36
1/30/2024
• Example
• A research study obtained data regarding sexual
behavior from a sample of unmarried men and women
between the age of 20 and 44 residing in geographic
areas characterized by high rate of sexually transmitted
diseases admission to drug programs. Fifty percent of
1229 respondents reported that they never used condom.
Construct a 95% CI for population proportion never
using condom.
37
1/30/2024
• Solution
• n = 1229
• Ƹ
𝑝 = 0.5 a point estimator of population proportion
• [P( Ƹ
𝑝 - Z (1- /2)
𝑝𝑞
𝑛
<  < Ƹ
𝑝 + Z (1- /2)
𝑝𝑞
𝑛
)]
• α = 0.05, Z (1- /2) = Z (0.975) =1.96
• [P(0.5- 1.96)
0.5∗0.5
1229
<  < 0.5+ 1.96)
0.5∗0.5
1229
)]
• The 95% CI for population proportion is (0.47, 0.53)
38
1/30/2024
Using sample statistics to Test Hypotheses
about population parameters
HYPOTHESES TESTING
39
1/30/2024
Cont…
• Data are often collected to answer specified
questions, such as:
✓Do children under five from Urban have a lower
prevalence of malnutrition compared with Rural
children?
✓Is a new treatment beneficial to those suffering
from a certain disease compared with the
standard treatment
40
1/30/2024
• Such questions may be answered by setting up a
hypothesis and then using the data to test this
hypothesis.
• It is generally agreed that some caution should be
exercised before claiming that some effect, such as a
reduction in malnutrition or an improved cure rate, has
been established.
• The way to proceed is to set up a null hypothesis, that
there is no effect. 41
Cont…
1/30/2024
Definition
• Hypothesis: is a statement about one or more populations. It
is usually concerned with the parameters of the population
• Statistical hypotheses: are hypotheses that are stated in such
a way that they may be evaluated by appropriate statistical
techniques
• e.g. the hospital administrator may want to test the hypothesis
that the average length of stay of patients admitted to the
hospital is 5 days
42
1/30/2024
Statistical hypotheses
• There are two hypotheses involved in hypothesis testing
✓Null hypothesis H0: It is the hypothesis to be tested.
Also called hypotheses of no difference
✓Alternative hypothesis HA : It is a statement of what
we believe is true if our sample data cause us to
reject the null hypothesis
43
1/30/2024
Steps in Testing Hypothesis
1. Data
2. Assumptions
3. Hypotheses
4. Test statistic
5. Select the level of significance (α):
6. Determine Critical value (ztab, ttab):
7. Calculation of test statistic (zcalc, tcalc):
8. Statistical decision
9. conclusion
10. P-values
44
1/30/2024
1. Data: understand the nature of data (e.g. counts or
measurements or proportions)
2. Assumptions: about normality of population
distribution, equality of variance, independence of
samples
3. Hypotheses: the H0 and HA should be explicitly
stated
45
Cont…
1/30/2024
Step 3. Hypotheses cont’d
• Rules for stating statistical hypotheses
a) What you hope to be able to conclude as a result of the test
usually should be placed in the alternative hypothesis.
b) The null hypothesis should contain a statement of equality,
either =,≥, or ≤.
c) The null hypothesis is the hypothesis that is tested.
d) The null and alternative hypotheses are complementary.
• That is, the two together exhaust all possibilities
regarding the value that the hypothesized parameter can
assume.
46
Cont…
1/30/2024
Step 3. Hypotheses cont’d
Examples: suppose that we want to answer the question;
i. Can we conclude that a certain population mean is not 50?
Our hypotheses are H0 : μ=50 HA : μ 50
ii. Can we conclude that the population mean is greater than
50? Our hypotheses are H0: μ ≤ 50 HA: μ >50
iii. Can we conclude that the population mean is less than 50?
Our hypotheses are H0: μ ≥ 50 HA: μ<50

47
Cont…
1/30/2024
4. Test statistic: It is a value computed from the sample data that is
used in making the decision about the rejection of the null hypothesis
• Decide on the appropriate test statistic for the hypothesis (z, t, etc.)
Based on the
✓sample size (n<30 or n>30),
✓ type of data (count i.e. qualitative or measurement or
quantitative),
✓functional form of the distribution (normal or non normal),
✓known or unknown population variance,
✓number of means or proportions, etc.
48
Cont…
1/30/2024
General formula for test statistic
test statistic = observed statistic − hypothesized parameter
standard error of the observed statistic
5. Select the level of significance (α): (α =0.05, 0.01,
0.001, etc…). If not given take 0.05
• The level of significance (α) is the probability of
rejecting a true null hypothesis.
49
Cont…
1/30/2024
Level of Significance, α
• Is the probability of rejecting a true Ho
• Defines unlikely values of sample statistic if Ho is true
✓Defines rejection region of the sampling distribution
• The decision is made on the basis of the level of
significance, designated by α.
• More frequently used values of α are 0.01, 0.05 and 0.10.
• α is selected by the researcher
50
Cont…
1/30/2024
One tail and two tail tests
• In a one tail test, the rejection region is at one end
of the distribution or the other.
• In a two tail test, the rejection region is split
between the two tails.
• Which one is used depends on the way the HA is
stated.
51
Cont…
1/30/2024
Level of Significance and the Rejection Region
Example:
• The average survival year after cancer diagnosis is less
than 3 years.
52
Cont…
1/30/2024
53
Cont…
1/30/2024
6. Determine Critical value (ztab, ttab):
• It is the value the test statistic must attain to be declared
significant (i.e. label the rejection & "acceptance"
regions)
7. Calculation of test statistic (zcalc, tcalc):
• calculate the test statistic based on step 4 and compare
it with the critical value
54
Cont…
1/30/2024
8. Statistical decision: statistical decision consists of rejecting
or not rejecting the null hypothesis.
• It is rejected if the computed value of the test statistic falls in
the rejection area.
✓i.e. Reject Ho if, Zcal > Ztab OR tcal>ttab
• It is not rejected if the computed value of the test statistic
falls in the non-rejection area.
✓i.e. Accept or don't reject Ho if, Zcal < Ztab OR tcal< t tab
55
Cont…
1/30/2024
Types of Errors in Hypothesis Tests
• Whenever we reject or accept the Ho, we commit
errors.
• Two types of errors are committed.
• Type I Error
• Type II Error
56
1/30/2024
Type I Error
• The error committed when
a true Ho is rejected
• The probability of type I
error is α
• Called level of significance
of the test
Type II Error
• The error committed
when a false Ho is not
rejected
• The probability of Type II
Error is 
57
Cont…
1/30/2024
Action
(Conclusion)
Reality
Ho True Ho False
Do not
reject Ho
Correct action Type II error (β)
Reject Ho Type I error (α) Correct action
58
Cont…
1/30/2024
9. Conclusion:
✓if Ho is rejected, we conclude that HA is true.
✓If Ho is not rejected, we conclude that Ho may be true.
10. P-values:
• The p-value is the probability of getting a value for the test
statistic as large or larger than the observed value of the test
statistic just by random chance.
✓Reject the null hypothesis if P≤α
✓Don't reject ("accept") the null hypothesis if P>α
59
Cont…
1/30/2024
TESTING A HYPOTHESIS ABOUT
THE MEAN OF A POPULATION:
60
1/30/2024
Testing a hypothesis about the mean of a population:
1.Data: determine variable, sample size (n), sample mean( ) ,
population standard deviation or sample standard deviation
(s) if it is unknown
2. Assumptions: We have two cases:
• Case1: Population is normally or approximately normally
distributed with known or unknown variance (sample size n
may be small or large),
• Case 2: Population is not normal with known or unknown
variance (n is large i.e. n≥30).
x
61
1/30/2024
3. Hypotheses: we have three cases
✓Case I: H0: μ=μ0
HA: μ μ0
• e.g. we want to test that the population mean is different
from 50
✓Case II: H0: μ ≤ μ0
HA: μ > μ0
• e.g. we want to test that the population mean is greater
than 50
✓Case III: H0: μ ≥ μ0
HA: μ < μ0
• e.g. we want to test that the population mean is less than
50


62
Cont…
1/30/2024
4.Test Statistic:
• Case 1: population is normal or approximately normal
σ2
is known σ2
is unknown
( n large or small)
n large n small
• Case2: If population is not normally distributed and n is
large
i)If σ2
is known ii) If σ2
is unknown
n
X
Z

o
-
=
n
s
X
Z o
- 
=
n
s
X
T o
- 
=
n
s
X
Z o
- 
=
n
X
Z

o
-
=
63
Cont…
1/30/2024
5.Decision Rule:
i) If HA: μ μ0
✓Reject H0 if Z > Z1-α/2 OR
Z < - Z1-α/2
✓Reject H0 if T > t1-α/2,n-1
OR T < -t1-α/2,n-1
ii) If HA: μ > μ0
✓Reject H0 if Z > Z1-α
Or
✓Reject H0 if T > t1-α,n-1
iii) If HA: μ < μ0
✓Reject H0 if Z < - Z1-α
Or
✓Reject H0 if T < - t1-α,n-1

64
Cont…
1/30/2024
Note:
✓ Z1-α/2 , Z1-α , Zα are tabulated values obtained from Z
table
✓ t1-α/2 , t1-α , tα are tabulated values obtained from t table
with (n-1) degree of freedom (df)
65
Cont…
1/30/2024
6. Decision :
• If we reject H0, we can conclude that HA is true.
• If ,however ,we do not reject H0, we may conclude
that H0 is may be true.
66
Cont…
1/30/2024
An Alternative Decision Rule using the p - value
• The p-value is defined as the smallest value of α for
which the null hypothesis can be rejected.
• If the p-value is less than or equal to α ,we reject the null
hypothesis (p ≤ α if one tailed test or p ≤ α/2, if two
tailed test )
• If the p-value is greater than α ,we do not reject the null
hypothesis (p > α if one tailed test or p > α/2, if two
tailed test )
67
Cont…
1/30/2024
Example
• Researchers are interested in the mean age of a certain population.
• A random sample of 10 individuals drawn from the population of
interest has a mean of 27.
• Assuming that the population is approximately normally
distributed with variance 20,
• Can we conclude that the population mean is different from 30
years? (α=0.05) .
68
1/30/2024
Solution
1-Data: variable is age, n = 10, = 27, σ2 = 20, α = 0.05
2-Assumptions: the population is approximately normally
distributed with variance 20
3-Hypotheses:
• H0 : μ=30
• HA: μ 30
x

69
Cont…
1/30/2024
4- Distribution of Test Statistic:
5. Level of significance α = 0.05
6. Decision Rule
• The alternative hypothesis is HA: μ 30
✓reject H0 if Zcal > Ztab or Zcal < - Ztab
✓Generally when HA: μ μ0
Reject H0 if │Zcal│> Ztab
n
X
Z

o
-
=


70
Cont…
1/30/2024
6. Critical value
• Since the HA is two sided we divide α by 2
• Ztab = Z1-α/2 = Z1-0.05/2 = Z0.975 = 1.96 in right tail and -
1.96 in left tail
71
Cont…
1/30/2024
7. Calculation of test statistic
• Zcal = 27-30/(√20/√10) = -2.12
8. Statistical Decision:
• We reject H0 ,since -2.12 is in the rejection region
i.e. │-2.12│> 1.96
9. Conclusion
• We can conclude that the mean age (μ) is different from 30
years
10. p-value: p =0.0174 < 0.025, i.e. p ≤ α/2 therefore we reject H0
72
Cont…
1/30/2024
Example
• Among 157 African-American men ,the mean systolic
blood pressure was 146 mm Hg with a standard
deviation of 27.
• We wish to know if on the basis of these data,
• we may conclude that the mean systolic blood pressure
for a population of African-American is greater than
140. Use α=0.01.
73
1/30/2024
Solution
1. Data: Variable is systolic blood pressure, n=157, = 146,
s = 27, α = 0.01.
2. Assumption: population is not normal, σ2 is unknown, n>30
3. Hypotheses: H0 : μ ≤ 140
HA : μ > 140
4. Test Statistic: = =
• Zcal = 2.78
n
s
X
Z o
- 
=
157
27
140
146 −
1548
.
2
6
x
74
Cont…
1/30/2024
5. Decision Rule:
✓we reject H0 if Zcal>Z1-α
✓Ztab = Z0.99= 2.33
(from z table)
6. Decision: We reject H0.
• Hence we may conclude that the mean systolic blood
pressure for a population of African-American is
greater than 140 mm Hg.
75
Cont…
1/30/2024
Example
• A simple random sample of 17 patients with muscle
injury were treated at a research center.
• The variable of interest was number of days between
injury and recovery. The number of days until recovery
was normally distributed in the population.
• Can we conclude that the mean number of days is not 15
days in the population represented by the sample data?
(See the data below)
76
1/30/2024
Table: number of days until recovery for subjects with muscle injury
Subject Days Subject Days
1 14 11 28
2 9 12 24
3 18 13 24
4 26 14 2
5 12 15 3
6 0 16 14
7 10 17 9
8 4
9 8
10 21 77
1/30/2024
Solution
1. Data: number of days n = 17, =13.294, S = 8.886,
(calculate from the data)
2. Assumptions: n < 30, Simple random sampling, normally
distributed, unknown population variance
3. Hypotheses:
H0 : μ = 15
HA: μ ≠ 15
4. Test statistic: our test statistics is distributed as students t
with 17-1=16 df . And given by
x
n
s
X
T o
- 
=
78
1/30/2024
5. Level of significance:
• let α=0.05 since we have a two-tailed test we put α/2 =0.025 in
each tail of the distribution
• Decision rule: Reject H0 if │tcal│> tn-1,1- α/2
6. Critical value:
ttab = tn-1,1- α/2
= t16,0.975
= 2.1199 to right and
-2.1199 to left
79
Cont…
1/30/2024
7. Calculation of test statistic:
ttab =
𝑋−𝑢0
𝑠√𝑛
=
13.2941−15
8.886√17
=
−1.7059
2.1553
= -0.791
8.Statistical decision:
✓don’t reject H0, since │-0.791│< 2.1199
9. Conclusion: based on this data the mean of population from
which the sample came may be 15.
10. P-value: P(t ≤ -0.791) and P( t ≥ 0.791) > 0.1 which is
greater than α, so don’t reject H0.
80
Cont…
1/30/2024
Hypothesis Testing:
A single population proportion
81
1/30/2024
A single population proportion:
• Testing hypothesis about population proportion (P) have
the following steps:
1. Data: sample size (n), sample proportion( ),
hypothesized population proportion (P0)
2. Assumptions :normal distribution ,
p̂
n
a
p =
=
sample
in the
element
of
no.
Total
istic
charachtar
some
with
sample
in the
element
of
no.
ˆ
82
1/30/2024
3. Hypotheses: we have three cases
• Case I: H0: P = P0
HA: P ≠ P0
• Case II: H0: P ≤ P0
HA: P > P0
• Case III: H0: P ≥ P0
HA: P < P0
4. Test Statistic:
✓Where H0 is true, is distributed approximately as the standard
normal
n
q
p
p
p
Z
0
0
0
ˆ −
=
83
Cont…
1/30/2024
5. Decision Rule:
i) If HA: P ≠ P0
✓Reject H 0 if Z > Zα/2 or Z < - Zα/2
ii) If HA: P > P0
✓Reject H0 if Z > Zα
iii) If HA: P < P0
✓Reject H0 if Z < - Zα
Note: Zα/2 , Zα , Zα are tabulated values obtained from table
6. Conclusion: reject or fail to reject H0
84
Cont…
1/30/2024
Example
• A study on 301 Hispanic women in San Antonio, Texas
investigated percentage of subjects with impaired fasting
glucose (IFG). In the study, 24 women were classified in
the IFG stage. The population estimates for IFG among
Hispanic women in Texas as 6.3%. Is there sufficient
evidence to indicate that the population of Hispanic
women in San Antonio has a prevalence of IFG higher
than 6.3%.
85
1/30/2024
Solution
1. Data: n = 301, p0 = 6.3/100 = 0.063 , a =24,
✓q0 = 1- p0 = 1- 0.063 = 0.937, α = 0.05
2. Assumptions : is approximately normally distributed
3. Hypotheses: we have three cases
H0: P ≤ 0.063
HA: P > 0.063
p̂
08
.
0
301
24
ˆ =
=
=
n
a
p
86
1/30/2024
4. Test Statistic
5. Decision Rule: α=0.05
✓Reject H0 if Z > Z1-α
✓Where Z1-α = Z1-0.05 = Z0.95 = 1.645
6. Conclusion: Fail to reject H0 Since
✓Z =1.21 > Z1-α= 1.645 Or , P-value = 0.1131,
✓ Fail to reject H0 → P > α
21
.
1
301
)
0.937
(
063
.
0
063
.
0
08
.
0
ˆ
0
0
0
=
−
=
−
=
n
q
p
p
p
Z
87
Cont…
1/30/2024
Sample Size For Crossectional Study
88
1/30/2024
• An essential part of planning any study is to decide how
many people need to be studied ?
Sample Size
• The number of study subjects selected to represent a
given study population.
• Important to make inferences based on the findings from
the sample.
• Should be sufficient to represent the characteristics of
interest of the study population.
89
1/30/2024
• In estimating a certain characteristic of a population,
sample size calculations are important to ensure that
estimates are obtained with required precision or
confidence.
• The accuracy of the envisaged results determine the
size of the sample.
90
Cont…
1/30/2024
• Sample size determination depends on the:
– objective of the study;
– design of the study;
– plan for statistical analysis;
– accuracy of the measurements to be made;
–degree of precision required for generalization;
– degree of confidence with which to conclude.
91
Cont…
1/30/2024
• Common questions:
– “How many subjects should I study?”
– Too small sample = Waste of time and
resources = Results have no practical use
– Too large sample = Waste of resources = Data
quality compromised
92
Cont…
1/30/2024
• The feasible sample size is also determined by
the availability of resources:
– Human resource
– Time
– Transport
– Available facility, and
– Money
93
Cont…
1/30/2024
94
1/30/2024
Cont…
Sample Size: Single Sample
• The aim is to have a large enough sample with
which to estimate a population mean or
proportion within a narrow interval with high
reliability.
• Concerned with the precision of the estimate
(“narrowness of the CI”).
estimate ± d units
95
1/30/2024
Sample Size For Single Sample Includes:
A. Sample size for estimating a single population
mean.
B. Sample size to estimate a single population
proportion.
• The minimum sample size required, for a very large
population (N10,000)
96
1/30/2024
Sample size for Estimating single population mean
• Suppose we want to estimate the average daily
caloric intake of people in a community. The daily
caloric intake is assumed to have a normal
distribution with mean µ and standard deviation (σ).
•The sample measure used to estimate µ is the sample
mean. The sampling distribution of the sample mean is
also normal, with the same mean, µ and standard
deviation, σ ⁄√n (the standard error of the mean).
97
1/30/2024
Sample size for estimating a single population mean
• AIM: Estimate µ
• WANT: Estimate ( ) ± d units
where d = Margin of error =
= Absolute precision
= Half of the width (w) of CI
Steps:
1. Specify d (or w = 2d)
2. Use known σ2 or estimate using s2
98
1/30/2024
3.
99
Where d = e in some text books
Standard error of the
estimator of the parameter
of interest
1/30/2024
Example:
1. Find the minimum sample size needed to estimate the
drop in heart rate (µ) for a new study using a higher
dose of propranolol than the standard one. We require
that the two-sided 95% CI for µ be no wider than 5
beats per minute and the sample sd for change in heart
rate equals 10 beats per minute.
n = (1.96)
2
10
2
/(2.5)
2
= 62 patients
100
1/30/2024
101
2. Suppose that for a certain group of cancer patients,
we are interested in estimating the mean age at
diagnosis. We would like a 95% CI of 5 years wide.
If the population SD is 12 years, how large should
our sample be?
1/30/2024
102
• Suppose d = 1
• Then the sample size increases
Cont…
1/30/2024
3. A hospital director wishes to estimate the
mean weight of babies born in the hospital.
How large a sample of birth records should be
taken if she/he wants a 95% CI of 0.5 wide?
Assume that a reasonable estimate of  is 2.
Ans: 246 birth records.
103
1/30/2024
But the population 2 is most of the time unknown
As a result, it has to be estimated from:
• Pilot or preliminary study or survey:
– Select a pilot sample and estimate 2 with
the sample variance, s2
• Previous or similar studies
104
1/30/2024
Sample size to estimate a single population proportion
• Aim: Estimate p
• Want: Estimate ± d units where d = Z•SE
(95% CI of width=2d)
Steps:
1. Specify d (or w = 2d)
2. Use estimated p (use p=0.5 if no
information)
105
1/30/2024
3. Solve for n
106
1/30/2024
107
1. Suppose that you are interested to know the proportion
of infants who breastfed >18 months of age in a rural
area. Suppose that in a similar area, the proportion (p)
of breastfed infants was found to be 0.20. What
sample size is required to estimate the true proportion
within ±3% points with 95% confidence. Let p=0.20,
d=0.03, α=5%
Example
1/30/2024
108
• Suppose there is no prior information about the
proportion (p) who breastfeed
• Assume p = q = 0.5 (most conservative)
• Then the required sample size increases
1/30/2024
109
• An estimate of p is not always available.
• However, the formula may also be used for
sample size calculation based on various
assumptions for the values of p.
• P = 0.1 → n = (1.96)2(0.1)(0.9)/(0.05)2 = 138
P = 0.2 → n = (1.96)2(0.2)(0.8)/(0.05)2 = 246
P = 0.3 → n = (1.96)2(0.3)(0.7)/(0.05)2 = 323
P = 0.5 → n = (1.96)2(0.5)(0.5)/(0.05)2 = 384
P = 0.7 → n = (1.96)2(0.7)(0.3)/(0.05)2 = 323
P = 0.8 → n = (1.96)2(0.8)(0.2)/(0.05)2 = 246
1/30/2024
➢For a fixed absolute precision (d), the required
sample size increases as P increases form 0 to
0.5, and then decreases in the same way as the
prevalence approaches 1.
110
1/30/2024
2. A survey is planned to determine what proportion
of the medical students have regularly chewed khat.
If no estimate of p is available and a pilot sample
cannot be drawn, what sample size would be
required if a 95% confidence is desired, and d=0.04
is to be used.
Ans: 600 students
111
1/30/2024
3. Suppose an estimate is desired of the average retail
price of twenty tablets of a commonly used
tranquilizer. A random sample of retail
pharmacies is to be selected. The estimate is
required to be within 10 cents of the true average
price with 95% confidence. Based on a small pilot
study, the standard deviation in price, σ, can be
estimated as 85 cents. How many pharmacies
should be randomly selected?
 Solution
 Using the above formula, it follows that
 n = [(1.960)2(0.85)2]/{0.1 0) 2 = 277.56.
 As a result, a sample of 278 pharmacies should be
taken
112
1/30/2024
Points for Consideration
1. Sample size estimates might need to be adjusted to
compensate for non-response rate, patient dropout or loss
to follow-up, lack of compliance, etc.
2. If sampling is from a finite population of size N (<10,000),
then:
where n0 is the sample from an infinite population. When N
is large in comparison to n, (i.e., n/N ≤ 0.05), the finite
population correction may be ignored.
3. Design effect for complex cluster sampling. Common values:
multiply n by 2, 3, …5.
n =
n
1 +
n
N
0
0






113
1/30/2024
114 1/30/2024

More Related Content

PPT
Ch4 Confidence Interval
PDF
Sufficient statistics
PPTX
Normality evaluation in a data
PDF
Confidence intervals: Types and calculations
PDF
Practice test 3A, Normal Probability Distribution
PPTX
Parametric vs Nonparametric Tests: When to use which
PPTX
Statistics and data analysis
PDF
Hypothesis Testing
Ch4 Confidence Interval
Sufficient statistics
Normality evaluation in a data
Confidence intervals: Types and calculations
Practice test 3A, Normal Probability Distribution
Parametric vs Nonparametric Tests: When to use which
Statistics and data analysis
Hypothesis Testing

Similar to Biostatics part 7.pdf (20)

PPTX
PPT
Statistik 1 7 estimasi & ci
PPTX
M1-4 Estimasi Titik dan Intervaltttt.pptx
PDF
Ch3_Statistical Analysis and Random Error Estimation.pdf
PPT
2_5332511410507220042.ppt
PPTX
Presentation_advance_1n.pptx
PPT
Lecture-3 inferential stastistics.ppt
PPTX
Estimating a Population Mean
PDF
Lec 5 statistical intervals
PPT
Chapter 7 note Estimation.ppt biostatics
PDF
Estimation and hypothesis test lecture.pdf
PPTX
Statistical Analysis-Confidence Interval_Session 5.pptx
PPTX
Estimating a Population Mean
PPTX
Estimating a Population Mean
PDF
Estimation and hypothesis testing (2).pdf
PPTX
JM Statr session 13, Jan 11
PPTX
L10 confidence intervals
PPTX
Data Analysis - Confirmatory Data Analysis.pptx
PPTX
STAT 206 - Chapter 8 (Confidence Interval Estimation).pptx
PPT
Confidence intervals
Statistik 1 7 estimasi & ci
M1-4 Estimasi Titik dan Intervaltttt.pptx
Ch3_Statistical Analysis and Random Error Estimation.pdf
2_5332511410507220042.ppt
Presentation_advance_1n.pptx
Lecture-3 inferential stastistics.ppt
Estimating a Population Mean
Lec 5 statistical intervals
Chapter 7 note Estimation.ppt biostatics
Estimation and hypothesis test lecture.pdf
Statistical Analysis-Confidence Interval_Session 5.pptx
Estimating a Population Mean
Estimating a Population Mean
Estimation and hypothesis testing (2).pdf
JM Statr session 13, Jan 11
L10 confidence intervals
Data Analysis - Confirmatory Data Analysis.pptx
STAT 206 - Chapter 8 (Confidence Interval Estimation).pptx
Confidence intervals
Ad

More from NatiphBasha (11)

PPTX
Congenital Heart Disease.pptxyyyyyyyyyyy
PPTX
Health Informatics terminology wabi.pptx
PPTX
Introduction to HIS (1) from wabi.pptxfy
PPTX
L4 Neonatal sepsis by F4.pptxbbbbbnbvbbvvb
PPTX
pschotic disoder.pptxbbbbbbbbbbbbbbbbbbbbb
PPTX
Obstetric II by Amare.pptxyyyyyyyyyyyyyyyy
PPTX
Fund. Nurs Asia.pptx vvbbhhjjjjjjjjjjjkjj bbnnn
PPTX
med. suffgghhhhhyyhhhhhjjjj hjjjhjhhjjjjvhhhhh
PPT
nrc_peds_oi_july09_fungal.ppt
PPTX
GI physiology KKK.pptx
PPTX
Biochem. lipids for midwife 6.pptx
Congenital Heart Disease.pptxyyyyyyyyyyy
Health Informatics terminology wabi.pptx
Introduction to HIS (1) from wabi.pptxfy
L4 Neonatal sepsis by F4.pptxbbbbbnbvbbvvb
pschotic disoder.pptxbbbbbbbbbbbbbbbbbbbbb
Obstetric II by Amare.pptxyyyyyyyyyyyyyyyy
Fund. Nurs Asia.pptx vvbbhhjjjjjjjjjjjkjj bbnnn
med. suffgghhhhhyyhhhhhjjjj hjjjhjhhjjjjvhhhhh
nrc_peds_oi_july09_fungal.ppt
GI physiology KKK.pptx
Biochem. lipids for midwife 6.pptx
Ad

Recently uploaded (20)

PDF
1_English_Language_Set_2.pdf probationary
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PPTX
202450812 BayCHI UCSC-SV 20250812 v17.pptx
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PDF
Empowerment Technology for Senior High School Guide
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PDF
RMMM.pdf make it easy to upload and study
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
PDF
What if we spent less time fighting change, and more time building what’s rig...
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
PPTX
History, Philosophy and sociology of education (1).pptx
PPTX
Introduction to Building Materials
PPTX
Cell Types and Its function , kingdom of life
1_English_Language_Set_2.pdf probationary
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
202450812 BayCHI UCSC-SV 20250812 v17.pptx
A systematic review of self-coping strategies used by university students to ...
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Empowerment Technology for Senior High School Guide
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
RMMM.pdf make it easy to upload and study
Final Presentation General Medicine 03-08-2024.pptx
Chinmaya Tiranga quiz Grand Finale.pdf
LDMMIA Reiki Yoga Finals Review Spring Summer
Supply Chain Operations Speaking Notes -ICLT Program
UV-Visible spectroscopy..pptx UV-Visible Spectroscopy – Electronic Transition...
What if we spent less time fighting change, and more time building what’s rig...
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
CHAPTER IV. MAN AND BIOSPHERE AND ITS TOTALITY.pptx
History, Philosophy and sociology of education (1).pptx
Introduction to Building Materials
Cell Types and Its function , kingdom of life

Biostatics part 7.pdf

  • 2. • Inferential statistics are the statistical methods used to draw conclusions from a sample and make inferences to the entire population. • The two primary methods for making inference are estimation and hypothesis testing. Infer. cont.… 2 1/30/2024
  • 3. • Estimation is the process of determining a likely value for a variable in the survey collected population, based on information collected from the sample. • Estimation is the use of sample statistics to estimate population parameters. • The true population parameter value is usually unknown. 3 cont.… 1/30/2024
  • 4. • Researchers are usually interested in looking totals at estimates of many statistic. • For example, a sample survey could be used to produce any of the following statistics: ✓Estimates for the proportion of smokers among all people aged 15 to 24 in the population; ✓The mean level of a certain enzyme among healthy men. 4 cont.… 1/30/2024
  • 5. Statistical estimation • Point estimate is always within the interval estimate Point Estimate ✓ sample mean ✓ sample proportion Interval Estimate ✓ confidence interval for mean ✓ confidence interval for proportion Estimate 5 1/30/2024
  • 6. Point Estimate • Single numerical value used to estimate the corresponding population parameter. • A single value quoted as an estimate of a population parameter is of little use unless it is accompanied by some indication of its precision. • The following slides describe various ways of enhancing the value of point estimates. 6 1/30/2024
  • 7. Parameter Statistic Mean Mean Difference Variance Proportion Proportion Difference Correlation Coefficient OR RR cont.… 7 1/30/2024
  • 8. •Desirable properties of estimators include: ✓Unbiasedness ✓Efficiency ✓Consistency ✓Sufficiency Properties of Estimators 8 1/30/2024
  • 9. Interval Estimation • Usually, we only have a sample and don’t know the entire population. • Example: Point estimate of 0.30 for population proportion • It is not reasonable to assume that the population proportion is exactly 0.30. • The probability of getting a sample statistic value that is exactly equal to the corresponding population parameter is usually quite small. 9 1/30/2024
  • 10. • It may be reasonable to assume that 0.30 is close to the population proportion • We use a point estimate to obtain an interval estimate • Ideally, we would like to completely certain that the population parameter of interest is included in our interval estimate. This is too much to ask for. • We have to settle for something less than certainty, namely, confidence levels. cont.… 10 1/30/2024
  • 11. • While we construct CI we need to consider the following. • Takes into consideration variation in sample statistics from sample to sample • Provides Range of Values ✓Based on Observations from 1 Sample • Gives Information about Closeness to Unknown Population Parameter • Stated in terms of Probability ✓Never 100% Sure cont.… 11 1/30/2024
  • 12. • Two questions to put bounds on our point estimates to reflect our level of confidence ✓How wide does the bracket have to be? ✓What is our tolerance of error(variability, not mistake)? • Scientists usually accept a 5% chance that the range will not include the true that the range will not include the true population value ✓The range or interval is called 95% confidence interval cont.… 12 1/30/2024
  • 14. • A Probability That the Population Parameter Falls Somewhere Within the Interval is LC. Confidence Interval Sample Statistic Confidence Limit (Lower) Confidence Limit (Upper) cont.… 14 1/30/2024
  • 16. Factors Affecting Interval Width 16 1/30/2024
  • 17. 1. C.I. For a Population Mean (Normally Distributed) a) Known variance(large sample size) • A 100(1‐α)% C.I. for μ is • α is to be chosen by the researcher, most common values of α are 0.05, 0.01, 0.001 and 0.1 17 1/30/2024
  • 18. cont.… • 100(1-α)% CI for μ when σ is known (sampling from normal population or large sample) • Interpretation: a. Probabilistic: in repeated sampling, 100(1-α)% of all intervals will include μ b. Practical: we are 100(1-α)% confident that a single interval contains μ Estimator Precision of the estimate (margin of error) Reliability Coefficient Standard Error 18 1/30/2024
  • 19. Example • A data on 199 patients on systolic blood pressure gives a mean value of 125.8 mmHg. Let us assume that the standard deviation for this patient population is known to be 20 mmHg. ✓Construct a 95 percent confidence interval for the population mean. cont.… 19 1/30/2024
  • 20. • Solution • The 95% CI is (123.0, 128.6 mmHg ) • We are 95% sure that the average systolic blood pressure for similar patients is between 123 and 128.6 cont.… 20 1/30/2024
  • 21. t X s n = −  b) When σ Is Unknown - The t-Distribution • If the population standard deviation, σ, is not known, replace σ with the sample standard deviation, s. If the population is normal, the resulting statistic: has a t distribution with (n - 1) degrees of freedom 21 1/30/2024
  • 22. • The t is a family of bell-shaped and symmetric distributions, one for each number of degree of freedom. • The expected value of t is 0. • For df > 2, the variance of t is df/(df-2). This is greater than 1, but approaches 1 as the number of degrees of freedom increases. The t is flatter and has fatter tails than does the standard normal. • The t distribution approaches a standard normal as the number of degrees of freedom increases Cont… 22 1/30/2024
  • 24. A (1-α)100% confidence interval for µ when σ is not known (assuming a normally distributed population): where is the value of the t distribution with n-1 degrees of freedom that cuts off a tail area of to its right. 2  t  2 n s t x 2   Cont… 24 1/30/2024
  • 25. df t0.100 t0.050 t0.025 t0.010 t0.005 --- ----- ----- ------ ------ ------ 1 3.078 6.314 12.706 31.821 63.657 2 1.886 2.920 4.303 6.965 9.925 3 1.638 2.353 3.182 4.541 5.841 4 1.533 2.132 2.776 3.747 4.604 5 1.476 2.015 2.571 3.365 4.032 6 1.440 1.943 2.447 3.143 3.707 7 1.415 1.895 2.365 2.998 3.499 8 1.397 1.860 2.306 2.896 3.355 9 1.383 1.833 2.262 2.821 3.250 10 1.372 1.812 2.228 2.764 3.169 11 1.363 1.796 2.201 2.718 3.106 12 1.356 1.782 2.179 2.681 3.055 13 1.350 1.771 2.160 2.650 3.012 14 1.345 1.761 2.145 2.624 2.977 15 1.341 1.753 2.131 2.602 2.947 16 1.337 1.746 2.120 2.583 2.921 17 1.333 1.740 2.110 2.567 2.898 18 1.330 1.734 2.101 2.552 2.878 19 1.328 1.729 2.093 2.539 2.861 20 1.325 1.725 2.086 2.528 2.845 21 1.323 1.721 2.080 2.518 2.831 22 1.321 1.717 2.074 2.508 2.819 23 1.319 1.714 2.069 2.500 2.807 24 1.318 1.711 2.064 2.492 2.797 25 1.316 1.708 2.060 2.485 2.787 26 1.315 1.706 2.056 2.479 2.779 27 1.314 1.703 2.052 2.473 2.771 28 1.313 1.701 2.048 2.467 2.763 29 1.311 1.699 2.045 2.462 2.756 30 1.310 1.697 2.042 2.457 2.750 40 1.303 1.684 2.021 2.423 2.704 60 1.296 1.671 2.000 2.390 2.660 120 1.289 1.658 1.980 2.358 2.617 1.282 1.645 1.960 2.326 2.576  0 0 .4 0 .3 0 .2 0 .1 0 .0 t f(t) t D istrib utio n: d f=1 0 Area = 0.10 } Area = 0.10 } Area = 0.025 } Area = 0.025 } 1.372 -1.372 2.228 -2.228 Whenever σ is not known (and the population is assumed normal), the correct distribution to use is the t distribution with n-1 degrees of freedom. Note, however, that for large degrees of freedom, the t distribution is approximated well by the Z distribution. The t Distribution 25 1/30/2024
  • 27. • Example • In a study of preeclampsia, Kaminski and Rechberger found the mean systolic blood pressure of 10 healthy, non-pregnant women to be 119 with a standard deviation of 2.1. • What is the estimated standard error of the mean? • Construct the 99% confidence interval for the mean of the population from which the 10 subjects may be presumed to be a random sample. • What is the precision of the estimate? • What assumptions are necessary for the validity of the confidence interval you constructed? 27 1/30/2024
  • 28. Solution A. 𝑆𝐸 = 𝑠 𝑛 = 2.1 10 = 0.66 B. Df = n-1 =10-1 = 9 99% α = 1% , α/2 =0.005 𝑡(𝑛−1)(α/2)= 3.250 ҧ 𝑥+𝑡(𝑛−1)(α/2) ∗ 𝑠 𝑛 = 119 +3.25*0.66 The 99% CI is (116.8 --- 121.2) C. Precision = 3.25*0.66 = 2.16 28 1/30/2024
  • 29. Cont… D. i. A population should be normally distributed ii. The sample of 10 subjects should represent random sample from this population. 29 1/30/2024
  • 30. Example • Suppose a researcher , interested in obtaining an estimate of the average level of some enzyme in a certain human population, takes a sample of 10 individuals, determines the level of the enzyme in each, and computes a sample mean of approximately = 22 • Suppose further it is known that the variable of interest is approximately normally distributed with a variance of 45. We wish to estimate . (=0.05) • Compute the 95% CI for  30 1/30/2024
  • 31. Solution • 1-  = 0.95 →  = 0.05 → /2 = 0.025, • variance = σ2 = 45 → σ =  45, n = 10 • 95% confidence interval for  is given by: P( ± Z (1- /2) /n ) = 1-  • Z (1- /2) = Z 0.975 = 1.96 (refer to z table) • Z 0.975(/n) = 1.96 ( 45 / 10) = 4.1578 • 22 ± 1.96 4.1578 → (22- 4.1578, 22+4.1578) →(17.84, 26.16) • We are 95% confident that the population mean level of enzyme is between 17.84 and 26.16 22 = x x 31 1/30/2024
  • 32. Example • The activity values of a certain enzyme measured in normal gastric tissue of 35 patients with gastric carcinoma has a mean of 0.718 and a standard deviation of 0.511. • We want to construct a 90% confidence interval for the population mean enzyme activity. Solution ✓Note that the population is not normal, ✓ n = 35 (n > 30) n is large and  is unknown , s = 0.511 ✓1-  = 0.90 →  = 0.1 ✓→ /2 = 0.05 → 1-/2 = 0.95, 32 1/30/2024
  • 33. • Solution • Then 90% confident interval for  is given by : • P( - Z (1- /2) s/n <  < + Z (1- /2) s/n) = 1-  • Z (1- /2) = Z0.95 = 1.645 (refer to table) • Z 0.95(s/n) = 1.645 (0.511/ 35)= 0.1421 0.718 ± 0.1421 → (0.718-0.1421, 0.718+0.1421) → ✓ (0.576,0.860). ✓We are 90% confident that population mean enzyme activity is between 0.576 and 0.860 x x 33 1/30/2024
  • 34. Example • Suppose a researcher , studied the effectiveness of early weight bearing and ankle therapies following acute repair of a ruptured Achilles tendon. • One of the variables they measured following treatment was the muscle strength. • In 19 subjects, the mean of the strength was 250.8 with standard deviation of 130.9 • we assume that the sample was taken from is approximately normally distributed population. • Calculate 95% confident interval for the mean of the strength ? 34 1/30/2024
  • 35. Solution • 1- =0.95→ =0.05→ /2=0.025, • Standard deviation= S = 130.9 ,n=19 95%confidence interval for  is given by: ➢P( - t (1- /2),n-1 s/n <  < + t (1- /2),n-1 s/n) = 1-  • t (1- /2),n-1 = t 0.975,18 = 2.1009 (refer to table) • t 0.975,18(s/n) =2.1009 (130.9 / 19)= 63.1 • 250.8 ± 63.1→ (250.8- 63.1 , 22+63.1) → • (187.7, 313.9) We are 95% certain that the population mean of strength is between 187.7 and 313.9 8 . 250 = x x x 35 1/30/2024
  • 36. Confidence interval for population proportion (assuming n large) • Assumption • two category outcome • Population follows binomial distribution • Normal approximation can be used if • nP > 5 and n(1-P) > 5 • The 95% CI for P is given by: • [P( Ƹ 𝑝 - Z (1- /2) 𝑝𝑞 𝑛 <  < Ƹ 𝑝 + Z (1- /2) 𝑝𝑞 𝑛 )] 36 1/30/2024
  • 37. • Example • A research study obtained data regarding sexual behavior from a sample of unmarried men and women between the age of 20 and 44 residing in geographic areas characterized by high rate of sexually transmitted diseases admission to drug programs. Fifty percent of 1229 respondents reported that they never used condom. Construct a 95% CI for population proportion never using condom. 37 1/30/2024
  • 38. • Solution • n = 1229 • Ƹ 𝑝 = 0.5 a point estimator of population proportion • [P( Ƹ 𝑝 - Z (1- /2) 𝑝𝑞 𝑛 <  < Ƹ 𝑝 + Z (1- /2) 𝑝𝑞 𝑛 )] • α = 0.05, Z (1- /2) = Z (0.975) =1.96 • [P(0.5- 1.96) 0.5∗0.5 1229 <  < 0.5+ 1.96) 0.5∗0.5 1229 )] • The 95% CI for population proportion is (0.47, 0.53) 38 1/30/2024
  • 39. Using sample statistics to Test Hypotheses about population parameters HYPOTHESES TESTING 39 1/30/2024
  • 40. Cont… • Data are often collected to answer specified questions, such as: ✓Do children under five from Urban have a lower prevalence of malnutrition compared with Rural children? ✓Is a new treatment beneficial to those suffering from a certain disease compared with the standard treatment 40 1/30/2024
  • 41. • Such questions may be answered by setting up a hypothesis and then using the data to test this hypothesis. • It is generally agreed that some caution should be exercised before claiming that some effect, such as a reduction in malnutrition or an improved cure rate, has been established. • The way to proceed is to set up a null hypothesis, that there is no effect. 41 Cont… 1/30/2024
  • 42. Definition • Hypothesis: is a statement about one or more populations. It is usually concerned with the parameters of the population • Statistical hypotheses: are hypotheses that are stated in such a way that they may be evaluated by appropriate statistical techniques • e.g. the hospital administrator may want to test the hypothesis that the average length of stay of patients admitted to the hospital is 5 days 42 1/30/2024
  • 43. Statistical hypotheses • There are two hypotheses involved in hypothesis testing ✓Null hypothesis H0: It is the hypothesis to be tested. Also called hypotheses of no difference ✓Alternative hypothesis HA : It is a statement of what we believe is true if our sample data cause us to reject the null hypothesis 43 1/30/2024
  • 44. Steps in Testing Hypothesis 1. Data 2. Assumptions 3. Hypotheses 4. Test statistic 5. Select the level of significance (α): 6. Determine Critical value (ztab, ttab): 7. Calculation of test statistic (zcalc, tcalc): 8. Statistical decision 9. conclusion 10. P-values 44 1/30/2024
  • 45. 1. Data: understand the nature of data (e.g. counts or measurements or proportions) 2. Assumptions: about normality of population distribution, equality of variance, independence of samples 3. Hypotheses: the H0 and HA should be explicitly stated 45 Cont… 1/30/2024
  • 46. Step 3. Hypotheses cont’d • Rules for stating statistical hypotheses a) What you hope to be able to conclude as a result of the test usually should be placed in the alternative hypothesis. b) The null hypothesis should contain a statement of equality, either =,≥, or ≤. c) The null hypothesis is the hypothesis that is tested. d) The null and alternative hypotheses are complementary. • That is, the two together exhaust all possibilities regarding the value that the hypothesized parameter can assume. 46 Cont… 1/30/2024
  • 47. Step 3. Hypotheses cont’d Examples: suppose that we want to answer the question; i. Can we conclude that a certain population mean is not 50? Our hypotheses are H0 : μ=50 HA : μ 50 ii. Can we conclude that the population mean is greater than 50? Our hypotheses are H0: μ ≤ 50 HA: μ >50 iii. Can we conclude that the population mean is less than 50? Our hypotheses are H0: μ ≥ 50 HA: μ<50  47 Cont… 1/30/2024
  • 48. 4. Test statistic: It is a value computed from the sample data that is used in making the decision about the rejection of the null hypothesis • Decide on the appropriate test statistic for the hypothesis (z, t, etc.) Based on the ✓sample size (n<30 or n>30), ✓ type of data (count i.e. qualitative or measurement or quantitative), ✓functional form of the distribution (normal or non normal), ✓known or unknown population variance, ✓number of means or proportions, etc. 48 Cont… 1/30/2024
  • 49. General formula for test statistic test statistic = observed statistic − hypothesized parameter standard error of the observed statistic 5. Select the level of significance (α): (α =0.05, 0.01, 0.001, etc…). If not given take 0.05 • The level of significance (α) is the probability of rejecting a true null hypothesis. 49 Cont… 1/30/2024
  • 50. Level of Significance, α • Is the probability of rejecting a true Ho • Defines unlikely values of sample statistic if Ho is true ✓Defines rejection region of the sampling distribution • The decision is made on the basis of the level of significance, designated by α. • More frequently used values of α are 0.01, 0.05 and 0.10. • α is selected by the researcher 50 Cont… 1/30/2024
  • 51. One tail and two tail tests • In a one tail test, the rejection region is at one end of the distribution or the other. • In a two tail test, the rejection region is split between the two tails. • Which one is used depends on the way the HA is stated. 51 Cont… 1/30/2024
  • 52. Level of Significance and the Rejection Region Example: • The average survival year after cancer diagnosis is less than 3 years. 52 Cont… 1/30/2024
  • 54. 6. Determine Critical value (ztab, ttab): • It is the value the test statistic must attain to be declared significant (i.e. label the rejection & "acceptance" regions) 7. Calculation of test statistic (zcalc, tcalc): • calculate the test statistic based on step 4 and compare it with the critical value 54 Cont… 1/30/2024
  • 55. 8. Statistical decision: statistical decision consists of rejecting or not rejecting the null hypothesis. • It is rejected if the computed value of the test statistic falls in the rejection area. ✓i.e. Reject Ho if, Zcal > Ztab OR tcal>ttab • It is not rejected if the computed value of the test statistic falls in the non-rejection area. ✓i.e. Accept or don't reject Ho if, Zcal < Ztab OR tcal< t tab 55 Cont… 1/30/2024
  • 56. Types of Errors in Hypothesis Tests • Whenever we reject or accept the Ho, we commit errors. • Two types of errors are committed. • Type I Error • Type II Error 56 1/30/2024
  • 57. Type I Error • The error committed when a true Ho is rejected • The probability of type I error is α • Called level of significance of the test Type II Error • The error committed when a false Ho is not rejected • The probability of Type II Error is  57 Cont… 1/30/2024
  • 58. Action (Conclusion) Reality Ho True Ho False Do not reject Ho Correct action Type II error (β) Reject Ho Type I error (α) Correct action 58 Cont… 1/30/2024
  • 59. 9. Conclusion: ✓if Ho is rejected, we conclude that HA is true. ✓If Ho is not rejected, we conclude that Ho may be true. 10. P-values: • The p-value is the probability of getting a value for the test statistic as large or larger than the observed value of the test statistic just by random chance. ✓Reject the null hypothesis if P≤α ✓Don't reject ("accept") the null hypothesis if P>α 59 Cont… 1/30/2024
  • 60. TESTING A HYPOTHESIS ABOUT THE MEAN OF A POPULATION: 60 1/30/2024
  • 61. Testing a hypothesis about the mean of a population: 1.Data: determine variable, sample size (n), sample mean( ) , population standard deviation or sample standard deviation (s) if it is unknown 2. Assumptions: We have two cases: • Case1: Population is normally or approximately normally distributed with known or unknown variance (sample size n may be small or large), • Case 2: Population is not normal with known or unknown variance (n is large i.e. n≥30). x 61 1/30/2024
  • 62. 3. Hypotheses: we have three cases ✓Case I: H0: μ=μ0 HA: μ μ0 • e.g. we want to test that the population mean is different from 50 ✓Case II: H0: μ ≤ μ0 HA: μ > μ0 • e.g. we want to test that the population mean is greater than 50 ✓Case III: H0: μ ≥ μ0 HA: μ < μ0 • e.g. we want to test that the population mean is less than 50   62 Cont… 1/30/2024
  • 63. 4.Test Statistic: • Case 1: population is normal or approximately normal σ2 is known σ2 is unknown ( n large or small) n large n small • Case2: If population is not normally distributed and n is large i)If σ2 is known ii) If σ2 is unknown n X Z  o - = n s X Z o -  = n s X T o -  = n s X Z o -  = n X Z  o - = 63 Cont… 1/30/2024
  • 64. 5.Decision Rule: i) If HA: μ μ0 ✓Reject H0 if Z > Z1-α/2 OR Z < - Z1-α/2 ✓Reject H0 if T > t1-α/2,n-1 OR T < -t1-α/2,n-1 ii) If HA: μ > μ0 ✓Reject H0 if Z > Z1-α Or ✓Reject H0 if T > t1-α,n-1 iii) If HA: μ < μ0 ✓Reject H0 if Z < - Z1-α Or ✓Reject H0 if T < - t1-α,n-1  64 Cont… 1/30/2024
  • 65. Note: ✓ Z1-α/2 , Z1-α , Zα are tabulated values obtained from Z table ✓ t1-α/2 , t1-α , tα are tabulated values obtained from t table with (n-1) degree of freedom (df) 65 Cont… 1/30/2024
  • 66. 6. Decision : • If we reject H0, we can conclude that HA is true. • If ,however ,we do not reject H0, we may conclude that H0 is may be true. 66 Cont… 1/30/2024
  • 67. An Alternative Decision Rule using the p - value • The p-value is defined as the smallest value of α for which the null hypothesis can be rejected. • If the p-value is less than or equal to α ,we reject the null hypothesis (p ≤ α if one tailed test or p ≤ α/2, if two tailed test ) • If the p-value is greater than α ,we do not reject the null hypothesis (p > α if one tailed test or p > α/2, if two tailed test ) 67 Cont… 1/30/2024
  • 68. Example • Researchers are interested in the mean age of a certain population. • A random sample of 10 individuals drawn from the population of interest has a mean of 27. • Assuming that the population is approximately normally distributed with variance 20, • Can we conclude that the population mean is different from 30 years? (α=0.05) . 68 1/30/2024
  • 69. Solution 1-Data: variable is age, n = 10, = 27, σ2 = 20, α = 0.05 2-Assumptions: the population is approximately normally distributed with variance 20 3-Hypotheses: • H0 : μ=30 • HA: μ 30 x  69 Cont… 1/30/2024
  • 70. 4- Distribution of Test Statistic: 5. Level of significance α = 0.05 6. Decision Rule • The alternative hypothesis is HA: μ 30 ✓reject H0 if Zcal > Ztab or Zcal < - Ztab ✓Generally when HA: μ μ0 Reject H0 if │Zcal│> Ztab n X Z  o - =   70 Cont… 1/30/2024
  • 71. 6. Critical value • Since the HA is two sided we divide α by 2 • Ztab = Z1-α/2 = Z1-0.05/2 = Z0.975 = 1.96 in right tail and - 1.96 in left tail 71 Cont… 1/30/2024
  • 72. 7. Calculation of test statistic • Zcal = 27-30/(√20/√10) = -2.12 8. Statistical Decision: • We reject H0 ,since -2.12 is in the rejection region i.e. │-2.12│> 1.96 9. Conclusion • We can conclude that the mean age (μ) is different from 30 years 10. p-value: p =0.0174 < 0.025, i.e. p ≤ α/2 therefore we reject H0 72 Cont… 1/30/2024
  • 73. Example • Among 157 African-American men ,the mean systolic blood pressure was 146 mm Hg with a standard deviation of 27. • We wish to know if on the basis of these data, • we may conclude that the mean systolic blood pressure for a population of African-American is greater than 140. Use α=0.01. 73 1/30/2024
  • 74. Solution 1. Data: Variable is systolic blood pressure, n=157, = 146, s = 27, α = 0.01. 2. Assumption: population is not normal, σ2 is unknown, n>30 3. Hypotheses: H0 : μ ≤ 140 HA : μ > 140 4. Test Statistic: = = • Zcal = 2.78 n s X Z o -  = 157 27 140 146 − 1548 . 2 6 x 74 Cont… 1/30/2024
  • 75. 5. Decision Rule: ✓we reject H0 if Zcal>Z1-α ✓Ztab = Z0.99= 2.33 (from z table) 6. Decision: We reject H0. • Hence we may conclude that the mean systolic blood pressure for a population of African-American is greater than 140 mm Hg. 75 Cont… 1/30/2024
  • 76. Example • A simple random sample of 17 patients with muscle injury were treated at a research center. • The variable of interest was number of days between injury and recovery. The number of days until recovery was normally distributed in the population. • Can we conclude that the mean number of days is not 15 days in the population represented by the sample data? (See the data below) 76 1/30/2024
  • 77. Table: number of days until recovery for subjects with muscle injury Subject Days Subject Days 1 14 11 28 2 9 12 24 3 18 13 24 4 26 14 2 5 12 15 3 6 0 16 14 7 10 17 9 8 4 9 8 10 21 77 1/30/2024
  • 78. Solution 1. Data: number of days n = 17, =13.294, S = 8.886, (calculate from the data) 2. Assumptions: n < 30, Simple random sampling, normally distributed, unknown population variance 3. Hypotheses: H0 : μ = 15 HA: μ ≠ 15 4. Test statistic: our test statistics is distributed as students t with 17-1=16 df . And given by x n s X T o -  = 78 1/30/2024
  • 79. 5. Level of significance: • let α=0.05 since we have a two-tailed test we put α/2 =0.025 in each tail of the distribution • Decision rule: Reject H0 if │tcal│> tn-1,1- α/2 6. Critical value: ttab = tn-1,1- α/2 = t16,0.975 = 2.1199 to right and -2.1199 to left 79 Cont… 1/30/2024
  • 80. 7. Calculation of test statistic: ttab = 𝑋−𝑢0 𝑠√𝑛 = 13.2941−15 8.886√17 = −1.7059 2.1553 = -0.791 8.Statistical decision: ✓don’t reject H0, since │-0.791│< 2.1199 9. Conclusion: based on this data the mean of population from which the sample came may be 15. 10. P-value: P(t ≤ -0.791) and P( t ≥ 0.791) > 0.1 which is greater than α, so don’t reject H0. 80 Cont… 1/30/2024
  • 81. Hypothesis Testing: A single population proportion 81 1/30/2024
  • 82. A single population proportion: • Testing hypothesis about population proportion (P) have the following steps: 1. Data: sample size (n), sample proportion( ), hypothesized population proportion (P0) 2. Assumptions :normal distribution , p̂ n a p = = sample in the element of no. Total istic charachtar some with sample in the element of no. ˆ 82 1/30/2024
  • 83. 3. Hypotheses: we have three cases • Case I: H0: P = P0 HA: P ≠ P0 • Case II: H0: P ≤ P0 HA: P > P0 • Case III: H0: P ≥ P0 HA: P < P0 4. Test Statistic: ✓Where H0 is true, is distributed approximately as the standard normal n q p p p Z 0 0 0 ˆ − = 83 Cont… 1/30/2024
  • 84. 5. Decision Rule: i) If HA: P ≠ P0 ✓Reject H 0 if Z > Zα/2 or Z < - Zα/2 ii) If HA: P > P0 ✓Reject H0 if Z > Zα iii) If HA: P < P0 ✓Reject H0 if Z < - Zα Note: Zα/2 , Zα , Zα are tabulated values obtained from table 6. Conclusion: reject or fail to reject H0 84 Cont… 1/30/2024
  • 85. Example • A study on 301 Hispanic women in San Antonio, Texas investigated percentage of subjects with impaired fasting glucose (IFG). In the study, 24 women were classified in the IFG stage. The population estimates for IFG among Hispanic women in Texas as 6.3%. Is there sufficient evidence to indicate that the population of Hispanic women in San Antonio has a prevalence of IFG higher than 6.3%. 85 1/30/2024
  • 86. Solution 1. Data: n = 301, p0 = 6.3/100 = 0.063 , a =24, ✓q0 = 1- p0 = 1- 0.063 = 0.937, α = 0.05 2. Assumptions : is approximately normally distributed 3. Hypotheses: we have three cases H0: P ≤ 0.063 HA: P > 0.063 p̂ 08 . 0 301 24 ˆ = = = n a p 86 1/30/2024
  • 87. 4. Test Statistic 5. Decision Rule: α=0.05 ✓Reject H0 if Z > Z1-α ✓Where Z1-α = Z1-0.05 = Z0.95 = 1.645 6. Conclusion: Fail to reject H0 Since ✓Z =1.21 > Z1-α= 1.645 Or , P-value = 0.1131, ✓ Fail to reject H0 → P > α 21 . 1 301 ) 0.937 ( 063 . 0 063 . 0 08 . 0 ˆ 0 0 0 = − = − = n q p p p Z 87 Cont… 1/30/2024
  • 88. Sample Size For Crossectional Study 88 1/30/2024
  • 89. • An essential part of planning any study is to decide how many people need to be studied ? Sample Size • The number of study subjects selected to represent a given study population. • Important to make inferences based on the findings from the sample. • Should be sufficient to represent the characteristics of interest of the study population. 89 1/30/2024
  • 90. • In estimating a certain characteristic of a population, sample size calculations are important to ensure that estimates are obtained with required precision or confidence. • The accuracy of the envisaged results determine the size of the sample. 90 Cont… 1/30/2024
  • 91. • Sample size determination depends on the: – objective of the study; – design of the study; – plan for statistical analysis; – accuracy of the measurements to be made; –degree of precision required for generalization; – degree of confidence with which to conclude. 91 Cont… 1/30/2024
  • 92. • Common questions: – “How many subjects should I study?” – Too small sample = Waste of time and resources = Results have no practical use – Too large sample = Waste of resources = Data quality compromised 92 Cont… 1/30/2024
  • 93. • The feasible sample size is also determined by the availability of resources: – Human resource – Time – Transport – Available facility, and – Money 93 Cont… 1/30/2024
  • 95. Sample Size: Single Sample • The aim is to have a large enough sample with which to estimate a population mean or proportion within a narrow interval with high reliability. • Concerned with the precision of the estimate (“narrowness of the CI”). estimate ± d units 95 1/30/2024
  • 96. Sample Size For Single Sample Includes: A. Sample size for estimating a single population mean. B. Sample size to estimate a single population proportion. • The minimum sample size required, for a very large population (N10,000) 96 1/30/2024
  • 97. Sample size for Estimating single population mean • Suppose we want to estimate the average daily caloric intake of people in a community. The daily caloric intake is assumed to have a normal distribution with mean µ and standard deviation (σ). •The sample measure used to estimate µ is the sample mean. The sampling distribution of the sample mean is also normal, with the same mean, µ and standard deviation, σ ⁄√n (the standard error of the mean). 97 1/30/2024
  • 98. Sample size for estimating a single population mean • AIM: Estimate µ • WANT: Estimate ( ) ± d units where d = Margin of error = = Absolute precision = Half of the width (w) of CI Steps: 1. Specify d (or w = 2d) 2. Use known σ2 or estimate using s2 98 1/30/2024
  • 99. 3. 99 Where d = e in some text books Standard error of the estimator of the parameter of interest 1/30/2024
  • 100. Example: 1. Find the minimum sample size needed to estimate the drop in heart rate (µ) for a new study using a higher dose of propranolol than the standard one. We require that the two-sided 95% CI for µ be no wider than 5 beats per minute and the sample sd for change in heart rate equals 10 beats per minute. n = (1.96) 2 10 2 /(2.5) 2 = 62 patients 100 1/30/2024
  • 101. 101 2. Suppose that for a certain group of cancer patients, we are interested in estimating the mean age at diagnosis. We would like a 95% CI of 5 years wide. If the population SD is 12 years, how large should our sample be? 1/30/2024
  • 102. 102 • Suppose d = 1 • Then the sample size increases Cont… 1/30/2024
  • 103. 3. A hospital director wishes to estimate the mean weight of babies born in the hospital. How large a sample of birth records should be taken if she/he wants a 95% CI of 0.5 wide? Assume that a reasonable estimate of  is 2. Ans: 246 birth records. 103 1/30/2024
  • 104. But the population 2 is most of the time unknown As a result, it has to be estimated from: • Pilot or preliminary study or survey: – Select a pilot sample and estimate 2 with the sample variance, s2 • Previous or similar studies 104 1/30/2024
  • 105. Sample size to estimate a single population proportion • Aim: Estimate p • Want: Estimate ± d units where d = Z•SE (95% CI of width=2d) Steps: 1. Specify d (or w = 2d) 2. Use estimated p (use p=0.5 if no information) 105 1/30/2024
  • 106. 3. Solve for n 106 1/30/2024
  • 107. 107 1. Suppose that you are interested to know the proportion of infants who breastfed >18 months of age in a rural area. Suppose that in a similar area, the proportion (p) of breastfed infants was found to be 0.20. What sample size is required to estimate the true proportion within ±3% points with 95% confidence. Let p=0.20, d=0.03, α=5% Example 1/30/2024
  • 108. 108 • Suppose there is no prior information about the proportion (p) who breastfeed • Assume p = q = 0.5 (most conservative) • Then the required sample size increases 1/30/2024
  • 109. 109 • An estimate of p is not always available. • However, the formula may also be used for sample size calculation based on various assumptions for the values of p. • P = 0.1 → n = (1.96)2(0.1)(0.9)/(0.05)2 = 138 P = 0.2 → n = (1.96)2(0.2)(0.8)/(0.05)2 = 246 P = 0.3 → n = (1.96)2(0.3)(0.7)/(0.05)2 = 323 P = 0.5 → n = (1.96)2(0.5)(0.5)/(0.05)2 = 384 P = 0.7 → n = (1.96)2(0.7)(0.3)/(0.05)2 = 323 P = 0.8 → n = (1.96)2(0.8)(0.2)/(0.05)2 = 246 1/30/2024
  • 110. ➢For a fixed absolute precision (d), the required sample size increases as P increases form 0 to 0.5, and then decreases in the same way as the prevalence approaches 1. 110 1/30/2024
  • 111. 2. A survey is planned to determine what proportion of the medical students have regularly chewed khat. If no estimate of p is available and a pilot sample cannot be drawn, what sample size would be required if a 95% confidence is desired, and d=0.04 is to be used. Ans: 600 students 111 1/30/2024
  • 112. 3. Suppose an estimate is desired of the average retail price of twenty tablets of a commonly used tranquilizer. A random sample of retail pharmacies is to be selected. The estimate is required to be within 10 cents of the true average price with 95% confidence. Based on a small pilot study, the standard deviation in price, σ, can be estimated as 85 cents. How many pharmacies should be randomly selected?  Solution  Using the above formula, it follows that  n = [(1.960)2(0.85)2]/{0.1 0) 2 = 277.56.  As a result, a sample of 278 pharmacies should be taken 112 1/30/2024
  • 113. Points for Consideration 1. Sample size estimates might need to be adjusted to compensate for non-response rate, patient dropout or loss to follow-up, lack of compliance, etc. 2. If sampling is from a finite population of size N (<10,000), then: where n0 is the sample from an infinite population. When N is large in comparison to n, (i.e., n/N ≤ 0.05), the finite population correction may be ignored. 3. Design effect for complex cluster sampling. Common values: multiply n by 2, 3, …5. n = n 1 + n N 0 0       113 1/30/2024