2. Learning Objectives
• At the end of the class, the learners
will be able to:
• Define estimation
• Explain the types of estimation
• Apply the concepts of estimation
2/3/2023 Dr. Getabalew 2
3. • The process of drawing conclusions about an entire
population based on the data in a sample is known as
statistical inference.
• Estimation is the process of determining a likely value
for a variable in the survey population, based on
information collected from the sample.
• Estimation is the use of sample statistics to estimate
population parameters.
Estimation
3
Dr. Getabalew
2/3/2023
4. Example
• A sample survey revealed:
– Proportion of smokers among a certain group of
population aged 15 to 24.
– Mean of SBP among sampled population
– Prevalence of HIV-positive among people involved
in the study
The next question is what can we predict about the
characteristics of the population from which the
sample was drawn
4
Dr. Getabalew
2/3/2023
5. Point and Interval Estimates
A point estimate is a single value used as an estimate of a population
parameter
Interval estimate is a range or interval of numbers believed to include
unknown population parameter with a certain degree of assurance
Point estimate is always within the interval estimate
Point Estimate
Lower
Confidence
Limit
Upper
Confidence
Limit
Interval estimate
2/3/2023 Dr. Getabalew 5
6. Estimation Process
Mean, , is
Population
unknown
Random
X = 50
S
a m
p
l
e
Interval estimate
I am 95%
confident that
is between 40 &
60.
Point estimate
Mean
2/3/2023 Dr. Getabalew 6
7. 1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
Sample Statistics are Estimators of Population Parameters
Sample mean,
Sample variance, S2
Sample proportion,
Sample Odds Ratio, OŔ
Sample Relative Risk, RŔ
Sample correlation coefficient, r
µ
2
P or π
OR
RR
ρ
7
Dr. Getabalew
2/3/2023
8. a) Unbiasedness: An estimator is said to be
unbiased if its expected value is equal to the
population parameter it estimates.
For example: when E(X ) ,the sample mean is an
unbiased estimator of the population mean
Unbiasedness is an average or long-run property.
The mean of any single sample will probably not
equal to the population mean, but the average of the
means of repeated independent samples from a
population will equal to the population mean.
2/3/2023 Dr. Getabalew 8
9. b) Minimum variance: (Efficiency)
An estimate which has a minimum standard error
is a good estimator
For symmetrical distribution the mean has a mini
mum standard error and
If the distribution is skewed the median has a mi
nimum standard error
2/3/2023 Dr. Getabalew 9
10. C) Consistency: An estimator is said to be consistent if its
probability of being close to the parameter it estimates
increases as the sample size increases
n = 100
n = 10
Consistency
2/3/2023 Dr. Getabalew 10
11. 2. Interval Estimation
Confidence Intervals
Give a plausible range of values of the estimate likely
to include the “true” (population) value with a given
confidence level.
An interval estimate provides more information about
a population characteristic than does a point estimate
Such interval estimates are called confidence
intervals.
11
Dr. Getabalew
2/3/2023
12. General Formula:
The general formula for all CIs is:
point estimate (measure of how confident we
want to be) (standard error)
The value of the statistic in my sample
(eg., mean, odds ratio, etc.)
From a Z table
Standard error of the statistic.
Lower limit = Point Estimate - (Critical Value) x (Standard Error)
Upper limit = Point Estimate + (Critical Value) x (Standard Error)
12
Dr. Getabalew
2/3/2023
13. A CI in general:
Confidence in which the interval will contain the unknown
population parameter
– Based on observation from a sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
Also written (1 - α) = .95
A wide interval suggests imprecision of estimation.
Narrow CI widths reflects large sample size or low
variability or both.
13
Dr. Getabalew
2/3/2023
14. Definition: 95% CI
When sampling is from a normally distributed population
with known standard deviation, we are 100 (1-α) [e.g.,
95%] confident that the single computed interval contains
the unknown population parameter.
14
Dr. Getabalew
2/3/2023
17. 1. CI for a Single Population Mean
A. Known variance (large sample size, normally
distributed)
Assumptions
Population standard deviation ( ) is known
Population is normally distributed
If population is not normal, use large
sample
17
Dr. Getabalew
2/3/2023
18. • There are 3 elements to a CI:
1. Point estimate
2. SE of the point estimate
3. Confidence level;
A 100(1- )% C.I. for is:
is to be chosen by the researcher, most common values of are
0.05, 0.01 and 0.1. 18
Dr. Getabalew
2/3/2023
19. 3. Commonly used CLs are 90%, 95%, and
99%
19
Dr. Getabalew
2/3/2023
20. Example:
1. Waiting times (in hours) at a particular hospital
are believed to be approximately normally
distributed with a variance of 2.25 hr.
a. A sample of 20 outpatients revealed a mean waiting
time of 1.52 hours. Construct the 95% CI for the
estimate of the population mean.
b. Suppose that the mean of 1.52 hours had resulted
from a sample of 32 patients. Find the 95% CI.
c. What effect does larger sample size have on the CI?
20
Dr. Getabalew
2/3/2023
21. a.
)
17
.
2
,
87
(.
65
.
52
.
1
)
33
(.
96
.
1
52
.
1
20
25
.
2
96
.
1
52
.
1
• We are 95% confident that the true mean waiting time is between 0.87
and 2.17 hrs.
• An incorrect interpretation is that there is 95% probability that this
interval contains the true population mean.
b.
)
.05
2
,
99
(.
53
.
52
.
1
)
27
(.
96
.
1
52
.
1
32
25
.
2
96
.
1
52
.
1
c. The larger the sample size makes the CI narrower (more
precision).
21
Dr. Getabalew
2/3/2023
23. Student’s t Distribution
• Bell Shaped
• Symmetric about zero (the mean)
• Flatter than the Normal (0,1). This means
– The variability of a t is greater than that of a Z
that is normal(0,1)
– Thus, there is more area under the tails and less
at center
– Because variability is greater, resulting
confidence intervals will be wider. 23
Dr. Getabalew
2/3/2023
24. • Note: t approaches z as n increases
24
Dr. Getabalew
2/3/2023
28. 2. CI for the difference between
population means (normally distributed)
A. Known variances (2 independent samples)
• When 1 and 2 are known and both populations are
normal or both sample sizes are at least 30, the test
statistic is a z-value…
28
Dr. Getabalew
2/3/2023
29. Examples
• We are interested in the similarity of the two groups.
1) Is mean blood pressure the same for males and
females?
2) Is body mass index (BMI) similar for breast cancer
cases versus non-cancer patients?
3) Is length of stay (LOS) for patients in hospital “A” the
same as that for similar patients in hospital “B”?
29
Dr. Getabalew
2/3/2023
30. Example
• Researchers are interested in the difference between
serum uric acid levels in patients with and without
Down’s syndrome.
• Patients without Down’s syndrome
– n=12, sample mean=4.5 mg/100ml, 2=1.0
• Patients with Down’s syndrome
– n=15, sample mean=3.4 mg/100ml, 2=1.5
• Calculate the 95% CI.
• We are 95% confident that the true difference
between the two population means is between 0.26
and 1.94. 30
Dr. Getabalew
2/3/2023
35. 3. CIs for single population proportion, p
• Is based on three elements of CI.
– Point estimate
– SE of point estimate
– Confidence interval 35
Dr. Getabalew
2/3/2023
37. Example 1
A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.
Interpretation: we are 95% confidence that the true percentage of left
handers in the population is between 16.51%, 33.49%
37
Dr. Getabalew
2/3/2023
38. Example 2
• It was found that 28.1% of 153 cervical-cancer cases
had never had a Pap smear prior to the time of case’s
diagnosis. Calculate a 95% CI for the percentage of
cervical-cancer cases who never had a Pap test.
•
38
Dr. Getabalew
2/3/2023
39. 4. Two Population Proportions
• We are often interested in comparing proportions
from 2 populations:
• Is the incidence of disease A the same in two
populations?
• Patients are treated with either drug D, or with
placebo. Is the proportion “improved” the same in
both groups?
39
Dr. Getabalew
2/3/2023
40. Confidence Interval for
Two Population Proportions
• SE of the difference =
• The confidence interval for p1 – p2 is:
40
Dr. Getabalew
2/3/2023
41. Example
• In a clinical trial for a new drug to treat hypertension,
N1 = 50 patients were randomly assigned to receive
the new drug, and N2 = 50 patients to receive a
placebo. 34 of the patients receiving the drug showed
improvement, while 15 of those receiving placebo
showed improvement.
• Compute a 95% CI estimate for the difference
between proportions improved.
41
Dr. Getabalew
2/3/2023
42. • p1 = 34/50 = 0.68, p2 = 15/50 = 0.30
• The point estimate for the difference is:
= [0.68−0.30]=0.38
• SE of the difference =
• 95% CI
– Lower = ( point estimate ) - (Zα/2) (SE)
= 0.38 – (1.96)(0.0925) = 0.20
– Upper = ( point estimate ) + (Zα/2) (SE)
= 0.38 + (1.96)(0.0925) = 0.56
• 95% CI = (0.20, 0.56)
42
Dr. Getabalew
2/3/2023
44. • One way of statistical inference
• Is a claim (assumption) about a population parameter
• Hypotheses are formulated, experiments are performed,
and results are evaluated for their consistency (non-
consistency) with a hypothesis.
• The purpose of HT is to aid the clinician, researcher or
administrator in reaching a decision (conclusion)
concerning a population by examining a sample from that
population.
2/3/2023 Dr. Getabalew 44
45. Types of Hypothesis
1. The Null Hypothesis, H0
Is a statement claiming that there is no difference
between the hypothesized value and the population
value.
(The effect of interest is zero = no difference)
States the assumption (hypothesis) to be tested
H0 is a statement of agreement (or no difference), is
always about a population parameter, not about a
sample statistic
2/3/2023 Dr. Getabalew 45
46. Always contains “=” , “ ≤” or “≥ ” sign
May or may not be rejected
Begin with the assumption that the Ho is true
– Similar to the notion of innocent until proven
guilty
2/3/2023 Dr. Getabalew 46
47. 2. The Alternative Hypothesis, HA
Is a statement of what we will believe is true if our
sample data causes us to reject Ho.
Is generally the hypothesis that is believed (or needs
to be supported) by the researcher
Is a statement that disagrees (opposes) with Ho
(The effect of interest is not zero)
Never contains “=” , “ ≤” or “≥ ” sign
May or may not be accepted
2/3/2023 Dr. Getabalew 47
48. Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses clearly
• Specify HO and HA
H0: = 0 H0: ≤ 0 H0: ≥ 0
H1: 0 H1: > 0 H1: < 0
two-tailed one-tailed one-tailed
• Can we conclude that the proportion of patients with leukemia
who survive more than six years is not 60%?
Ho: ? HA: ?
• Can we conclude that a certain population mean is greater than
50?
Ho: ? HA: ?
2/3/2023 Dr. Getabalew 48
49. 2. State the assumptions necessary for computing probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown
3. Select a sample and collect data
• Categorical, continuous
4. Decide on the appropriate test statistic for the hypothesis.
E.g., One population
OR
2/3/2023 Dr. Getabalew 49
50. 5. Specify the desired level of significance for the
statistical test ( =0.05, 0.01, etc.)
6. Determine the critical value.
– A value the test statistic must attain to be
declared significant.
-1.96 1.96 1.645 -1.645
2/3/2023 Dr. Getabalew 50
51. 7. Obtain sample evidence and compute the test
statistic
8. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true
(or accepted).
• If Ho is not rejected, we conclude that Ho may
be true.
2/3/2023 Dr. Getabalew 51
52. Rejection and Non-Rejection Regions
• The values of the test statistic assume the points on the
horizontal axis of the normal distribution and are
divided into two groups:
• Rejection region, and
• Non-rejection region.
2/3/2023 Dr. Getabalew 52
53. Example: Two-sided test at α 5%
Rejection region Non-rejection region Rejection region
= 0.025 = 0.025
0.95
1.96
-1.96
2/3/2023 Dr. Getabalew 53
54. Statistical Decision
Reject Ho if the value of the test statistic that we
compute from our sample is one of the values in the
rejection region
Don’t reject Ho if the computed value of the test
statistic is one of the values in the non-rejection
region.
2/3/2023 Dr. Getabalew 54
55. Level of Significance, α
Is the probability of rejecting a true Ho
For example, a significance level of 0.05 indicates a
5% risk of concluding that a difference exists when
there is no actual difference.
Alpha levels are controlled by the researcher and
are related to confidence level.
An alpha level obtained by subtracting the
confidence level from 100%
2/3/2023 Dr. Getabalew 55
57. Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α
P-value is the probability of obtaining a test statistic
as extreme as or more extreme than the actual test
statistic obtained if the Ho is true
Indicates the probability of having enough
evidence to reject or not to reject the null
hypothesis
The larger the test statistic, the smaller is the P-value.
OR, the smaller the P-value the stronger the evidence
against the Ho.
2/3/2023 Dr. Getabalew 57
58. 1. Hypothesis Testing of a Single Mean
(Normally Distributed)
2/3/2023 Dr. Getabalew 58
60. Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude
that the mean age of the population is not 30? The
variance is known to be 20. Let α = .05.
• Answer, "Yes we can, if we can reject the Ho that it is
30."
A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population
variance is known
2/3/2023 Dr. Getabalew 60
61. C. Hypotheses
Ho: µ = 30
HA: µ ≠ 30
D. Test statistic
As the population variance is known, we use Z
as the test statistic.
2/3/2023 Dr. Getabalew 61
62. E. Decision Rule
• Reject Ho if the Z value falls in the rejection region.
• Don’t reject Ho if the Z value falls in the non-rejection region.
• Because of the structure of Ho it is a two tail test. Therefore,
reject Ho if Z ≤ -1.96 or Z ≥ 1.96.
2/3/2023 Dr. Getabalew 62
63. F. Calculation of test statistic
G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection
region. The value is significant at 5% α.
H. Conclusion
We conclude that µ is not 30. P-value = 0.0340
A Z value of -2.12 corresponds to an area of 0.0170. Since there
are two parts to the rejection region in a two tail test, the P-value is
twice this which is .0340.
2/3/2023 Dr. Getabalew 63
64. Example: One -Tailed Test
• A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that
the mean age of the population is less than 30? The
variance is known to be 20. Let α = 0.05.
• Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
• Hypotheses
Ho: µ ?, HA: µ ?
2/3/2023 Dr. Getabalew 64
65. • Test statistic
• Rejection Region
• With α = 0.05 and the inequality, we have the entire rejection region
at the left. The critical value will be Z = -1.64. Reject Ho if Z < -
1.645.
=
Lower tail test
2/3/2023 Dr. Getabalew 65
66. • Statistical decision
– We reject the Ho because -2.12 < -1.645.
• Conclusion
– We conclude that µ < 30.
– p = .0170 this time because it is only a one tail test and not a two
tail test.
2/3/2023 Dr. Getabalew 66
67. 1.2 Unknown Variance
• In most practical applications the standard deviation of
the underlying population is not known
• In this case, can be estimated by the sample standard
deviation s.
• If the underlying population is normally distributed,
then the test statistic is:
2/3/2023 Dr. Getabalew 67
68. Example: Two-Tailed Test
• A simple random sample of 14 people from a certain population
gives a sample mean body mass index (BMI) of 30.5 and sd of
10.64. Can we conclude that the BMI is not 35 at α 5%?
• Ho: µ = 35, HA: µ ≠35
• Test statistic
• If the assumptions are correct and Ho is true, the test statistic
follows Student's t distribution with 13 degrees of freedom.
2/3/2023 Dr. Getabalew 68
69. • Decision rule
– We have a two tailed test. With α = 0.05 it means that each tail is
0.025. The critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604.
• Do not reject Ho because -1.58 is not in the rejection
region. Based on the data of the sample, it is possible
that µ = 35. P-value = 0.1375
2/3/2023 Dr. Getabalew 69
71. 2.1 Known Variances
(Independent Samples)
• When two independent samples are drawn
from a normally distributed population with
known variance, the test statistic for testing
the Ho of equal population means is:
2/3/2023 Dr. Getabalew 71
72. Example:
• Researchers wish to know a difference in mean serum
uric acid (SUA) levels between normal individuals and
individuals with Down’s syndrome. The means SUA
levels on 12 individuals with Down’s syndrome and 15
normal individuals are 4.5 and 3.4 mg/100 ml,
respectively. with variances. ( 2=1, 2=1.5, respectively).
Is there a difference between the means of both groups
at α 5%?
• Hypotheses:
Ho: µ1- µ2 = 0 or Ho: µ1 = µ2
HA: µ1 - µ2 ≠ 0 or HA: µ1 ≠ µ2
2/3/2023 Dr. Getabalew 72
73. • With α = 0.05, the critical values of Z are -1.96 and
+1.96. We reject Ho if Z < -1.96 or Z > +1.96.
• Reject Ho because 2.57 > 1.96.
• From these data, it can be concluded that the
population means are not equal. A 95% CI would
give the same conclusion. P-value = 0.01.
2/3/2023 Dr. Getabalew 73
74. 2.2 Unknown Variances
i. Equal variances (Independent samples)
• With equal population variances, we can
obtain a pooled value from the sample
variances.
• The test statistic for µ1 - µ2 is:
• Where tα/2 has (n1 + n2 – 2) df., and
2/3/2023 Dr. Getabalew 74
75. Example:
• We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have
greater lung damage than do non-smokers.
• Calculation of Pooled
Variance
2/3/2023 Dr. Getabalew 75
76. • Hypotheses:
Ho: µ1 ≤ µ2 = 0, HA: µ1 > µ2
• With α = 0.05 and df = 23, the critical value of t is 1.7139. We
reject Ho if t > 1.7139.
• Test statistic
• Reject Ho because 2.6563 > 1.7139. On
the basis of the data, we conclude that µ1 >
µ2.
2/3/2023 Dr. Getabalew 76
77. 3. Hypothesis Testing about a Single
Population Proportion
(Normal Approximation to Binomial Distribution)
• Involves categorical values
• Two possible outcomes
– “Success” (possesses a certain
characteristic)
– “Failure” (does not possesses that
characteristic)
• Fraction or proportion of population in the
“success” category is denoted by p
2/3/2023 Dr. Getabalew 77
78. Example
• In the general population of 0 to 4-year-olds, the annual
incidence of asthma is 1.4%. If 10 cases of asthma are observed
over a single year in a sample of 500 children whose mothers
smoke, can we conclude that this is different from the
underlying probability of p0 = 0.014? α = 5%
H0 : p = 0.014
HA: p ≠ 0.014
2/3/2023 Dr. Getabalew 78
79. • The test statistic is given by:
2/3/2023 Dr. Getabalew 79
80. • The critical value of Zα/2 at α=5% is ±1.96.
• Don’t reject Ho since Z (=1.14) in the non-rejection
region between ±1.96.
• P-value = 0.2548
• We do not have sufficient evidence to conclude that
the probability of developing asthma for children
whose mothers smoke in the home is different from
the probability in the general population
2/3/2023 Dr. Getabalew 80
81. 4. Hypothesis Tests about the Difference
Between
Two Population Proportions
2/3/2023 Dr. Getabalew 81
82. Where X1 = the observed number of events in the first sample
and X2 = the observed number of events in the second sample
2/3/2023 Dr. Getabalew 82
84. Example
• A study was conducted to investigate the
possible cause of gastroenteritis outbreak
following a lunch served in a high school
cafeteria. Among the 225 students who ate the
sandwiches, 109 became ill. While, among the
38 students who did not eat the sandwiches, 4
became ill. Is there a significant difference
between the two groups at α =5%.
• We wish to test
Ho: p1 = p2 against the alternative
HA: p1 ≠ p2
2/3/2023 Dr. Getabalew 84
86. • Assume that the sample sizes are large
enough, and the normal approximation to
the binomial distribution is valid.
• If the Ho is true, then p1 = p2 = p
2/3/2023 Dr. Getabalew 86
87. The area under the standard normal curve to the
right of 4.36 is less than 0.0001. Therefore, p <
0.0002. We reject H0 at the 0.05 level.
The proportion of students who became ill
differs in the two groups; those who ate the
prepared sandwiches were more likely to
develop gastroenteritis.
2/3/2023 Dr. Getabalew 87
88. Types of Errors in Hypothesis
Tests
• Whenever we reject or accept the Ho, we
commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error
2/3/2023 Dr. Getabalew 88
89. Type I Error
• The error committed when a true Ho is rejected
• Considered a serious type of error
• The probability of a type I error is the probability of
rejecting the Ho when it is true
• The probability of type I error is α
• Called level of significance of the test
• Set by researcher in advance
2/3/2023 Dr. Getabalew 89
90. Type II Error
• The error committed when a false Ho is not rejected
• The probability of Type II Error is
Power
• The probability of rejecting the Ho when it is false.
Power = 1 – β = 1- probability of type II error
• We would like to maintain low probability of a
Type I error (α) and low probability of a Type II
error (β) [high power = 1 - β].
2/3/2023 Dr. Getabalew 90
91. Action
(Conclusion)
Reality
Ho True Ho False
Do not
reject Ho
Correct action
(Prob. = 1-α)
Type II error (β)
(Prob. = β= 1-Power)
Reject Ho Type I error (α)
(Prob. = α = Sign. level)
Correct action
(Prob. = Power = 1-β)
2/3/2023 Dr. Getabalew 91