Estimation and hypothesis test lecture.pdf

Estimation
Getabalew E (MPH, Ph.D)
1
Dr. Getabalew
2/3/2023

Learning Objectives
• At the end of the class, the learners
will be able to:
• Define estimation
• Explain the types of estimation
• Apply the concepts of estimation
2/3/2023 Dr. Getabalew 2

• The process of drawing conclusions about an entire
population based on the data in a sample is known as
statistical inference.
• Estimation is the process of determining a likely value
for a variable in the survey population, based on
information collected from the sample.
• Estimation is the use of sample statistics to estimate
population parameters.
Estimation
3
Dr. Getabalew
2/3/2023

Example
• A sample survey revealed:
– Proportion of smokers among a certain group of
population aged 15 to 24.
– Mean of SBP among sampled population
– Prevalence of HIV-positive among people involved
in the study
The next question is what can we predict about the
characteristics of the population from which the
sample was drawn
4
Dr. Getabalew
2/3/2023

Point and Interval Estimates
A point estimate is a single value used as an estimate of a population
parameter
Interval estimate is a range or interval of numbers believed to include
unknown population parameter with a certain degree of assurance
Point estimate is always within the interval estimate
Point Estimate
Lower
Confidence
Limit
Upper
Confidence
Limit
Interval estimate

Estimation Process
Mean, , is
Population
unknown
Random
X = 50
S
a m
p
l
e
Interval estimate
I am 95%
confident that
is between 40 &
60.
Point estimate
Mean

1. Point Estimate
• A single numerical value used to estimate the
corresponding population parameter.
Sample Statistics are Estimators of Population Parameters
Sample mean,
Sample variance, S2
Sample proportion,
Sample Odds Ratio, OŔ
Sample Relative Risk, RŔ
Sample correlation coefficient, r
µ
2
P or π
OR
RR
ρ
7
Dr. Getabalew
2/3/2023

a) Unbiasedness: An estimator is said to be
unbiased if its expected value is equal to the
population parameter it estimates.
For example: when E(X ) ,the sample mean is an
unbiased estimator of the population mean
Unbiasedness is an average or long-run property.
The mean of any single sample will probably not
equal to the population mean, but the average of the
means of repeated independent samples from a
population will equal to the population mean.

b) Minimum variance: (Efficiency)
An estimate which has a minimum standard error
is a good estimator
For symmetrical distribution the mean has a mini
mum standard error and
If the distribution is skewed the median has a mi
nimum standard error

C) Consistency: An estimator is said to be consistent if its
probability of being close to the parameter it estimates
increases as the sample size increases
n = 100
n = 10
Consistency

2. Interval Estimation
Confidence Intervals
Give a plausible range of values of the estimate likely
to include the “true” (population) value with a given
confidence level.
An interval estimate provides more information about
a population characteristic than does a point estimate
Such interval estimates are called confidence
intervals.
11
Dr. Getabalew
2/3/2023

General Formula:
The general formula for all CIs is:
point estimate (measure of how confident we
want to be) (standard error)
The value of the statistic in my sample
(eg., mean, odds ratio, etc.)
From a Z table
Standard error of the statistic.
Lower limit = Point Estimate - (Critical Value) x (Standard Error)
Upper limit = Point Estimate + (Critical Value) x (Standard Error)
12
Dr. Getabalew
2/3/2023

A CI in general:
Confidence in which the interval will contain the unknown
population parameter
– Based on observation from a sample
– Gives information about closeness to unknown
population parameters
– Stated in terms of level of confidence
• Never 100% sure
Also written (1 - α) = .95
A wide interval suggests imprecision of estimation.
Narrow CI widths reflects large sample size or low
variability or both.
13
Dr. Getabalew
2/3/2023

Definition: 95% CI
When sampling is from a normally distributed population
with known standard deviation, we are 100 (1-α) [e.g.,
95%] confident that the single computed interval contains
the unknown population parameter.
14
Dr. Getabalew
2/3/2023

1. CI for a Single Population Mean
A. Known variance (large sample size, normally
distributed)
Assumptions
Population standard deviation ( ) is known
Population is normally distributed
If population is not normal, use large
sample
17
Dr. Getabalew
2/3/2023

• There are 3 elements to a CI:
1. Point estimate
2. SE of the point estimate
3. Confidence level;
A 100(1- )% C.I. for is:
is to be chosen by the researcher, most common values of are
0.05, 0.01 and 0.1. 18
Dr. Getabalew
2/3/2023

3. Commonly used CLs are 90%, 95%, and
99%
19
Dr. Getabalew
2/3/2023

Example:
1. Waiting times (in hours) at a particular hospital
are believed to be approximately normally
distributed with a variance of 2.25 hr.
a. A sample of 20 outpatients revealed a mean waiting
time of 1.52 hours. Construct the 95% CI for the
estimate of the population mean.
b. Suppose that the mean of 1.52 hours had resulted
from a sample of 32 patients. Find the 95% CI.
c. What effect does larger sample size have on the CI?
20
Dr. Getabalew
2/3/2023

a.
)
17
.
2
,
87
(.
65
.
52
.
1
)
33
(.
96
.
1
52
.
1
20
25
.
2
96
.
1
52
.
1






• We are 95% confident that the true mean waiting time is between 0.87
and 2.17 hrs.
• An incorrect interpretation is that there is 95% probability that this
interval contains the true population mean.
b.
)
.05
2
,
99
(.
53
.
52
.
1
)
27
(.
96
.
1
52
.
1
32
25
.
2
96
.
1
52
.
1






c. The larger the sample size makes the CI narrower (more
precision).
21
Dr. Getabalew
2/3/2023

Student’s t Distribution
• Bell Shaped
• Symmetric about zero (the mean)
• Flatter than the Normal (0,1). This means
– The variability of a t is greater than that of a Z
that is normal(0,1)
– Thus, there is more area under the tails and less
at center
– Because variability is greater, resulting
confidence intervals will be wider. 23
Dr. Getabalew
2/3/2023

• Note: t approaches z as n increases
24
Dr. Getabalew
2/3/2023

Student’s t Table
25
Dr. Getabalew
2/3/2023

Example
• Standard error =
• t-value at 90% CL at 19 df =1.729
26
Dr. Getabalew
2/3/2023

2. CI for the difference between
population means (normally distributed)
A. Known variances (2 independent samples)
• When 1 and 2 are known and both populations are
normal or both sample sizes are at least 30, the test
statistic is a z-value…
28
Dr. Getabalew
2/3/2023

Examples
• We are interested in the similarity of the two groups.
1) Is mean blood pressure the same for males and
females?
2) Is body mass index (BMI) similar for breast cancer
cases versus non-cancer patients?
3) Is length of stay (LOS) for patients in hospital “A” the
same as that for similar patients in hospital “B”?
29
Dr. Getabalew
2/3/2023

Example
• Researchers are interested in the difference between
serum uric acid levels in patients with and without
Down’s syndrome.
• Patients without Down’s syndrome
– n=12, sample mean=4.5 mg/100ml, 2=1.0
• Patients with Down’s syndrome
– n=15, sample mean=3.4 mg/100ml, 2=1.5
• Calculate the 95% CI.
• We are 95% confident that the true difference
between the two population means is between 0.26
and 1.94. 30
Dr. Getabalew
2/3/2023

3. CIs for single population proportion, p
• Is based on three elements of CI.
– Point estimate
– SE of point estimate
– Confidence interval 35
Dr. Getabalew
2/3/2023

Example 1
A random sample of 100 people shows that 25
are left-handed. Form a 95% CI for the true
proportion of left-handers.
Interpretation: we are 95% confidence that the true percentage of left
handers in the population is between 16.51%, 33.49%
37
Dr. Getabalew
2/3/2023

Example 2
• It was found that 28.1% of 153 cervical-cancer cases
had never had a Pap smear prior to the time of case’s
diagnosis. Calculate a 95% CI for the percentage of
cervical-cancer cases who never had a Pap test.
•
38
Dr. Getabalew
2/3/2023

4. Two Population Proportions
• We are often interested in comparing proportions
from 2 populations:
• Is the incidence of disease A the same in two
populations?
• Patients are treated with either drug D, or with
placebo. Is the proportion “improved” the same in
both groups?
39
Dr. Getabalew
2/3/2023

Confidence Interval for
Two Population Proportions
• SE of the difference =
• The confidence interval for p1 – p2 is:
40
Dr. Getabalew
2/3/2023

Example
• In a clinical trial for a new drug to treat hypertension,
N1 = 50 patients were randomly assigned to receive
the new drug, and N2 = 50 patients to receive a
placebo. 34 of the patients receiving the drug showed
improvement, while 15 of those receiving placebo
showed improvement.
• Compute a 95% CI estimate for the difference
between proportions improved.
41
Dr. Getabalew
2/3/2023

• p1 = 34/50 = 0.68, p2 = 15/50 = 0.30
• The point estimate for the difference is:
= [0.68−0.30]=0.38
• SE of the difference =
• 95% CI
– Lower = ( point estimate ) - (Zα/2) (SE)
= 0.38 – (1.96)(0.0925) = 0.20
– Upper = ( point estimate ) + (Zα/2) (SE)
= 0.38 + (1.96)(0.0925) = 0.56
• 95% CI = (0.20, 0.56)
42
Dr. Getabalew
2/3/2023

Hypothesis Testing

• One way of statistical inference
• Is a claim (assumption) about a population parameter
• Hypotheses are formulated, experiments are performed,
and results are evaluated for their consistency (non-
consistency) with a hypothesis.
• The purpose of HT is to aid the clinician, researcher or
administrator in reaching a decision (conclusion)
concerning a population by examining a sample from that
population.

Types of Hypothesis
1. The Null Hypothesis, H0
Is a statement claiming that there is no difference
between the hypothesized value and the population
value.
(The effect of interest is zero = no difference)
States the assumption (hypothesis) to be tested
H0 is a statement of agreement (or no difference), is
always about a population parameter, not about a
sample statistic

Always contains “=” , “ ≤” or “≥ ” sign
May or may not be rejected
Begin with the assumption that the Ho is true
– Similar to the notion of innocent until proven
guilty

2. The Alternative Hypothesis, HA
Is a statement of what we will believe is true if our
sample data causes us to reject Ho.
Is generally the hypothesis that is believed (or needs
to be supported) by the researcher
Is a statement that disagrees (opposes) with Ho
(The effect of interest is not zero)
Never contains “=” , “ ≤” or “≥ ” sign
May or may not be accepted

Steps in Hypothesis Testing
1. Formulate the appropriate statistical hypotheses clearly
• Specify HO and HA
H0: = 0 H0: ≤ 0 H0: ≥ 0
H1: 0 H1: > 0 H1: < 0
two-tailed one-tailed one-tailed
• Can we conclude that the proportion of patients with leukemia
who survive more than six years is not 60%?
Ho: ? HA: ?
• Can we conclude that a certain population mean is greater than
50?
Ho: ? HA: ?

2. State the assumptions necessary for computing probabilities
• A distribution is approximately normal (Gaussian)
• Variance is known or unknown
3. Select a sample and collect data
• Categorical, continuous
4. Decide on the appropriate test statistic for the hypothesis.
E.g., One population
OR

5. Specify the desired level of significance for the
statistical test ( =0.05, 0.01, etc.)
6. Determine the critical value.
– A value the test statistic must attain to be
declared significant.
-1.96 1.96 1.645 -1.645

7. Obtain sample evidence and compute the test
statistic
8. Reach a decision and draw the conclusion
• If Ho is rejected, we conclude that HA is true
(or accepted).
• If Ho is not rejected, we conclude that Ho may
be true.

Rejection and Non-Rejection Regions
• The values of the test statistic assume the points on the
horizontal axis of the normal distribution and are
divided into two groups:
• Rejection region, and
• Non-rejection region.

Example: Two-sided test at α 5%
Rejection region Non-rejection region Rejection region
= 0.025 = 0.025
0.95
1.96
-1.96

Statistical Decision
Reject Ho if the value of the test statistic that we
compute from our sample is one of the values in the
rejection region
Don’t reject Ho if the computed value of the test
statistic is one of the values in the non-rejection
region.

Level of Significance, α
Is the probability of rejecting a true Ho
For example, a significance level of 0.05 indicates a
5% risk of concluding that a difference exists when
there is no actual difference.
Alpha levels are controlled by the researcher and
are related to confidence level.
An alpha level obtained by subtracting the
confidence level from 100%

Another way to state conclusion
• Reject Ho if P-value < α
• Accept Ho if P-value ≥ α
P-value is the probability of obtaining a test statistic
as extreme as or more extreme than the actual test
statistic obtained if the Ho is true
Indicates the probability of having enough
evidence to reject or not to reject the null
hypothesis
The larger the test statistic, the smaller is the P-value.
OR, the smaller the P-value the stronger the evidence
against the Ho.

1. Hypothesis Testing of a Single Mean
(Normally Distributed)

1.1 Known Variance

Example: Two-Tailed Test
1. A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude
that the mean age of the population is not 30? The
variance is known to be 20. Let α = .05.
• Answer, "Yes we can, if we can reject the Ho that it is
30."
A. Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
B. Assumptions
Simple random sample
Normally distributed population
variance is known

C. Hypotheses
Ho: µ = 30
HA: µ ≠ 30
D. Test statistic
As the population variance is known, we use Z
as the test statistic.

E. Decision Rule
• Reject Ho if the Z value falls in the rejection region.
• Don’t reject Ho if the Z value falls in the non-rejection region.
• Because of the structure of Ho it is a two tail test. Therefore,
reject Ho if Z ≤ -1.96 or Z ≥ 1.96.

F. Calculation of test statistic
G. Statistical decision
We reject the Ho because Z = -2.12 is in the rejection
region. The value is significant at 5% α.
H. Conclusion
We conclude that µ is not 30. P-value = 0.0340
A Z value of -2.12 corresponds to an area of 0.0170. Since there
are two parts to the rejection region in a two tail test, the P-value is
twice this which is .0340.

Example: One -Tailed Test
• A simple random sample of 10 people from a certain
population has a mean age of 27. Can we conclude that
the mean age of the population is less than 30? The
variance is known to be 20. Let α = 0.05.
• Data
n = 10, sample mean = 27, 2 = 20, α = 0.05
• Hypotheses
Ho: µ ?, HA: µ ?

• Test statistic
• Rejection Region
• With α = 0.05 and the inequality, we have the entire rejection region
at the left. The critical value will be Z = -1.64. Reject Ho if Z < -
1.645.
=
Lower tail test

• Statistical decision
– We reject the Ho because -2.12 < -1.645.
• Conclusion
– We conclude that µ < 30.
– p = .0170 this time because it is only a one tail test and not a two
tail test.

1.2 Unknown Variance
• In most practical applications the standard deviation of
the underlying population is not known
• In this case, can be estimated by the sample standard
deviation s.
• If the underlying population is normally distributed,
then the test statistic is:

Example: Two-Tailed Test
• A simple random sample of 14 people from a certain population
gives a sample mean body mass index (BMI) of 30.5 and sd of
10.64. Can we conclude that the BMI is not 35 at α 5%?
• Ho: µ = 35, HA: µ ≠35
• Test statistic
• If the assumptions are correct and Ho is true, the test statistic
follows Student's t distribution with 13 degrees of freedom.

• Decision rule
– We have a two tailed test. With α = 0.05 it means that each tail is
0.025. The critical t values with 13 df are -2.1604 and 2.1604.
– We reject Ho if the t ≤ -2.1604 or t ≥ 2.1604.
• Do not reject Ho because -1.58 is not in the rejection
region. Based on the data of the sample, it is possible
that µ = 35. P-value = 0.1375

Two Population Means, Independent
Samples

2.1 Known Variances
(Independent Samples)
• When two independent samples are drawn
from a normally distributed population with
known variance, the test statistic for testing
the Ho of equal population means is:

Example:
• Researchers wish to know a difference in mean serum
uric acid (SUA) levels between normal individuals and
individuals with Down’s syndrome. The means SUA
levels on 12 individuals with Down’s syndrome and 15
normal individuals are 4.5 and 3.4 mg/100 ml,
respectively. with variances. ( 2=1, 2=1.5, respectively).
Is there a difference between the means of both groups
at α 5%?
• Hypotheses:
Ho: µ1- µ2 = 0 or Ho: µ1 = µ2
HA: µ1 - µ2 ≠ 0 or HA: µ1 ≠ µ2

• With α = 0.05, the critical values of Z are -1.96 and
+1.96. We reject Ho if Z < -1.96 or Z > +1.96.
• Reject Ho because 2.57 > 1.96.
• From these data, it can be concluded that the
population means are not equal. A 95% CI would
give the same conclusion. P-value = 0.01.

2.2 Unknown Variances
i. Equal variances (Independent samples)
• With equal population variances, we can
obtain a pooled value from the sample
variances.
• The test statistic for µ1 - µ2 is:
• Where tα/2 has (n1 + n2 – 2) df., and

Example:
• We wish to know if we may conclude, at the 95%
confidence level, that smokers, in general, have
greater lung damage than do non-smokers.
• Calculation of Pooled
Variance

• Hypotheses:
Ho: µ1 ≤ µ2 = 0, HA: µ1 > µ2
• With α = 0.05 and df = 23, the critical value of t is 1.7139. We
reject Ho if t > 1.7139.
• Test statistic
• Reject Ho because 2.6563 > 1.7139. On
the basis of the data, we conclude that µ1 >
µ2.

3. Hypothesis Testing about a Single
Population Proportion
(Normal Approximation to Binomial Distribution)
• Involves categorical values
• Two possible outcomes
– “Success” (possesses a certain
characteristic)
– “Failure” (does not possesses that
characteristic)
• Fraction or proportion of population in the
“success” category is denoted by p

Example
• In the general population of 0 to 4-year-olds, the annual
incidence of asthma is 1.4%. If 10 cases of asthma are observed
over a single year in a sample of 500 children whose mothers
smoke, can we conclude that this is different from the
underlying probability of p0 = 0.014? α = 5%
H0 : p = 0.014
HA: p ≠ 0.014

• The test statistic is given by:

• The critical value of Zα/2 at α=5% is ±1.96.
• Don’t reject Ho since Z (=1.14) in the non-rejection
region between ±1.96.
• P-value = 0.2548
• We do not have sufficient evidence to conclude that
the probability of developing asthma for children
whose mothers smoke in the home is different from
the probability in the general population

4. Hypothesis Tests about the Difference
Between
Two Population Proportions

Where X1 = the observed number of events in the first sample
and X2 = the observed number of events in the second sample

Example
• A study was conducted to investigate the
possible cause of gastroenteritis outbreak
following a lunch served in a high school
cafeteria. Among the 225 students who ate the
sandwiches, 109 became ill. While, among the
38 students who did not eat the sandwiches, 4
became ill. Is there a significant difference
between the two groups at α =5%.
• We wish to test
Ho: p1 = p2 against the alternative
HA: p1 ≠ p2

• Assume that the sample sizes are large
enough, and the normal approximation to
the binomial distribution is valid.
• If the Ho is true, then p1 = p2 = p

The area under the standard normal curve to the
right of 4.36 is less than 0.0001. Therefore, p <
0.0002. We reject H0 at the 0.05 level.
The proportion of students who became ill
differs in the two groups; those who ate the
prepared sandwiches were more likely to
develop gastroenteritis.

Types of Errors in Hypothesis
Tests
• Whenever we reject or accept the Ho, we
commit errors.
• Two types of errors are committed.
– Type I Error
– Type II Error

Type I Error
• The error committed when a true Ho is rejected
• Considered a serious type of error
• The probability of a type I error is the probability of
rejecting the Ho when it is true
• The probability of type I error is α
• Called level of significance of the test
• Set by researcher in advance

Type II Error
• The error committed when a false Ho is not rejected
• The probability of Type II Error is
Power
• The probability of rejecting the Ho when it is false.
Power = 1 – β = 1- probability of type II error
• We would like to maintain low probability of a
Type I error (α) and low probability of a Type II
error (β) [high power = 1 - β].

Action
(Conclusion)
Reality
Ho True Ho False
Do not
reject Ho
Correct action
(Prob. = 1-α)
Type II error (β)
(Prob. = β= 1-Power)
Reject Ho Type I error (α)
(Prob. = α = Sign. level)
Correct action
(Prob. = Power = 1-β)

Thank you

Estimation and hypothesis test lecture.pdf

More Related Content

Similar to Estimation and hypothesis test lecture.pdf (20)

Recently uploaded (20)

Estimation and hypothesis test lecture.pdf