1. HYPOTHESIS TESTING:
Inference on Proportions I
Dr. O. J. AKINSOLA
Department of Community Health and Primary Care,
College of Medicine
University of Lagos
LAGOS
2. Recall that the choice of a test statistic is dictated by
the followings:
the study objectives
the type of data usually indicated by the level of
measurements
the type of study design (sample selection methods,
data collection techniques, matching, dependent and
independent samples, etc)
the power of the test required to detect a true
difference
3. This depends on assumptions underlying the statistical
model for the test statistics. Remember also that the
difference between power of a test and the p-value can
be explained as follows.
The p-value is a measure of the risk we are willing to
take that our decision with respect to the null hypothesis
may be incorrect. Whereas, the ability of a study to
detect a difference between two or more treatment
groups if the alternative hypothesis were true, is the
4. Power of the test. This power depends on the magnitude
of the difference to be detected, the sample size, the
variation in the measurements and the level of
significance (p-value).
6. Chi-Square Test of Independence
Chi-square Goodness of Fit Test
Fisher’s Exact Test
McNemar’s Test
Four Most Common Statistical Tests For
Categorical Data
7. This test checks whether two categorical
variables are independent or related, making it
useful to determine if knowing one category
affects the other.
Chi-Square Test of Independence
8. The test compares observed data with
expected data to see if they match. It’s used to
test how well the data fits a particular
distribution.
Chi-Square Goodness of Fit Test
9. It is accurate for small sample sizes and test
the relationship between two categorical
variables, similar to Chi-Square Test of
Independence but better suited for small data.
Fisher’s Exact Test
10. This test specifically designed for paired data, often used
in before-and-after scenarios to detect changes in
proportions within the same group over time.
McNemar’s Test is a test on a 2x2 contingency table. It
checks the marginal homogeneity of
two dichotomous variables.
It is used for data of the two groups coming from the
same participants, i.e. paired data.
Paired data usually arise through matching, which
increased the validity by controlling confounders.
For example, it is used to analyze tests performed before
and after treatment in a population.
McNemar’s Test
11. For categorical data, the chi-square test is useful to
test the significance of association between any two
categorical variables. Recall that nominal data
expressed as proportions can be compared using the
Z-test for proportions.
The procedure is to estimate the difference between
the two proportions and divide this by standard error
of the difference to obtain the critical value of Z
which are later referred to the Z-table to obtain the
p-value. At the 5% level of significance, the critical
value is 1.96. So that any
Chi-Square Test of Independence
12. calculated value beyond Z=1.96 is considered falling
in the critical region and therefore leads to concluding
the difference observed is for statistical significance.
A statistical test for the same purpose is the Chi-
square test. This test demands that the data is
expressed as frequencies rather than proportions and
presented in contingency tables. The size of a
contingency table is determined by the number of
categories of each classifying nominal variables. A
special characteristics of the Chi-square test is its
dependence on the degrees of freedom. In fact the
13. chi-square distribution approaches the normal
distribution with increase in degrees of freedom. The
r x c contingency table which has r rows and c
columns has (r-1) x (c-1) degrees of freedom. The
area under a standard chi-square table has been
calculated and tabulated at corresponding degrees of
freedom.
Once the chi-square test statistic is evaluated, we
only need to read off the significance level directly
from this table or by comparing computed value with
a calculated value at a chosen level of significance.
14. If 14 out of the 60 patients seen in a rural clinic in the
Northern region of Nigeria in one day had irritable
bowel syndrome (IBS), and 4 out of 50 patients seen
in another rural clinic on the same day in the
Southern region had irritable bowel syndrome.
Can we say the numbers with irritable bowel
syndrome is contingent on the location of the clinic?
The result is presented in the 2 x 2 table as follws:
Example 1
16. The procedure to test the association between IBS
and REGION follows the steps.
Step 1: Ho: There is no association between Region
and presentation with irritable bowel syndrome.
Step 2: HA: There is an association between Region
and presentation with irritable bowel syndrome.
Step 3: Level of significance (α = 5%)
Step 4: χ2
=
n
i Ei
Ei
Oi
1
2
)
(
17. Where Oi are observed frequencies and Ei are
expected frequencies in the cells of the table if the
null hypothesis were true. We now need to calculate
these expected frequencies in each cell of the 2 x 2
table in the Null hypothesis were true.
i.e. Ei =
T
T
T
G
xC
R
18. Oi Ei (Oi-Ei)2
14 9.82 17.47 1.77
46 50.18 17.47 0.35
4 8.18 17.47 2.14
46 41.82 17.47 0.42
TOTAL - - 4.68
Ei
Ei
Oi 2
)
(
19. χ2
= 4.68 on 1 degree of freedom. The tabulated chi-
square value on 1 degree of freedom at 5% level is
3.84. We can see that the calculated chi-square value
is greater than the tabulated value at the 5% level.
That is, 4.68 > 3.84. Therefore, p < 0.05
20. We reject the null hypothesis that there is no significant
difference in the rate of irritable bowel syndrome
between East and South. In other words, there is a
significant association between REGION and IBS.
Conclusion
21. In a study of the potential role of drug therapy in the
treatment of bladder instability in the elderly, 19
incontinent elderly patients received Imipramine and
14 received a placebo treatment. Of the Imipramine
patients, 14 became dry after treatment compared
with only 6 of the placebo patients.
Show the data in an appropriate table
Is there any evidence of genuine treatment
differences?
Example 2
22. The chi-square test is also used to determine in
quantitative terms if a set of observations follow a
particular probability distribution. We compare the
observed frequencies of the individuals with the
given attribute with what we would have expected if
certain theory or probability distribution had held.
Chi-Square Goodness of Fit Test
23. The data below gave the number of spontaneous
abortions suffered by a sample of 71 women who had
been pregnant four times. If the risk of abortions were
independent of previous reproductive history and the
same for all women, the number of abortions out of
four pregnancies would follow a binomial distribution.
Show that the number of abortions out of four
pregnancies follow a binomial distribution.
Example 1
25. Since the total number of abortions is 82, the
probability of abortion is
= 0.292
Using Pr= n
cxpx
qn-x
, the expected probabilities of
0,1,2,3,4, spontaneous abortions are:
71
4
82
x
27. Number of abortions Observed Expected
0 24 17.821
1 29 29.465
2 7 18.176
3 5 5.041
4 6 0.497
TOTAL 71 71
28. χ2
= 69.951. The critical value is 5% at n-1 degree of freedom
which gives 7.81. This calculated value on 3 degrees of
freedom exceed the critical value at a 5% probability level.
Hence, there is a strong evidence against the goodness of fit of
the binomial distribution to the observed distribution of
spontaneous abortions, p<0.001. In other words, we reject the
null hypothesis in favour of the alternative hypothesis.
29. Population statistics indicate that the chances of a newborn
being male are 0.52. A survey of 50 quadruplet births yielded
the following pattern of gender outcomes.
0 male 5
1 male 14
2 males 14
3 males 10
4 males 7
Test if these results follow a Binomial distribution. What does
this imply about the genders of the children in quadruplet
births?
Example 2
30. Procedure:
1. Enter
a) Value of 2x2 contingency table tabulating the
outcomes of 2 tests
b) Value of 1-α, the two-sided confidence level
2. Click the button “Calculate” to obtain
a) Test statistic and p-values (1 tail and 2 tails) of
McNemar’s Test
b) Odds Ratio
3. Click the button “Reset” for another new calculation
McNemar’s Test
31. Test 2 Positive Test 2 Negative Totals
Test 1 Positive a b n1 = a+b
Test 1 Negative c d n2= c+d
Totals m1 = a+c m2 = b+d N = n1+ n2
32. The null hypothesis is
The alternative hypothesis is
The McNemar’s test statistic with Yates' continuity correction is:
, with degree of freedom =1
The Odds Ratio is
Notation:
100(1-α)% confidence interval: We are 100(1-α)% confident that the true value of the parameter is
included in the confidence interval
: The z-value for standard normal distribution with left-tail probability
33. To determine whether a drug influences the
disease, the result of diagnosis before and after the
treatment is tabulated on a 2x2 contingency table.
Example 1
After: Positive After: Negative Totals
Before: Positive 7 13 20
Before: Negative 1 8 9
Totals 8 21 29
Then the test statistic is 8.64286 and the 1-tail and 2-tails p-values are
0.00164 and 0.00328 respectively.
Therefore, the null hypothesis is rejected with 5% significance level.
The Odds Ratio is 13.
34. A study was carried out on post-menopausal women
in City A. Cases of women with endometrial cancer
were identified from this city. A control group was
selected matched to the case on age and length of
residence in city A. The medical question was whether
endometrial cancer was related to estrogen use.
Example 2
Estrogen
(Control)
No Estrogen
(Control)
Totals
Estrogen (Cases) 27 29 56
No Estrogen (Cases) 3 4 7
Totals 30 33 63
35. The test statistic is 19.53 and the 2-tails p-value is
<0.00001. Therefore, the null hypothesis is rejected at 5%
significance level.
These data show a statistically significant association
between estrogen use and endometrial cancer (p<0.0001).
The odds of endometrial cancer is approximately 10 times
greater for women who were on estrogen therapy compared
to those who were not.