3. Types of Analysis for
Types of Analysis for
Categorical Data
Categorical Data
5. Choosing the appropriate
Statistical test
Based on the three aspects of the data
Types of variables
Number of groups being
compared &
Sample size
6. Statistical test (cont.)
Chi-square test:
Study variable: Qualitative(Categorical )
Outcome variable: Qualitative(Categorical)
Comparison: two or more proportions
Sample size: >30
Expected frequency: > 5
7. Chi-square test
Purpose
Purpose
To find out whether the association between two
To find out whether the association between two
categorical variables are statistically significant
categorical variables are statistically significant
Null Hypothesis
Null Hypothesis
There is no association between two variables
There is no association between two variables
8.
( o - e ) 2
e
X
X 2
=
=
Figure for Each Cell
9. reject
reject H
Ho
o if
if
2
2
>
>
2
2
.
.
,
,df
df
where df = (
where df = (r
r-1)(
-1)(c
c-1)
-1)
2
2
= ∑
= ∑
(
(O
O -
- E
E)
)2
2
E
E
3.
3. E is the expected frequency
E is the expected frequency
^
^
E
E =
=
^
^
(total of all cells)
(total of all cells)
total of row in
total of row in
which the cell lies
which the cell lies
total of column in
total of column in
which the cell lies
which the cell lies
•
•
1.
1. The summation is over all cells of the contingency
The summation is over all cells of the contingency
table consisting of r rows and c columns
table consisting of r rows and c columns
2.
2. O is the observed frequency
O is the observed frequency
4.
4. The degrees of freedom are df = (r-1)(c-1)
The degrees of freedom are df = (r-1)(c-1)
10. Requirements
Prior to using the chi square test,
there are certain requirements that
must be met.
The data must be in the form of
frequencies counted in each of a set of
categories. Percentages cannot be used.
The total number observed must be
exceed 20.
11. Requirements
The expected frequency under the H0
hypothesis in any one fraction must not
normally be less than 5.
All the observations must be
independent of each other. In other
words, one observation must not have
an influence upon another observation.
12. APPLICATION OF CHI-SQUARE TEST
TESTING INDEPENDCNE (or
ASSOCATION)
TESTING FOR HOMOGENEITY
TESTING OF GOODNESS-OF-FIT
13. Chi-square test
Objective : Smoking is a risk factor for MI
Null Hypothesis: Smoking does not
cause MI
D (MI)
D (MI) No D( No MI)
No D( No MI) Total
Total
Smokers
Smokers 29
29 21
21 50
50
Non-smokers
Non-smokers 16
16 34
34 50
50
Total
Total 45
45 55
55 100
100
18. Degrees of Freedom
df = (r-1) (c-1)
= (2-1) (2-1) =1
Critical Value (Table A.6) = 3.84
X2
= 6.84
Calculated value(6.84) is greater than critical (table)
value (3.84) at 0.05 level with 1 d.f.f
Hence we reject our Ho and conclude that there is
highly statistically significant association between
smoking and MI.
Chi-Square
19. Association between Diabetes and Heart
Disease?
Background:
Contradictory opinions:
1. A diabetic’s risk of dying after a first heart attack is the same as that of
someone without diabetes. There is no association between diabetes and
heart disease.
vs.
2. Diabetes takes a heavy toll on the body and diabetes patients often
suffer heart attacks and strokes or die from cardiovascular complications
at a much younger age.
So we use hypothesis test based on the latest data to see what’s the right
conclusion.
There are a total of 5167 patients, among which 1131 patients are non-
diabetics and 4036 are diabetics. Among the non-diabetic patients, 42%
of them had their blood pressure properly controlled (therefore it’s 475 of
1131). While among the diabetic patients only 20% of them had the blood
pressure controlled (therefore it’s 807 of 4036).
20. Association between Diabetes and Heart
Disease?
Data
Controlled Uncontrolled Total
Non-diabetes 475 656 1131
Diabetes 807 3229 4036
Total 1282 3885 5167
21. Association between Diabetes and Heart
Disease?
Data:
Diabetes: 1=Not have diabetes, 2=Have Diabetes
Control: 1=Controlled, 2=Uncontrolled
DIABETES * CONTROL Crosstabulation
Count
475 656 1131
807 3229 4036
1282 3885 5167
1.00
2.00
DIABETES
Total
1.00 2.00
CONTROL
Total
22. Association between Diabetes and Heart
Disease?
Hypothesis test:
H0: There is no association between diabetes
and heart disease. (or) Diabetes and heart
disease are independent.
vs
HA: There is an association between diabetes
and heart disease. (or) Diabetes and heart
disease are dependent.
--- Assume a significance level of 0.05
23. Association between Diabetes and Heart
Disease?
---The computer gives us a Chi-Square Statistic
of 229.268
---The computer gives us a p-value of .000
(<0.0001)
--- Because our p-value is less than alpha, we
would reject the null hypothesis.
--- There is sufficient evidence to conclude that
there is an association between diabetes and
heart disease.
24. Age
Age
Gender
Gender <30
<30 30-45
30-45 >45
>45 Total
Total
Male
Male 60 (60)
60 (60) 20 (30)
20 (30) 40 (30)
40 (30) 120
120
Female
Female 40 (40)
40 (40) 30 (20)
30 (20) 10 (20)
10 (20) 80
80
Total
Total 100
100 50
50 50
50 200
200
Chi- square test
Find out whether the gender is equally
distributed among each age group
25. Test for Homogeneity (Similarity)
To test similarity between frequency distribution or group. It is
used in assessing the similarity between non-responders and
responders in any survey
Age (yrs)
Age (yrs) Responders
Responders Non-responders
Non-responders Total
Total
<20
<20 76 (82)
76 (82) 20 (14)
20 (14) 96
96
20 – 29
20 – 29 288 (289)
288 (289) 50 (49)
50 (49) 338
338
30-39
30-39 312 (310)
312 (310) 51 (53)
51 (53) 363
363
40-49
40-49 187 (185)
187 (185) 30 (32)
30 (32) 217
217
>50
>50 77 (73)
77 (73) 9 (13)
9 (13) 86
86
Total
Total 940
940 160
160 1100
1100
26. Fisher’s exact test:
Study variable: Qualitative(Categorical)
Outcome variable: Qualitative(Categorical)
Comparison: two proportions
Sample size: < 30
28. Example
Example
The following data compare malocclusion
of teeth with method of feeding infants.
Normal teeth Malocclusion
Breast fed 4 (a) 16 (b)
Bottle fed 1 (c) 21 (d)
29. Fisher’s Exact Test:
Fisher’s Exact Test:
The method of Yates's correction was useful
when manual calculations were done. Now
different types of statistical packages are
available. Therefore, it is better to use
Fisher's exact test rather than Yates's
correction as it gives exact result.
1 2 1 2
! ! ! !
'
! ! ! ! !
R R C C
Fisher s ExactTest
n a b c d
34. What to do when we have a
paired samples and both the
exposure and outcome
variables are qualitative
variables (Binary).
35. Macnemar’s test: (for paired samples)
Study variable: Qualitative (categorical)
Outcome variable: Qualitative(categorical)
Comparison: two proportions
Sample size: Any
36. Problem
A researcher has done a matched case-
control study of endometrial cancer
(cases) and exposure to conjugated
estrogens (exposed).
In the study cases were individually
matched 1:1 to a non-cancer hospital-
based control, based on age, race, date
of admission, and hospital.
37. McNemar’s test
McNemar’s test
Situation:
Situation:
Two paired binary variables that
Two paired binary variables that
form a particular type of 2 x 2
form a particular type of 2 x 2
table
table
e.g. matched case-control study or
e.g. matched case-control study or
cross-over trial
cross-over trial
39. can’t use a chi-squared test - observations
are not independent - they’re paired.
we must present the 2 x 2 table differently
each cell should contain a count of the
number of pairs with certain criteria, with
the columns and rows respectively referring
to each of the subjects in the matched pair
the information in the standard 2 x 2 table
used for unmatched studies is insufficient
because it doesn’t say who is in which pair -
ignoring the matching
44. Degrees of Freedom
df = (r-1) (c-1)
= (2-1) (2-1) =1
Critical Value (Table A.6) = 3.84
X2
= 25.92
Calculated value(25.92) is greater than critical
(table) value (3.84) at 0.05 level with 1 d.f.f
Hence we reject our Ho and conclude that
there is highly statistically significant
association between Endometrial cancer and
Estrogens.
46. In Conclusion !
When both the study variables and outcome
variables are categorical (Qualitative):
Apply
(i) Chi square test
(ii) Fisher’s exact test (Small samples)
(iii) Mac nemar’s test ( for paired samples)