4 Threats to validity from confounding bias and effect modification

Threats to Validity from Confounding and
Effect Modification
• Overview: Random vs. systematic error
• Confounding
• Effect Modification
• Logistic regression (time permitting)
• Special thanks for some of the materials in these
lecture:
– Professor Jen Ahern (UCB)
– Professor Madhu Pai (McGilll—a former
250b GSI)
1
2014 Page 1

1
The cardinal rule of epidemiology
• Remember that all results based on epidemiology
studies are likely to be …
2014 Page 2

The cardinal rule of epidemiology (continued)
• WRONG…
– unless proper care has been taken to eliminate
all sources of error in the estimate (…and
sometimes even then the results will be
wrong because of unknown sources of error)
2
2014 Page 3

Example: Confounding
• A colleague with outside funding believes that cigarette smoke
is not a “cause” (in any sense) of lung cancer but that exposure
to matches (yes, matches) is the cause. This colleague has
conducted a large case control study to test the null hypothesis:
Ho
: “Matches are not associated with lung cancer”.
• What’s the rationale (in the Popperian sense) for stating the null
hypothesis rather than the alternative:
HA
: “Matches are associated with lung cancer”.
• What does the colleague hope to do (in terms of hypothesis
testing)
• What do you think of the term “associated” –would it be better
to write “a cause of”? 2014 Page 4

• “We can never finally prove our scientific
theories, we can merely (provisionally)
confirm or (conclusively) refute them.”
– - Karl Popper
Sir Karl Raimund Popper CH FBA FRS
[4]
(28 July 1902 – 17 September 1994) was an Austrian-British[5]
philosopher and professor at the London School of Economics.[6]
He is generally regarded o regarded as
one of the greatest philosophers of science of the 20th century.[7][8]
Popper is known for his rejection of the
classical inductivist views on the scientific method, in favour of empirical falsification: regarded as one of the
greatest philosophers of science of the 20th century.[7][8]
(wikipedia.com)
2014 Page 5

Confounding: smoking, matches,
10
and lung cancer
• Your colleague has located 1000 cases of lung cancer, of
whom 820 carry matches.
• Among 1000 reference patients (selected randomly from a
population with recently taken normal chest x-rays), 340
carry matches.
• Strengths of the reference selection process? Weaknesses?
• Describe the relationship between matches and lung cancer
in your colleague’s data.
• Would you like to analyze the data in any other fashion?
2014 Page 6

and lung cancer
• Odds ratio = (820 * 660) / (180 * 340)
• OR = 8.8
• 95% CI (7.2, 10.9)
Cancer No cancer
Matches 820 340
No matches 180 660
2014 Page 7

and lung cancer
• You decide to look at the relationship between matches
and lung cancer in the smokers separately from the non-
smokers.
• You find that among the 1000 cases, 900 are smokers and
810 (of the 900) carry matches
• Among the 1000 reference patients, 300 are smokers and
270 (of the 300) carry matches
• Calculate the relevant measure(s) of effect.
• What should your colleague do about future funding?
2014 Page 8

Confounding: smoking, matches, and lung
cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
Pooled Cancer No cancer
Matches No
Matches
Smokers
Matches
820
180
Cancer
810
340
660
No cancer
270
No Matches
Non-smoker
Matches
No Matches
90
Cancer
10
90
30
No cancer
70
630 13
2014 Page 9

and lung cancer
• To be complete, you also decide to examine the
relationship between smoking and lung cancer.
• What tables should you construct to do this?
14
2014 Page 10

cancer
’
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Smoking No
Smoking
Matches
Smoking
900
100
Cancer
810
300
700
No cancer
270
No Smoking
No matches
Smoking No
Smoking
10
Cancer
90
90
70
No cancer
30
630 16
2014 Page 11

Confounder?
? ?
? Unadjusted RR
Exposure Disease
? Adjusted
RR
19
2014 Page 12

2
BMJ 2004;329:868-869 (16 October)
Why is confounding so important in
epidemiology?
● BMJ Editorial: “The scandal of poor epidemiological
research” [16 October 2004]
● “Confounding, the situation in which an apparent
effect of an exposure on risk is explained by its
association with other factors, is probably the
most important cause of spurious associations in
observational epidemiology.”
2014 Page 13

Overview
3
● Causality is the central concern of epidemiology
● Confounding is the central concern with establishing
causality
● Confounding can be understood using multiple
different approaches
● A strong understanding of various approaches to
confounding and its control is essential for all those
who engage in health research
2014 Page 14

10
Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.
Causal Effect
Random Error
Confounding
Information bias (misclassification)
Selection bias
Bias in inference
Reporting & publication bias
Bias in knowledge use
Confounding is one of the key biases in identifying
causal effects
RR
causal
“truth”
RR
association
2014 Page 15

11
Confounding:
4 ways to understand
it!
1. “Mixing of effects”
2. “Classical” approach based on a priori
criteria
3. Collapsibility and data-based criteria
4. “Counterfactual” and non-comparability
approaches
2014 Page 16

12
Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002
First approach:
Confounding: mixing of effects
● “Confounding is confusion, or mixing, of
effects; the effect of the exposure is mixed
together with the effect of another variable,
leading to bias” - Rothman, 2002
Latin: “confundere” is to mix together
2014 Page 17

Example
Association between birth order and Down syndrome
13
Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 18

Association between maternal age and Down syndrome
14
2014 Page 19

Association between maternal age and Down syndrome, stratified by
birth order
15
2014 Page 20

Mixing of Effects: the water pipes analogy
Exposure
16
Adapted from Jewell NP. Statistics for Epidemiology. Chapman & Hall,
2003
Outcome
Confounder
Mixing of effects – cannot separate the effect of exposure from that of
confounder
Exposure and disease
share a common cause (‘parent’)
2014 Page 21

Mixing of Effects: “control” of the confounder
Exposure
17
Adapted from: Jewell NP. Statistics for Epidemiology. Chapman & Hall,
2003
Outcome
Confounder
Successful “control” of confounding (adjustment)
If the common cause (‘parent’)
is blocked, then the exposure –
disease association becomes
clearer
2014 Page 22

Second approach: “Classical” approach
based on a priori criteria
18
“Bias of the estimated effect of an exposure on an outcome due to
the presence of a common cause of the exposure and the
outcome” – Porta 2008
● A factor is a confounder if 3 criteria are met:
● a) a confounder must be causally or noncausally
associated with the exposure in the source population
(study base) being studied;
● b) a confounder must be a causal risk factor (or a
surrogate measure of a cause) for the disease in the
unexposed cohort; and
● c) a confounder must not be an intermediate cause (in
other words, a confounder must not be an intermediate
step in the causal pathway between the exposure and the
disease)
2014 Page 23

19
Exposure
E
Disease (outcome)
D
Confounder
C
Confounding Schematic
Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc.,
2000. Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th
Edition.
2014 Page 24

Exposure
E
Confounder
C
Intermediate cause
Disease
D
20
2014 Page 25

Exposure
E
Confounder
C
General idea: a confounder could be a ‘parent’ of the
exposure, but should not be be a ‘daughter’ of the
exposure
Disease
D
21
2014 Page 26

Example of schematic (from Gordis)
22
2014 Page 27

Birth Order
E
23
Down Syndrome
D
Confounding factor:
Maternal Age
C
Confounding Schematic
2014 Page 28

HRT use Heart disease
Association between HRT and heart disease
Confounding factor:
SES
24
Are confounding criteria met?
2014 Page 29

BRCA1 gene Breast cancer
Confounding factor:
Age
x
25
Should we adjust for age, when evaluating the association
between a genetic factor and risk of breast cancer?
No!
2014 Page 30

Sex with multiple partners Cervical cancer
Confounding factor:
HPV
26
2014 Page 31

Sex with
multiple
partners
HPV Cervical
cancer
27
What if this was the underlying causal
mechanism?
2014 Page 32

Obesity Mortality
Confounding factor:
Hypertension
28
2014 Page 33

Obesity Hypertension Mortality
29
What if this was the underlying causal
mechanism?
2014 Page 34

Direct vs indirect effects
Obesity Hypertension Mortality
Obesity
Indirect effect
Hypertension Mortality
Direct effect
Direct effect is portion of the total effect that does not act via an intermediate cause 30
Indirect effect
2014 Page 35

Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an
appl3
ic3
ation to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
Simple causal
graphs
E DC
Maternal age (C) can confound the association
between multivitamin use (E) and the risk of certain
birth defects (D)
2014 Page 36

34
Complex causal graphs
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation:
an application to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
E DC
U
History of birth defects (C) may increase the chance of
periconceptional vitamin intake (E). A genetic factor (U) could
have been the cause of previous birth defects in the family,
and could again cause birth defects in the current pregnancy
2014 Page 37

35
Smoking
A
E
Calcium
D
Bone
fractures
C
BMI
supplementation
U
Physical
Activity
B
Source: Hertz-Picciotto
More complicated causal graphs!
2014 Page 38

The ultimate complex causal
graph!
36
A PowerPoint diagram meant to portray the
complexity of American strategy in
Afghanistan!
2014 Page 39

38
Third approach: Collapsibility and data-
based approaches
● According to this definition, a factor is a confounding
variable if
● a) the effect measure is homogeneous across the strata
defined by the confounder and
● b) the crude and common stratum-specific (adjusted) effect
measures are unequal (this is called “lack of collapsibility”)
● Usually evaluated using 2x2 tables, and simple
stratified analyses to compare crude effects with
adjusted effects
“Collapsibility is equality of stratum-specific measures of effect with the crude
(collapsed), unstratified measure” Porta, 2008, Dictionary
2014 Page 40

39
Crude vs. Adjusted Effects
● Crude: does not take into account the effect of the
confounding variable
● Adjusted: accounts for the confounding variable(s)
(what we get by pooling stratum-specific effect
estimates)
● Generating using methods such as Mantel-Haenszel
estimator
● Also generated using multivariate analyses (e.g. logistic
regression)
● Confounding is likely when:
●
●
RR
crude
=/= RR
adjusted
OR
crude
=/= OR
adjusted
2014 Page 41

42
Crude 2 x 2 table
Calculate Crude OR (or RR)
Stratify by Confounder
Calculate OR’s
for each stratum
If stratum-specific OR’s are
similar, calculate adjusted RR (e.
g. MH)
Crude
Stratum 1 Stratum 2
If Crude OR =/= Adjusted OR,
confounding is likely
If Crude OR = Adjusted
OR, confounding is
unlikely
OR
Crude
OR1
OR2
Stratified Analysis
JC: introduce “test of homogeneity”
2014 Page 42

Examples: crude vs adjusted RR
Study Crude RR Stratum1 Stratum2 Adjusted Confound
RR RR RR ing?
1 6.00 3.20 3.50 3.30
2 2.00 1.02 1.10 1.08
3 1.10 2.00 2.00 2.00
4 0.56 0.50 0.60 0.54
5 4.20 4.00 4.10 4.04
6 1.70 0.03 3.50
48
2014 Page 43

49
Maldonado & Greenland, Int J Epi 2002;31:422-29
Fourth approach:
Causality: counterfactual
model
● Ideal “causal contrast” between exposed and
unexposed groups:
● “A causal contrast compares disease frequency
under two exposure distributions, but in one target
population during one etiologic time period”
● If the ideal causal contrast is met, the observed
effect is the “causal effect”
2014 Page 44

52
What happens actually?
RR
assoc
= I
exp
/ I
substitute
RR
causal
= I
exp
/ I
unexp
IDEAL
ACTUAL
2014 Page 45

50
I
exp
Iunexp
Counterfactual, unexposed cohort
RR
causal
= I
exp
/ I
unexp
“A causal contrast compares disease frequency under two exposure distributions, but in one
Exposed cohort
Ideal counterfactual comparison to determine causal effects
target population during one etiologic time period”
“Initial conditions” are identical in
the exposed and unexposed groups
– because they are the same
population!
2014 Page 46

51
I
exp
Iunexp
Exposed cohort
Substitute, unexposed cohort
Isubstitute
What happens actually?
counterfactual state
is not observed
A substitute will usually be a population other than the target
population during the etiologic time period - INITIAL CONDITIONS
MAY BE DIFFERENT
2014 Page 47

53
Counterfactual definition of confounding
● “Confounding is present if the substitute
population imperfectly represents what the
target would have been like under the
counterfactual condition”
● “An association measure is confounded (or biased
due to confounding) for a causal contrast if it does
not equal that causal contrast because of such an
imperfect substitution”
RR
causal
=/=
RR
assoc
2014 Page 48

Residual confounding
• Confounding can persist, even after adjustment
• Why?
– All confounders were not adjusted for (unmeasured confounding)
– Some variables were actually not confounders!
– Confounders were measured with error (misclassification of
confounders)
– Categories of the confounding variable are improperly defined
(e.g. age categories were too broad)
51
2014 Page 49

55
Simulating the counter-factual comparison:
Experimental Studies: RCT
Randomization helps to make the groups “comparable” (i.e. similar
initial conditions) with respect to known and unknown confounders
Therefore confounding is unlikely at randomization - time t0
Eligible patients
Treatment
Randomization
Placebo
Outcomes
Outcomes
2014 Page 50

Confounding: Methods to control
or reduce confounding
• Methods used in study design to reduce confounding
– Randomization
– Restriction
– Matching
• Methods used in study analysis to reduce confounding
– Stratified analysis
– Multivariate analysis
31
2014 Page 51

Confounding:The use of randomization to
“ ”
reduce confounding
• Randomization
– Useful only for intervention studies
– Definition: random assignment of study subjects to
exposure categories
– The special strength of randomization is its ability to
control/reduce the effect of confounding variables about
which the investigator is unaware
– If there is maldistribution of potentially confounding
variables after randomization (the reason for the classic
“Table I: Baseline characteristics” in the randomized trial)
then other confounding control options (see below) are
32applied
2014 Page 52

Substitute, unexposed cohort
54
Exposed cohort
“Confounding is
present if the
substitute
population
imperfectly
represents what
the target would
have been like
under the
counterfactual
condition”
2014 Page 53

Confounding: The use of restriction to
reduce confounding
• Confounding cannot occur if the distribution of the
potential confounding factors do not vary across exposure
or disease categories
– Implication of this is that an investigator may restrict
study subjects to only those falling with specific level
(s) of a confounding variable
• Extreme example: an investigator only selects
subjects of exactly the same age.
• Advantages of restriction
– straightforward, convenient, inexpensive
33
2014 Page 54

Confounding: The use of restriction to
reduce confounding (cont.)
• Disadvantages
– May limit number of eligible subjects
– Residual confounding may persist if restriction
categories not sufficiently narrow (e.g. “decade of age”
might be too broad)
– Not possible to evaluate the relationship of interest at
different levels of the confounder
• Question: How does restriction differ from matching?
34
2014 Page 55

Confounding:The use of matching to reduce
confounding
• Subjects with all levels of a potential confounder are
admitted into the study BUT the control/reference subjects
(either with respect to exposure in a cohort or disease in a
case-reference study) are chosen to have the same
distribution of the potential confounder
• The use of matching (may) also require special analysis
techniques (matched analyses and conditional logistic
regression)
35
2014 Page 56

• Disadvantages of matching
– Finding appropriate control/reference subjects may be
difficult and expensive and limit sample size
– Matching is most often used in case-reference (i.e.
case- control studies because in a large cohort study the
cost of matching may be prohibitive)
• Thus, in cohort studies it’s often cheaper to just
enroll available controls and use analytic methods
(below) to control confounding)—this doesn’t apply
to computerized “free” data
36
2014 Page 57

Confounding: The use of matching to
reduce confounding (cont.)
• Disadvantages of matching (cont.)
– Confounding factor used to match subjects cannot be
itself evaluated with respect to the outcome/disease
– Obviously, matching does not control for confounding
by factors other than that used to match
– The use of matching makes the use of stratified analysis
(for the control of other potential but non-matched
factors) very difficult
• One way around this problem is the use of
conditional logistic regression but there is a large
reduction in “effective” sample size because only
discordant pairs are used.
37
2014 Page 58

• Advantages of matching
– Matching may be the only way to obtain sufficient
numbers of control/reference subjects with relevant
levels of the confounding factor(s)
– Example: controlling for “neighborhood” (and all that
it implies) by any approach other than matching is very
difficult
38
2014 Page 59

• Advantages of matching (cont.)
– Useful in very small studies in which chance
differences in confounding factors are likely to exist
between the study groups and other forms of control for
the confounders (such as stratification or multivariate
adjustment) are not possible (because of the limited
sample size)
– The full benefit of matching (in terms of the reduction
of confounding) is obtained only if the proper form of
matched analysis is used (to be reviewed later in the
course)
39
2014 Page 60

• Basic goal of stratification is to evaluate the relationship
between the predictor (“cause”) and outcome (“effect”)
variable in strata homogenous with respect to potentially
confounding variables
40
2014 Page 61

Confounding:The use of stratification to
reduce confounding
• For example, to examine the relationship between smoking
and lung cancer while controlling for the potentially
confounding effect of gender:
– Create a 2x2 table (smoking vs. lung cancer) for men
and women separately
– To control for multiple confounders simultaneously,
stratify by pairs (or triplets or higher) of confounding
factors. For example, to control for gender and
race/ethnicity determine the OR for smoking vs. lung
cancer in multiple strata: white women, black
women, Hispanic women, white men, black men,
Hispanic men,etc. 41
2014 Page 62

• (From the earlier example): Goal: create a summary or
“adjusted” estimate for the relationship between matches
and lung cancer while adjusting for the two levels of
smoking (the potential confounder)
• This process is analgous to the standardization of rates
earlier in the course—in those examples the purpose of
adjustment was to remove the confounding effect of age on
the relationship between populations (A vs. B etc.) and
rates of disease or death.
• In the present example the goal is to remove the
confounding effect of smoking on the relationship between
matches and lung cancer. 42
2014 Page 63

Confounding:Types of summary estimators
to determine uniform effect over strata
• Mantel-Haenszel
– We will use this estimator in the present course
– Resistant to the effects of small strata or cells with a
value of “0”
– Computationally a piece of cake
• Directly pooled estimators (e.g. Woolf)
– Sensitive to small strata and cells with value “0”
– Computationally messy but doable
• Maximum likelihood
– The most “appropriate” estimator
– Resistant to the effects of small strata or cells with a
value of “0”
– Computationally
challenging
43
2014 Page 64

cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
Matches No
Matches
Smokers
Matches
820
180
Cancer
810
340
660
No cancer
270
No Matches
Non-smoker
Matches
No Matches
90
Cancer
10
90
30
No cancer
70
630 44
2014 Page 65

An aside:
Terminology
• Pooled = combined = collapsed = unadjusted
• Adjusted = summary = weighted, etc.
– All of these reflect some adjustment process such as
Mantel-Haenszel or Woolf or maximum likelihood
estimation to weight the strata and develop confidence
intervals about the estimate.
45
2014 Page 66

Confounding:Notation used in Mantel-
Haenszel estimators of relative risk
• Notation for case-control or cohort studies with count data
Case-control: RR = OR = ad / bc
Cohort: RR =
Ie
I0
46
=
a / (a + b)
c/ (c + d)
Cases Controls Total
Exposed
Nonexposed
a c b d a + b c + d
Total a + c b + d a + b + c + d = T
2014 Page 67

Confounding:Notation used in Mantel-
Haenszel estimators of relative risk (cont.)
• Notation for cohort studies with person-time data
RR =
Ie
I0
=
a / PY1
47
c / PY0
Cases Controls
Exposed
Nonexposed
a c ---
---
PY1
PY0
Total a + c T
2014 Page 68

Confounding:Mantel-Haenszel estimators of
relative risk for stratified data
Case-Control Study:
RRMH
=
∑(ad / T)i
∑(bc / T)i
Cohort Study with Count Denominators:
RRMH
= ∑{a(c + d) / T}i
∑{b(a + b) / T}I
Cohort Study with Person-years Denominators:
RRMH
= ∑{a(PY0
) / T}i
∑{b(PY1
) / T}i
48
2014 Page 69

cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
No Matches 90 630 51
Matches 820 340
No Matches 180 660
Smokers Cancer No cancer
Matches 810 270
No Matches 90 30
Non-smoker Cancer No cancer
Matches 10 70
2014 Page 70

Confounding:Mantel-Haenszel estimators of
relative risk for stratified data (smoking, matches,
lung cancer
RRMH
= ∑(ad / T)i
/ ∑(bc / T)i
Numerator of MH estimator:
• For smokers: (ad/T)=(810*30)/1200=20.25;
• For nonsmokers: (ad/T)=(10*630)/800=7.88;
• Add these together: 20.25 + 7.88=28.13 (numerator)
Denominator of MH estimator:
• For smokers: (bc/T)=(270*90)/1200=20.25;
• For nonsmokers: (bc/T)=(90*70)/800=7.88;
• Add these together: 20.25 + 7.88=28.13
•ORMH
= 28.13 / 28.13 = 1.0 (as expected since both stratified OR’s were = 1.0)
•Be sure to try this on stratified data in which the two strata are not exactly equal
to each other (but also not so different as to suggest that effect modification is
present
52
2014 Page 71

Confounding:Interpretation of ORMH
• If ORMH
(=1.0 in this example) “differs meaningfully”
from ORunadjusted
(=8.8 in this example) then confounding is
present
• What does “differs meaningfully” mean
– This is a matter of judgment based on biologic/clinical
sense rather than on a statistical test
– Even if they “differ” only slightly, generally the ORMH
rather than the ORcombined
is reported as the summary
effect estimate
• But what is one disadvantage of reporting ORMH
?
– Although there do exist statistical tests of confounding
they are not widely recommended (these tests evaluate53
Ho: OR
MH
= OR
unadjusted
2014 Page 72

67
JC: test of homogeneity
2014 Page 73

Hennekens, 1987, p305
54
2014 Page 74

Review what the X^2 means in this context.
58
2014 Page 77

• Confounding “pulls” the observed association away from the true
association
– It can either exaggerate/over-estimate the true association (positive
confounding)
• Example
– RRcausal
= 1.0
– RR
observed = 3.0
or
– It can hide/under-estimate the true association (negative
confounding)
• Example
– RRcausal
= 3.0
– RR
observed
= 1.0
Direction of Confounding Bias
40
2014 Page 79

Confounding:Summary of steps to evaluate
confounding
Table 12-10. Steps for the control of confounding and the evaluation of effect
modification through stratified analysis
1. Stratify by levels of the potential confounding factor.
2. Compute stratum-specific unconfounded relative risk estimates.
3. Evaluate similarity of the stratum-specific estimates by either eyeballing or
performing test of statistical significance. (More on this step later)
4. If the effect is thought to be uniform, calculate a pooled unconfounded summary
estimate using RRMH
. If effect is not uniform (i.e. effect modification is present,
skip to step 6)
5. Perform hypothesis testing on the unconfounded estimate, using Mantel-Haenszel
chi-square and compute confidence interval.
6. If effect is not thought to be uniform (i.e., if effect modification is present):
a. Report stratum-specific estimates, results of hypothesis testing, and
confidence intervals for each estimate
b.If desired, calculate a summary unconfounded estimate using a standar6
d6
ized
formula 2014 Page 80

67
JC: test of homogeneity
2014 Page 81

68
Effect modification (Interaction)
• Goals of stratification of data
– Evaluate and reduce/remove confounding
– Evaluate and describe effect modification
• Description of effect modification
– A change in the magnitude of an effect measure
(between exposure and disease) according to the level
of some third variable
– What two “classes” of effect measures have we used so
far in the course?
2014 Page 82

Effect modification: example
#1
• Disease incidence by exposure and age
– Does the relationship between exposure and disease change
over the value of the potential confounder (age)? How?
69
2014 Page 83

Effect modification: example #2
• Disease incidence by exposure and age
• Does the relationship between exposure and disease
change over the value of the potential confounder
(age)? How?
Rothman ’86 (p 178) 70
2014 Page 84

Effect modification: contrast
with confounding
• Confounding
– A bias that an investigator hopes to remove
– A nuisance that may or may not be present in a given
study design
• Properties of a confounding variable: (Rothman, p123):
– a) be a risk factor for disease among the non-exposed;
– b) be associated with the exposure variable; and
– c) not be an intermediate step in the “causal pathway”
71
2014 Page 85

with confounding
• Effect modification
– A more detailed description of the “true” relationship
between the exposure and the outcome
– Effect modification is a finding to be reported (even
celebrated), not a bias to be eliminated
– Effect modification is a “natural phenomenon” that
exists independently of the study design
– The presence and interpretation of effect modification
depends upon the choice of effect measure (ratio vs.
difference)
72
2014 Page 86

73
Some lingo
• Covariate
– Confounder, potential confounder
– Effect modification, interaction
– Intermediate variable
2014 Page 87

with confounding
• Note that for any association under study, a given factor
may be:
– Both a confounder and an effect modifier or
– A confounder but not an effect modifier or An effect
modifier but not a confounder or
– neither
74
2014 Page 88

Examples of confounding/effect modification
76
Level 1 Level 2 Crude/
collapsed/
Combined
“unadjusted”
Uniform
estimate
(ORMH
) /
“adjusted”
Confounding
present
Interaction
present
4.0 4.0 4.0 4.0 NO NO
4.0 0.25 1.0 1.0 NO YES
1.0 1.0 8.4 1.0 YES NO
4.0 0.25 1.0 2.0 YES
(?relevance)
YES
2014 Page 89

Effect modification: test of homogeneity
• Null hypothesis: The individual stratified estimates of the effect do not
differ from some uniform estimate of effect (such as a Mantel Haenszel
estimator)
• Notation:
–
– N is the number of strata (N=2 in our smoking/matches example);
– ln^Ri
is the natural logarithm of the estimated (hence the “^”) effect
measure for each stratum (ORi
in our example);
– ln^R is the natural logarithm of the uniform effect estimate (e.g. ORMH
in
X
2
(N-1)
is chi-square with (N-1) degrees of freedom;
our example—the computer will use the maximum likelihood estimate)
• One formula to test homogeneity:
X
2
(N-1)
=
∑
[ln(^ Ri
) – ln(RMH
)]2
Var[ln(^
Ri
)]
N
i= 1
78
JC: Comment on choice of signifciance level for test of homogeneity2014 Page 91

Paradox
• If effect modification is present, a uniform estimator of
effect (such as ORMH
) cannot (or at least should not) be
reported.
• However, in order to determine if effect modification is
present, it is necessary to calculate the value of a uniform
estimator of effect (such as ORMH
) because it is needed in
the calculation of the test of homogeneity.
79
2014 Page 92

Effect modification: test of homogeneity (or
is heterogeneity?)
• Comments
– If the test of homogeneity is “significant” (=“reject homogeneity”)
this is evidence that there is heterogeneity (i.e. no homogeneity)
and that effect modification may be present.
• (Null hypothesis: The individual stratified estimates of the
effect do not differ from some uniform estimate of effect)
– The choice of a significance level (e.g. p < 0.05) is somewhat open
to interpretation.
• One “conservative” approach, because of inherent limitations in
the power of the test of homogeneity, is to treat the data as if
interaction is present for p < 0.20).
• In other words, one would rather err on the side of assuming
that interaction is present (and reporting the stratified estimates
of effect) than on reporting a uniform estimate that may not be
true across strata.
80
2014 Page 93

Additive versus multiplicative scale effect
modification
● Notation: RXZ
● No additive interaction if (R11 – R01) = (R10 – R00)
○ Rewrite as: (R11-R01)-(R10-R00)=0
● In words: Difference in risk for (X=1 vs. X=0) when Z=1 is
equal to difference in risk for (X=1 vs. X=0) when Z=0
● Note: the values R11, R10, etc. are risks (not counts)
2014 Page 96

Additive versus multiplicative scale effect
modification
● Notation: RXZ
● No multiplicative interaction if (R11/R01)=(R10/R00)
Rewrite as: (R11/R01)/(R10/R00)=1
● In words: Ratio of risks/rates when X=1 vs. X=0 when
Z=1 is equal to ratio of risks/rates when X=1 vs. X=0
when Z=0
2014 Page 97

Effect modification is scale-dependent
• Evidence for effect modification/statistical interaction
if the RR or the AR differs between two groups
• However, effect modification/statistical interaction is
scale-dependent
– If you do not have interaction on the additive scale (AR is
homogenous) then you will have interaction on the multiplicative
scale (RR must be heterogeneous)
– If you do not have interaction on the multiplicative scale (RR is
homogenous) then you will have interaction on the additive scale
(AR must be heterogeneous)
– Note: It is common to have evidence of interaction on both
scales.
2014 Page 98

Example
● No additive scale interaction if (R11-R01)-(R10-R00)=0
● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (50-10) = 0
○ Interaction not present on the additive scale
● Relative scale: (60/20) / (50/10)=0.6
○ Interaction present on the relative scale
Z=1 Z=0
X=1 60 50
X=0 20 10
2014 Page 99

Example
● No additive scale interaction if (R11-R01)-(R10-R00)=0
● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (30-10) = 20
○ Interaction present on the additive scale
● Relative scale: (60/20) / (30/10)=1
○ Interaction not present on the relative scale
Z=1 Z=0
X=1 60 30
X=0 20 10
2014 Page 100

Logistic Regression
(time permitting)
2014 Page 101

cancer
’
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Smoking No
Smoking
Matches
Smoking
900
100
Cancer
810
300
700
No cancer
270
No Smoking
No matches
Smoking No
Smoking
10
Cancer
90
90
70
No cancer
30
630 84
2014 Page 102

A brief introduction to logistic regression
Let X1 = smoking (1=yes; 0=no)
Let X2 = matches (1=yes; 0=no)
Let Cancer = cancer (1=yes; 0=no)
Recall earlier tables:
OR=21.0
OR=21.0 OR=21.0
Conclusions: No confounding by matches of the relationship
between smoking and lung cancer; no effect modification by
matches of the relationship between smoking and lung cancer 85
Collapsed Cancer =1 Cancer=0
X1
=1 900 300
X1
=0 100 700
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 103

Data structure for computer analysis
• Most computer programs would want to see the data for
the individual subjects in the study in the following form:
H 0 0 0
86
Subject ID X1
X2
Cancer How many?
A 1 1 1
B 1 1 0
C 0 1 1
D 0 1 0
E 1 0 1
F 1 0 0
G 0 0 1
2014 Page 104

Data structure for computer analysis
• Most computer programs would want to see the data for
the individual subjects in the study in the following form:
87
Subject ID X1
X2
Cancer How many?
A 1 1 1 810 of these
B 1 1 0 270 of these
C 0 1 1 10 of these
D 0 1 0 70 of these
E 1 0 1 90 of these
F 1 0 0 30 of these
G 0 0 1 90 of these
H 0 0 0 630 of these
2014 Page 105

88
The basic logistic equation for this problem
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
(smoking) + b2
(matches) +
b3
(smoking)(matches)
2014 Page 106

Solving a logistic equation
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 0, solve for “a”
• ln (odds) = a = ln ( ) =
• a =
• So now: ln (odds) =
89
2014 Page 107

OR=21.0 OR=21.0
90
X2
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 108

Solving a logistic equation
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 0, solve for “a”
• ln (odds) = a = ln (90/630) = -1.946
• a = -1.946
• So now: ln (odds) = -1.946 + b1
X1
+ b2
X2
+
b3
X1
X2
91
2014 Page 109

92
Solving a logistic equation (cont.)
• When X1
= 1 and X2
= 0, solve for
b1
• ln (odds) =
• b1
=
2014 Page 110

93
OR=21.0 OR=21.0
X2
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 111

94
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 0, solve for b1
• ln (odds) = ln (90/30) = 1.099 = -1.946 + b1
• b1
= 3.045
• So now: ln (odds) = -1.946 + 3.045X1
+ b2
X2
+
b3
X1
X2
2014 Page 112

95
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 1, solve for b2
:
• ln (odds) = ln ( ) =
• b2
=
2014 Page 113

96
X2
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
OR=21.0 OR=21.0
2014 Page 114

97
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 1, solve for b2
:
• ln (odds) = ln (10/70) = -1.946 + 0 + b2
X2
+ 0
• b2
= 0
• So now: ln (odds) = -1.946 + 3.045X1
+ 0 +
b3
X1
X2
2014 Page 115

X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 1 then:
• ln (odds) =
• ln (odds) =
• Solve for b3
• ln (odds) =
• b3
=
98
2014 Page 116

99
X2
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
OR=21.0 OR=21.0
2014 Page 117

X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 1 then:
• ln (odds) = -1.946 + b1
+ b2
+ b3
• ln (odds) = -1.946 + 3.045 + 0 + b3
• Solve for b3
• ln (odds) = ln (810/270) = 1.099 = -1.946 + 3.045 +
b3
• b3
= 0
• So now: ln (odds) = -1.946 + 3.045X1
+ 0 + 0
100
2014 Page 118

X1
+ b2
X2
+ b3
X1
X2
• This simplifies (earlier calculations) to:
– ln (odds) = -1.946 + 3.045X1
+ 0 + 0
• One can now use the logistic equation to efficiently describe
relationships in the table
• Calculate the ln(odds) for a smoker who uses matches: ln
(odds)=
• Calculate the ln(odds) for a smoker who doesn’t use matches:
ln(odds) =
• Now calculate the odds ratio for (smokers vs. non-smokers//
matches+)
• At home, calculate the odds ratio for (smokers vs. non-
smokers// matches-)
101
2014 Page 119

X1
+ b2
X2
+ b3
X1
X2
• This simplifies (earlier calculations) to:
-1.946 + 3.045(X1
) + 0(X2
) + 0(X1
X2
)
• One can now use the logistic equation to efficiently describe
relationships in the table
• Calculate the ln(odds) for a smoker who uses matches (X1
= 1
and X2
= 1):
ln (odds)= -1.946 + 3.045 = 1.099
• Calculate the ln(odds) for a smoker who doesn’t use matches (X1
= 1 and X2
= 0):
ln(odds) = -1.946 + 3.045 = 1.099
• Now calculate the odds ratio for (smokers vs. non-smokers//
matches)
• At home, calculate the odds ratio for (smokers vs. non-smokers//
102
no matches)
2014 Page 120

Logistic Regression
Using the logistic model model developed in class for the matches-
smoking-lung cancer data (stratified by matches), evaluate the risk
of lung cancer for:
1. (in-class) A smoker who uses matches vs. a non-smoker who uses
matches.
2. (at home) A smoker who uses matches vs. a non-smoker who
does not use matches
SEPARATE ASSIGNMENT
Develop a logistic model for the matches-smoking-lung cancer
data (stratified by smoking status). Use this model to evaluate
the risk of lung cancer for:
1. (at home) A user of matches who smokes vs. a non-user of
matches who smokes.
2. (at home) A smoker who uses matches vs. a non-smoker who uses
matches. Is this result consistent with that you arrived at in the
103
class example above?
2014 Page 121

Find OR for smokers (who use matches) vs. non-smokers
(who use matches)
For Smokers who use matches X1
= 1
X2
= 1
For non-smokers who use matches X1
= 0
X2
= 1
From prior slides we determined that: ln (odds) = -1.946
+ 3.045 (X1
)
105
2014 Page 122

For smokers who use matches (X1
= 1; X2
= 1) ln (odds) =
-1.946 + 3.045 (1) = 1.0990
For non-smokers who use matches (X1
= 0; X2
= 1) ln (odds) =
-1.946 + 0 + 0 + 0 = -1.946
We want to solve:
ln OR = 1.0990 – (-1.946) = 3.045 eln OR
= OR = e3.045
= 21.0
Therefore, the odds ratio (determined using logistic regression)
comparing smokers using matches to non-smokers using
matches is 21.0. This agrees with the stratified data
presented earlier.
106
2014 Page 123

cancer
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s’
No Smoking 90 630 107
Smoking 900 300
No Smoking 100 700
Matches Cancer No cancer
Smoking 810 270
No Smoking 10 70
No matches Cancer No cancer
Smoking 90 30
2014 Page 124

Some concluding comments on logistic
regression
• Interpretations of the final logistic equation for these data:
ln (odds of disease) = a + b1
(smoking) + b2
(matches) + b3
(smoking)(matches)
ln(odds) = -1.946 + 3.045(smoking) + 0(matches) + 0(matches)(smoking)
• This equation describes the data whether stratified either by matches or
by smoking.
• The relationship of multiple variables may be simultaneously adjusted
for by the the logistic equations
• The estimates of the coefficients for the equation are derived through
maximum likelihood techniques
• This technique is very widely used in epidemiologic (and other)
applications when the outcome variable of interest is dichotomous. 108
2014 Page 125

Some concluding comments on logistic
regression
• Comments
– Having multiple strata (how this technique makes
possible)
– Test of homogeneity (b3
)
Maximum likelihood estimation for coefficient
estimation
• Modifications of logistic regression exist for coping with
– Outcome variables with multiple levels = polytomous
logistic regression
– Studies in which matching was used = Conditional
logistic regression 109
2014 Page 126

. use http://www.stata-press.com/data/r8/lbw
storage display value
variable name type format variable label
--------------------------------------------------------------
-----------------
110
id
low
int
byte
%8.0g
%8.0g
identification code
birth weight<2500g
age
lwt
race
smoke
ptl
ht
ui
ftv
byte
int
byte
byte
byte
byte
byte
byte
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
age of
weight
race
smoked
mother
at last menstrual period
during pregnancy
premature labor history (count)
has history of hypertension
presence, uterine irritability
number of visits to physician
during 1st trimester
birth weight (grams)bwt int %8.0g
2014 Page 127

Special (and very useful) STATA command
“xi” (=“interaction expansion”)
• xi: logistic low age lowwt i.race smoke pt1 ht ui
• In this example, a variable named “race” has three levels
(e.g. white/hispanic/black) that might be coded as
“0=white”; “1=hispanic”; “2=black”
• The combined use of xi and i.race directs STATA to
analyze all levels of race (and compare them to level 1)—
this can be a HUGE time-saver (avoids the user having to
manually recode such variables)!
111
2014 Page 128

Assignments
• Write the logistic model describing these data (next slide).
• What is the risk of low birth weight (LBW) for a smoker,
adjusted for all other variables?
• How can the 95% CI be determined?
• What is the risk of LBW for an Hispanic baby (compared
to a white baby)?
• What is the risk of LBW for a black baby (compared to an
Hispanic baby)?
112
2014 Page 129

114
Discuss intercept
2014 Page 131

4 Threats to validity from confounding bias and effect modification

More Related Content

What's hot (20)

Viewers also liked (20)

Similar to 4 Threats to validity from confounding bias and effect modification (20)

More from A M (20)

Recently uploaded (20)

4 Threats to validity from confounding bias and effect modification