SlideShare a Scribd company logo
Threats to Validity from Confounding and
Effect Modification
•  Overview: Random vs. systematic error
•  Confounding
•  Effect Modification
•  Logistic regression (time permitting)
•  Special thanks for some of the materials in these
lecture:
–  Professor Jen Ahern (UCB)
–  Professor Madhu Pai (McGilll—a former
250b GSI)
1
2014 Page 1
1
The cardinal rule of epidemiology
• Remember that all results based on epidemiology
studies are likely to be …
2014 Page 2
The cardinal rule of epidemiology (continued)
• WRONG…
– unless proper care has been taken to eliminate
all sources of error in the estimate (…and
sometimes even then the results will be
wrong because of unknown sources of error)
2
2014 Page 3
Example: Confounding
• A colleague with outside funding believes that cigarette smoke
is not a “cause” (in any sense) of lung cancer but that exposure
to matches (yes, matches) is the cause. This colleague has
conducted a large case control study to test the null hypothesis:
Ho
: “Matches are not associated with lung cancer”.
• What’s the rationale (in the Popperian sense) for stating the null
hypothesis rather than the alternative:
HA
: “Matches are associated with lung cancer”.
• What does the colleague hope to do (in terms of hypothesis
testing)
• What do you think of the term “associated” –would it be better
to write “a cause of”? 2014 Page 4
• “We can never finally prove our scientific
theories, we can merely (provisionally)
confirm or (conclusively) refute them.”
– - Karl Popper
Sir Karl Raimund Popper CH FBA FRS
[4]
(28 July 1902 – 17 September 1994) was an Austrian-British[5]
philosopher and professor at the London School of Economics.[6]
He is generally regarded o regarded as
one of the greatest philosophers of science of the 20th century.[7][8]
Popper is known for his rejection of the
classical inductivist views on the scientific method, in favour of empirical falsification: regarded as one of the
greatest philosophers of science of the 20th century.[7][8]
(wikipedia.com)
2014 Page 5
Confounding: smoking, matches,
10
and lung cancer
• Your colleague has located 1000 cases of lung cancer, of
whom 820 carry matches.
• Among 1000 reference patients (selected randomly from a
population with recently taken normal chest x-rays), 340
carry matches.
• Strengths of the reference selection process? Weaknesses?
• Describe the relationship between matches and lung cancer
in your colleague’s data.
• Would you like to analyze the data in any other fashion?
2014 Page 6
Confounding: smoking, matches,
and lung cancer
• Odds ratio = (820 * 660) / (180 * 340)
• OR = 8.8
• 95% CI (7.2, 10.9)
Cancer No cancer
Matches 820 340
No matches 180 660
2014 Page 7
Confounding: smoking, matches,
and lung cancer
• You decide to look at the relationship between matches
and lung cancer in the smokers separately from the non-
smokers.
• You find that among the 1000 cases, 900 are smokers and
810 (of the 900) carry matches
• Among the 1000 reference patients, 300 are smokers and
270 (of the 300) carry matches
• Calculate the relevant measure(s) of effect.
• What should your colleague do about future funding?
2014 Page 8
Confounding: smoking, matches, and lung
cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
Pooled Cancer No cancer
Matches No
Matches
Smokers
Matches
820
180
Cancer
810
340
660
No cancer
270
No Matches
Non-smoker
Matches
No Matches
90
Cancer
10
90
30
No cancer
70
630 13
2014 Page 9
Confounding: smoking, matches,
and lung cancer
• To be complete, you also decide to examine the
relationship between smoking and lung cancer.
• What tables should you construct to do this?
14
2014 Page 10
Confounding: smoking, matches, and lung
cancer
’
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Pooled Cancer No cancer
Smoking No
Smoking
Matches
Smoking
900
100
Cancer
810
300
700
No cancer
270
No Smoking
No matches
Smoking No
Smoking
10
Cancer
90
90
70
No cancer
30
630 16
2014 Page 11
Confounder?
? ?
? Unadjusted RR
Exposure Disease
? Adjusted
RR
19
2014 Page 12
2
BMJ 2004;329:868-869 (16 October)
Why is confounding so important in
epidemiology?
● BMJ Editorial: “The scandal of poor epidemiological
research” [16 October 2004]
● “Confounding, the situation in which an apparent
effect of an exposure on risk is explained by its
association with other factors, is probably the
most important cause of spurious associations in
observational epidemiology.”
2014 Page 13
Overview
3
● Causality is the central concern of epidemiology
● Confounding is the central concern with establishing
causality
● Confounding can be understood using multiple
different approaches
● A strong understanding of various approaches to
confounding and its control is essential for all those
who engage in health research
2014 Page 14
10
Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.
Causal Effect
Random Error
Confounding
Information bias (misclassification)
Selection bias
Bias in inference
Reporting & publication bias
Bias in knowledge use
Confounding is one of the key biases in identifying
causal effects
RR
causal
“truth”
RR
association
2014 Page 15
11
Confounding:
4 ways to understand
it!
1. “Mixing of effects”
2. “Classical” approach based on a priori
criteria
3. Collapsibility and data-based criteria
4. “Counterfactual” and non-comparability
approaches
2014 Page 16
12
Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002
First approach:
Confounding: mixing of effects
● “Confounding is confusion, or mixing, of
effects; the effect of the exposure is mixed
together with the effect of another variable,
leading to bias” - Rothman, 2002
Latin: “confundere” is to mix together
2014 Page 17
Example
Association between birth order and Down syndrome
13
Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 18
Association between maternal age and Down syndrome
14
Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 19
Association between maternal age and Down syndrome, stratified by
birth order
15
Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 20
Mixing of Effects: the water pipes analogy
Exposure
16
Adapted from Jewell NP. Statistics for Epidemiology. Chapman & Hall,
2003
Outcome
Confounder
Mixing of effects – cannot separate the effect of exposure from that of
confounder
Exposure and disease
share a common cause (‘parent’)
2014 Page 21
Mixing of Effects: “control” of the confounder
Exposure
17
Adapted from: Jewell NP. Statistics for Epidemiology. Chapman & Hall,
2003
Outcome
Confounder
Successful “control” of confounding (adjustment)
If the common cause (‘parent’)
is blocked, then the exposure –
disease association becomes
clearer
2014 Page 22
Second approach: “Classical” approach
based on a priori criteria
18
“Bias of the estimated effect of an exposure on an outcome due to
the presence of a common cause of the exposure and the
outcome” – Porta 2008
● A factor is a confounder if 3 criteria are met:
● a) a confounder must be causally or noncausally
associated with the exposure in the source population
(study base) being studied;
● b) a confounder must be a causal risk factor (or a
surrogate measure of a cause) for the disease in the
unexposed cohort; and
● c) a confounder must not be an intermediate cause (in
other words, a confounder must not be an intermediate
step in the causal pathway between the exposure and the
disease)
2014 Page 23
19
Exposure
E
Disease (outcome)
D
Confounder
C
Confounding Schematic
Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc.,
2000. Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th
Edition.
2014 Page 24
Exposure
E
Confounder
C
Intermediate cause
Disease
D
20
2014 Page 25
Exposure
E
Confounder
C
General idea: a confounder could be a ‘parent’ of the
exposure, but should not be be a ‘daughter’ of the
exposure
Disease
D
21
2014 Page 26
Example of schematic (from Gordis)
22
2014 Page 27
Birth Order
E
23
Down Syndrome
D
Confounding factor:
Maternal Age
C
Confounding Schematic
2014 Page 28
HRT use Heart disease
Association between HRT and heart disease
Confounding factor:
SES
24
Are confounding criteria met?
2014 Page 29
BRCA1 gene Breast cancer
Confounding factor:
Age
x
25
Are confounding criteria met?
Should we adjust for age, when evaluating the association
between a genetic factor and risk of breast cancer?
No!
2014 Page 30
Sex with multiple partners Cervical cancer
Confounding factor:
HPV
Are confounding criteria met?
26
2014 Page 31
Sex with
multiple
partners
HPV Cervical
cancer
27
What if this was the underlying causal
mechanism?
2014 Page 32
Obesity Mortality
Are confounding criteria met?
Confounding factor:
Hypertension
28
2014 Page 33
Obesity Hypertension Mortality
29
What if this was the underlying causal
mechanism?
2014 Page 34
Direct vs indirect effects
Obesity Hypertension Mortality
Obesity
Indirect effect
Hypertension Mortality
Direct effect
Direct effect is portion of the total effect that does not act via an intermediate cause 30
Indirect effect
2014 Page 35
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an
appl3
ic3
ation to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
Simple causal
graphs
E DC
Maternal age (C) can confound the association
between multivitamin use (E) and the risk of certain
birth defects (D)
2014 Page 36
34
Complex causal graphs
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation:
an application to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
E DC
U
History of birth defects (C) may increase the chance of
periconceptional vitamin intake (E). A genetic factor (U) could
have been the cause of previous birth defects in the family,
and could again cause birth defects in the current pregnancy
2014 Page 37
35
Smoking
A
E
Calcium
D
Bone
fractures
C
BMI
supplementation
U
Physical
Activity
B
Source: Hertz-Picciotto
More complicated causal graphs!
2014 Page 38
The ultimate complex causal
graph!
36
A PowerPoint diagram meant to portray the
complexity of American strategy in
Afghanistan!
2014 Page 39
38
Third approach: Collapsibility and data-
based approaches
● According to this definition, a factor is a confounding
variable if
● a) the effect measure is homogeneous across the strata
defined by the confounder and
● b) the crude and common stratum-specific (adjusted) effect
measures are unequal (this is called “lack of collapsibility”)
● Usually evaluated using 2x2 tables, and simple
stratified analyses to compare crude effects with
adjusted effects
“Collapsibility is equality of stratum-specific measures of effect with the crude
(collapsed), unstratified measure” Porta, 2008, Dictionary
2014 Page 40
39
Crude vs. Adjusted Effects
● Crude: does not take into account the effect of the
confounding variable
● Adjusted: accounts for the confounding variable(s)
(what we get by pooling stratum-specific effect
estimates)
● Generating using methods such as Mantel-Haenszel
estimator
● Also generated using multivariate analyses (e.g. logistic
regression)
● Confounding is likely when:
●
●
RR
crude
=/= RR
adjusted
OR
crude
=/= OR
adjusted
2014 Page 41
42
Crude 2 x 2 table
Calculate Crude OR (or RR)
Stratify by Confounder
Calculate OR’s
for each stratum
If stratum-specific OR’s are
similar, calculate adjusted RR (e.
g. MH)
Crude
Stratum 1 Stratum 2
If Crude OR =/= Adjusted OR,
confounding is likely
If Crude OR = Adjusted
OR, confounding is
unlikely
OR
Crude
OR1
OR2
Stratified Analysis
JC: introduce “test of homogeneity”
2014 Page 42
Examples: crude vs adjusted RR
Study Crude RR Stratum1 Stratum2 Adjusted Confound
RR RR RR ing?
1 6.00 3.20 3.50 3.30
2 2.00 1.02 1.10 1.08
3 1.10 2.00 2.00 2.00
4 0.56 0.50 0.60 0.54
5 4.20 4.00 4.10 4.04
6 1.70 0.03 3.50
48
2014 Page 43
49
Maldonado & Greenland, Int J Epi 2002;31:422-29
Fourth approach:
Causality: counterfactual
model
● Ideal “causal contrast” between exposed and
unexposed groups:
● “A causal contrast compares disease frequency
under two exposure distributions, but in one target
population during one etiologic time period”
● If the ideal causal contrast is met, the observed
effect is the “causal effect”
2014 Page 44
52
What happens actually?
RR
assoc
= I
exp
/ I
substitute
RR
causal
= I
exp
/ I
unexp
IDEAL
ACTUAL
2014 Page 45
50
I
exp
Iunexp
Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual, unexposed cohort
RR
causal
= I
exp
/ I
unexp
“A causal contrast compares disease frequency under two exposure distributions, but in one
Exposed cohort
Ideal counterfactual comparison to determine causal effects
target population during one etiologic time period”
“Initial conditions” are identical in
the exposed and unexposed groups
– because they are the same
population!
2014 Page 46
51
I
exp
Iunexp
Counterfactual, unexposed cohort
Exposed cohort
Substitute, unexposed cohort
Isubstitute
What happens actually?
counterfactual state
is not observed
A substitute will usually be a population other than the target
population during the etiologic time period - INITIAL CONDITIONS
MAY BE DIFFERENT
2014 Page 47
53
Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual definition of confounding
● “Confounding is present if the substitute
population imperfectly represents what the
target would have been like under the
counterfactual condition”
● “An association measure is confounded (or biased
due to confounding) for a causal contrast if it does
not equal that causal contrast because of such an
imperfect substitution”
RR
causal
=/=
RR
assoc
2014 Page 48
Residual confounding
• Confounding can persist, even after adjustment
• Why?
– All confounders were not adjusted for (unmeasured confounding)
– Some variables were actually not confounders!
– Confounders were measured with error (misclassification of
confounders)
– Categories of the confounding variable are improperly defined
(e.g. age categories were too broad)
51
2014 Page 49
55
Simulating the counter-factual comparison:
Experimental Studies: RCT
Randomization helps to make the groups “comparable” (i.e. similar
initial conditions) with respect to known and unknown confounders
Therefore confounding is unlikely at randomization - time t0
Eligible patients
Treatment
Randomization
Placebo
Outcomes
Outcomes
2014 Page 50
Confounding: Methods to control
or reduce confounding
• Methods used in study design to reduce confounding
– Randomization
– Restriction
– Matching
• Methods used in study analysis to reduce confounding
– Stratified analysis
– Multivariate analysis
31
2014 Page 51
Confounding:The use of randomization to
“ ”
reduce confounding
• Randomization
– Useful only for intervention studies
– Definition: random assignment of study subjects to
exposure categories
– The special strength of randomization is its ability to
control/reduce the effect of confounding variables about
which the investigator is unaware
– If there is maldistribution of potentially confounding
variables after randomization (the reason for the classic
“Table I: Baseline characteristics” in the randomized trial)
then other confounding control options (see below) are
32applied
2014 Page 52
Substitute, unexposed cohort
54
Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual, unexposed cohort
Exposed cohort
“Confounding is
present if the
substitute
population
imperfectly
represents what
the target would
have been like
under the
counterfactual
condition”
2014 Page 53
Confounding: The use of restriction to
reduce confounding
• Confounding cannot occur if the distribution of the
potential confounding factors do not vary across exposure
or disease categories
– Implication of this is that an investigator may restrict
study subjects to only those falling with specific level
(s) of a confounding variable
• Extreme example: an investigator only selects
subjects of exactly the same age.
• Advantages of restriction
– straightforward, convenient, inexpensive
33
2014 Page 54
Confounding: The use of restriction to
reduce confounding (cont.)
• Disadvantages
– May limit number of eligible subjects
– Residual confounding may persist if restriction
categories not sufficiently narrow (e.g. “decade of age”
might be too broad)
– Not possible to evaluate the relationship of interest at
different levels of the confounder
• Question: How does restriction differ from matching?
34
2014 Page 55
Confounding:The use of matching to reduce
confounding
• Subjects with all levels of a potential confounder are
admitted into the study BUT the control/reference subjects
(either with respect to exposure in a cohort or disease in a
case-reference study) are chosen to have the same
distribution of the potential confounder
• The use of matching (may) also require special analysis
techniques (matched analyses and conditional logistic
regression)
35
2014 Page 56
• Disadvantages of matching
– Finding appropriate control/reference subjects may be
difficult and expensive and limit sample size
– Matching is most often used in case-reference (i.e.
case- control studies because in a large cohort study the
cost of matching may be prohibitive)
• Thus, in cohort studies it’s often cheaper to just
enroll available controls and use analytic methods
(below) to control confounding)—this doesn’t apply
to computerized “free” data
36
2014 Page 57
Confounding: The use of matching to
reduce confounding (cont.)
• Disadvantages of matching (cont.)
– Confounding factor used to match subjects cannot be
itself evaluated with respect to the outcome/disease
– Obviously, matching does not control for confounding
by factors other than that used to match
– The use of matching makes the use of stratified analysis
(for the control of other potential but non-matched
factors) very difficult
• One way around this problem is the use of
conditional logistic regression but there is a large
reduction in “effective” sample size because only
discordant pairs are used.
37
2014 Page 58
• Advantages of matching
– Matching may be the only way to obtain sufficient
numbers of control/reference subjects with relevant
levels of the confounding factor(s)
– Example: controlling for “neighborhood” (and all that
it implies) by any approach other than matching is very
difficult
38
2014 Page 59
• Advantages of matching (cont.)
– Useful in very small studies in which chance
differences in confounding factors are likely to exist
between the study groups and other forms of control for
the confounders (such as stratification or multivariate
adjustment) are not possible (because of the limited
sample size)
– The full benefit of matching (in terms of the reduction
of confounding) is obtained only if the proper form of
matched analysis is used (to be reviewed later in the
course)
39
2014 Page 60
• Basic goal of stratification is to evaluate the relationship
between the predictor (“cause”) and outcome (“effect”)
variable in strata homogenous with respect to potentially
confounding variables
40
2014 Page 61
Confounding:The use of stratification to
reduce confounding
• For example, to examine the relationship between smoking
and lung cancer while controlling for the potentially
confounding effect of gender:
– Create a 2x2 table (smoking vs. lung cancer) for men
and women separately
– To control for multiple confounders simultaneously,
stratify by pairs (or triplets or higher) of confounding
factors. For example, to control for gender and
race/ethnicity determine the OR for smoking vs. lung
cancer in multiple strata: white women, black
women, Hispanic women, white men, black men,
Hispanic men,etc. 41
2014 Page 62
• (From the earlier example): Goal: create a summary or
“adjusted” estimate for the relationship between matches
and lung cancer while adjusting for the two levels of
smoking (the potential confounder)
• This process is analgous to the standardization of rates
earlier in the course—in those examples the purpose of
adjustment was to remove the confounding effect of age on
the relationship between populations (A vs. B etc.) and
rates of disease or death.
• In the present example the goal is to remove the
confounding effect of smoking on the relationship between
matches and lung cancer. 42
2014 Page 63
Confounding:Types of summary estimators
to determine uniform effect over strata
• Mantel-Haenszel
– We will use this estimator in the present course
– Resistant to the effects of small strata or cells with a
value of “0”
– Computationally a piece of cake
• Directly pooled estimators (e.g. Woolf)
– Sensitive to small strata and cells with value “0”
– Computationally messy but doable
• Maximum likelihood
– The most “appropriate” estimator
– Resistant to the effects of small strata or cells with a
value of “0”
– Computationally
challenging
43
2014 Page 64
Confounding: smoking, matches, and lung
cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
Pooled Cancer No cancer
Matches No
Matches
Smokers
Matches
820
180
Cancer
810
340
660
No cancer
270
No Matches
Non-smoker
Matches
No Matches
90
Cancer
10
90
30
No cancer
70
630 44
2014 Page 65
An aside:
Terminology
• Pooled = combined = collapsed = unadjusted
• Adjusted = summary = weighted, etc.
– All of these reflect some adjustment process such as
Mantel-Haenszel or Woolf or maximum likelihood
estimation to weight the strata and develop confidence
intervals about the estimate.
45
2014 Page 66
Confounding:Notation used in Mantel-
Haenszel estimators of relative risk
• Notation for case-control or cohort studies with count data
Case-control: RR = OR = ad / bc
Cohort: RR =
Ie
I0
46
=
a / (a + b)
c/ (c + d)
Cases Controls Total
Exposed
Nonexposed
a c b d a + b c + d
Total a + c b + d a + b + c + d = T
2014 Page 67
Confounding:Notation used in Mantel-
Haenszel estimators of relative risk (cont.)
• Notation for cohort studies with person-time data
RR =
Ie
I0
=
a / PY1
47
c / PY0
Cases Controls
Exposed
Nonexposed
a c ---
---
PY1
PY0
Total a + c T
2014 Page 68
Confounding:Mantel-Haenszel estimators of
relative risk for stratified data
Case-Control Study:
RRMH
=
∑(ad / T)i
∑(bc / T)i
Cohort Study with Count Denominators:
RRMH
= ∑{a(c + d) / T}i
∑{b(a + b) / T}I
Cohort Study with Person-years Denominators:
RRMH
= ∑{a(PY0
) / T}i
∑{b(PY1
) / T}i
48
2014 Page 69
Confounding: smoking, matches, and lung
cancer
• ORpooled
= 8.84 (7.2, 10.9)
• ORsmokers
= 1.0 (0.6, 1.5)
• ORnonsmokers
= 1.0 (0.5, 2.0)
No Matches 90 630 51
Pooled Cancer No cancer
Matches 820 340
No Matches 180 660
Smokers Cancer No cancer
Matches 810 270
No Matches 90 30
Non-smoker Cancer No cancer
Matches 10 70
2014 Page 70
Confounding:Mantel-Haenszel estimators of
relative risk for stratified data (smoking, matches,
lung cancer
RRMH
= ∑(ad / T)i
/ ∑(bc / T)i
Numerator of MH estimator:
• For smokers: (ad/T)=(810*30)/1200=20.25;
• For nonsmokers: (ad/T)=(10*630)/800=7.88;
• Add these together: 20.25 + 7.88=28.13 (numerator)
Denominator of MH estimator:
• For smokers: (bc/T)=(270*90)/1200=20.25;
• For nonsmokers: (bc/T)=(90*70)/800=7.88;
• Add these together: 20.25 + 7.88=28.13
•ORMH
= 28.13 / 28.13 = 1.0 (as expected since both stratified OR’s were = 1.0)
•Be sure to try this on stratified data in which the two strata are not exactly equal
to each other (but also not so different as to suggest that effect modification is
present
52
2014 Page 71
Confounding:Interpretation of ORMH
• If ORMH
(=1.0 in this example) “differs meaningfully”
from ORunadjusted
(=8.8 in this example) then confounding is
present
• What does “differs meaningfully” mean
– This is a matter of judgment based on biologic/clinical
sense rather than on a statistical test
– Even if they “differ” only slightly, generally the ORMH
rather than the ORcombined
is reported as the summary
effect estimate
• But what is one disadvantage of reporting ORMH
?
– Although there do exist statistical tests of confounding
they are not widely recommended (these tests evaluate53
Ho: OR
MH
= OR
unadjusted
2014 Page 72
67
JC: test of homogeneity
2014 Page 73
Hennekens, 1987, p305
54
2014 Page 74
55
2014 Page 75
56
2014 Page 76
Review what the X^2 means in this context.
58
2014 Page 77
59
2014 Page 78
• Confounding “pulls” the observed association away from the true
association
– It can either exaggerate/over-estimate the true association (positive
confounding)
• Example
– RRcausal
= 1.0
– RR
observed = 3.0
or
– It can hide/under-estimate the true association (negative
confounding)
• Example
– RRcausal
= 3.0
– RR
observed
= 1.0
Direction of Confounding Bias
40
2014 Page 79
Confounding:Summary of steps to evaluate
confounding
Table 12-10. Steps for the control of confounding and the evaluation of effect
modification through stratified analysis
1. Stratify by levels of the potential confounding factor.
2. Compute stratum-specific unconfounded relative risk estimates.
3. Evaluate similarity of the stratum-specific estimates by either eyeballing or
performing test of statistical significance. (More on this step later)
4. If the effect is thought to be uniform, calculate a pooled unconfounded summary
estimate using RRMH
. If effect is not uniform (i.e. effect modification is present,
skip to step 6)
5. Perform hypothesis testing on the unconfounded estimate, using Mantel-Haenszel
chi-square and compute confidence interval.
6. If effect is not thought to be uniform (i.e., if effect modification is present):
a. Report stratum-specific estimates, results of hypothesis testing, and
confidence intervals for each estimate
b.If desired, calculate a summary unconfounded estimate using a standar6
d6
ized
formula 2014 Page 80
67
JC: test of homogeneity
2014 Page 81
68
Effect modification (Interaction)
• Goals of stratification of data
– Evaluate and reduce/remove confounding
– Evaluate and describe effect modification
• Description of effect modification
– A change in the magnitude of an effect measure
(between exposure and disease) according to the level
of some third variable
– What two “classes” of effect measures have we used so
far in the course?
2014 Page 82
Effect modification: example
#1
• Disease incidence by exposure and age
– Does the relationship between exposure and disease change
over the value of the potential confounder (age)? How?
69
2014 Page 83
Effect modification: example #2
• Disease incidence by exposure and age
• Does the relationship between exposure and disease
change over the value of the potential confounder
(age)? How?
Rothman ’86 (p 178) 70
2014 Page 84
Effect modification: contrast
with confounding
• Confounding
– A bias that an investigator hopes to remove
– A nuisance that may or may not be present in a given
study design
• Properties of a confounding variable: (Rothman, p123):
– a) be a risk factor for disease among the non-exposed;
– b) be associated with the exposure variable; and
– c) not be an intermediate step in the “causal pathway”
71
2014 Page 85
Effect modification: contrast
with confounding
• Effect modification
– A more detailed description of the “true” relationship
between the exposure and the outcome
– Effect modification is a finding to be reported (even
celebrated), not a bias to be eliminated
– Effect modification is a “natural phenomenon” that
exists independently of the study design
– The presence and interpretation of effect modification
depends upon the choice of effect measure (ratio vs.
difference)
72
2014 Page 86
73
Some lingo
• Covariate
– Confounder, potential confounder
– Effect modification, interaction
– Intermediate variable
2014 Page 87
Effect modification: contrast
with confounding
• Note that for any association under study, a given factor
may be:
– Both a confounder and an effect modifier or
– A confounder but not an effect modifier or An effect
modifier but not a confounder or
– neither
74
2014 Page 88
Examples of confounding/effect modification
76
Level 1 Level 2 Crude/
collapsed/
Combined
“unadjusted”
Uniform
estimate
(ORMH
) /
“adjusted”
Confounding
present
Interaction
present
4.0 4.0 4.0 4.0 NO NO
4.0 0.25 1.0 1.0 NO YES
1.0 1.0 8.4 1.0 YES NO
4.0 0.25 1.0 2.0 YES
(?relevance)
YES
2014 Page 89
77
2014 Page 90
Effect modification: test of homogeneity
• Null hypothesis: The individual stratified estimates of the effect do not
differ from some uniform estimate of effect (such as a Mantel Haenszel
estimator)
• Notation:
–
– N is the number of strata (N=2 in our smoking/matches example);
– ln^Ri
is the natural logarithm of the estimated (hence the “^”) effect
measure for each stratum (ORi
in our example);
– ln^R is the natural logarithm of the uniform effect estimate (e.g. ORMH
in
X
2
(N-1)
is chi-square with (N-1) degrees of freedom;
our example—the computer will use the maximum likelihood estimate)
• One formula to test homogeneity:
X
2
(N-1)
=
∑
[ln(^ Ri
) – ln(RMH
)]2
Var[ln(^
Ri
)]
N
i= 1
78
JC: Comment on choice of signifciance level for test of homogeneity2014 Page 91
Paradox
• If effect modification is present, a uniform estimator of
effect (such as ORMH
) cannot (or at least should not) be
reported.
• However, in order to determine if effect modification is
present, it is necessary to calculate the value of a uniform
estimator of effect (such as ORMH
) because it is needed in
the calculation of the test of homogeneity.
79
2014 Page 92
Effect modification: test of homogeneity (or
is heterogeneity?)
• Comments
– If the test of homogeneity is “significant” (=“reject homogeneity”)
this is evidence that there is heterogeneity (i.e. no homogeneity)
and that effect modification may be present.
• (Null hypothesis: The individual stratified estimates of the
effect do not differ from some uniform estimate of effect)
– The choice of a significance level (e.g. p < 0.05) is somewhat open
to interpretation.
• One “conservative” approach, because of inherent limitations in
the power of the test of homogeneity, is to treat the data as if
interaction is present for p < 0.20).
• In other words, one would rather err on the side of assuming
that interaction is present (and reporting the stratified estimates
of effect) than on reporting a uniform estimate that may not be
true across strata.
80
2014 Page 93
UC Berkeley
34
2014 Page 94
81
2014 Page 95
Additive versus multiplicative scale effect
modification
● Notation: RXZ
● No additive interaction if (R11 – R01) = (R10 – R00)
○ Rewrite as: (R11-R01)-(R10-R00)=0
● In words: Difference in risk for (X=1 vs. X=0) when Z=1 is
equal to difference in risk for (X=1 vs. X=0) when Z=0
● Note: the values R11, R10, etc. are risks (not counts)
2014 Page 96
Additive versus multiplicative scale effect
modification
● Notation: RXZ
● No multiplicative interaction if (R11/R01)=(R10/R00)
Rewrite as: (R11/R01)/(R10/R00)=1
● In words: Ratio of risks/rates when X=1 vs. X=0 when
Z=1 is equal to ratio of risks/rates when X=1 vs. X=0
when Z=0
2014 Page 97
Effect modification is scale-dependent
• Evidence for effect modification/statistical interaction
if the RR or the AR differs between two groups
• However, effect modification/statistical interaction is
scale-dependent
– If you do not have interaction on the additive scale (AR is
homogenous) then you will have interaction on the multiplicative
scale (RR must be heterogeneous)
– If you do not have interaction on the multiplicative scale (RR is
homogenous) then you will have interaction on the additive scale
(AR must be heterogeneous)
– Note: It is common to have evidence of interaction on both
scales.
2014 Page 98
Example
● No additive scale interaction if (R11-R01)-(R10-R00)=0
● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (50-10) = 0
○ Interaction not present on the additive scale
● Relative scale: (60/20) / (50/10)=0.6
○ Interaction present on the relative scale
Z=1 Z=0
X=1 60 50
X=0 20 10
2014 Page 99
Example
● No additive scale interaction if (R11-R01)-(R10-R00)=0
● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (30-10) = 20
○ Interaction present on the additive scale
● Relative scale: (60/20) / (30/10)=1
○ Interaction not present on the relative scale
Z=1 Z=0
X=1 60 30
X=0 20 10
2014 Page 100
Logistic Regression
(time permitting)
2014 Page 101
Confounding: smoking, matches, and lung
cancer
’
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Pooled Cancer No cancer
Smoking No
Smoking
Matches
Smoking
900
100
Cancer
810
300
700
No cancer
270
No Smoking
No matches
Smoking No
Smoking
10
Cancer
90
90
70
No cancer
30
630 84
2014 Page 102
A brief introduction to logistic regression
Let X1 = smoking (1=yes; 0=no)
Let X2 = matches (1=yes; 0=no)
Let Cancer = cancer (1=yes; 0=no)
Recall earlier tables:
OR=21.0
OR=21.0 OR=21.0
Conclusions: No confounding by matches of the relationship
between smoking and lung cancer; no effect modification by
matches of the relationship between smoking and lung cancer 85
Collapsed Cancer =1 Cancer=0
X1
=1 900 300
X1
=0 100 700
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 103
Data structure for computer analysis
• Most computer programs would want to see the data for
the individual subjects in the study in the following form:
H 0 0 0
86
Subject ID X1
X2
Cancer How many?
A 1 1 1
B 1 1 0
C 0 1 1
D 0 1 0
E 1 0 1
F 1 0 0
G 0 0 1
2014 Page 104
Data structure for computer analysis
• Most computer programs would want to see the data for
the individual subjects in the study in the following form:
87
Subject ID X1
X2
Cancer How many?
A 1 1 1 810 of these
B 1 1 0 270 of these
C 0 1 1 10 of these
D 0 1 0 70 of these
E 1 0 1 90 of these
F 1 0 0 30 of these
G 0 0 1 90 of these
H 0 0 0 630 of these
2014 Page 105
88
The basic logistic equation for this problem
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• ln (odds of disease) = a + b1
(smoking) + b2
(matches) +
b3
(smoking)(matches)
2014 Page 106
Solving a logistic equation
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 0, solve for “a”
• ln (odds) = a = ln ( ) =
• a =
• So now: ln (odds) =
89
2014 Page 107
OR=21.0 OR=21.0
90
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 108
Solving a logistic equation
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 0, solve for “a”
• ln (odds) = a = ln (90/630) = -1.946
• a = -1.946
• So now: ln (odds) = -1.946 + b1
X1
+ b2
X2
+
b3
X1
X2
91
2014 Page 109
92
Solving a logistic equation (cont.)
• When X1
= 1 and X2
= 0, solve for
b1
• ln (odds) =
• b1
=
• So now: ln (odds) =
2014 Page 110
93
OR=21.0 OR=21.0
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
2014 Page 111
94
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 0, solve for b1
• ln (odds) = ln (90/30) = 1.099 = -1.946 + b1
• b1
= 3.045
• So now: ln (odds) = -1.946 + 3.045X1
+ b2
X2
+
b3
X1
X2
2014 Page 112
95
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 1, solve for b2
:
• ln (odds) = ln ( ) =
• b2
=
• So now: ln (odds) =
2014 Page 113
96
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
OR=21.0 OR=21.0
2014 Page 114
97
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 0 and X2
= 1, solve for b2
:
• ln (odds) = ln (10/70) = -1.946 + 0 + b2
X2
+ 0
• b2
= 0
• So now: ln (odds) = -1.946 + 3.045X1
+ 0 +
b3
X1
X2
2014 Page 115
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 1 then:
• ln (odds) =
• ln (odds) =
• Solve for b3
• ln (odds) =
• b3
=
• So now: ln (odds) =
98
2014 Page 116
99
X2
=1 Cancer=1 No Cancer=0 X2
=0 Cancer=1 No Cancer=0
X1
=1 810 270 X1
=1 90 30
X1
=0 10 70 X1
=0 90 630
OR=21.0 OR=21.0
2014 Page 117
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• When X1
= 1 and X2
= 1 then:
• ln (odds) = -1.946 + b1
+ b2
+ b3
• ln (odds) = -1.946 + 3.045 + 0 + b3
• Solve for b3
• ln (odds) = ln (810/270) = 1.099 = -1.946 + 3.045 +
b3
• b3
= 0
• So now: ln (odds) = -1.946 + 3.045X1
+ 0 + 0
100
2014 Page 118
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• This simplifies (earlier calculations) to:
– ln (odds) = -1.946 + 3.045X1
+ 0 + 0
• One can now use the logistic equation to efficiently describe
relationships in the table
• Calculate the ln(odds) for a smoker who uses matches: ln
(odds)=
• Calculate the ln(odds) for a smoker who doesn’t use matches:
ln(odds) =
• Now calculate the odds ratio for (smokers vs. non-smokers//
matches+)
• At home, calculate the odds ratio for (smokers vs. non-
smokers// matches-)
101
2014 Page 119
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1
X1
+ b2
X2
+ b3
X1
X2
• This simplifies (earlier calculations) to:
-1.946 + 3.045(X1
) + 0(X2
) + 0(X1
X2
)
• One can now use the logistic equation to efficiently describe
relationships in the table
• Calculate the ln(odds) for a smoker who uses matches (X1
= 1
and X2
= 1):
ln (odds)= -1.946 + 3.045 = 1.099
• Calculate the ln(odds) for a smoker who doesn’t use matches (X1
= 1 and X2
= 0):
ln(odds) = -1.946 + 3.045 = 1.099
• Now calculate the odds ratio for (smokers vs. non-smokers//
matches)
• At home, calculate the odds ratio for (smokers vs. non-smokers//
102
no matches)
2014 Page 120
Logistic Regression
Using the logistic model model developed in class for the matches-
smoking-lung cancer data (stratified by matches), evaluate the risk
of lung cancer for:
1. (in-class) A smoker who uses matches vs. a non-smoker who uses
matches.
2. (at home) A smoker who uses matches vs. a non-smoker who
does not use matches
SEPARATE ASSIGNMENT
Develop a logistic model for the matches-smoking-lung cancer
data (stratified by smoking status). Use this model to evaluate
the risk of lung cancer for:
1. (at home) A user of matches who smokes vs. a non-user of
matches who smokes.
2. (at home) A smoker who uses matches vs. a non-smoker who uses
matches. Is this result consistent with that you arrived at in the
103
class example above?
2014 Page 121
Find OR for smokers (who use matches) vs. non-smokers
(who use matches)
For Smokers who use matches X1
= 1
X2
= 1
For non-smokers who use matches X1
= 0
X2
= 1
From prior slides we determined that: ln (odds) = -1.946
+ 3.045 (X1
)
105
2014 Page 122
For smokers who use matches (X1
= 1; X2
= 1) ln (odds) =
-1.946 + 3.045 (1) = 1.0990
For non-smokers who use matches (X1
= 0; X2
= 1) ln (odds) =
-1.946 + 0 + 0 + 0 = -1.946
We want to solve:
ln OR = 1.0990 – (-1.946) = 3.045 eln OR
= OR = e3.045
= 21.0
Therefore, the odds ratio (determined using logistic regression)
comparing smokers using matches to non-smokers using
matches is 21.0. This agrees with the stratified data
presented earlier.
106
2014 Page 123
Confounding: smoking, matches, and lung
cancer
• ORpooled
= 21.0 (16.3, 27.1)
• ORmatches
= 21.0 (10.5, 46.2)
• ORno matches
= 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s’
No Smoking 90 630 107
Pooled Cancer No cancer
Smoking 900 300
No Smoking 100 700
Matches Cancer No cancer
Smoking 810 270
No Smoking 10 70
No matches Cancer No cancer
Smoking 90 30
2014 Page 124
Some concluding comments on logistic
regression
• Interpretations of the final logistic equation for these data:
ln (odds of disease) = a + b1
(smoking) + b2
(matches) + b3
(smoking)(matches)
ln(odds) = -1.946 + 3.045(smoking) + 0(matches) + 0(matches)(smoking)
• This equation describes the data whether stratified either by matches or
by smoking.
• The relationship of multiple variables may be simultaneously adjusted
for by the the logistic equations
• The estimates of the coefficients for the equation are derived through
maximum likelihood techniques
• This technique is very widely used in epidemiologic (and other)
applications when the outcome variable of interest is dichotomous. 108
2014 Page 125
Some concluding comments on logistic
regression
• Comments
– Having multiple strata (how this technique makes
possible)
– Test of homogeneity (b3
)
Maximum likelihood estimation for coefficient
estimation
• Modifications of logistic regression exist for coping with
– Outcome variables with multiple levels = polytomous
logistic regression
– Studies in which matching was used = Conditional
logistic regression 109
2014 Page 126
. use http://www.stata-press.com/data/r8/lbw
storage display value
variable name type format variable label
--------------------------------------------------------------
-----------------
110
id
low
int
byte
%8.0g
%8.0g
identification code
birth weight<2500g
age
lwt
race
smoke
ptl
ht
ui
ftv
byte
int
byte
byte
byte
byte
byte
byte
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
%8.0g
age of
weight
race
smoked
mother
at last menstrual period
during pregnancy
premature labor history (count)
has history of hypertension
presence, uterine irritability
number of visits to physician
during 1st trimester
birth weight (grams)bwt int %8.0g
2014 Page 127
Special (and very useful) STATA command
“xi” (=“interaction expansion”)
• xi: logistic low age lowwt i.race smoke pt1 ht ui
• In this example, a variable named “race” has three levels
(e.g. white/hispanic/black) that might be coded as
“0=white”; “1=hispanic”; “2=black”
• The combined use of xi and i.race directs STATA to
analyze all levels of race (and compare them to level 1)—
this can be a HUGE time-saver (avoids the user having to
manually recode such variables)!
111
2014 Page 128
Assignments
• Write the logistic model describing these data (next slide).
• What is the risk of low birth weight (LBW) for a smoker,
adjusted for all other variables?
• How can the 95% CI be determined?
• What is the risk of LBW for an Hispanic baby (compared
to a white baby)?
• What is the risk of LBW for a black baby (compared to an
Hispanic baby)?
112
2014 Page 129
113
2014 Page 130
114
Discuss intercept
2014 Page 131
115
2014 Page 132

More Related Content

PPTX
Confounder and effect modification
PPTX
Association and causation
PPTX
Introduction to ethical issues in public health, Public Health Institute (PHI...
PPTX
Bias and confounding
PPTX
From association to causation
PPT
Criteria for causal association
PPTX
Public health ethics (KFMC,11.05.2016)
PPT
09 selection bias
Confounder and effect modification
Association and causation
Introduction to ethical issues in public health, Public Health Institute (PHI...
Bias and confounding
From association to causation
Criteria for causal association
Public health ethics (KFMC,11.05.2016)
09 selection bias

What's hot (20)

PPTX
4.4. effect modification
 
PPT
1 epidemiology course
PPTX
Bias in epidemiology uploaded
PPTX
Investigation of Epidemic
PPTX
Association causation
PDF
2-Epidemiological studies
PPTX
Cross sectional study
PPTX
Epidemiological Studies
PPTX
10 MCQs in Epidemiology & Biostatistics: How much can you score? (Medical Boo...
PPTX
1.6 standardization
 
PPT
Cluster randomization trial presentation
PDF
Measures of association
PPTX
Global Burden of Disease Study - 2010
PDF
Communication in Public Health
PPTX
Association & causation
PPTX
Causation in epidemiology
PDF
Association and Causation
PPTX
Error, bias and confounding
PPT
Epidemiology notes
PPTX
Understanding clinical trial's statistics
4.4. effect modification
 
1 epidemiology course
Bias in epidemiology uploaded
Investigation of Epidemic
Association causation
2-Epidemiological studies
Cross sectional study
Epidemiological Studies
10 MCQs in Epidemiology & Biostatistics: How much can you score? (Medical Boo...
1.6 standardization
 
Cluster randomization trial presentation
Measures of association
Global Burden of Disease Study - 2010
Communication in Public Health
Association & causation
Causation in epidemiology
Association and Causation
Error, bias and confounding
Epidemiology notes
Understanding clinical trial's statistics
Ad

Viewers also liked (20)

PPTX
Bias and errors
PPTX
Error, confounding and bias
PPSX
Bias, confounding and fallacies in epidemiology
PPT
Bias, confounding and causality in p'coepidemiological research
PPTX
5.2.2 dags for confounding
 
PPTX
3.5 types of biases
 
PPTX
3.6 comparing biases
 
PPTX
Techniques in clinical epidemiology
PPTX
3.5.2 selection bias
 
PPTX
Confounding and Directed Acyclic Graphs
PPTX
3.1 big picture
 
PPTX
3.2 sourced of error
 
PPTX
3.5.1 information bias
 
PPTX
3.3 hierarchy of populations
 
PPTX
3.7 preventing biases
 
PPTX
3.4 types of validity
 
PPT
Errors and Error Measurements
PPTX
Errors in research
PPTX
4.3.1. controlling confounding matching
 
PPTX
Presentation on bias and confouinding
Bias and errors
Error, confounding and bias
Bias, confounding and fallacies in epidemiology
Bias, confounding and causality in p'coepidemiological research
5.2.2 dags for confounding
 
3.5 types of biases
 
3.6 comparing biases
 
Techniques in clinical epidemiology
3.5.2 selection bias
 
Confounding and Directed Acyclic Graphs
3.1 big picture
 
3.2 sourced of error
 
3.5.1 information bias
 
3.3 hierarchy of populations
 
3.7 preventing biases
 
3.4 types of validity
 
Errors and Error Measurements
Errors in research
4.3.1. controlling confounding matching
 
Presentation on bias and confouinding
Ad

Similar to 4 Threats to validity from confounding bias and effect modification (20)

PPTX
4.1. introduction
 
PPTX
4.2.2. confounding classical approach
 
PPTX
Confounding and interaction seminar
DOCX
Excelsior College PBH321 1 Confounding .docx
PDF
15 Causation and causal inference.pdf basic epidemiology
PDF
Causal Inference PowerPoint
DOCX
Excelsior College PBH 321 Page 1 CONFOUNDING .docx
PPTX
5. Judgement of causality power point .pptx
PDF
Epinor presentation 24.09.2015.
PDF
7. Bias and Causation medical Statistics.pdf
PPT
PPTX
Epide 5.pptx epidemology assignment one for
PPT
Epidemiology Lectures for UG
PPTX
Epidemiology: unit 3 bias.pptx
PDF
weon preconference 2013 vandenbroucke counterfactual theory causality epidem...
PPT
Bradford Hill Criteria.ppt
PPTX
4.2.1. confounding mixing of effects
 
PPS
PPTX
Judgment of causality in Epidemiology: Handout
4.1. introduction
 
4.2.2. confounding classical approach
 
Confounding and interaction seminar
Excelsior College PBH321 1 Confounding .docx
15 Causation and causal inference.pdf basic epidemiology
Causal Inference PowerPoint
Excelsior College PBH 321 Page 1 CONFOUNDING .docx
5. Judgement of causality power point .pptx
Epinor presentation 24.09.2015.
7. Bias and Causation medical Statistics.pdf
Epide 5.pptx epidemology assignment one for
Epidemiology Lectures for UG
Epidemiology: unit 3 bias.pptx
weon preconference 2013 vandenbroucke counterfactual theory causality epidem...
Bradford Hill Criteria.ppt
4.2.1. confounding mixing of effects
 
Judgment of causality in Epidemiology: Handout

More from A M (20)

PDF
Transparency7
 
PDF
Transparency6
 
PDF
Transparency5
 
PDF
Transparency4
 
PDF
Transparency3
 
PDF
Transparency2
 
PDF
Transparency1
 
PPTX
5.3.5 causal inference in research
 
PPTX
5.3.4 reporting em
 
PPTX
5.3.3 potential outcomes em
 
PPTX
5.3.2 sufficient cause em
 
PPTX
5.3.1 causal em
 
PPTX
5.2.3 dags for selection bias
 
PPTX
5.1.3 hills criteria
 
PPTX
5.1.2 counterfactual framework
 
PPTX
5.1.1 sufficient component cause model
 
PPTX
5.2.1 dags
 
PPTX
4.5. logistic regression
 
PPTX
4.3.2. controlling confounding stratification
 
PPTX
4.2.4. confounding counterfactual
 
Transparency7
 
Transparency6
 
Transparency5
 
Transparency4
 
Transparency3
 
Transparency2
 
Transparency1
 
5.3.5 causal inference in research
 
5.3.4 reporting em
 
5.3.3 potential outcomes em
 
5.3.2 sufficient cause em
 
5.3.1 causal em
 
5.2.3 dags for selection bias
 
5.1.3 hills criteria
 
5.1.2 counterfactual framework
 
5.1.1 sufficient component cause model
 
5.2.1 dags
 
4.5. logistic regression
 
4.3.2. controlling confounding stratification
 
4.2.4. confounding counterfactual
 

Recently uploaded (20)

PDF
RMMM.pdf make it easy to upload and study
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Lesson notes of climatology university.
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
Presentation on HIE in infants and its manifestations
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
Cell Structure & Organelles in detailed.
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PPTX
Cell Types and Its function , kingdom of life
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Computing-Curriculum for Schools in Ghana
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
RMMM.pdf make it easy to upload and study
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Supply Chain Operations Speaking Notes -ICLT Program
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Anesthesia in Laparoscopic Surgery in India
2.FourierTransform-ShortQuestionswithAnswers.pdf
Lesson notes of climatology university.
VCE English Exam - Section C Student Revision Booklet
Chinmaya Tiranga quiz Grand Finale.pdf
Presentation on HIE in infants and its manifestations
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
Cell Structure & Organelles in detailed.
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial diseases, their pathogenesis and prophylaxis
Cell Types and Its function , kingdom of life
Microbial disease of the cardiovascular and lymphatic systems
Computing-Curriculum for Schools in Ghana
O5-L3 Freight Transport Ops (International) V1.pdf

4 Threats to validity from confounding bias and effect modification

  • 1. Threats to Validity from Confounding and Effect Modification •  Overview: Random vs. systematic error •  Confounding •  Effect Modification •  Logistic regression (time permitting) •  Special thanks for some of the materials in these lecture: –  Professor Jen Ahern (UCB) –  Professor Madhu Pai (McGilll—a former 250b GSI) 1 2014 Page 1
  • 2. 1 The cardinal rule of epidemiology • Remember that all results based on epidemiology studies are likely to be … 2014 Page 2
  • 3. The cardinal rule of epidemiology (continued) • WRONG… – unless proper care has been taken to eliminate all sources of error in the estimate (…and sometimes even then the results will be wrong because of unknown sources of error) 2 2014 Page 3
  • 4. Example: Confounding • A colleague with outside funding believes that cigarette smoke is not a “cause” (in any sense) of lung cancer but that exposure to matches (yes, matches) is the cause. This colleague has conducted a large case control study to test the null hypothesis: Ho : “Matches are not associated with lung cancer”. • What’s the rationale (in the Popperian sense) for stating the null hypothesis rather than the alternative: HA : “Matches are associated with lung cancer”. • What does the colleague hope to do (in terms of hypothesis testing) • What do you think of the term “associated” –would it be better to write “a cause of”? 2014 Page 4
  • 5. • “We can never finally prove our scientific theories, we can merely (provisionally) confirm or (conclusively) refute them.” – - Karl Popper Sir Karl Raimund Popper CH FBA FRS [4] (28 July 1902 – 17 September 1994) was an Austrian-British[5] philosopher and professor at the London School of Economics.[6] He is generally regarded o regarded as one of the greatest philosophers of science of the 20th century.[7][8] Popper is known for his rejection of the classical inductivist views on the scientific method, in favour of empirical falsification: regarded as one of the greatest philosophers of science of the 20th century.[7][8] (wikipedia.com) 2014 Page 5
  • 6. Confounding: smoking, matches, 10 and lung cancer • Your colleague has located 1000 cases of lung cancer, of whom 820 carry matches. • Among 1000 reference patients (selected randomly from a population with recently taken normal chest x-rays), 340 carry matches. • Strengths of the reference selection process? Weaknesses? • Describe the relationship between matches and lung cancer in your colleague’s data. • Would you like to analyze the data in any other fashion? 2014 Page 6
  • 7. Confounding: smoking, matches, and lung cancer • Odds ratio = (820 * 660) / (180 * 340) • OR = 8.8 • 95% CI (7.2, 10.9) Cancer No cancer Matches 820 340 No matches 180 660 2014 Page 7
  • 8. Confounding: smoking, matches, and lung cancer • You decide to look at the relationship between matches and lung cancer in the smokers separately from the non- smokers. • You find that among the 1000 cases, 900 are smokers and 810 (of the 900) carry matches • Among the 1000 reference patients, 300 are smokers and 270 (of the 300) carry matches • Calculate the relevant measure(s) of effect. • What should your colleague do about future funding? 2014 Page 8
  • 9. Confounding: smoking, matches, and lung cancer • ORpooled = 8.84 (7.2, 10.9) • ORsmokers = 1.0 (0.6, 1.5) • ORnonsmokers = 1.0 (0.5, 2.0) Pooled Cancer No cancer Matches No Matches Smokers Matches 820 180 Cancer 810 340 660 No cancer 270 No Matches Non-smoker Matches No Matches 90 Cancer 10 90 30 No cancer 70 630 13 2014 Page 9
  • 10. Confounding: smoking, matches, and lung cancer • To be complete, you also decide to examine the relationship between smoking and lung cancer. • What tables should you construct to do this? 14 2014 Page 10
  • 11. Confounding: smoking, matches, and lung cancer ’ • ORpooled = 21.0 (16.3, 27.1) • ORmatches = 21.0 (10.5, 46.2) • ORno matches = 21.0 (12.9, 34.7) • Discuss your intuitions about the 95% CI s Pooled Cancer No cancer Smoking No Smoking Matches Smoking 900 100 Cancer 810 300 700 No cancer 270 No Smoking No matches Smoking No Smoking 10 Cancer 90 90 70 No cancer 30 630 16 2014 Page 11
  • 12. Confounder? ? ? ? Unadjusted RR Exposure Disease ? Adjusted RR 19 2014 Page 12
  • 13. 2 BMJ 2004;329:868-869 (16 October) Why is confounding so important in epidemiology? ● BMJ Editorial: “The scandal of poor epidemiological research” [16 October 2004] ● “Confounding, the situation in which an apparent effect of an exposure on risk is explained by its association with other factors, is probably the most important cause of spurious associations in observational epidemiology.” 2014 Page 13
  • 14. Overview 3 ● Causality is the central concern of epidemiology ● Confounding is the central concern with establishing causality ● Confounding can be understood using multiple different approaches ● A strong understanding of various approaches to confounding and its control is essential for all those who engage in health research 2014 Page 14
  • 15. 10 Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122. Causal Effect Random Error Confounding Information bias (misclassification) Selection bias Bias in inference Reporting & publication bias Bias in knowledge use Confounding is one of the key biases in identifying causal effects RR causal “truth” RR association 2014 Page 15
  • 16. 11 Confounding: 4 ways to understand it! 1. “Mixing of effects” 2. “Classical” approach based on a priori criteria 3. Collapsibility and data-based criteria 4. “Counterfactual” and non-comparability approaches 2014 Page 16
  • 17. 12 Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002 First approach: Confounding: mixing of effects ● “Confounding is confusion, or mixing, of effects; the effect of the exposure is mixed together with the effect of another variable, leading to bias” - Rothman, 2002 Latin: “confundere” is to mix together 2014 Page 17
  • 18. Example Association between birth order and Down syndrome 13 Data from Stark and Mantel (1966) Source: Rothman 2002 2014 Page 18
  • 19. Association between maternal age and Down syndrome 14 Data from Stark and Mantel (1966) Source: Rothman 2002 2014 Page 19
  • 20. Association between maternal age and Down syndrome, stratified by birth order 15 Data from Stark and Mantel (1966) Source: Rothman 2002 2014 Page 20
  • 21. Mixing of Effects: the water pipes analogy Exposure 16 Adapted from Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003 Outcome Confounder Mixing of effects – cannot separate the effect of exposure from that of confounder Exposure and disease share a common cause (‘parent’) 2014 Page 21
  • 22. Mixing of Effects: “control” of the confounder Exposure 17 Adapted from: Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003 Outcome Confounder Successful “control” of confounding (adjustment) If the common cause (‘parent’) is blocked, then the exposure – disease association becomes clearer 2014 Page 22
  • 23. Second approach: “Classical” approach based on a priori criteria 18 “Bias of the estimated effect of an exposure on an outcome due to the presence of a common cause of the exposure and the outcome” – Porta 2008 ● A factor is a confounder if 3 criteria are met: ● a) a confounder must be causally or noncausally associated with the exposure in the source population (study base) being studied; ● b) a confounder must be a causal risk factor (or a surrogate measure of a cause) for the disease in the unexposed cohort; and ● c) a confounder must not be an intermediate cause (in other words, a confounder must not be an intermediate step in the causal pathway between the exposure and the disease) 2014 Page 23
  • 24. 19 Exposure E Disease (outcome) D Confounder C Confounding Schematic Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000. Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition. 2014 Page 24
  • 26. Exposure E Confounder C General idea: a confounder could be a ‘parent’ of the exposure, but should not be be a ‘daughter’ of the exposure Disease D 21 2014 Page 26
  • 27. Example of schematic (from Gordis) 22 2014 Page 27
  • 28. Birth Order E 23 Down Syndrome D Confounding factor: Maternal Age C Confounding Schematic 2014 Page 28
  • 29. HRT use Heart disease Association between HRT and heart disease Confounding factor: SES 24 Are confounding criteria met? 2014 Page 29
  • 30. BRCA1 gene Breast cancer Confounding factor: Age x 25 Are confounding criteria met? Should we adjust for age, when evaluating the association between a genetic factor and risk of breast cancer? No! 2014 Page 30
  • 31. Sex with multiple partners Cervical cancer Confounding factor: HPV Are confounding criteria met? 26 2014 Page 31
  • 32. Sex with multiple partners HPV Cervical cancer 27 What if this was the underlying causal mechanism? 2014 Page 32
  • 33. Obesity Mortality Are confounding criteria met? Confounding factor: Hypertension 28 2014 Page 33
  • 34. Obesity Hypertension Mortality 29 What if this was the underlying causal mechanism? 2014 Page 34
  • 35. Direct vs indirect effects Obesity Hypertension Mortality Obesity Indirect effect Hypertension Mortality Direct effect Direct effect is portion of the total effect that does not act via an intermediate cause 30 Indirect effect 2014 Page 35
  • 36. Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an appl3 ic3 ation to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84. Simple causal graphs E DC Maternal age (C) can confound the association between multivitamin use (E) and the risk of certain birth defects (D) 2014 Page 36
  • 37. 34 Complex causal graphs Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84. E DC U History of birth defects (C) may increase the chance of periconceptional vitamin intake (E). A genetic factor (U) could have been the cause of previous birth defects in the family, and could again cause birth defects in the current pregnancy 2014 Page 37
  • 39. The ultimate complex causal graph! 36 A PowerPoint diagram meant to portray the complexity of American strategy in Afghanistan! 2014 Page 39
  • 40. 38 Third approach: Collapsibility and data- based approaches ● According to this definition, a factor is a confounding variable if ● a) the effect measure is homogeneous across the strata defined by the confounder and ● b) the crude and common stratum-specific (adjusted) effect measures are unequal (this is called “lack of collapsibility”) ● Usually evaluated using 2x2 tables, and simple stratified analyses to compare crude effects with adjusted effects “Collapsibility is equality of stratum-specific measures of effect with the crude (collapsed), unstratified measure” Porta, 2008, Dictionary 2014 Page 40
  • 41. 39 Crude vs. Adjusted Effects ● Crude: does not take into account the effect of the confounding variable ● Adjusted: accounts for the confounding variable(s) (what we get by pooling stratum-specific effect estimates) ● Generating using methods such as Mantel-Haenszel estimator ● Also generated using multivariate analyses (e.g. logistic regression) ● Confounding is likely when: ● ● RR crude =/= RR adjusted OR crude =/= OR adjusted 2014 Page 41
  • 42. 42 Crude 2 x 2 table Calculate Crude OR (or RR) Stratify by Confounder Calculate OR’s for each stratum If stratum-specific OR’s are similar, calculate adjusted RR (e. g. MH) Crude Stratum 1 Stratum 2 If Crude OR =/= Adjusted OR, confounding is likely If Crude OR = Adjusted OR, confounding is unlikely OR Crude OR1 OR2 Stratified Analysis JC: introduce “test of homogeneity” 2014 Page 42
  • 43. Examples: crude vs adjusted RR Study Crude RR Stratum1 Stratum2 Adjusted Confound RR RR RR ing? 1 6.00 3.20 3.50 3.30 2 2.00 1.02 1.10 1.08 3 1.10 2.00 2.00 2.00 4 0.56 0.50 0.60 0.54 5 4.20 4.00 4.10 4.04 6 1.70 0.03 3.50 48 2014 Page 43
  • 44. 49 Maldonado & Greenland, Int J Epi 2002;31:422-29 Fourth approach: Causality: counterfactual model ● Ideal “causal contrast” between exposed and unexposed groups: ● “A causal contrast compares disease frequency under two exposure distributions, but in one target population during one etiologic time period” ● If the ideal causal contrast is met, the observed effect is the “causal effect” 2014 Page 44
  • 45. 52 What happens actually? RR assoc = I exp / I substitute RR causal = I exp / I unexp IDEAL ACTUAL 2014 Page 45
  • 46. 50 I exp Iunexp Maldonado & Greenland, Int J Epi 2002;31:422-29 Counterfactual, unexposed cohort RR causal = I exp / I unexp “A causal contrast compares disease frequency under two exposure distributions, but in one Exposed cohort Ideal counterfactual comparison to determine causal effects target population during one etiologic time period” “Initial conditions” are identical in the exposed and unexposed groups – because they are the same population! 2014 Page 46
  • 47. 51 I exp Iunexp Counterfactual, unexposed cohort Exposed cohort Substitute, unexposed cohort Isubstitute What happens actually? counterfactual state is not observed A substitute will usually be a population other than the target population during the etiologic time period - INITIAL CONDITIONS MAY BE DIFFERENT 2014 Page 47
  • 48. 53 Maldonado & Greenland, Int J Epi 2002;31:422-29 Counterfactual definition of confounding ● “Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition” ● “An association measure is confounded (or biased due to confounding) for a causal contrast if it does not equal that causal contrast because of such an imperfect substitution” RR causal =/= RR assoc 2014 Page 48
  • 49. Residual confounding • Confounding can persist, even after adjustment • Why? – All confounders were not adjusted for (unmeasured confounding) – Some variables were actually not confounders! – Confounders were measured with error (misclassification of confounders) – Categories of the confounding variable are improperly defined (e.g. age categories were too broad) 51 2014 Page 49
  • 50. 55 Simulating the counter-factual comparison: Experimental Studies: RCT Randomization helps to make the groups “comparable” (i.e. similar initial conditions) with respect to known and unknown confounders Therefore confounding is unlikely at randomization - time t0 Eligible patients Treatment Randomization Placebo Outcomes Outcomes 2014 Page 50
  • 51. Confounding: Methods to control or reduce confounding • Methods used in study design to reduce confounding – Randomization – Restriction – Matching • Methods used in study analysis to reduce confounding – Stratified analysis – Multivariate analysis 31 2014 Page 51
  • 52. Confounding:The use of randomization to “ ” reduce confounding • Randomization – Useful only for intervention studies – Definition: random assignment of study subjects to exposure categories – The special strength of randomization is its ability to control/reduce the effect of confounding variables about which the investigator is unaware – If there is maldistribution of potentially confounding variables after randomization (the reason for the classic “Table I: Baseline characteristics” in the randomized trial) then other confounding control options (see below) are 32applied 2014 Page 52
  • 53. Substitute, unexposed cohort 54 Maldonado & Greenland, Int J Epi 2002;31:422-29 Counterfactual, unexposed cohort Exposed cohort “Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition” 2014 Page 53
  • 54. Confounding: The use of restriction to reduce confounding • Confounding cannot occur if the distribution of the potential confounding factors do not vary across exposure or disease categories – Implication of this is that an investigator may restrict study subjects to only those falling with specific level (s) of a confounding variable • Extreme example: an investigator only selects subjects of exactly the same age. • Advantages of restriction – straightforward, convenient, inexpensive 33 2014 Page 54
  • 55. Confounding: The use of restriction to reduce confounding (cont.) • Disadvantages – May limit number of eligible subjects – Residual confounding may persist if restriction categories not sufficiently narrow (e.g. “decade of age” might be too broad) – Not possible to evaluate the relationship of interest at different levels of the confounder • Question: How does restriction differ from matching? 34 2014 Page 55
  • 56. Confounding:The use of matching to reduce confounding • Subjects with all levels of a potential confounder are admitted into the study BUT the control/reference subjects (either with respect to exposure in a cohort or disease in a case-reference study) are chosen to have the same distribution of the potential confounder • The use of matching (may) also require special analysis techniques (matched analyses and conditional logistic regression) 35 2014 Page 56
  • 57. • Disadvantages of matching – Finding appropriate control/reference subjects may be difficult and expensive and limit sample size – Matching is most often used in case-reference (i.e. case- control studies because in a large cohort study the cost of matching may be prohibitive) • Thus, in cohort studies it’s often cheaper to just enroll available controls and use analytic methods (below) to control confounding)—this doesn’t apply to computerized “free” data 36 2014 Page 57
  • 58. Confounding: The use of matching to reduce confounding (cont.) • Disadvantages of matching (cont.) – Confounding factor used to match subjects cannot be itself evaluated with respect to the outcome/disease – Obviously, matching does not control for confounding by factors other than that used to match – The use of matching makes the use of stratified analysis (for the control of other potential but non-matched factors) very difficult • One way around this problem is the use of conditional logistic regression but there is a large reduction in “effective” sample size because only discordant pairs are used. 37 2014 Page 58
  • 59. • Advantages of matching – Matching may be the only way to obtain sufficient numbers of control/reference subjects with relevant levels of the confounding factor(s) – Example: controlling for “neighborhood” (and all that it implies) by any approach other than matching is very difficult 38 2014 Page 59
  • 60. • Advantages of matching (cont.) – Useful in very small studies in which chance differences in confounding factors are likely to exist between the study groups and other forms of control for the confounders (such as stratification or multivariate adjustment) are not possible (because of the limited sample size) – The full benefit of matching (in terms of the reduction of confounding) is obtained only if the proper form of matched analysis is used (to be reviewed later in the course) 39 2014 Page 60
  • 61. • Basic goal of stratification is to evaluate the relationship between the predictor (“cause”) and outcome (“effect”) variable in strata homogenous with respect to potentially confounding variables 40 2014 Page 61
  • 62. Confounding:The use of stratification to reduce confounding • For example, to examine the relationship between smoking and lung cancer while controlling for the potentially confounding effect of gender: – Create a 2x2 table (smoking vs. lung cancer) for men and women separately – To control for multiple confounders simultaneously, stratify by pairs (or triplets or higher) of confounding factors. For example, to control for gender and race/ethnicity determine the OR for smoking vs. lung cancer in multiple strata: white women, black women, Hispanic women, white men, black men, Hispanic men,etc. 41 2014 Page 62
  • 63. • (From the earlier example): Goal: create a summary or “adjusted” estimate for the relationship between matches and lung cancer while adjusting for the two levels of smoking (the potential confounder) • This process is analgous to the standardization of rates earlier in the course—in those examples the purpose of adjustment was to remove the confounding effect of age on the relationship between populations (A vs. B etc.) and rates of disease or death. • In the present example the goal is to remove the confounding effect of smoking on the relationship between matches and lung cancer. 42 2014 Page 63
  • 64. Confounding:Types of summary estimators to determine uniform effect over strata • Mantel-Haenszel – We will use this estimator in the present course – Resistant to the effects of small strata or cells with a value of “0” – Computationally a piece of cake • Directly pooled estimators (e.g. Woolf) – Sensitive to small strata and cells with value “0” – Computationally messy but doable • Maximum likelihood – The most “appropriate” estimator – Resistant to the effects of small strata or cells with a value of “0” – Computationally challenging 43 2014 Page 64
  • 65. Confounding: smoking, matches, and lung cancer • ORpooled = 8.84 (7.2, 10.9) • ORsmokers = 1.0 (0.6, 1.5) • ORnonsmokers = 1.0 (0.5, 2.0) Pooled Cancer No cancer Matches No Matches Smokers Matches 820 180 Cancer 810 340 660 No cancer 270 No Matches Non-smoker Matches No Matches 90 Cancer 10 90 30 No cancer 70 630 44 2014 Page 65
  • 66. An aside: Terminology • Pooled = combined = collapsed = unadjusted • Adjusted = summary = weighted, etc. – All of these reflect some adjustment process such as Mantel-Haenszel or Woolf or maximum likelihood estimation to weight the strata and develop confidence intervals about the estimate. 45 2014 Page 66
  • 67. Confounding:Notation used in Mantel- Haenszel estimators of relative risk • Notation for case-control or cohort studies with count data Case-control: RR = OR = ad / bc Cohort: RR = Ie I0 46 = a / (a + b) c/ (c + d) Cases Controls Total Exposed Nonexposed a c b d a + b c + d Total a + c b + d a + b + c + d = T 2014 Page 67
  • 68. Confounding:Notation used in Mantel- Haenszel estimators of relative risk (cont.) • Notation for cohort studies with person-time data RR = Ie I0 = a / PY1 47 c / PY0 Cases Controls Exposed Nonexposed a c --- --- PY1 PY0 Total a + c T 2014 Page 68
  • 69. Confounding:Mantel-Haenszel estimators of relative risk for stratified data Case-Control Study: RRMH = ∑(ad / T)i ∑(bc / T)i Cohort Study with Count Denominators: RRMH = ∑{a(c + d) / T}i ∑{b(a + b) / T}I Cohort Study with Person-years Denominators: RRMH = ∑{a(PY0 ) / T}i ∑{b(PY1 ) / T}i 48 2014 Page 69
  • 70. Confounding: smoking, matches, and lung cancer • ORpooled = 8.84 (7.2, 10.9) • ORsmokers = 1.0 (0.6, 1.5) • ORnonsmokers = 1.0 (0.5, 2.0) No Matches 90 630 51 Pooled Cancer No cancer Matches 820 340 No Matches 180 660 Smokers Cancer No cancer Matches 810 270 No Matches 90 30 Non-smoker Cancer No cancer Matches 10 70 2014 Page 70
  • 71. Confounding:Mantel-Haenszel estimators of relative risk for stratified data (smoking, matches, lung cancer RRMH = ∑(ad / T)i / ∑(bc / T)i Numerator of MH estimator: • For smokers: (ad/T)=(810*30)/1200=20.25; • For nonsmokers: (ad/T)=(10*630)/800=7.88; • Add these together: 20.25 + 7.88=28.13 (numerator) Denominator of MH estimator: • For smokers: (bc/T)=(270*90)/1200=20.25; • For nonsmokers: (bc/T)=(90*70)/800=7.88; • Add these together: 20.25 + 7.88=28.13 •ORMH = 28.13 / 28.13 = 1.0 (as expected since both stratified OR’s were = 1.0) •Be sure to try this on stratified data in which the two strata are not exactly equal to each other (but also not so different as to suggest that effect modification is present 52 2014 Page 71
  • 72. Confounding:Interpretation of ORMH • If ORMH (=1.0 in this example) “differs meaningfully” from ORunadjusted (=8.8 in this example) then confounding is present • What does “differs meaningfully” mean – This is a matter of judgment based on biologic/clinical sense rather than on a statistical test – Even if they “differ” only slightly, generally the ORMH rather than the ORcombined is reported as the summary effect estimate • But what is one disadvantage of reporting ORMH ? – Although there do exist statistical tests of confounding they are not widely recommended (these tests evaluate53 Ho: OR MH = OR unadjusted 2014 Page 72
  • 73. 67 JC: test of homogeneity 2014 Page 73
  • 77. Review what the X^2 means in this context. 58 2014 Page 77
  • 79. • Confounding “pulls” the observed association away from the true association – It can either exaggerate/over-estimate the true association (positive confounding) • Example – RRcausal = 1.0 – RR observed = 3.0 or – It can hide/under-estimate the true association (negative confounding) • Example – RRcausal = 3.0 – RR observed = 1.0 Direction of Confounding Bias 40 2014 Page 79
  • 80. Confounding:Summary of steps to evaluate confounding Table 12-10. Steps for the control of confounding and the evaluation of effect modification through stratified analysis 1. Stratify by levels of the potential confounding factor. 2. Compute stratum-specific unconfounded relative risk estimates. 3. Evaluate similarity of the stratum-specific estimates by either eyeballing or performing test of statistical significance. (More on this step later) 4. If the effect is thought to be uniform, calculate a pooled unconfounded summary estimate using RRMH . If effect is not uniform (i.e. effect modification is present, skip to step 6) 5. Perform hypothesis testing on the unconfounded estimate, using Mantel-Haenszel chi-square and compute confidence interval. 6. If effect is not thought to be uniform (i.e., if effect modification is present): a. Report stratum-specific estimates, results of hypothesis testing, and confidence intervals for each estimate b.If desired, calculate a summary unconfounded estimate using a standar6 d6 ized formula 2014 Page 80
  • 81. 67 JC: test of homogeneity 2014 Page 81
  • 82. 68 Effect modification (Interaction) • Goals of stratification of data – Evaluate and reduce/remove confounding – Evaluate and describe effect modification • Description of effect modification – A change in the magnitude of an effect measure (between exposure and disease) according to the level of some third variable – What two “classes” of effect measures have we used so far in the course? 2014 Page 82
  • 83. Effect modification: example #1 • Disease incidence by exposure and age – Does the relationship between exposure and disease change over the value of the potential confounder (age)? How? 69 2014 Page 83
  • 84. Effect modification: example #2 • Disease incidence by exposure and age • Does the relationship between exposure and disease change over the value of the potential confounder (age)? How? Rothman ’86 (p 178) 70 2014 Page 84
  • 85. Effect modification: contrast with confounding • Confounding – A bias that an investigator hopes to remove – A nuisance that may or may not be present in a given study design • Properties of a confounding variable: (Rothman, p123): – a) be a risk factor for disease among the non-exposed; – b) be associated with the exposure variable; and – c) not be an intermediate step in the “causal pathway” 71 2014 Page 85
  • 86. Effect modification: contrast with confounding • Effect modification – A more detailed description of the “true” relationship between the exposure and the outcome – Effect modification is a finding to be reported (even celebrated), not a bias to be eliminated – Effect modification is a “natural phenomenon” that exists independently of the study design – The presence and interpretation of effect modification depends upon the choice of effect measure (ratio vs. difference) 72 2014 Page 86
  • 87. 73 Some lingo • Covariate – Confounder, potential confounder – Effect modification, interaction – Intermediate variable 2014 Page 87
  • 88. Effect modification: contrast with confounding • Note that for any association under study, a given factor may be: – Both a confounder and an effect modifier or – A confounder but not an effect modifier or An effect modifier but not a confounder or – neither 74 2014 Page 88
  • 89. Examples of confounding/effect modification 76 Level 1 Level 2 Crude/ collapsed/ Combined “unadjusted” Uniform estimate (ORMH ) / “adjusted” Confounding present Interaction present 4.0 4.0 4.0 4.0 NO NO 4.0 0.25 1.0 1.0 NO YES 1.0 1.0 8.4 1.0 YES NO 4.0 0.25 1.0 2.0 YES (?relevance) YES 2014 Page 89
  • 91. Effect modification: test of homogeneity • Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect (such as a Mantel Haenszel estimator) • Notation: – – N is the number of strata (N=2 in our smoking/matches example); – ln^Ri is the natural logarithm of the estimated (hence the “^”) effect measure for each stratum (ORi in our example); – ln^R is the natural logarithm of the uniform effect estimate (e.g. ORMH in X 2 (N-1) is chi-square with (N-1) degrees of freedom; our example—the computer will use the maximum likelihood estimate) • One formula to test homogeneity: X 2 (N-1) = ∑ [ln(^ Ri ) – ln(RMH )]2 Var[ln(^ Ri )] N i= 1 78 JC: Comment on choice of signifciance level for test of homogeneity2014 Page 91
  • 92. Paradox • If effect modification is present, a uniform estimator of effect (such as ORMH ) cannot (or at least should not) be reported. • However, in order to determine if effect modification is present, it is necessary to calculate the value of a uniform estimator of effect (such as ORMH ) because it is needed in the calculation of the test of homogeneity. 79 2014 Page 92
  • 93. Effect modification: test of homogeneity (or is heterogeneity?) • Comments – If the test of homogeneity is “significant” (=“reject homogeneity”) this is evidence that there is heterogeneity (i.e. no homogeneity) and that effect modification may be present. • (Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect) – The choice of a significance level (e.g. p < 0.05) is somewhat open to interpretation. • One “conservative” approach, because of inherent limitations in the power of the test of homogeneity, is to treat the data as if interaction is present for p < 0.20). • In other words, one would rather err on the side of assuming that interaction is present (and reporting the stratified estimates of effect) than on reporting a uniform estimate that may not be true across strata. 80 2014 Page 93
  • 96. Additive versus multiplicative scale effect modification ● Notation: RXZ ● No additive interaction if (R11 – R01) = (R10 – R00) ○ Rewrite as: (R11-R01)-(R10-R00)=0 ● In words: Difference in risk for (X=1 vs. X=0) when Z=1 is equal to difference in risk for (X=1 vs. X=0) when Z=0 ● Note: the values R11, R10, etc. are risks (not counts) 2014 Page 96
  • 97. Additive versus multiplicative scale effect modification ● Notation: RXZ ● No multiplicative interaction if (R11/R01)=(R10/R00) Rewrite as: (R11/R01)/(R10/R00)=1 ● In words: Ratio of risks/rates when X=1 vs. X=0 when Z=1 is equal to ratio of risks/rates when X=1 vs. X=0 when Z=0 2014 Page 97
  • 98. Effect modification is scale-dependent • Evidence for effect modification/statistical interaction if the RR or the AR differs between two groups • However, effect modification/statistical interaction is scale-dependent – If you do not have interaction on the additive scale (AR is homogenous) then you will have interaction on the multiplicative scale (RR must be heterogeneous) – If you do not have interaction on the multiplicative scale (RR is homogenous) then you will have interaction on the additive scale (AR must be heterogeneous) – Note: It is common to have evidence of interaction on both scales. 2014 Page 98
  • 99. Example ● No additive scale interaction if (R11-R01)-(R10-R00)=0 ● No relative scale interaction if (R11/R01)/(R10/R00)=1 ● Additive scale: (60-20) - (50-10) = 0 ○ Interaction not present on the additive scale ● Relative scale: (60/20) / (50/10)=0.6 ○ Interaction present on the relative scale Z=1 Z=0 X=1 60 50 X=0 20 10 2014 Page 99
  • 100. Example ● No additive scale interaction if (R11-R01)-(R10-R00)=0 ● No relative scale interaction if (R11/R01)/(R10/R00)=1 ● Additive scale: (60-20) - (30-10) = 20 ○ Interaction present on the additive scale ● Relative scale: (60/20) / (30/10)=1 ○ Interaction not present on the relative scale Z=1 Z=0 X=1 60 30 X=0 20 10 2014 Page 100
  • 102. Confounding: smoking, matches, and lung cancer ’ • ORpooled = 21.0 (16.3, 27.1) • ORmatches = 21.0 (10.5, 46.2) • ORno matches = 21.0 (12.9, 34.7) • Discuss your intuitions about the 95% CI s Pooled Cancer No cancer Smoking No Smoking Matches Smoking 900 100 Cancer 810 300 700 No cancer 270 No Smoking No matches Smoking No Smoking 10 Cancer 90 90 70 No cancer 30 630 84 2014 Page 102
  • 103. A brief introduction to logistic regression Let X1 = smoking (1=yes; 0=no) Let X2 = matches (1=yes; 0=no) Let Cancer = cancer (1=yes; 0=no) Recall earlier tables: OR=21.0 OR=21.0 OR=21.0 Conclusions: No confounding by matches of the relationship between smoking and lung cancer; no effect modification by matches of the relationship between smoking and lung cancer 85 Collapsed Cancer =1 Cancer=0 X1 =1 900 300 X1 =0 100 700 X2 =1 Cancer=1 No Cancer=0 X2 =0 Cancer=1 No Cancer=0 X1 =1 810 270 X1 =1 90 30 X1 =0 10 70 X1 =0 90 630 2014 Page 103
  • 104. Data structure for computer analysis • Most computer programs would want to see the data for the individual subjects in the study in the following form: H 0 0 0 86 Subject ID X1 X2 Cancer How many? A 1 1 1 B 1 1 0 C 0 1 1 D 0 1 0 E 1 0 1 F 1 0 0 G 0 0 1 2014 Page 104
  • 105. Data structure for computer analysis • Most computer programs would want to see the data for the individual subjects in the study in the following form: 87 Subject ID X1 X2 Cancer How many? A 1 1 1 810 of these B 1 1 0 270 of these C 0 1 1 10 of these D 0 1 0 70 of these E 1 0 1 90 of these F 1 0 0 30 of these G 0 0 1 90 of these H 0 0 0 630 of these 2014 Page 105
  • 106. 88 The basic logistic equation for this problem • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • ln (odds of disease) = a + b1 (smoking) + b2 (matches) + b3 (smoking)(matches) 2014 Page 106
  • 107. Solving a logistic equation • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 0 and X2 = 0, solve for “a” • ln (odds) = a = ln ( ) = • a = • So now: ln (odds) = 89 2014 Page 107
  • 108. OR=21.0 OR=21.0 90 X2 =1 Cancer=1 No Cancer=0 X2 =0 Cancer=1 No Cancer=0 X1 =1 810 270 X1 =1 90 30 X1 =0 10 70 X1 =0 90 630 2014 Page 108
  • 109. Solving a logistic equation • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 0 and X2 = 0, solve for “a” • ln (odds) = a = ln (90/630) = -1.946 • a = -1.946 • So now: ln (odds) = -1.946 + b1 X1 + b2 X2 + b3 X1 X2 91 2014 Page 109
  • 110. 92 Solving a logistic equation (cont.) • When X1 = 1 and X2 = 0, solve for b1 • ln (odds) = • b1 = • So now: ln (odds) = 2014 Page 110
  • 111. 93 OR=21.0 OR=21.0 X2 =1 Cancer=1 No Cancer=0 X2 =0 Cancer=1 No Cancer=0 X1 =1 810 270 X1 =1 90 30 X1 =0 10 70 X1 =0 90 630 2014 Page 111
  • 112. 94 Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 1 and X2 = 0, solve for b1 • ln (odds) = ln (90/30) = 1.099 = -1.946 + b1 • b1 = 3.045 • So now: ln (odds) = -1.946 + 3.045X1 + b2 X2 + b3 X1 X2 2014 Page 112
  • 113. 95 Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 0 and X2 = 1, solve for b2 : • ln (odds) = ln ( ) = • b2 = • So now: ln (odds) = 2014 Page 113
  • 114. 96 X2 =1 Cancer=1 No Cancer=0 X2 =0 Cancer=1 No Cancer=0 X1 =1 810 270 X1 =1 90 30 X1 =0 10 70 X1 =0 90 630 OR=21.0 OR=21.0 2014 Page 114
  • 115. 97 Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 0 and X2 = 1, solve for b2 : • ln (odds) = ln (10/70) = -1.946 + 0 + b2 X2 + 0 • b2 = 0 • So now: ln (odds) = -1.946 + 3.045X1 + 0 + b3 X1 X2 2014 Page 115
  • 116. Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 1 and X2 = 1 then: • ln (odds) = • ln (odds) = • Solve for b3 • ln (odds) = • b3 = • So now: ln (odds) = 98 2014 Page 116
  • 117. 99 X2 =1 Cancer=1 No Cancer=0 X2 =0 Cancer=1 No Cancer=0 X1 =1 810 270 X1 =1 90 30 X1 =0 10 70 X1 =0 90 630 OR=21.0 OR=21.0 2014 Page 117
  • 118. Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • When X1 = 1 and X2 = 1 then: • ln (odds) = -1.946 + b1 + b2 + b3 • ln (odds) = -1.946 + 3.045 + 0 + b3 • Solve for b3 • ln (odds) = ln (810/270) = 1.099 = -1.946 + 3.045 + b3 • b3 = 0 • So now: ln (odds) = -1.946 + 3.045X1 + 0 + 0 100 2014 Page 118
  • 119. Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • This simplifies (earlier calculations) to: – ln (odds) = -1.946 + 3.045X1 + 0 + 0 • One can now use the logistic equation to efficiently describe relationships in the table • Calculate the ln(odds) for a smoker who uses matches: ln (odds)= • Calculate the ln(odds) for a smoker who doesn’t use matches: ln(odds) = • Now calculate the odds ratio for (smokers vs. non-smokers// matches+) • At home, calculate the odds ratio for (smokers vs. non- smokers// matches-) 101 2014 Page 119
  • 120. Solving a logistic equation (cont.) • ln (odds of disease) = a + b1 X1 + b2 X2 + b3 X1 X2 • This simplifies (earlier calculations) to: -1.946 + 3.045(X1 ) + 0(X2 ) + 0(X1 X2 ) • One can now use the logistic equation to efficiently describe relationships in the table • Calculate the ln(odds) for a smoker who uses matches (X1 = 1 and X2 = 1): ln (odds)= -1.946 + 3.045 = 1.099 • Calculate the ln(odds) for a smoker who doesn’t use matches (X1 = 1 and X2 = 0): ln(odds) = -1.946 + 3.045 = 1.099 • Now calculate the odds ratio for (smokers vs. non-smokers// matches) • At home, calculate the odds ratio for (smokers vs. non-smokers// 102 no matches) 2014 Page 120
  • 121. Logistic Regression Using the logistic model model developed in class for the matches- smoking-lung cancer data (stratified by matches), evaluate the risk of lung cancer for: 1. (in-class) A smoker who uses matches vs. a non-smoker who uses matches. 2. (at home) A smoker who uses matches vs. a non-smoker who does not use matches SEPARATE ASSIGNMENT Develop a logistic model for the matches-smoking-lung cancer data (stratified by smoking status). Use this model to evaluate the risk of lung cancer for: 1. (at home) A user of matches who smokes vs. a non-user of matches who smokes. 2. (at home) A smoker who uses matches vs. a non-smoker who uses matches. Is this result consistent with that you arrived at in the 103 class example above? 2014 Page 121
  • 122. Find OR for smokers (who use matches) vs. non-smokers (who use matches) For Smokers who use matches X1 = 1 X2 = 1 For non-smokers who use matches X1 = 0 X2 = 1 From prior slides we determined that: ln (odds) = -1.946 + 3.045 (X1 ) 105 2014 Page 122
  • 123. For smokers who use matches (X1 = 1; X2 = 1) ln (odds) = -1.946 + 3.045 (1) = 1.0990 For non-smokers who use matches (X1 = 0; X2 = 1) ln (odds) = -1.946 + 0 + 0 + 0 = -1.946 We want to solve: ln OR = 1.0990 – (-1.946) = 3.045 eln OR = OR = e3.045 = 21.0 Therefore, the odds ratio (determined using logistic regression) comparing smokers using matches to non-smokers using matches is 21.0. This agrees with the stratified data presented earlier. 106 2014 Page 123
  • 124. Confounding: smoking, matches, and lung cancer • ORpooled = 21.0 (16.3, 27.1) • ORmatches = 21.0 (10.5, 46.2) • ORno matches = 21.0 (12.9, 34.7) • Discuss your intuitions about the 95% CI s’ No Smoking 90 630 107 Pooled Cancer No cancer Smoking 900 300 No Smoking 100 700 Matches Cancer No cancer Smoking 810 270 No Smoking 10 70 No matches Cancer No cancer Smoking 90 30 2014 Page 124
  • 125. Some concluding comments on logistic regression • Interpretations of the final logistic equation for these data: ln (odds of disease) = a + b1 (smoking) + b2 (matches) + b3 (smoking)(matches) ln(odds) = -1.946 + 3.045(smoking) + 0(matches) + 0(matches)(smoking) • This equation describes the data whether stratified either by matches or by smoking. • The relationship of multiple variables may be simultaneously adjusted for by the the logistic equations • The estimates of the coefficients for the equation are derived through maximum likelihood techniques • This technique is very widely used in epidemiologic (and other) applications when the outcome variable of interest is dichotomous. 108 2014 Page 125
  • 126. Some concluding comments on logistic regression • Comments – Having multiple strata (how this technique makes possible) – Test of homogeneity (b3 ) Maximum likelihood estimation for coefficient estimation • Modifications of logistic regression exist for coping with – Outcome variables with multiple levels = polytomous logistic regression – Studies in which matching was used = Conditional logistic regression 109 2014 Page 126
  • 127. . use http://www.stata-press.com/data/r8/lbw storage display value variable name type format variable label -------------------------------------------------------------- ----------------- 110 id low int byte %8.0g %8.0g identification code birth weight<2500g age lwt race smoke ptl ht ui ftv byte int byte byte byte byte byte byte %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g %8.0g age of weight race smoked mother at last menstrual period during pregnancy premature labor history (count) has history of hypertension presence, uterine irritability number of visits to physician during 1st trimester birth weight (grams)bwt int %8.0g 2014 Page 127
  • 128. Special (and very useful) STATA command “xi” (=“interaction expansion”) • xi: logistic low age lowwt i.race smoke pt1 ht ui • In this example, a variable named “race” has three levels (e.g. white/hispanic/black) that might be coded as “0=white”; “1=hispanic”; “2=black” • The combined use of xi and i.race directs STATA to analyze all levels of race (and compare them to level 1)— this can be a HUGE time-saver (avoids the user having to manually recode such variables)! 111 2014 Page 128
  • 129. Assignments • Write the logistic model describing these data (next slide). • What is the risk of low birth weight (LBW) for a smoker, adjusted for all other variables? • How can the 95% CI be determined? • What is the risk of LBW for an Hispanic baby (compared to a white baby)? • What is the risk of LBW for a black baby (compared to an Hispanic baby)? 112 2014 Page 129