JHU Job Talk

A General Framework for
Multiple Testing Dependence
Jeffrey Leek
Johns Hopkins University School of Medicine

High-dimensional multiple hypothesis testing is common.
Problem:
Dependence between tests can result in incorrect statistical
and scientific results.
A solution:
Define and address multiple testing dependence at the
level of the data – not the P-values.
Big Picture Ideas

High-Dimensional Multiple Testing Is Common
Spatial EpidemiologyBrain Imaging
Molecular Biology

4
Inflammation and the Host Response to Injury
mRNA
Expression
~50,000
genes
Clinical Data
>150
clinical variables
Patient 1 Patient 2 Patient 166….
MOF
measures
severity of
injury

Data at Initial Time Point
Multiple Organ Failure

Simple Analysis
1. Fit the model to the data, xi, for gene i:
xi = ai + biMOF + ei
2. Calculate P-values for testing the hypotheses:
H0: bi = 0 vs. H1: bi ≠ 0
3

Four “Replicated” Studies
Phase 1
Phase 3
Phase 2
Phase 4
P-value P-value
P-value P-value
Frequency
Frequency
Frequency
Frequency

• Data for test i:
• “Primary variable(s)”:
• Model:
• Hypothesis test i:
€
xi = xi1,xi2,…,xin( )
€
Y = y1,y2,…,yn( )
€
xij = ai + biksk y j( )
k=1
d
∑ + eij
H0i :bi ∈ Ω0 H1i :bi ∈ Ω1
{m hypothesis tests, n observations per test}
Start With The Whole Data

= +
X = B S(Y) + E
observations
tests
Underlying Model

A Simple Simulated Example
Independent E Dependent E
Genes
Genes
Arrays Arrays

Null P-Value Distributions
Independent E
Dependent E
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
P-value P-value P-value P-value

Null P-Value Distributions
|ρ| = 0.40 |ρ| = 0.31 |ρ| = 0.10 |ρ| = 0.00Correlation
Independent E
Dependent E
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency

Null Distribution Behavior
Dependent E
Independent E

False Discovery Rate Estimates

Ranking Estimates

Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
When To Address Dependence?
Form Test-Statistics
and
Null Distribution

Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
and
Null Distribution
Existing Approaches
Empirical null approaches
modify the null distribution at
the test-statistic level
Dependence adjustments
conservatively modify
the P-value threshold

Examples of Existing Approaches
• Empirical Null
– Devlin and Roeder Biometrics (1999)
– Efron JASA (2004)
– Schwartzman AOAS (2008)
• Error Rate Adjustments
– Benjamini and Yekutieli Annals of Statistics (2001)
– Romano, Shaikh, and Wolf Test (2001)
– Dudoit, Gilbert, van der Laan Biometrical Journal (2008)

Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
and
Null Distribution
Our Approach
Fit the model:
X = BS + ΓG + U
where G is a valid dependence
kernel

Dependence and bias are no longer present at any of these steps;
standard methods can be used.
Data X
Fit Model
X= BS + E
Obtain
and R
€
ˆB
Calculate
P-values
Form P-value
Threshold
and
Null Distribution
Our Approach
Fit the model:
X = BS + ΓG + U
where G is a valid dependence
kernel

New Dependence Definitions
Definition – Data X are population-level multiple testing
dependent if:
Definition - Data X are estimation-level multiple testing
dependent if:
Leek and Storey (2008)

Structure in E
Array
MOF1Genes
Signal + Dependent Noise
Dependent Noise
Independent Noise

= +
X = B S + E
observations
tests
data
random
variation
primary
variables
Decomposing E

= +
X = B S + H + U
tests
+
independent
variation
observations
data
primary
variables
dependent
variation
Decomposing E

= +
X = B S + Γ G + U
tests
+
independent
variation
observations
data
primary
variables
dependence
kernel
Decomposing E
H

Decomposing E
Theorem Let the data be distributed according to the
model:
Suppose that for each ei there is no Borel measurable
function, g, such that ei =g(ei,…,ei-1,ei+1,…,em) almost
surely. Then there exist matrices Γ(m×r), G(r×n) (r ≤ n) and
U(m×n) such that:
where the rows of U are independent and ui ≠ 0 and
ui=hi(ei) for a non-random Borel measurable function hi.

Dependence Kernel
Definition – Dependence Kernel
An r ×n matrix G forms a dependence kernel for the data X, if
the following equality holds:
X = BS + E
= BS + ΓG + U
where the rows of U are independent.

Fitting S & G Results In Independent Tests
Theorem Let G be any valid dependence kernel for the data X.
Suppose that the model:
is fit by least squares resulting in residuals:
if the rowspace jointly spanned by S and G has dimension less
than n, then the ri and the are jointly independent given S
and G and:
€
ˆbi

= +
X = B S + Γ G + U
tests
+
independent
variation
observations
data
primary
variables
dependence
kernel
A “Blessing” of Dimensionality

Iteratively Reweighted Surrogate Variable Analysis
1. Estimate the row dimension, , of G.
2. Form an initial estimate equal to the first right
singular vectors of R = X - S.
3. Estimate .
4. Weight the ith row of X by and
set to be the first right singular vectors of the
weighted matrix.
ˆG(b+1)
€
ˆr
€
ˆB
Iterate for b=0,…,B:
€
ˆG0
ˆr
€
X = BS + ΓG + U
€
xi = biS + γiG + ui
Whole data:
Test i data:
€
ˆr

An Example of the IRW-SVA Algorithm
The Data True GEstimate of GPr(G & !S)

Iteratively Re-weighted Surrogate Variable Analysis
1. Estimate the row dimension, , of G.
2. Form an initial estimate equal to the first right
singular vectors of R = X - S.
3. Estimate .
4. Weight the ith row of X by and
set to be the first right singular vectors of the
weighted matrix.
ˆG(b+1)
€
ˆr
€
ˆB
€
ˆG0
ˆr
€
X = BS + ΓG + U
€
xi = biS + γiG + ui
Whole data:
Test i data:
€
ˆr
Iterate for b=0,…,B:

1. Buja and Eyuboglu (1992) proposed a
permutation approach.
2. Patterson, Price, and Reich (2006) proposed a
sequential testing strategy based on Tracey-
Widom theory.
3. Leek (in preparation) proposes an eigenvalue
estimator that is consistent in the number of
tests.
Estimating The Row Dimension of G

1. Assume the data follow X = BS + ΓG + U, where G
and S have row dimensions r and d, r + d < n.
2. Calculate the singular values s1,…, sn of X and choose
b, such that r+d < b.
3. Calculate the eigenvalues, λ1,…, λn of
where P = I - S(STS)-1ST and R = XP.
4. Set
ˆr = 1 λj > m−1/ 3
( )
j=1
n
∑
€
€
1
m
RT
R − sb
2
P[ ]

Theorem As ,
is a consistent estimate of the row dimension of G,
provided that:
(1) uij are independent
(2) E[uij]=0
(3)
(4)
(5) ΓTΓ is positive definite with unique eigenvalues
€
m → ∞
€
E[uij
2
] = σi
2
< M1
€
E[uij
4
] < M2
€
lim
m→∞
1
m
Leek (In Prep.)
€
ˆr = 1 λj > m−1/ 3
( )
j=1
n
∑

Break The Estimation Into Two Components

1. Form F-statistics F1,…,Fm for testing the hypotheses:
2. Bootstrap from the conditional null model to obtain null-
statistics , k =1,…K.
3. From Bayes’ Theorem:
where and .
Estimating the Probability Weights
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1

4. Estimate the ratio of the densities with a non-parametric
logistic regression where Fi are “successes” and Fi
0k are
“failures” (Anderson and Blair 1982).
where and . .
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1

4. Estimate the ratio of the densities with a non-parametric
logistic regression where Fi are “successes” and Fi
0k are
“failures” (Anderson and Blair 1982).
5. Estimate π0 according to Storey (2002).
where and .
€
F1
0k
,...,Fm
0k
€
Fi
0k
~ g0
€
Fi ~ π0g0 + (1− π0)g1

Estimate of posterior
probability bi ≠ 0.

SVA-Adjusted Analysis
1. Estimate G with IRW-SVA
2. Fit
3. Test the hypotheses
€
H0i :bi ∈ Ω0 H1i :bi ∈ Ω1

Null Distribution Behavior
Dependent E
Independent E
Dependent E
+ IRW-SVA

False Discovery Rate Estimates
Dependent E
+ IRW-SVA
True False Discovery Rate True False Discovery Rate True False Discovery Rate
Q-value
Q-value
Q-value

Ranking Estimates
Dependent E
+ IRW-SVA
Ranking by True Signal to Noise Ranking by True Signal to Noise Ranking by True Signal to Noise
AverageRankingbyT-Statistic

53
Inflammation and the Host Response to Injury
mRNA
Expression
~50,000
genes
Clinical Data
>150
clinical variables
Patient 1 Patient 2 Patient 166….
MOF1
measures
severity of
injury

Phase 1 Phase 2 Phase 3 Phase 4
Four “Replicated” Studies
FrequencyFrequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency
Frequency

Functional Enrichment Across Phases
Number of phases in which a significant pathway appears
Percentoftotalsignificantpathways
1 of 4 2 of 4 3 of 4 4 of 4
Unadjusted
IRW-SVAAdjusted

• High-dimensional hypothesis testing is common.
• Dependence between tests can result in incorrect
statistical and scientific inference.
• We can define and address dependence at the
level of the model using the dependence kernel.
• IRW-SVA can be used to improve inference in
high-dimensional multiple hypothesis testing.
Summary

Future Work
• Multiple Testing
– Develop dependence kernel estimates for spatial data
– Develop diagnostic tests for multiple testing procedures
• High-Dimensional Asymptotics
– Extend methods for asymptotic SVD to binary data
• Feature Selection for High-Dimensional Classifiers
– Extensions of top-scoring pairs (TSP) to survival data
– Theoretical connections to LDA and SVM
– Embedding TSP in a logic regression framework

1. Calculate the residuals R = X - S.
2. Calculate the singular values of R, d1,…,dn.
3. Permute each row of R individually to get R0.
4. Take the SVD of the residuals R* = R0 - S to
obtain null singular values .
5. Compare di to for k=1,…,K to calculate a P-
value for the ith right singular vector.
€
ˆB
€
ˆB0
€
di0
k
€
di0
k
For k =1,…,K do steps 3-4:
Buja and Eyuboglu (1992)

Why Does This Work?
Leek and Storey (2007), Leek and Storey (2008)
Useful Fact:
X = BS + E
= BS + ΓG + U
= BS + ΛH + U
if G and H have the same column space.

• References:
Benjamini Y and Hochberg Y. (1995), “Controlling the false discovery rate – a
practical and powerful approach to multiple testing.” JRSSB, 57: 289-300.
De Castro MC, Monte-Mor RL, Sawyer DO, and Singer, BH. (2005),
“Malaria risk on the amazon frontier.” PNAS, 103: 2452-2457.
Delin B and Roeder K. (1999), “Genomic control for association studies.”
Biometrics, 55: 997-1004.
Efron B. (2004) “Large-scale simultaneous hypothesis testing: The choice of a
null hypothesis.” JASA, 99: 96-104.
Leek JT and Storey JD. (2008) “A general framework for multiple testing
dependence.” Proceedings of the National Academy of Sciences , 105:
18718-18723.
Leek JT and Storey JD. (2007) “Capturing heterogeneity in gene expression
studies by ‘Surrogate Variable Analysis’.” PLoS Genetics, 3: e161.
Taylor JE and Worsley KJ. (2007) “Detecting sparse signals in random fields,
with applications to brain mapping.” JASA, 102: 913-928.
Thank You

1. Perform each hypothesis test individually.
2. Obtain the test-statistic for each test.
3. Compare distribution of test-statistics to the
theoretical null distribution.
4. Adjust theoretical null so that it matches the
observed statistics in a low signal region.
Empirical Null

Theoretical Null
Empirical Null
Efron (2004)

Empirical Null Results in Incorrect Null Distribution
Dep. Kernel

• Observed statistics or observed P-values come
from mixture distribution:
π0g0 + π1g1
• Dependence distorts g0 … can go either way:
• Must use full data set to capture dependence
With Confounding Empirical Null is Ill-Posed

JHU Job Talk

More Related Content

What's hot (20)

Similar to JHU Job Talk (20)

More from jtleek (11)

Recently uploaded (20)

JHU Job Talk