Bayesian hypothesis testing

Clinical trial monitoring with
Bayesian hypothesis testing

John D. Cook
Valen E. Johnson

August 6, 2008

Estimation and Testing

Bayesians typically approach a clinical trial as an estimation
problem, not a test.
Possible explanation: poor operating characteristics . . .
Unless you choose your alternative prior well.

Local prior operating characteristics

Point null hypothesis versus alternative prior that assigns
positive probability to the null
When simulating from the alternative, Bayes factor in favor of
alternative grows like e n .
When simulating from the null, Bayes factor in favor of null
grows like n1/2 .
Hard to ever reject the null.

Inverse moment priors (iMOM)

0.0 0.2 0.4 0.6 0.8 1.0

π1 (θ) ∝ (θ − θ0 )−ν−1 exp −λ(θ − θ0 )−2k [θ > θ0 ]

iMOM Convergence rates

When simulating from alternative

p lim n−1 log BFn (1|0) = c > 0.
n→∞

(Well known result.)

When simulating from null,

p lim n−k/(k+1) log BFn (1|0) = c < 0.
n→∞

(New result.)

Thall-Simon method

Historical standard: θS ∼ Beta(aS , bS ). Parameters aS and
bS large.
Experimental treatment: θE ∼ Beta(aE , bE ) a priori, aE and
bE small.
Stop for inferiority if P(θE < δ + θS | data) is large.
Stop for superiority if P(θE > θS | data) is large.
Operating characteristics degrade without δ > 0.
Inconsistent in limit: both stopping rules could apply.

Thall-Simon plot

0.0 0.2 0.4 0.6 0.8 1.0

Beta(60, 140) historical, Beta(12, 18) experimental

Comparing Bayes factor with Thall-Simon

Historical response 20%, alternative 30%. Fifty patients maximum.

Bayes factor design:
H0 : θ = 0.2
H1 : iMOM prior with mode 0.3.
Stop for inferiority if P(H0 | data) > 0.9.
Stop for superiority if P(H1 | data) > 0.9.

Comparing Bayes factor with Thall-Simon, cont.

Thall-Simon design:
θS ∼ Beta(200,800)
θE ∼ Beta(0.6, 1.4) a priori
Stop for inferiority if P(θS > 0.1 + θE | data) > 0.976.
Stop for superiority if P(θE > θS | data) > 0.99.
Calibrated to match probability of stopping for wrong reason at
null and alternative.

Stopping for inferiority

1.0
q q

q

0.8
probability of concluding inferiority

q

q Thall−Simon
0.6

Bayes factor

q
0.4

q
0.2

q

q
q
0.0

q q q q q q q q q q q q

0.0 0.2 0.4 0.6 0.8 1.0

true response probability

Stopping for superiority

1.0
q q q q q q q q q q
q
q

q
probability of concluding superiority

0.8

q q Thall−Simon
0.6

Bayes factor
0.4

q
0.2

q

q
0.0

q
q q q

0.0 0.2 0.4 0.6 0.8 1.0


Thall-Wooten time-to-event method

Analogous to Thall-Simon method for binary outcomes.
t | θ ∼ exponential with mean θ, θ ∼ inverse gamma
Stop for inferiority if P(θS + 0.1 > θE | data) large . . .
Stop for superiority if P(θE > θS | data) large

Comparing Bayes factor and Thall-Wooten method

Standard treatment 6 months PFS, alternative 8 months,
maximum 50 patients

Bayes factor design:
H0 : θ = 6
H1 : iMOM prior with mode 8.
Stop for inferiority if P(H0 | data) > 0.9.
Stop for superiority if P(H1 | data) > 0.9.

Comparing Bayes factor and Thall-Wooten method, cont.

Thall-Wooten design:
θS ∼ Inverse Gamma (20,1200)
θE ∼ Inverse Gamma(3, 12) a priori
Stop for inferiority if P(θS + 2 > θE | data) > 0.976.
Stop for superiority if P(θE > θS | data) > 0.93.
Calibrated to match probability of stopping for wrong reason at
null and alternative.

Stopping for inferiority

1.0
q q q q q q
probability of early stopping for inferiority q

q
0.8

q
0.6

q Thall−Wooten
Bayes factor
q
0.4

q
0.2

q
q
q
q
q q q q q q
0.0

2 4 6 8 10 12

true mean survival time

Stopping for superiority

1.0
q q q
q q
q
probability of early stopping for superiority
q

q
0.8

q
0.6

q
0.4

q

q Thall−Wooten
0.2

q Bayes factor

q
q
0.0

q q q q q q q

2 4 6 8 10 12

true mean survival time

Comparison with Simon two-stage design

Simon two-stage design to test null response rate 0.20 versus
alternative rate 0.40.

Reject 95% of the time under null, 20% under alternative.

Maximum of 43 patients: 13 in ﬁrst stage, 30 in second stage.

Comparison with Simon two-stage design:
rejection probability

1.0
q q q q
q
0.8

q
probability of rejecting treatment

q Simon
0.6

q Bayes factor
0.4

q
0.2

q

q

q
q
0.0

q q q q q q q q q

0.0 0.2 0.4 0.6 0.8 1.0


Comparison with Simon two-stage design:
patients used

q q q q q q q q q
q
40 q
q
q

q
30

q
patients

q
20

q
q Simon
q Bayes factor
q
Naive Simon
q
10

q
0

0.0 0.2 0.4 0.6 0.8 1.0


References

Valen E. Johnson, John D. Cook. Bayesian Design of
Single-Arm Phase II Clinical Trials with Continuous
Monitoring. Clinical Trials 2009; 6(3):217-26.
Software: http://biostatistics.mdanderson.org
http://www.JohnDCook.com

Bayesian hypothesis testing

More Related Content

Viewers also liked (10)

Similar to Bayesian hypothesis testing (8)

More from John Cook (6)

Bayesian hypothesis testing