SlideShare a Scribd company logo
Bootstrap Methods: Recent
Advances and New Applications
2007 Nonparametrics Conference
Michael Chernick
United BioSource Corporation
October 11, 2007
1
Bootstrap Topics
• Introduction to Bootstrap
• Wide Variety of Applications
• Confidence regions and hypothesis tests
• Examples of bootstrap applications:
(1) P-value adjustment - consulting example,
(2) Bioequivalence - Individual Bioequivalence
• Examples where bootstrap is not consistent and remedies:
(1) infinite variance case for a population mean and
(2) extreme values
• Available Software
2
Introduction
• The bootstrap is a general method for doing
statistical analysis without making strong
parametric assumptions.
• Efron’s nonparametric bootstrap, resamples
the original data.
• It was originally designed to estimate bias and
standard errors for statistical estimates much
like the jackknife.
3
Introduction (continued)
• The bootstrap is similar to earlier
techniques which are also called
resampling methods:
– (1) jackknife,
– (2) cross-validation,
– (3) delta method,
– (4) permutation methods, and
– (5) subsampling..
4
Introduction (continued)
The technique was extended, modified and
refined to handle a wide variety of problems
including:
– (1) confidence intervals and hypothesis tests,
– (2) linear and nonlinear regression,
– (3) time series analysis and other problems
5
Introduction (continued)
Definition of Efron’s nonparametric bootstrap.
Given a sample of n independent identically
distributed (i.i.d.) observations X1, X2, …, Xn from
a distribution F and a parameter  of the
distribution F with a real valued estimator
(X1, X2, …, Xn ), the bootstrap estimates the
accuracy of the estimator by replacing F with Fn,
the empirical distribution, where Fn places
probability mass 1/n at each observation Xi.
6
Introduction (continued)
• Let X1
*, X2
*, …, Xn
* be a bootstrap sample, that
is a sample of size n taken with replacement
from Fn .
• The bootstrap, estimates the variance of
(X1, X2, …, Xn ) by computing or
approximating the variance of
* = (X1
*, X2
*, …, Xn
* ).
7
Introduction (continued)
• Statistical Functionals - A functional is a
mapping that takes functions into real
numbers.
• Parameters of a distribution can usually be
expressed as functionals of the population
distribution.
• Often the standard estimate of a
parameter is the same functional applied
to the empirical distribution.
8
Introduction (continued)
• Statistical Functionals and the bootstrap.
• A parameter  is a functional T(F) where T
denotes the functional and F is a
population distribution.
• An estimator of  is h = T(Fn) where Fn is
the empirical distribution function.
• Many statistical problems involve
properties of the distribution of  - h , its
mean (bias of h ), variance, median etc.
9
Introduction (continued)
• Bootstrap idea: Cannot determine the distribution of
 - h but through the bootstrap we can determine, or
approximate through Monte Carlo, the distribution of
h - *, where * = T(Fn
*) and Fn
* is the empirical
distribution for a bootstrap sample X1
*, X2
*,…,Xn
*
(* is a bootstrap estimate of ).
• Based on k bootstrap samples the Monte Carlo
approximation to the distribution of h - * is used to
estimate bias, variance etc. for h .
• In bootstrapping h substitutes for  and * substitutes for
h . Called the bootstrap principle.
10
Introduction (continued)
• Basic Theory: Mathematical results show that
bootstrap estimates are consistent in particular
cases.
• Basic Idea: Empirical distributions behave in
large samples like population distributions.
Glivenko-Cantelli Theorem tells us this.
• The smoothness condition is needed to transfer
consistency to functionals of Fn, such as the
estimate of the parameter .
11
Wide Variety of Applications
• Efron and others recognized that through
the power of fast computing the Monte
Carlo approximation could be used to
extend the bootstrap to many different
statistical problems .
12
Wide Variety of Applications
(continued)
• It can estimate process capability indices for
non-Gaussian data.
• It is used to adjust p-values in a variety of
multiple comparison situations.
• It can be extended to problems involving
dependent data including multivariate, spatial
and time series data and in sampling from
finite populations.
13
Wide Variety of Applications
(continued)
• It also has been applied to problems involving
missing data.
• In many cases, the theory justifying the use of
bootstrap (e.g. consistency theorems) has been
extended to these non i.i.d. settings.
• In other cases, the bootstrap has been modified
to “make it work.” The general case of
confidence interval estimation is a notable
example.
14
Confidence regions and hypothesis
tests
• The percentile method and other bootstrap
variations may require 1000 or more
bootstrap replications to be very useful.
• The percentile method only works under
special conditions.
• Bias correction and other adjustments are
sometimes needed to make the bootstrap
“accurate” and “correct” when the sample
size n is small or moderate.
15
Confidence regions and hypothesis
tests (continued)
• Confidence intervals are accurate or nearly
exact when the stated confidence level for the
intervals is approximately the long run
probability that the random interval contains the
“true” value of the parameter.
• Accurate confidence intervals are said to be
correct if they are approximately the shortest
length confidence intervals possible for the given
confidence level.
16
Confidence regions and hypothesis
tests (continued)
• The BCa method, the iterated bootstrap (or
double bootstrap) and the bootstrap t
method are methods for constructing
bootstrap confidence intervals that are
closer to being exact (accurate) and
correct than the percentile method in many
circumstances.
• See Chernick (2007) pp. 57-65 for details
on these methods.
17
Confidence regions and hypothesis
tests (continued)
• Hall and Martin have shown what the rate is that
various bootstrap estimates approach their
advertised confidence levels as the size n of the
original sample increases.
• They use Edgeworth and Cornish-Fisher
expansions to prove these results.
• See Hall (1992) Chapter 3 or Chernick (2007)
Section 3.1 for more discussion of this.
• See Ewens and Grant (2001) Chapter 12 for
another nice treatment and comparison with
permutation tests.
18
Four Methods for Setting Approximate
Confidence Intervals for a Real-Valued
Parameter 
Method Abbrev-
iation
-Level Endpoint Correct if
1. Standard
Normal
Approximation
S [] h + h z()
h  N(, 2
) with  constant
2. Percentile P [] Gh
-1
()
There exists a monotone transformation such that
h=g(h) where = g() and h  N(, 2
) and  is
constant
3. Bias-
corrected
BC [] Gh
-1
({2z0 + z()
}) There exists a monotone transformation such that
h=g(h) where = g() and h N(-z0, 2
) and 
and z0 are constant
4. BCa BCa [] Gh
-1
({z0 + [z0 + z()
]/[1-a(z0 +z()
]}) There exists a monotone transformation such that
h=g(h) where = g() and h  N(-z00, 0
2
)
where 0 = 1+a and z0 and a are constant.
19
Hypothesis tests
• Since there is a 1-1 correspondence between
hypothesis tests and confidence intervals, a
hypothesis test about a parameter  can be
constructed based on a bootstrap confidence
interval for .
• See Chernick (1999 or 2007) Section 3.2.
• Examples of hypothesis tests can be found in
Section 3.3 of Chernick (1999 or 2007).
• Advice on which method to use is also given in
Carpenter and Bithell (2000).
20
Examples of bootstrap applications
• P-value adjustment - a consulting example
• Many problems in the course of a clinical trial
involve multiple comparisons or repeated
significance tests for a key endpoint at various
follow-up times.
• In these cases, the individual test p-values are
not appropriate and p-value adjustment is
appropriate.
• Conservative estimates based on the Bonferroni
inequality are often used but sometimes may be
too conservative.
21
P-value Adjustment Application
• Westfall and Young (1993) have demonstrated useful
bootstrap and permutation approaches which work in a
wide variety of multiple testing situations.
• Their methods are implemented in the SAS software
package (Version 6.12 or higher) through a procedure
called PROC MULTTEST.
• Chernick has implemented this approach in a number of
clinical trials.
• As a consultant on a particular clinical trial he employed
p-value adjustment to determine if results differed
significantly depending on the country where the patient
was treated.
22
P-value Adjustment Application
(continued)
• This example is presented in Section 8.5.3 of
Chernick (2007).
• A company conducted a clinical trial for a
medical treatment in one country but due to
slow enrollment decided to extend the trial to
other countries.
• The initial country we denote as country E.
• The other four countries are labeled A, B, C
and D.
23
P-value Adjustment Application
(continued)
• Fisher’s exact test was used to compare failure
rates for the treatment with failure rates for the
control. The primary statistical analysis of the
endpoint.
• In country E, the result showed that the
treatment was superior to the control, but this
was not the case in the other countries.
• The client wanted to show that there were
differences among countries which made the
poolability of the data questionable.
24
P-value Adjustment Application
(continued)
• They wanted to claim that only the data in
country E was relevant to the submission since
they were seeking regulatory approval only in
country E.
• This involved comparing treatment success in
each country compared to country E.
25
P-value Adjustment Application
(continued)
• There are 4 relevant pairwise
comparisons of other countries with
country E.
• Consequently, the raw p-values from the
individual Fisher tests are not
appropriate.
• The raw p-values were compared with
the Bonferroni adjustment and the
bootstrap adjustment.
26
P-value Adjustment Application
(continued)
TABLE 8.1 from Chernick (2007)
page 152 Comparison of
Treatment Failure Rates
Country failure rate
A 40% (18/45)
B 41% (58/143)
C 29% (20/70)
D 29% (51/177)
E 22% (26/116)
TABLE 8.2 from Chernick (2007)
page 153 Comparison of p-value
adjustments
Countries Raw p Bonf. p Boot. p
E vs A 0.0307 0.1229 0.0855
E vs B 0.0021 0.0085 0.0062
E vs C 0.3826 1.0000 0.7654
E vs D 0.2776 1.0000 0.6193
27
P-value Adjustment Application
(continued)
• The raw p-values indicated that failure
rate for E was statistically significantly
different (lower) from A and B at the 5%
level.
• But results are misleading since they
ignore the multiple testing.
• The Bonferroni bound shows only E and
B to be statistically significantly different
at the 10% level.
28
P-value Adjustment Application
(continued)
• But the Bonferroni bound is known to be
excessively conservative in many situations.
• Bootstrap provides an appropriate answer.
• For the bootstrap estimate we again find that E
and B are clearly different but now we find that
the p-value for E and A is below 0.10 and so E
is statistically significantly better than A at the
10% level.
29
Individual Bioequivalence
• The FDA has a Guidance document on how to
conduct bioequivalence (bioavailability) trials.
• Three types of bioequivalence have been
defined (1) average bioequivalence, (2)
population bioequivalence and (3) individual
bioequivalence.
• Currently the FDA only requires average
bioequivalence be shown (a change over past
policy).
• Bootstrap solutions useful in determining
individual bioequivalence and population
bioequivalence have been devised and shown to
be consistent.
30
Individual Bioequivalence: Model
In the model, we consider crossing over twice with
the sequence TRR Meaning new treatment first and then
the reference treatment 2 times and RTR, reference first
followed by new treatment and then the reference again.
Consider the following model for pharmacokinetic
response in a 2 treatment crossover design using
only sequences RTR and TRR randomized 1:1:
Yijkl = μ + Fl +Pj + Qk + Wijk + Sikl + εijkl,
where μ is the overall mean, Pj is the fixed effect for the jth
period with the constraint ∑ Pj = 0, Qk is the fixed effect for
the kth sequence with ∑ Qk = 0, Fl is the fixed effect for the
lth drug.
31
Individual Bioequivalence: Model
(continued)
For these trials we only have two drugs the new and old
Formulations denoted T for the new treatment and R for the
reference formulation. We also have the constraint that
FT + FR = 0. Now Sikl is a random effect of the ith subject in
the kth sequence with the lth treatment, Wijk is the fixed
interaction between treatment , sequence and period and
εijkl is a random noise (error) component with mean 0
independent and identically distributed and independent of
all the fixed and random effects.
32
Individual Bioequivalence:
Definition
Under the linear model given on the previous
slides individual bioequivalence is accepted if
after testing H0:ΔPB ≤ Δ versus H1: ΔPB > Δ,
where ΔPB = PTR – P RR with
PTR = prob(|YT-YR| ≤ r) and
P RR = prob(|YT - Y’R| ≤ r) where Δ and r are
determined fixed constants and Y’R is the observed
response the second time the reference treatment is
given.
33
Bootstrap Results for this Trial
• See Schall and Luus (1993) for a description of
a bootstrap hypothesis test for this problem.
• Pigeot (2001) in a survey article describes the
Schall and Luus method in detail, shows that
their method is not consistent and modifies it by
constructing a bootstrap percentile method
confidence interval to use in the test.
• In an earlier work Shao, Kübler and Pigeot
(2000) prove that the bootstrap method Pigeot
describes in Pigeot (2001) is consistent.
34
Examples where the bootstrap fails
• Athreya (1987) shows that the bootstrap
estimate of the sample mean is
inconsistent when the population
distribution has an infinite variance.
• Angus (1993) provides similar
inconsistency results for the maximum
and minimum of a sequence of
independent identically distributed
observations.
35
Examples where the bootstrap fails
(continued)
We shall describe the inconsistency of the bootstrap in
these two cases and then provide remedies
(1) sample mean with infinite population variance,
and
(2) maximum term in an i.i.d sequence of observations
36
Example where the bootstrap fails - Sample
Mean with Infinite Population Variance
• Singh (1981) and Bickel and Freedman (1981)
showed that in the case of estimating the mean
from an i.i.d. sample with a finite population
variance the bootstrap procedure is consistent.
• In the case of an infinite variance, the population
distribution might have a distribution F(x) satisfying
1-F(x) ~ cx- L(x) where L is a slowly varying
function as x , c is a nonnegative constant and
0<2.
• Under these conditions, the sample mean
appropriately normalized, converges to a stable
distribution.
37
Example where the bootstrap fails - Sample
Mean with Infinite Population Variance
(continued)
• For =2 the variance of F is finite and the central
limit applies. For <2 the population variance is
infinite.
• Theorem 1 of Athreya (1987) proves the
inconsistency of the bootstrap for the case where
1<<2.
• The result tells us that when we appropriately
normalize the sample mean and apply the bootstrap
substitutions the bootstrap version of the normalized
mean converges to a random probability
distribution and not to the corresponding fixed
stable distribution that the sample mean
converges to.
38
Example where the bootstrap fails -
Estimating extreme values
• For i.i.d. random variables Gnedenko’s
theorem usually applies to the maximum or
minimum values.
• Gnedenko’s theorem states that when
appropriately normalized the minimum value
and the maximum value converge to one of
three extreme value distribution families.
• The appropriate family depends on the tail
behavior of the population distribution.
39
Example where the bootstrap fails -
Estimating extreme values (continued)
• Angus (1993) showed that using the
appropriate normalization and the bootstrap
substitution, the maximum and minimum
converge to a random probability
distribution and not the fixed extreme value
distribution from Gnedenko’s theorem that the
sample extremes converge to.
40
Bootstrap Remedies
• In the past decade many of the problems
where the bootstrap is inconsistent
remedies have been found by researchers
to give good modified bootstrap solutions
that are consistent.
• For both problems describe thus far a
simple procedure called the m-out-n
bootstrap has been shown to lead to
consistent estimates .
41
The m-out-of-n Bootstrap
• This idea was proposed by Bickel and Ren (1996) for
handling doubly censored data.
• Instead of sampling n times with replacement from a
sample of size n they suggest to do it only m times
where m is much less than n.
• To get the consistency results both m and n need to get
large but at different rates. We need m=o(n). That is
m/n→0 as m and n both → ∞.
• This method leads to consistent bootstrap estimates in
many cases where the ordinary bootstrap has problems,
particularly (1) mean with infinite variance and (2)
extreme value distributions.
42
Available Software
• Resampling Stats from Resampling Stats Inc.
(provides basic bootstrap tools in easy to use
software and is good as an elementary teaching tool).
• SPlus from Insightful Corporation ( good for
advanced bootstrap techniques such as BCa, easy to
use in new Windows based version). The current
module Resample is what I use in my bootstrap class
at statistics.com.
• S functions provided by Tibshirani (see Appendix in
Efron and Tibshirani text or visit Rob Tibshirani’s web
site http:/www.stat-stanford.edu/~tibs)
43
Available Software (continued)
• Stata has a bootstrap algorithm available that
some users rave about.
• Mathworks and other examples (see Susan
Holmes web page):
http:/www-stat.stanford.edu/~susan) or contact her
by email
• SAS macros are available and Proc MULTTEST
does bootstrap sampling.
44
References on confidence intervals
and hypothesis tests
(1) Chernick, M.R. (1999). Bootstrap Methods: A
Practitioner’s Guide. Wiley, New York.
(2) Chernick, M.R. (2007). Bootstrap Methods: A
Guide for Practitioners and Researchers, 2nd
Edition. Wiley, New York.
(3) Hall, P. (1992). The Bootstrap and
Edgeworth Expansion. Springer-Verlag, New
York.
(4) Efron, B. (1982) The Jackknife, the Bootstrap
and Other Resampling Plans. Society for
Industrial and Applied Mathematics CBMS-NSF
Regional Conference Series 38, Philadelphia.
45
(5) Carpenter, J. and Bithell, J. (2000).
Bootstrap confidence intervals: when, which,
what? A practical guide for medical statisticians.
Statistics in Medicine 19, 1141-1164.
(6) Bahadur, R.R. and Savage, L.J. (1956). The
nonexistence of certain statistical procedures in
nonparametric problems. Annals of
Mathematical Statistics 27, 1115-1122.
(7) Ewens, W.J. and Grant, G.R. (2001).
Statistical Methods in Bioinformatics An
Introduction.
References on confidence intervals and
hypothesis tests (continued)
46
References on p-value adjustment
(1) Chernick, M.R. (2007). Bootstrap Methods: A
Guide for Practitioners and Researchers, 2nd
Edition. Wiley, New York.
(2) Westfall, P. and Young, S. S. (1993).
Resampling-Based Multiple Testing: Examples
of p-Value Adjustment. Wiley, New York.
47
References for Individual Bioequivalence
(1) Chernick, M. R. (2007). Bootstrap Methods: A
Guide for Practitioners and Researchers. Wiley,
New York.
(2) Pigeot, I. (2001). The jackknife and bootstrap in
biomedical research – Common principles and
possible pitfalls. Drug Information J. 35, 1431-1443.
(3) Schall, R., and Luus, H. G. (1993). On
population and individual bioequivalence. Statist.
Med. 12, 1109-1124.
(4) Shao, J., Kübler, and Pigeot, I. (2000).
Consistency of the bootstrap procedure in
individual bioequivalence. Biometrika 87, 573-585.
48
References on when bootstrap fails
(1) Angus, J. E. (1993). Asymptotic theory for
bootstrapping the extremes. Communs. Statist.
Theory and Methods 22, 15-30.
(2) Athreya, K. B. (1987). Bootstrap estimation
of the mean in the infinite variance case. Ann.
Statist. 15, 724-731.
(3) Bickel, P. J. and Freedman, D. A. (1981).
Some asymptotic theory for the bootstrap. Ann.
Statist. 9, 1196-1217.
49
References on when bootstrap fails
(continued)
(4) Chernick, M.R. (1999). Bootstrap Methods: A
Practitioner’s Guide. Wiley, New York.
(5) Chernick, M.R. (2007). Bootstrap Methods: A
Guide for Practitioners and Researchers, 2nd
Edition. Wiley, New York.
(6) Cochran, W. (1977). Sampling Techniques.
3rd ed., Wiley, New York
50
References on when bootstrap fails
(continued)
(7) Knight, K. (1989). On the bootstrap of the
sample mean in the infinite variance case. Ann.
Statist. 17, 1168-1175.
(8) LePage, R., and Billard, L. (editors). (1992).
Exploring the Limit of Bootstrap. Wiley, New
York.
(9) Mammen, E. (1992). When Does the
Bootstrap Work? Asymptotic Results and
Simulations Springer-Verlag, Heidelberg.
(10) Singh, K. (1981). On the asymptotic
accuracy of Efron’s bootstrap. Ann. Statist. 9,
1187-1195.
51

More Related Content

PPTX
Hypothesis testing
PPT
chapter12.ppt
PPTX
PPTX
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
PDF
Data Analytics Tools presentation having different DA tools
PPTX
UNIT 4 PPT.pptx
PPTX
Hypothesis Testing
DOCX
Hypothesis testing
chapter12.ppt
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
Data Analytics Tools presentation having different DA tools
UNIT 4 PPT.pptx
Hypothesis Testing

Similar to Chernick.Michael (1).ppt (20)

PPT
chapter18.ppt
PPT
chi sqare test.ppt
PPT
chapter18.ppt
PPT
chapter18.ppt
PPT
Spsshelp 100608163328-phpapp01
PPT
CFA Fit Statistics
PPTX
linearity concept of significance, standard deviation, chi square test, stude...
PPTX
DWM- CO2_WAREHOUSE_MINING [Autosaved].pptx
PDF
Are we really including all relevant evidence
PPTX
Chi Squ.pptx.statisticcs.109876543210987
PPT
Diagnostic Tests.ppt
PPT
Validity andreliability
PDF
STAT 778 Project Proposal - Jonathan Poon
PPTX
Presentation1
PDF
Principles of Diagnostic Testing and ROC 2016
PPTX
Basics of Hypothesis Testing
PPT
Intro to ecm models and cointegration.ppt
PPTX
chi_square test.pptx
PPTX
STATISTIC ESTIMATION
chapter18.ppt
chi sqare test.ppt
chapter18.ppt
chapter18.ppt
Spsshelp 100608163328-phpapp01
CFA Fit Statistics
linearity concept of significance, standard deviation, chi square test, stude...
DWM- CO2_WAREHOUSE_MINING [Autosaved].pptx
Are we really including all relevant evidence
Chi Squ.pptx.statisticcs.109876543210987
Diagnostic Tests.ppt
Validity andreliability
STAT 778 Project Proposal - Jonathan Poon
Presentation1
Principles of Diagnostic Testing and ROC 2016
Basics of Hypothesis Testing
Intro to ecm models and cointegration.ppt
chi_square test.pptx
STATISTIC ESTIMATION
Ad

More from alizain9604 (20)

PPT
Cell Communication4.ppt123457899987523412
PPTX
proteinfolding-170226165229.pptx12345747
PPT
13-miller-chap-7a-lecture.ppt1234578904578
PPT
13-miller-chap-15-lecture (2).ppt1234578
PPT
signalling.ppt12345789009875431234578754345
PPT
Regulation of gene expression.ppt234578w3e45
PPT
1589353475-fermentation.ppt12345789934578
PPT
celltocellcommunication-101021235148-phpapp01.ppt
PPTX
bicatalysispresentation1-211210145704.pptx
PPT
13-miller-chap-15-lecture (1).ppt23457834
PPT
signalling (1).ppt12345777788888885555554
PPTX
BLAST AND FASTA.pptx12345789999987544321234
PPTX
transcriptionfactor-180830142612345.pptx
PPT
Ch15 Cell Signaling and Communication.ppt
PPT
13-miller-chap-15-lecture.ppt1234578900000000000000009875444333333333333333332
PPTX
1ystr-211201195417.pptx212345783457890345
PPTX
mixed Sample and LCN.pptx2w345789o345789o
PPT
Y_Workshop_WI_planz (3).ppt12345789999987543
PPTX
Production of Vaccine.pptx123457888889999990000
PPT
Bill Holmberg.ppt1234578999999234578912345
Cell Communication4.ppt123457899987523412
proteinfolding-170226165229.pptx12345747
13-miller-chap-7a-lecture.ppt1234578904578
13-miller-chap-15-lecture (2).ppt1234578
signalling.ppt12345789009875431234578754345
Regulation of gene expression.ppt234578w3e45
1589353475-fermentation.ppt12345789934578
celltocellcommunication-101021235148-phpapp01.ppt
bicatalysispresentation1-211210145704.pptx
13-miller-chap-15-lecture (1).ppt23457834
signalling (1).ppt12345777788888885555554
BLAST AND FASTA.pptx12345789999987544321234
transcriptionfactor-180830142612345.pptx
Ch15 Cell Signaling and Communication.ppt
13-miller-chap-15-lecture.ppt1234578900000000000000009875444333333333333333332
1ystr-211201195417.pptx212345783457890345
mixed Sample and LCN.pptx2w345789o345789o
Y_Workshop_WI_planz (3).ppt12345789999987543
Production of Vaccine.pptx123457888889999990000
Bill Holmberg.ppt1234578999999234578912345
Ad

Recently uploaded (20)

PPTX
Introduction to cybersecurity and digital nettiquette
PPTX
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
PPTX
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
PDF
si manuel quezon at mga nagawa sa bansang pilipinas
PPTX
E -tech empowerment technologies PowerPoint
PPT
Design_with_Watersergyerge45hrbgre4top (1).ppt
PPT
Ethics in Information System - Management Information System
PPT
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
PPT
250152213-Excitation-SystemWERRT (1).ppt
PPTX
SAP Ariba Sourcing PPT for learning material
PPTX
Mathew Digital SEO Checklist Guidlines 2025
DOC
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
PPTX
Funds Management Learning Material for Beg
PPTX
newyork.pptxirantrafgshenepalchinachinane
PPTX
artificialintelligenceai1-copy-210604123353.pptx
PDF
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
PPTX
Layers_of_the_Earth_Grade7.pptx class by
PDF
SASE Traffic Flow - ZTNA Connector-1.pdf
PDF
Introduction to the IoT system, how the IoT system works
PPTX
t_and_OpenAI_Combined_two_pressentations
Introduction to cybersecurity and digital nettiquette
IPCNA VIRTUAL CLASSES INTERMEDIATE 6 PROJECT.pptx
June-4-Sermon-Powerpoint.pptx USE THIS FOR YOUR MOTIVATION
si manuel quezon at mga nagawa sa bansang pilipinas
E -tech empowerment technologies PowerPoint
Design_with_Watersergyerge45hrbgre4top (1).ppt
Ethics in Information System - Management Information System
isotopes_sddsadsaadasdasdasdasdsa1213.ppt
250152213-Excitation-SystemWERRT (1).ppt
SAP Ariba Sourcing PPT for learning material
Mathew Digital SEO Checklist Guidlines 2025
Rose毕业证学历认证,利物浦约翰摩尔斯大学毕业证国外本科毕业证
Funds Management Learning Material for Beg
newyork.pptxirantrafgshenepalchinachinane
artificialintelligenceai1-copy-210604123353.pptx
SlidesGDGoCxRAIS about Google Dialogflow and NotebookLM.pdf
Layers_of_the_Earth_Grade7.pptx class by
SASE Traffic Flow - ZTNA Connector-1.pdf
Introduction to the IoT system, how the IoT system works
t_and_OpenAI_Combined_two_pressentations

Chernick.Michael (1).ppt

  • 1. Bootstrap Methods: Recent Advances and New Applications 2007 Nonparametrics Conference Michael Chernick United BioSource Corporation October 11, 2007 1
  • 2. Bootstrap Topics • Introduction to Bootstrap • Wide Variety of Applications • Confidence regions and hypothesis tests • Examples of bootstrap applications: (1) P-value adjustment - consulting example, (2) Bioequivalence - Individual Bioequivalence • Examples where bootstrap is not consistent and remedies: (1) infinite variance case for a population mean and (2) extreme values • Available Software 2
  • 3. Introduction • The bootstrap is a general method for doing statistical analysis without making strong parametric assumptions. • Efron’s nonparametric bootstrap, resamples the original data. • It was originally designed to estimate bias and standard errors for statistical estimates much like the jackknife. 3
  • 4. Introduction (continued) • The bootstrap is similar to earlier techniques which are also called resampling methods: – (1) jackknife, – (2) cross-validation, – (3) delta method, – (4) permutation methods, and – (5) subsampling.. 4
  • 5. Introduction (continued) The technique was extended, modified and refined to handle a wide variety of problems including: – (1) confidence intervals and hypothesis tests, – (2) linear and nonlinear regression, – (3) time series analysis and other problems 5
  • 6. Introduction (continued) Definition of Efron’s nonparametric bootstrap. Given a sample of n independent identically distributed (i.i.d.) observations X1, X2, …, Xn from a distribution F and a parameter  of the distribution F with a real valued estimator (X1, X2, …, Xn ), the bootstrap estimates the accuracy of the estimator by replacing F with Fn, the empirical distribution, where Fn places probability mass 1/n at each observation Xi. 6
  • 7. Introduction (continued) • Let X1 *, X2 *, …, Xn * be a bootstrap sample, that is a sample of size n taken with replacement from Fn . • The bootstrap, estimates the variance of (X1, X2, …, Xn ) by computing or approximating the variance of * = (X1 *, X2 *, …, Xn * ). 7
  • 8. Introduction (continued) • Statistical Functionals - A functional is a mapping that takes functions into real numbers. • Parameters of a distribution can usually be expressed as functionals of the population distribution. • Often the standard estimate of a parameter is the same functional applied to the empirical distribution. 8
  • 9. Introduction (continued) • Statistical Functionals and the bootstrap. • A parameter  is a functional T(F) where T denotes the functional and F is a population distribution. • An estimator of  is h = T(Fn) where Fn is the empirical distribution function. • Many statistical problems involve properties of the distribution of  - h , its mean (bias of h ), variance, median etc. 9
  • 10. Introduction (continued) • Bootstrap idea: Cannot determine the distribution of  - h but through the bootstrap we can determine, or approximate through Monte Carlo, the distribution of h - *, where * = T(Fn *) and Fn * is the empirical distribution for a bootstrap sample X1 *, X2 *,…,Xn * (* is a bootstrap estimate of ). • Based on k bootstrap samples the Monte Carlo approximation to the distribution of h - * is used to estimate bias, variance etc. for h . • In bootstrapping h substitutes for  and * substitutes for h . Called the bootstrap principle. 10
  • 11. Introduction (continued) • Basic Theory: Mathematical results show that bootstrap estimates are consistent in particular cases. • Basic Idea: Empirical distributions behave in large samples like population distributions. Glivenko-Cantelli Theorem tells us this. • The smoothness condition is needed to transfer consistency to functionals of Fn, such as the estimate of the parameter . 11
  • 12. Wide Variety of Applications • Efron and others recognized that through the power of fast computing the Monte Carlo approximation could be used to extend the bootstrap to many different statistical problems . 12
  • 13. Wide Variety of Applications (continued) • It can estimate process capability indices for non-Gaussian data. • It is used to adjust p-values in a variety of multiple comparison situations. • It can be extended to problems involving dependent data including multivariate, spatial and time series data and in sampling from finite populations. 13
  • 14. Wide Variety of Applications (continued) • It also has been applied to problems involving missing data. • In many cases, the theory justifying the use of bootstrap (e.g. consistency theorems) has been extended to these non i.i.d. settings. • In other cases, the bootstrap has been modified to “make it work.” The general case of confidence interval estimation is a notable example. 14
  • 15. Confidence regions and hypothesis tests • The percentile method and other bootstrap variations may require 1000 or more bootstrap replications to be very useful. • The percentile method only works under special conditions. • Bias correction and other adjustments are sometimes needed to make the bootstrap “accurate” and “correct” when the sample size n is small or moderate. 15
  • 16. Confidence regions and hypothesis tests (continued) • Confidence intervals are accurate or nearly exact when the stated confidence level for the intervals is approximately the long run probability that the random interval contains the “true” value of the parameter. • Accurate confidence intervals are said to be correct if they are approximately the shortest length confidence intervals possible for the given confidence level. 16
  • 17. Confidence regions and hypothesis tests (continued) • The BCa method, the iterated bootstrap (or double bootstrap) and the bootstrap t method are methods for constructing bootstrap confidence intervals that are closer to being exact (accurate) and correct than the percentile method in many circumstances. • See Chernick (2007) pp. 57-65 for details on these methods. 17
  • 18. Confidence regions and hypothesis tests (continued) • Hall and Martin have shown what the rate is that various bootstrap estimates approach their advertised confidence levels as the size n of the original sample increases. • They use Edgeworth and Cornish-Fisher expansions to prove these results. • See Hall (1992) Chapter 3 or Chernick (2007) Section 3.1 for more discussion of this. • See Ewens and Grant (2001) Chapter 12 for another nice treatment and comparison with permutation tests. 18
  • 19. Four Methods for Setting Approximate Confidence Intervals for a Real-Valued Parameter  Method Abbrev- iation -Level Endpoint Correct if 1. Standard Normal Approximation S [] h + h z() h  N(, 2 ) with  constant 2. Percentile P [] Gh -1 () There exists a monotone transformation such that h=g(h) where = g() and h  N(, 2 ) and  is constant 3. Bias- corrected BC [] Gh -1 ({2z0 + z() }) There exists a monotone transformation such that h=g(h) where = g() and h N(-z0, 2 ) and  and z0 are constant 4. BCa BCa [] Gh -1 ({z0 + [z0 + z() ]/[1-a(z0 +z() ]}) There exists a monotone transformation such that h=g(h) where = g() and h  N(-z00, 0 2 ) where 0 = 1+a and z0 and a are constant. 19
  • 20. Hypothesis tests • Since there is a 1-1 correspondence between hypothesis tests and confidence intervals, a hypothesis test about a parameter  can be constructed based on a bootstrap confidence interval for . • See Chernick (1999 or 2007) Section 3.2. • Examples of hypothesis tests can be found in Section 3.3 of Chernick (1999 or 2007). • Advice on which method to use is also given in Carpenter and Bithell (2000). 20
  • 21. Examples of bootstrap applications • P-value adjustment - a consulting example • Many problems in the course of a clinical trial involve multiple comparisons or repeated significance tests for a key endpoint at various follow-up times. • In these cases, the individual test p-values are not appropriate and p-value adjustment is appropriate. • Conservative estimates based on the Bonferroni inequality are often used but sometimes may be too conservative. 21
  • 22. P-value Adjustment Application • Westfall and Young (1993) have demonstrated useful bootstrap and permutation approaches which work in a wide variety of multiple testing situations. • Their methods are implemented in the SAS software package (Version 6.12 or higher) through a procedure called PROC MULTTEST. • Chernick has implemented this approach in a number of clinical trials. • As a consultant on a particular clinical trial he employed p-value adjustment to determine if results differed significantly depending on the country where the patient was treated. 22
  • 23. P-value Adjustment Application (continued) • This example is presented in Section 8.5.3 of Chernick (2007). • A company conducted a clinical trial for a medical treatment in one country but due to slow enrollment decided to extend the trial to other countries. • The initial country we denote as country E. • The other four countries are labeled A, B, C and D. 23
  • 24. P-value Adjustment Application (continued) • Fisher’s exact test was used to compare failure rates for the treatment with failure rates for the control. The primary statistical analysis of the endpoint. • In country E, the result showed that the treatment was superior to the control, but this was not the case in the other countries. • The client wanted to show that there were differences among countries which made the poolability of the data questionable. 24
  • 25. P-value Adjustment Application (continued) • They wanted to claim that only the data in country E was relevant to the submission since they were seeking regulatory approval only in country E. • This involved comparing treatment success in each country compared to country E. 25
  • 26. P-value Adjustment Application (continued) • There are 4 relevant pairwise comparisons of other countries with country E. • Consequently, the raw p-values from the individual Fisher tests are not appropriate. • The raw p-values were compared with the Bonferroni adjustment and the bootstrap adjustment. 26
  • 27. P-value Adjustment Application (continued) TABLE 8.1 from Chernick (2007) page 152 Comparison of Treatment Failure Rates Country failure rate A 40% (18/45) B 41% (58/143) C 29% (20/70) D 29% (51/177) E 22% (26/116) TABLE 8.2 from Chernick (2007) page 153 Comparison of p-value adjustments Countries Raw p Bonf. p Boot. p E vs A 0.0307 0.1229 0.0855 E vs B 0.0021 0.0085 0.0062 E vs C 0.3826 1.0000 0.7654 E vs D 0.2776 1.0000 0.6193 27
  • 28. P-value Adjustment Application (continued) • The raw p-values indicated that failure rate for E was statistically significantly different (lower) from A and B at the 5% level. • But results are misleading since they ignore the multiple testing. • The Bonferroni bound shows only E and B to be statistically significantly different at the 10% level. 28
  • 29. P-value Adjustment Application (continued) • But the Bonferroni bound is known to be excessively conservative in many situations. • Bootstrap provides an appropriate answer. • For the bootstrap estimate we again find that E and B are clearly different but now we find that the p-value for E and A is below 0.10 and so E is statistically significantly better than A at the 10% level. 29
  • 30. Individual Bioequivalence • The FDA has a Guidance document on how to conduct bioequivalence (bioavailability) trials. • Three types of bioequivalence have been defined (1) average bioequivalence, (2) population bioequivalence and (3) individual bioequivalence. • Currently the FDA only requires average bioequivalence be shown (a change over past policy). • Bootstrap solutions useful in determining individual bioequivalence and population bioequivalence have been devised and shown to be consistent. 30
  • 31. Individual Bioequivalence: Model In the model, we consider crossing over twice with the sequence TRR Meaning new treatment first and then the reference treatment 2 times and RTR, reference first followed by new treatment and then the reference again. Consider the following model for pharmacokinetic response in a 2 treatment crossover design using only sequences RTR and TRR randomized 1:1: Yijkl = μ + Fl +Pj + Qk + Wijk + Sikl + εijkl, where μ is the overall mean, Pj is the fixed effect for the jth period with the constraint ∑ Pj = 0, Qk is the fixed effect for the kth sequence with ∑ Qk = 0, Fl is the fixed effect for the lth drug. 31
  • 32. Individual Bioequivalence: Model (continued) For these trials we only have two drugs the new and old Formulations denoted T for the new treatment and R for the reference formulation. We also have the constraint that FT + FR = 0. Now Sikl is a random effect of the ith subject in the kth sequence with the lth treatment, Wijk is the fixed interaction between treatment , sequence and period and εijkl is a random noise (error) component with mean 0 independent and identically distributed and independent of all the fixed and random effects. 32
  • 33. Individual Bioequivalence: Definition Under the linear model given on the previous slides individual bioequivalence is accepted if after testing H0:ΔPB ≤ Δ versus H1: ΔPB > Δ, where ΔPB = PTR – P RR with PTR = prob(|YT-YR| ≤ r) and P RR = prob(|YT - Y’R| ≤ r) where Δ and r are determined fixed constants and Y’R is the observed response the second time the reference treatment is given. 33
  • 34. Bootstrap Results for this Trial • See Schall and Luus (1993) for a description of a bootstrap hypothesis test for this problem. • Pigeot (2001) in a survey article describes the Schall and Luus method in detail, shows that their method is not consistent and modifies it by constructing a bootstrap percentile method confidence interval to use in the test. • In an earlier work Shao, Kübler and Pigeot (2000) prove that the bootstrap method Pigeot describes in Pigeot (2001) is consistent. 34
  • 35. Examples where the bootstrap fails • Athreya (1987) shows that the bootstrap estimate of the sample mean is inconsistent when the population distribution has an infinite variance. • Angus (1993) provides similar inconsistency results for the maximum and minimum of a sequence of independent identically distributed observations. 35
  • 36. Examples where the bootstrap fails (continued) We shall describe the inconsistency of the bootstrap in these two cases and then provide remedies (1) sample mean with infinite population variance, and (2) maximum term in an i.i.d sequence of observations 36
  • 37. Example where the bootstrap fails - Sample Mean with Infinite Population Variance • Singh (1981) and Bickel and Freedman (1981) showed that in the case of estimating the mean from an i.i.d. sample with a finite population variance the bootstrap procedure is consistent. • In the case of an infinite variance, the population distribution might have a distribution F(x) satisfying 1-F(x) ~ cx- L(x) where L is a slowly varying function as x , c is a nonnegative constant and 0<2. • Under these conditions, the sample mean appropriately normalized, converges to a stable distribution. 37
  • 38. Example where the bootstrap fails - Sample Mean with Infinite Population Variance (continued) • For =2 the variance of F is finite and the central limit applies. For <2 the population variance is infinite. • Theorem 1 of Athreya (1987) proves the inconsistency of the bootstrap for the case where 1<<2. • The result tells us that when we appropriately normalize the sample mean and apply the bootstrap substitutions the bootstrap version of the normalized mean converges to a random probability distribution and not to the corresponding fixed stable distribution that the sample mean converges to. 38
  • 39. Example where the bootstrap fails - Estimating extreme values • For i.i.d. random variables Gnedenko’s theorem usually applies to the maximum or minimum values. • Gnedenko’s theorem states that when appropriately normalized the minimum value and the maximum value converge to one of three extreme value distribution families. • The appropriate family depends on the tail behavior of the population distribution. 39
  • 40. Example where the bootstrap fails - Estimating extreme values (continued) • Angus (1993) showed that using the appropriate normalization and the bootstrap substitution, the maximum and minimum converge to a random probability distribution and not the fixed extreme value distribution from Gnedenko’s theorem that the sample extremes converge to. 40
  • 41. Bootstrap Remedies • In the past decade many of the problems where the bootstrap is inconsistent remedies have been found by researchers to give good modified bootstrap solutions that are consistent. • For both problems describe thus far a simple procedure called the m-out-n bootstrap has been shown to lead to consistent estimates . 41
  • 42. The m-out-of-n Bootstrap • This idea was proposed by Bickel and Ren (1996) for handling doubly censored data. • Instead of sampling n times with replacement from a sample of size n they suggest to do it only m times where m is much less than n. • To get the consistency results both m and n need to get large but at different rates. We need m=o(n). That is m/n→0 as m and n both → ∞. • This method leads to consistent bootstrap estimates in many cases where the ordinary bootstrap has problems, particularly (1) mean with infinite variance and (2) extreme value distributions. 42
  • 43. Available Software • Resampling Stats from Resampling Stats Inc. (provides basic bootstrap tools in easy to use software and is good as an elementary teaching tool). • SPlus from Insightful Corporation ( good for advanced bootstrap techniques such as BCa, easy to use in new Windows based version). The current module Resample is what I use in my bootstrap class at statistics.com. • S functions provided by Tibshirani (see Appendix in Efron and Tibshirani text or visit Rob Tibshirani’s web site http:/www.stat-stanford.edu/~tibs) 43
  • 44. Available Software (continued) • Stata has a bootstrap algorithm available that some users rave about. • Mathworks and other examples (see Susan Holmes web page): http:/www-stat.stanford.edu/~susan) or contact her by email • SAS macros are available and Proc MULTTEST does bootstrap sampling. 44
  • 45. References on confidence intervals and hypothesis tests (1) Chernick, M.R. (1999). Bootstrap Methods: A Practitioner’s Guide. Wiley, New York. (2) Chernick, M.R. (2007). Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd Edition. Wiley, New York. (3) Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York. (4) Efron, B. (1982) The Jackknife, the Bootstrap and Other Resampling Plans. Society for Industrial and Applied Mathematics CBMS-NSF Regional Conference Series 38, Philadelphia. 45
  • 46. (5) Carpenter, J. and Bithell, J. (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine 19, 1141-1164. (6) Bahadur, R.R. and Savage, L.J. (1956). The nonexistence of certain statistical procedures in nonparametric problems. Annals of Mathematical Statistics 27, 1115-1122. (7) Ewens, W.J. and Grant, G.R. (2001). Statistical Methods in Bioinformatics An Introduction. References on confidence intervals and hypothesis tests (continued) 46
  • 47. References on p-value adjustment (1) Chernick, M.R. (2007). Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd Edition. Wiley, New York. (2) Westfall, P. and Young, S. S. (1993). Resampling-Based Multiple Testing: Examples of p-Value Adjustment. Wiley, New York. 47
  • 48. References for Individual Bioequivalence (1) Chernick, M. R. (2007). Bootstrap Methods: A Guide for Practitioners and Researchers. Wiley, New York. (2) Pigeot, I. (2001). The jackknife and bootstrap in biomedical research – Common principles and possible pitfalls. Drug Information J. 35, 1431-1443. (3) Schall, R., and Luus, H. G. (1993). On population and individual bioequivalence. Statist. Med. 12, 1109-1124. (4) Shao, J., Kübler, and Pigeot, I. (2000). Consistency of the bootstrap procedure in individual bioequivalence. Biometrika 87, 573-585. 48
  • 49. References on when bootstrap fails (1) Angus, J. E. (1993). Asymptotic theory for bootstrapping the extremes. Communs. Statist. Theory and Methods 22, 15-30. (2) Athreya, K. B. (1987). Bootstrap estimation of the mean in the infinite variance case. Ann. Statist. 15, 724-731. (3) Bickel, P. J. and Freedman, D. A. (1981). Some asymptotic theory for the bootstrap. Ann. Statist. 9, 1196-1217. 49
  • 50. References on when bootstrap fails (continued) (4) Chernick, M.R. (1999). Bootstrap Methods: A Practitioner’s Guide. Wiley, New York. (5) Chernick, M.R. (2007). Bootstrap Methods: A Guide for Practitioners and Researchers, 2nd Edition. Wiley, New York. (6) Cochran, W. (1977). Sampling Techniques. 3rd ed., Wiley, New York 50
  • 51. References on when bootstrap fails (continued) (7) Knight, K. (1989). On the bootstrap of the sample mean in the infinite variance case. Ann. Statist. 17, 1168-1175. (8) LePage, R., and Billard, L. (editors). (1992). Exploring the Limit of Bootstrap. Wiley, New York. (9) Mammen, E. (1992). When Does the Bootstrap Work? Asymptotic Results and Simulations Springer-Verlag, Heidelberg. (10) Singh, K. (1981). On the asymptotic accuracy of Efron’s bootstrap. Ann. Statist. 9, 1187-1195. 51