Advanced Bayesian Methods For Medical Test Accuracy Lyle D Broemeling
Advanced Bayesian Methods For Medical Test Accuracy Lyle D Broemeling
Advanced Bayesian Methods For Medical Test Accuracy Lyle D Broemeling
Advanced Bayesian Methods For Medical Test Accuracy Lyle D Broemeling
1. Advanced Bayesian Methods For Medical Test
Accuracy Lyle D Broemeling download
https://ebookbell.com/product/advanced-bayesian-methods-for-
medical-test-accuracy-lyle-d-broemeling-4421348
Explore and download more ebooks at ebookbell.com
2. Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Air Quality Monitoring And Advanced Bayesian Modeling Yongjie Li
https://ebookbell.com/product/air-quality-monitoring-and-advanced-
bayesian-modeling-yongjie-li-49169088
Basic And Advanced Bayesian Structural Equation Modeling With
Applications In The Medical And Behavioral Sciences 1st Edition
Xinyuan Song
https://ebookbell.com/product/basic-and-advanced-bayesian-structural-
equation-modeling-with-applications-in-the-medical-and-behavioral-
sciences-1st-edition-xinyuan-song-4299666
Advanced Methodologies For Bayesian Networks Second International
Workshop Ambn 2015 Yokohama Japan November 1618 2015 Proceedings 1st
Edition Joe Suzuki
https://ebookbell.com/product/advanced-methodologies-for-bayesian-
networks-second-international-workshop-ambn-2015-yokohama-japan-
november-1618-2015-proceedings-1st-edition-joe-suzuki-5355644
Advances In Bayesian Networks 1st Edition Alireza Daneshkhah
https://ebookbell.com/product/advances-in-bayesian-networks-1st-
edition-alireza-daneshkhah-4229792
3. Advanced Concrete Technology 2nd Edition 2nd Zongjin Li
https://ebookbell.com/product/advanced-concrete-technology-2nd-
edition-2nd-zongjin-li-44871020
Advanced Computer Science Kostas Dimtriou Markos Hatzitaskos
https://ebookbell.com/product/advanced-computer-science-kostas-
dimtriou-markos-hatzitaskos-44880120
Advanced Biological Processes For Wastewater Treatment Emerging
Consolidated Technologies And Introduction To Molecular Mrcia Dezotti
https://ebookbell.com/product/advanced-biological-processes-for-
wastewater-treatment-emerging-consolidated-technologies-and-
introduction-to-molecular-mrcia-dezotti-44898518
Advanced Blockchain Technology Liang Cai Qilei Li Xiubo Liang
https://ebookbell.com/product/advanced-blockchain-technology-liang-
cai-qilei-li-xiubo-liang-44906730
Advanced Excel Formulas Unleashing Brilliance With Excel Formulas Alan
Murray
https://ebookbell.com/product/advanced-excel-formulas-unleashing-
brilliance-with-excel-formulas-alan-murray-44954930
6. Editor-in-Chief
Shein-Chung Chow, Ph.D.
Professor
Department of Biostatistics and Bioinformatics
Duke University School of Medicine
Durham, North Carolina
Series Editors
Byron Jones
Senior Director
Statistical Research and Consulting Centre
(IPC 193)
Pfizer Global Research and Development
Sandwich, Kent, U
.K.
Jen-pei Liu
Professor
Division of Biometry
Department of Agronomy
National Taiwan University
Taipei, Taiwan
Karl E. Peace
Georgia Cancer Coalition
Distinguished Cancer Scholar
Senior Research Scientist and
Professor of Biostatistics
Jiann-Ping Hsu College of Public Health
Georgia Southern University
Statesboro, Georgia
Bruce W. Turnbull
Professor
School of Operations Research
and Industrial Engineering
Cornell University
Ithaca, New York
7. Adaptive Design Theory and
Implementation Using SAS and R
Mark Chang
Advanced Bayesian Methods for Medical
Test Accuracy
Lyle D. Broemeling
Advances in Clinical Trial Biostatistics
Nancy L. Geller
Applied Statistical Design for the
Researcher
Daryl S. Paulson
Basic Statistics and Pharmaceutical
Statistical Applications, Second Edition
James E. De Muth
Bayesian Adaptive Methods for
Clinical Trials
Scott M. Berry, Bradley P. Carlin,
J. Jack Lee, and Peter Muller
Bayesian Analysis Made Simple: An Excel
GUI for WinBUGS
Phil Woodward
Bayesian Methods for Measures of
Agreement
Lyle D. Broemeling
Bayesian Missing Data Problems: EM,
Data Augmentation and Noniterative
Computation
Ming T. Tan, Guo-Liang Tian,
and Kai Wang Ng
Bayesian Modeling in Bioinformatics
Dipak K. Dey, Samiran Ghosh,
and Bani K. Mallick
Causal Analysis in Biomedicine and
Epidemiology: Based on Minimal
Sufficient Causation
Mikel Aickin
Clinical Trial Data Analysis using R
Ding-Geng (Din) Chen and Karl E. Peace
Clinical Trial Methodology
Karl E. Peace and Ding-Geng (Din) Chen
Computational Methods in Biomedical
Research
Ravindra Khattree and Dayanand N. Naik
Computational Pharmacokinetics
Anders Källén
Controversial Statistical Issues in
Clinical Trials
Shein-Chung Chow
Data and Safety Monitoring Committees
in Clinical Trials
Jay Herson
Design and Analysis of Animal Studies in
Pharmaceutical Development
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Bioavailability and
Bioequivalence Studies, Third Edition
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Clinical Trials with
Time-to-Event Endpoints
Karl E. Peace
Design and Analysis of Non-Inferiority
Trials
Mark D. Rothmann, Brian L. Wiens,
and Ivan S. F. Chan
Difference Equations with Public Health
Applications
Lemuel A. Moyé and Asha Seth Kapadia
DNA Methylation Microarrays:
Experimental Design and Statistical
Analysis
Sun-Chong Wang and Arturas Petronis
DNA Microarrays and Related Genomics
Techniques: Design, Analysis, and
Interpretation of Experiments
David B. Allsion, Grier P. Page,
T. Mark Beasley, and Jode W. Edwards
Dose Finding by the Continual
Reassessment Method
Ying Kuen Cheung
Elementary Bayesian Biostatistics
Lemuel A. Moyé
8. Frailty Models in Survival Analysis
Andreas Wienke
Generalized Linear Models: A Bayesian
Perspective
Dipak K. Dey, Sujit K. Ghosh,
and Bani K. Mallick
Handbook of Regression and Modeling:
Applications for the Clinical and
Pharmaceutical Industries
Daryl S. Paulson
Measures of Interobserver Agreement
and Reliability, Second Edition
Mohamed M. Shoukri
Medical Biostatistics, Second Edition
A. Indrayan
Meta-Analysis in Medicine and Health
Policy
Dalene Stangl and Donal A. Berry
Monte Carlo Simulation for the
Pharmaceutical Industry: Concepts,
Algorithms, and Case Studies
Mark Chang
Multiple Testing Problems in
Pharmaceutical Statistics
Alex Dmitrienko, Ajit C. Tamhane,
and Frank Bretz
Sample Size Calculations in Clinical
Research, Second Edition
Shein-Chung Chow, Jun Shao
and Hansheng Wang
Statistical Design and Analysis of
Stability Studies
Shein-Chung Chow
Statistical Evaluation of Diagnostic
Performance: Topics in ROC Analysis
Kelly H. Zou, Aiyi Liu, Andriy Bandos,
Lucila Ohno-Machado, and Howard Rockette
Statistical Methods for Clinical Trials
Mark X. Norleans
Statistics in Drug Research:
Methodologies and Recent
Developments
Shein-Chung Chow and Jun Shao
Statistics in the Pharmaceutical Industry,
Third Edition
Ralph Buncher and Jia-Yeong Tsay
Translational Medicine: Strategies and
Statistical Methods
Dennis Cosmatos and Shein-Chung Chow
18. Preface
Bayesian methods are being used more and more in medicine and biology. For
example, at the University of Texas MD Anderson Cancer Center and other
institutions, Bayesian sequential stopping rules are implemented somewhat
routinely in the design of clinical trials. Also, Bayesian techniques are being
used more frequently in diagnostic medicine, such as estimating the accuracy
of diagnostic tests and for screening large populations for various diseases.
Bayesian methods are quite attractive in many areas of medicine because
they are based on prior information, which is usually available in the form
of related previous studies. An example of this is in the planning of Phase II
clinical trials, where a new therapy will be administered to patients who have
advanced disease. Such therapies are developed by pharmaceutical companies
and their success depends on the success of previous Phase I or other relevant
Phase II trials. Bayes theorem allows a logical way to incorporate the previous
information with the information that will accrue in the future. Accuracy of
a medical test is an essential component of the diagnostic process and is the
key issue of this book. Of course, medicine and biology are not the only areas
where the concept of test accuracy plays a paramount role. For example, in the
area of sports (e.g., cycling or baseball), the accuracy of a test for “doping”
is of extreme importance for maintaining the integrity of the sport.
Advanced Bayesian Methods for Medical Test Accuracy is intended as a
textbook for graduate students in statistics and as a reference for consulting
statisticians. It will be an invaluable resource especially for biostatistics stu-
dents who will be working in the various areas of diagnostic medicine (e.g.,
pathology and/or diagnostic imaging). The book is very practical and the
student will learn many useful methods for measuring the accuracy of vari-
ous medical tests. Most of the book is focused on Bayesian inferential proce-
dures, but some is devoted to the design of such studies. A student should
have completed a year of introductory probability and mathematical statis-
tics, several introductory methods courses, such as regression and the anal-
ysis of variance, and a course that is primarily an introduction to Bayesian
inference.
Consulting statisticians working in the areas of medicine and biology will
have an invaluable reference with Advanced Bayesian Methods for Medical Test
Accuracy, which will supplement the books Statistical Methods for Diagnostic
Medicine by Zhou, Obuchowski, and McClish, and The Statistical Evaluation
of Medical Tests for Classification and Prediction by Pepe. The two references
xv
K11763 FM page: xv date: June 21, 2011
19. K11763 FM page: xvi date: June 21, 2011
xvi Preface
are not presented from a Bayesian viewpoint; thus, the present volume is
unique and will develop methods of test accuracy that should prove to be
very useful to the consultant. Another unique feature of the book is that all
computing and analysis is based on the WinBUGS package, which will allow
the user a platform that efficiently uses prior information. Many of the ideas in
the present volume are presented for the first time and go far beyond the two
standard references. For the novice, an appendix introduces the fundamentals
of programming and executing BUGS, and as a result, the reader will have the
tools and experience to successfully analyze studies for medical test accuracy.
A very attractive feature of the book is that the author’s blog:
http://medtestacc.blogspot.com provides the BUGS code, which can be exe-
cuted as one progresses through the book and as one does the exercises at
the end of each chapter. Note, each chapter includes the code labeled as
BUGS CODE 4.1, BUGS CODE 4.2, etc., and this is also included in the
author’s blog; thus, the student can cycle between the book and the blog,
which reinforces the subject in a beneficial manner.
20. Acknowledgments
The author gratefully acknowledges the many departments and people at the
University of Texas MD Anderson Cancer Center in Houston, who assisted
him during the writing of this book. Many of the analyses that appear in
Advanced Bayesian Methods for Medical Test Accuracy are based on studies
performed at the Division of Diagnostic Imaging, and in particular, he would
like to thank Drs. Gayed, Munden, Kundra, Marom, Ng, Tamm, and Gupta.
Special thanks to Ana Broemeling for the editing and organization of the early
versions of the manuscript and for her encouragement during the writing of
this book.
xvii
K11763 FM page: xvii date: June 21, 2011
21. Author
Lyle D. Broemeling, PhD, is director of Broemeling and Associates Inc.,
and is a consulting biostatistician. He has been involved with academic health
science centers for over twenty years and has taught and been a consultant at
the University of Texas Medical Branch in Galveston, the University of Texas
MD Anderson Cancer Center, and the University of Texas School of Public
Health. During his tenure at the University of Texas, he developed an interest
in medical test accuracy, which resulted in various publications in the medical
literature as well as in statistical journals. His main interest is in developing
Bayesian methods for use in medical and biological problems and in authoring
textbooks in statistics. His previous books are Bayesian Analysis of Linear
Models, Econometrics and Structural Change, written with Hiroki Tsurumi,
Bayesian Biostatistics and Diagnostic Medicine, and Bayesian Methods for
Measures of Agreement.
xix
K11763 FM page: xix date: June 21, 2011
22. Chapter 1
Introduction
1.1 Introduction
This book describes Bayesian statistical methods for the design and anal-
ysis of studies involving medical test accuracy. It grew out of the author’s
experience in consulting with many investigators of the Division of Diagnostic
Imaging at the University of Texas MD Anderson Cancer Center (MDACC) in
Houston, Texas. In a modern medical center, medical test accuracy is crucial
for patient management, from the initial diagnosis to assessing the extent of
disease as the patient is being treated.
Why a book on medical test accuracy? The short answer is that your life
depends on it! Every visit to the doctor involves the use of some medical test,
from measuring blood pressure and temperature to perhaps more expensive
follow-up tests. If you go to the doctor complaining of chest pain, the doctor
might refer you for an exercise stress test, and if that test is positive, you might
undergo a heart catheterization to detect coronary artery disease. Suppose the
exercise stress test is mistakenly negative, and the doctor does not order any
follow-up procedures, then you go home with a hidden heart disease. You pay
the price later when you are belatedly treated for the disease. On the other
hand, suppose the exercise stress test is mistakenly positive, and the doctor
orders an unnecessary heart catheterization, which gives the correct diagno-
sis, namely, that you do not have heart disease. Then you have paid for an
unnecessary test. Diabetes is another example, where misdiagnosis can lead to
expensive treatment. This can happen when the blood glucose test mistakenly
indicates that you have type 2 diabetes, when in fact your blood glucose is
elevated, but not to the extent that drugs are needed to control the disease.
Of course, accuracy depends not only on the medical test, but also on the
interpretation of the test results. There are two sources of error, the inherent
variability of the medical test and the subjectivity involved in interpreting the
test output. Both of the above examples will be dealt with in more detail later
in the book, namely, the blood glucose test for type 2 diabetes, and the tests
for coronary artery disease. As a patient, you should know the accuracy of the
medical tests that will be administered to you when you go to the doctor for
routine visits and, more importantly, when you are undergoing treatment for
disease. An informative reference for the patient is Johnson, Sandmire, and
Klein [1].
1
K11763 Chapter: 1 page: 1 date: June 21, 2011
23. K11763 Chapter: 1 page: 2 date: June 21, 2011
2 Advanced Bayesian Methods for Medical Test Accuracy
1.2 Statistical Methods in Medical Test Accuracy
Biostatistics plays a pivotal role in the assessment of the accuracy of med-
ical tests, as can be discerned by reading papers in mainline journals, such
as Radiology and The Journal of Pathology, and more specialized journals,
such as The Journal of Computed Assisted Tomography, The Journal of Mag-
netic Resonance Imaging, The Journal of Nuclear Medicine, and The Journal
of Infectious Diseases. As we will see, the usual methods, ranging from the
t-test and chi-square test to others such as the analysis of variance and various
regression techniques, are standard fare for assessing the accuracy of medical
tests.
However, there are also some methods that are somewhat unique to the
field, including ways to estimate diagnostic test accuracy and methods to
measure the agreement between various tests and/or readers. This topic will be
addressed throughout the book. The most basic indicators of test accuracy are
the true and false positive fractions for medical tests that have binary scores or
score where a cutoff is used to declare disease, in effect providing binary scores.
For those patients with the disease, the fraction that test positive is referred to
as the true positive fraction. The fraction that test positive, among the non-
diseased patients, is called the false positive fraction. From the viewpoint of the
patient, the positive predictive value is important, because it is the fraction of
patients that have disease, among those that test positive for disease, but also
important is the negative predictive value, which is the fraction of subjects
who do not have the disease, among those that test negative. In most situations
for a medical test, these four values indicating test accuracy do not lead to
an unambiguous declaration that the test is a good one. Those factors that
affect the various measures of test accuracy will be described in the book.
An overall measure of a medical test with continuous scores is provided
by the area under the receiver operating characteristic (ROC) curve, which is
defined as follows. The area can vary from 0 to 1 and, in general, the area is
defined as follows:
ROC area = P[Y > X], (1.1)
where Y is the test score of an individual selected at random from the dis-
eased and X is the score of a subject selected at random from the non diseased.
When the area is 1, the test scores discriminate perfectly among the diseased
and non-diseased subjects, and if the area is 0.5, the scores are not at all
informative for discriminating between the two groups. The above definition
is changed to
ROC area = P[Y > X] + (1/2)P[Y = X], (1.2)
when the test scores are ordinal.
24. K11763 Chapter: 1 page: 3 date: June 21, 2011
Introduction 3
The types of tests employed in this book are used for diabetes, heart
disease, various forms of cancer, and tests for infectious diseases. A large pro-
portion of the tests are imaging tests for cancer, while tests for heart disease
are also represented. There are only a few examples of diabetes and infec-
tious diseases. One very important case is for medical imaging tests, which
are employed in cancer clinical trials, such as Phase II trials, where the main
objective is to determine the response to a new therapy, where response is
based on an image measurement. Computed tomography (CT; a form of x-ray)
is used to measure the tumor size at baseline before the trial begins and is used
at various times throughout the trial. Thus, there are several measurements
of tumor size for each patient, where all the measurements are used to clas-
sify the patient’s response. Thus, it is of paramount importance that the CT
measurements of tumor size be accurate, because inaccurate measurements
could lead to false declarations about the success or failure of a particular
therapy! This will be described in more detail later on when the Erasmus
et al. [2] study is explained. Also involved in this type of trial is the error
introduced by the several radiologists (readers) who are interpreting the CT
tumor measurements.
Along with the basic indicator of test accuracy, various statistical method-
ologies will be employed. For example, patient covariates will be taken into
account by regression techniques. Obviously, the true and false positive frac-
tions are affected by patient covariates, such as age, gender, and medical
history, and these regression techniques are described and illustrated in the
book. When the scores are ordinal or continuous, the appropriate regression
techniques are employed to measure accuracy by the area under the ROC
curve. Regression also plays an important role when comparing several read-
ers and when estimating the agreement between readers who are interpreting
the test results.
Often, not all patients are subject to the gold standard, the test that is
used as a reference to compute the accuracy of medical tests. For example,
in an exercise stress test, those that test negative are usually not referred to
the gold standard (heart catheterization) compared to those that test positive
and are usually given a heart catheterization. This is a special situation, called
verification bias, which requires specialized methods to estimate test accuracy
involving an application of Bayes theorem. When this is considered, various
generalizations will be implemented, including the consideration of several
tests and several readers and regression to take into account other patients
covariates.
Patient sample size for a clinical trial, based on Bayesian sequential stop-
ping rules, is another application that has proven to be quite beneficial in the
development of new medical therapies, where the accuracy of the medical tests
is key in the development of “new” therapies. The Bayesian approach will be
used throughout this book and is the foundation for estimating medical test
accuracy.
25. K11763 Chapter: 1 page: 4 date: June 21, 2011
4 Advanced Bayesian Methods for Medical Test Accuracy
1.3 Datasets for This Book
The datasets used in this book come from the following sources: (1) the
protocol review process of clinical trials at MDACC, where the author was
either a reviewer or a collaborator on the protocol; (2) the author’s consul-
tations with the scientific and clinical faculty of the Division of Diagnostic
Imaging at MDACC with some 32 datasets; (3) the several datasets accom-
panying the excellent book by Pepe [3] (these can be downloaded at http://
www.fhcrc.org/labs/pepe/Book) The Statistical Evaluation of Medical Tests
for Classification and Prediction; (4) the information contained in the exam-
ples of the WinBUGS package; and (5) other miscellaneous sources, including
the examples and problems in Statistical Methods in Diagnostic Medicine by
Zhou, McClish, and Obushowski [4].
1.4 Software
WinBUGS will be used for the Bayesian analysis for sampling from the
posterior distribution, and the appendix, which introduces the reader to the
basic elements of using the software, including many examples. The WinBUGS
code is clearly labeled in each chapter, e.g., BUGS CODE 4.1 and BUGS
CODE 4.2 in Chapter 4, and the code can be downloaded from the author’s
blog (http://www.medtestacc.blogspot.com). The reader can easily reproduce
the many analyses included in the book, which should greatly facilitate the
reader’s understanding of the Bayesian approach to estimating medical test
accuracy. The blog also contains a detailed example of how to execute the
Bayesian analysis using WinBUGS.
Many specialized Bayesian programs for the design and analysis of clinical
trials have been developed at the Department of Biostatistics and Applied
Mathematics at MDACC, some of which will be used for the design of clinical
trials as well as for many other analyses involved in biostatistics. These can
be accessed at http://biostatistics.mdanderson.org/SoftwareDownload/.
1.5 Bayesian Approach
Why is the Bayesian approach taken here? The author has been a Bayesian
for many years, since 1974 when he took leave to study at University College
London. Dennis Lindley persuaded him of the advantages of such an approach
and, of course, the main advantage is that it is a practical way to utilize prior
26. K11763 Chapter: 1 page: 5 date: June 21, 2011
Introduction 5
information. Prior information, especially in a medical setting, is ubiquitous
and should be used to one’s advantage as it would be a pity not to use it. It is
assumed that the reader is familiar with the Bayesian approach to inference,
but a brief introduction will be given here.
Suppose X is a continuous observable random vector and θ ∈ Ω ⊂ Rm
is
an unknown parameter vector, and suppose the conditional density of X given
θ is denoted by f(x/θ). If x = (x1, x2, . . . , xn) represents a random sample of
size n from a population with density f(x/θ), and ξ(θ) is the prior density
of θ, then Bayes theorem is given by
ξ(θ/x) = c
i=n
i=1
f(xi/θ)ξ(θ), xi ∈ R and θ ∈ Ω,
where the proportionality constant is c and the term
i=n
i=1
f(xi/θ),
is called the likelihood function. The density ξ(θ) is the prior density of θ
and represents the knowledge one possesses about the parameter before one
observes X. Such prior information is most likely available to the experi-
menter from other previous related experiments. Note that θ is considered a
random variable and that Bayes theorem transforms one’s prior knowledge
of θ, represented by its prior density, to the posterior density, and that the
transformation is the combining of the prior information about θ with the
sample information represented by the likelihood function.
“An essay toward solving a problem in the doctrine of chances” by the
Reverend Thomas Bayes appeared and was the beginning of our subject. He
considered a binomial experiment with n trials and assumed that the proba-
bility θ of success was uniformly distributed (by constructing a billiard table).
Bayes presented a way to calculate P(a ≤ θ ≤ b/x = p), where x is the number
of successes in n independent trials. This was a first in the sense that Bayes
was making inferences via ξ(θ/x), the conditional density of θ given x. Also,
by assuming that the parameter was uniformly distributed, he was assuming
vague prior information for θ. In what follows, the components of the param-
eter vector θ will be various measures of medical test accuracy.
It can well be argued that Laplace [5] made many significant contributions
to inverse probability (he did not know of Bayes), beginning in 1774 with his
own version of Bayes theorem, “Memorie sur la probabilite des causes par
la evenemens” and over a period of some 40 years culminating in “Theorie
analytique des probabilites.” See Stigler [6] and Chapters 9–20 of Hald [7] for
the history of Laplace’s many contributions to inverse probability.
It was in modern times that Bayesian statistics began its resurgence with
Lhoste [8], Jeffreys [9], Savage [10], and Lindley [11]. According to Broemeling
and Broemeling [12], Lhoste was the first to justify non-informative priors by
27. K11763 Chapter: 1 page: 6 date: June 21, 2011
6 Advanced Bayesian Methods for Medical Test Accuracy
invariance principals, a tradition carried on by Jeffreys. Savage’s book was a
major contribution in that Bayesian inference and decision theory was put on
a sound theoretical footing as a consequence of certain axioms of probabil-
ity and utility, while Lindley’s two volumes showed the relevance of Bayesian
inference to everyday statistical problems and was quite influential, setting
the tone and style for later books such as Box and Tiao [13], Zellner [14], and
Broemeling [15]. Box and Tiao and Broemeling were essentially works that
presented Bayesian methods for the usual statistical problems of the analysis
of variance and regression, while Zellner focused Bayesian methods primarily
on certain regression problems in econometrics. During this period, inferential
problems were solved analytically or by numerical integration. Models with
many parameters (such as hierarchical models with many levels) were diffi-
cult to use because at that time numerical integration methods had limited
capability in higher dimensions. For a good history of inverse probability, see
Chapter 3 of Stigler [6], and the two volumes of Hald [7], which present a com-
prehensive history and are invaluable as a reference. Dale [16] gives a very
complete and interesting account of Bayes’ contributions.
The last 20 years are characterized by the rediscovery and development of
resampling techniques, where samples are generated from the posterior dis-
tribution via Markov Chain Monte Carlo (MCMC) methods, such as Gibbs
sampling. Large samples generated from the posterior make it possible to
make statistical inferences and to employ multi-level hierarchical models to
solve complex, but practical problems, because computing technology is
available. See Leonard and Hsu [17], Gelman et al. [18], Congdon [19–21],
Carlin, Gelfand, and Smith [22], Gilks, Richardson, and Spiegelhalter [23],
who demonstrate the utility of MCMC techniques in Bayesian statistics. Of
course, in using WinBUGS, this book employs MCMC techniques to estimate
the parameters of the model. The output of the analysis typically includes the
posterior mean, standard deviation, median, and the upper and lower 2 1/2
percentiles. Also included is the MCMC error, which is an important compo-
nent of the analysis, and tells one how close the MCMC estimate is to the
“true” posterior characteristic, consequently, the MCMC error can be utilized
to adjust the sample size for the simulation.
References
[1] Johnson, D., Sandmire, D., and Klein, D. Medical Tests that can Save
your Life: 21 Tests your Doctor won’t Order. . . Unless You Know to Ask.
Rodale, Emmaus, PA, 2004.
[2] Erasmus, J.J., Gladish, G.W., Broemeling, L., Sabloff, B.S., Truong,
M.T., Herbst, R.S., and Munden, R.F. Interobserver variability in mea-
surement of non-small cell carcinoma of the lung lesions: Implications
28. K11763 Chapter: 1 page: 7 date: June 21, 2011
Introduction 7
for assessment of tumor response. Journal of Clinical Oncology, 21:2574,
2003.
[3] Pepe, M.S. The Statistical Evaluation of Medical Tests for Classification
and Prediction. Oxford University Press, Oxford, UK, 2000.
[4] Zhou, H.H., McClish, D.K., and Obuchowski, N.A. Statistical Methods
for Diagnostic Medicine. John Wiley, New York, 2002.
[5] Laplace, P.S. Memorie des les probabilities. Memories de l’Academie des
Sciences de Paris, 227, 1778.
[6] Stigler, M. The History of Statistics. The Measurement of Uncertainty
before 1900. The Belknap Press of Harvard University Press, Cambridge,
MA, 1986.
[7] Hald, A. A History of Mathematical Statistics from 1750 to 1930. Wiley
Interscience, London, 1990.
[8] Lhoste, E. Le calcul des probabilities appliqué a l’artillerie, lois de prob-
abilite a prior. Revue d’Artillerie, 91:405–423, 1923.
[9] Jeffreys, H. An Introduction to Probability. Clarendon Press, Oxford,
1939.
[10] Savage, L.J. The Foundation of Statistics. John Wiley, New York, 1954.
[11] Lindley, D.V. Introduction to Probability and Statistics from a Bayesian
Viewpoint, volumes I and II. Cambridge University Press, Cambridge,
1965.
[12] Broemeling, L.D. and Broemeling, A.L. Studies in the history of prob-
ability and statistics XLVIII: The Bayesian contributions of Ernest
Lhoste. Biometrika, 90(3):728–731, 2003.
[13] Box, G.E.P. and Tiao, G.C. Bayesian Inference in Statistical Analysis.
Addison Wesley, Reading, MA, 1973.
[14] Zellner, A. An Introduction to Bayesian Inference in Econometrics. John
Wiley, New York, 1971.
[15] Broemeling, L.D. The Bayesian Analysis of Linear Models. Marcel-
Dekker, New York, 1985.
[16] Dale, A. A History of Inverse Probability from Thomas Bayes to Karl
Pearson. Springer-Verlag, Berlin, 1991.
[17] Leonard, T. and Hsu, J.S.J. Bayesian Methods. An Analysis for Statis-
ticians and Interdisciplinary Researchers. Cambridge University Press,
Cambridge, 1999.
29. K11763 Chapter: 1 page: 8 date: June 21, 2011
8 Advanced Bayesian Methods for Medical Test Accuracy
[18] Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. Bayesian Data
Analysis. Chapman Hall/CRC and Taylor Francis, Boca Raton,
1997.
[19] Congdon, P. Bayesian Statistical Modeling. John Wiley, London, 2001.
[20] Congdon, P. Applied Bayesian Modeling. John Wiley, New York, 2003.
[21] Congdon, P. Bayesian Models for Categorical Data. John Wiley, New
York, 2005.
[22] Carlin, B.P., Gelfand, A.E., and Smith, A.F.M. Hierarchical Bayesian
analysis of changepoint problems. Applied Statistics, 41:389–405, 1992.
[23] Gilks, W.R., Richardson, S., and Spiegelhalter, D.J. Markov Chain Monte
Carlo in Practice. Chapman Hall/CRC, New York, 1996.
30. Chapter 2
Medical Tests and Preliminary
Information
2.1 Introduction
This chapter gives a brief description of medical imaging tests and other
tests routinely used at a major health care institution. Diagnostic imaging
plays an extremely important role in the overall care of the patient, including
diagnosis, staging, and monitoring of the patient during their stay in hospital.
Some of the examples in this book are taken from diagnostic imaging studies
for cancer, however, there are many other ways to perform diagnoses, and some
of these will also be explained. In addition to cancer, medical tests for heart
disease and stroke will be described, as well as some medical tests for diag-
nosing diabetes and infectious diseases. Most of the medical tests described in
this chapter will appear in the many examples to follow in later chapters.
2.2 Medical Imaging Tests
The primary tests for diagnostic imaging are x-ray, fluoroscopy, mammo-
graphy, computed tomography (CT), ultrasonography (US), magnetic reso-
nance imaging (MRI), and nuclear medicine. Each test has advantages and
disadvantages with regard to image quality, depending on the particular cli-
nical situation. Broadly speaking, image quality consists of three components.
The first is contrast. Contrast is good when important physical differences in
anatomy and tissue are displayed with corresponding different shades of gray
levels. The ability to display fine detail is another important aspect of image
quality and is called resolution. Anything that interferes with image quality is
referred to as noise, which is the third component of image quality. Obviously,
noise should be minimized in order to improve image quality.
2.2.1 X-ray
Medical images are best thought of as being produced by tracking certain
probes as they pass through the body. A stream of x-rays are passed through
9
K11763 Chapter: 2 page: 9 date: June 17, 2011
31. K11763 Chapter: 2 page: 10 date: June 17, 2011
10 Advanced Bayesian Methods for Medical Test Accuracy
the patient and captured on film as the stream exits. An x-ray is a stream
of photons, which are discrete packets of energy. As they pass through the
body, various tissues interact with the photons and these collisions remove and
scatter some of the photons. The various tissues reduce the amount of energy
in various parts of the stream by different amounts. A shadow is produced that
appears on a special photographic plate, producing an image. If the density
of the target object is much higher than that of the surrounding environment
(such as a bone), an x-ray does a good job of locating it. Some lesions have
densities that are quite similar to the surrounding medium and are difficult to
detect. Generally speaking, an x-ray has very good resolution and the noise is
easy to control, but has low contrast in certain cases. An x-ray is routine in
all medical settings, and is the most utilized of all imaging devices.
A close relative of x-rays is fluoroscopy. In this modality, the exiting beam
is processed further by projecting it onto an image intensifier, which is a
vacuum tube that transforms the x-ray shadow onto an optical image. This
mode has about the same image quality as an x-ray, but allows the radiologist
to manage images in real time. For example, it allows the operator to visualize
the movement of a contrast agent passing certain landmark locations in the
gastrointestinal tract or vascular system.
2.2.2 Computed tomography
Another variation of the x-ray is CT, which overcomes some of the limi-
tations of x-rays. The superimposition of shadows of overlapping tissues and
other anatomical structures often obscures detail in the image. CT produces
images quite differently from x-rays, however it does use x-rays, but the detec-
tion and processing of the shadows is quite sophisticated and is the distinctive
feature of the modality that vastly improves the image over that of an x-ray.
CT has good contrast among soft tissues (e.g., lung and brain tissue) and good
resolution. An x-ray takes information from a three-dimensional structure and
projects it onto a two-dimensional image, which causes the loss of detail due
to overlapping tissues. How does CT overcome this problem? The patient is
placed in a circle; inside the circle is an x-ray source and embedded in the
circle is an array of detectors that capture the shadow of the x-ray beam. The
x-ray source irradiates a thin slice of tissue across the patient and the detector
captures the shadow. The x-ray source moves to an adjacent location and the
process is repeated, say 700 times. The x-ray source circumscribes the patient
through 360 degrees. The source then repeats the above process with another
thin slice. For a given slice, there are 700 projections of that slice and these
700 projections are processed via computer and back projection algorithms to
produce the two-dimensional representation. The computer works backward
from the projections to reconstruct the spatial distribution of the structure of
the thin slice. In other words, CT answers the following question: what does
the original structure have to look like in order to produce the 700 generated
projections?
32. K11763 Chapter: 2 page: 11 date: June 17, 2011
Medical Tests and Preliminary Information 11
A good example of CT (using the Imatron C-100 Ultrafast) is screening for
coronary heart disease, where the coronary artery calcium (CAC) score indi-
cates the degree of disease severity. See Mielke, Shields, and Broemeling [1, 2],
DasGupta et al. [3], and Broemeling and Mielke [4], where the accuracy of
CAC to diagnose heart disease is estimated by the area under the receiver
operating characteristic (ROC) curve. These examples will be examined again
in later chapters from a Bayesian perspective.
2.2.3 Mammography
Mammography is yet another variation of the x-ray. While some small
masses can be detected by a physician or by self-examination, mammography
has the ability to detect very small lesions. However, the smaller they are, the
more difficult they are to detect. The set up for mammography consists of a
specialized x-ray tube and generator, a breast compression device, an anti-
scatter grid, and film. The procedure must be able to reveal small differences
in breast density, possibly indicative of a suspicious mass, and it must also
be able to detect small calcifications that may be important to diagnosis. All
the attributes of good image quality are required, namely, high contrast, good
resolution, and low noise. Later in this book, the role of mammography in
screening for breast cancer will be described.
2.2.4 Magnetic resonance imaging
A completely different form of imaging is MRI. A beam of photons is not
passed through the body, but instead the body is placed in a large magnet
and hydrogen atoms (in the water molecules) line up in the same direction
as the magnetic field. When the magnetic field is disrupted by directing radio
energy into the field, the magnetic orientation of the hydrogen atoms is dis-
rupted. The radio source is switched off and the magnetic orientation of the
hydrogen atoms returns to the original state. The manner (referred to as T1
and T2 relaxation times) and the way in which they return to the original
state produces the image. Essentially, what is being measured is the proton
density per unit volume of imaged material. The actual image looks like an
x-ray, however the principal foundations of MRI are completely different. The
same image processing technology used in CT can be used in MRI to process
the images. For example, thin slices and backward projection methods are
often used to improve MRI image quality. MRI has excellent resolution and
contrast among soft tissue, and displays good anatomical detail.
2.2.5 Nuclear medicine
Nuclear medicine is the joining of nuclear physics, nuclear chemistry, and
radiation detection. A radioactive chemical substance, called a radiopharma-
ceutical, is injected, usually by intravenous (IV), where it concentrates in
33. K11763 Chapter: 2 page: 12 date: June 17, 2011
12 Advanced Bayesian Methods for Medical Test Accuracy
a particular tissue or organ of interest. The substance emits gamma rays,
which are detected by gamma cameras. The gamma camera counts the num-
ber of gamma particles it captures. There are two principal gamma cameras—
positron emission tomography (PET) and single photon emission tomography
(SPECT). Nuclear imaging is often used to view physiological processes. For
example, FDG-PET is often used to measure glucose metabolism, where the
radiopharmaceutical (18) F-florodeoxyglucose is absorbed by every cell in the
body. The higher the observed radioactivity as measured by PET, the higher
the glucose metabolism. In some cancer studies, the malignant lesion has an
increased glucose metabolism compared to the adjacent non-malignant tissue,
thus FDG-PET is useful in the diagnosis and staging of disease. Another area
where nuclear medicine is useful is in cardiac perfusion studies. For example,
radiation therapy of esophageal cancer often induces damage to the heart in
the form of ischemia and scarring. The damage can be assessed by a nuclear
medicine procedure such as an exercise stress test, where thallium is adminis-
tered via IV to the patient and concentrates in the heart muscle; the resulting
radioactivity is counted by SPECT to produce the image. Among the soft
tissues, nuclear medicine procedures have fair to good contrast but poor reso-
lution, and noise can be a problem for image quality. Another very important
instance where MRI is used is to diagnose coronary heart disease via the exer-
cise stress test; if the test is positive, the patient can be referred to coronary
angiography for a more accurate assessment of the disease.
2.2.6 Ultrasound
US is the last modality to be described. It is based on a physical stream
of energy passing through the body to be imaged. The source is a transducer
that converts electrical energy into a brief pulse of high-frequency acoustical
energy to be transmitted into the patient’s tissues. The transducer acts as a
transmitter and receiver. The receiver detects echoes of sound deflected from
the tissues, where the depth of a particular echo is measured by the round
trip time of the transmitted emission. The images are viewed in real time
on a monitor and are produced by interrogating patient tissue in the field
of view. The real-time images are rapidly produced on the monitor, allow-
ing one to view moving tissue such as respiration and cardiac motion. The
US examination consists of applying the US transducer to the patient’s skin
using a water-soluble gel to make the connection secure for good transmission
of the signal. Image quality is adversely affected by bone and gas-filled struc-
tures such as the bowel and lung. For example, bone causes almost complete
absorption of the signal, producing an acoustic shadow on the image that
hides the detail of tissues near the bone, while soft tissue gas-filled objects
produce a complete reflection of sound energy that eliminates visualization of
deep structures. Despite these drawbacks, the mode has many advantages, one
of which is the non-invasive nature of the procedure. US is used to image a
multitude of clinical challenges and is very beneficial when solving a particular
clinical problem, such as viewing the development of the fetus.
34. K11763 Chapter: 2 page: 13 date: June 17, 2011
Medical Tests and Preliminary Information 13
2.2.7 Combined medical images
Various modalities are often combined to improve overall diagnostic accu-
racy. For example, recently, PET and CT have been combined to diagnose and
stage esophageal cancer. When two modalities are combined, one must formu-
late certain rules to decide when the combined procedure is deemed to produce
a “positive” or “negative” determination. In another interesting study, US
and CT were combined and their accuracy compared to FDG-PET. The ideas
involved in measuring the accuracy of combined modalities will be outlined in
the Chapter 10.
It is important to remember that the imaging device does not make the
diagnosis, rather the radiologist and others make the diagnosis! The modality
is an aid to the radiologist and to others who are responsible for the diagnosis.
After the radiologist reads the image, how is this information transformed to
a scale where the biostatistician and others are able to use it for their own
purposes?
For a non-technical introduction to medical imaging tests, Wolbarst [5]
presents a very readable account. In addition, Jawad [6], Chandra [7],
Seeram [8], and Markisz and Aguilia [9] are standard references to cardiac
ultrasound, nuclear medicine, CT, and MRI, respectively.
2.3 Other Medical Tests
2.3.1 Introduction
In order to establish a definitive diagnosis, there are several phases. For
example, a screening mammography might reveal a suspicious lesion, and this
will be followed with a biopsy of the suspected lesion. Many caregivers are
involved in the diagnostic process, and as has been emphasized, diagnos-
tic imaging plays a major role in that effort. However, they are just one of
many groups, including oncologists, surgeons, nurses, pathologists, geneticists,
microbiologists, and many more. The pathologist plays a crucial role in per-
forming the histologic tests on cell specimens taken for biopsy, as does the
microbiologist and geneticist, who are developing new techniques to measure
gene sensitivity from deoxyribo nucleic acid (DNA) specimens, etc. Two exam-
ples are described below: (1) metastasis of the primary melanoma lesion to
the lymph nodes, and (2) biopsy of lung nodules.
2.3.2 Sentinel lymph node biopsy for melanoma
This technique involves the cooperation of a melanoma oncologist, a surgi-
cal team to dissect the lymph nodes, diagnostic radiologists who will perform
the nuclear medicine procedure, and pathologists who will do the histology
of the lymph node samples. The following description of the technique is
35. K11763 Chapter: 2 page: 14 date: June 17, 2011
14 Advanced Bayesian Methods for Medical Test Accuracy
based on Pawlik and Gershenwald [10]. The early procedures are described by
Morton et al. [11] and consist of injecting a blue dye intradermally around the
primary lesion and biopsy site, where the lymphatic system takes up the dye
and carries it, via afferent lymphatics, to the draining regional node basins.
Surgeons then explore the draining nodal basin; the first draining lymph
nodes, the sentinel lymph nodes (SLNs), are identified by their uptake of
blue dye, then dissected and sent to pathology for histological examination of
malignancy.
These early methods were recently revised to include a nuclear medicine
application using a handheld gamma camera. (See Gershenwald et al. [12] for
a good explanation of this.) With this technique, intraoperative mapping
uses a handheld gamma probe, where 0.5–1.0 mCi of a radiopharmaceutical
is injected intradermally around the intact melanoma. The gamma camera
monitors the level of radioactivity from the injection sites to the location of
the SLNs and is also employed to assist the surgeons with the dissection of the
lymph nodes. This probe is used transcutaneously prior to surgery and has
an accuracy of 96–99% in correctly identifying the SLNs. Histological exam-
ination of the lymph node specimens determine if the lymph node basin has
malignant melanoma cells.
2.3.3 Tumor depth to diagnose metastatic melanoma
An SLN biopsy for melanoma metastasis is illustrated with a recent study
by Rousseau et al. [13], where the records of 1376 melanoma patients were
reviewed. The main objective was to diagnose metastasis to the lymph nodes,
where the gold standard is the outcome of the SLN biopsy and the diagnosis is
made on the basis of tumor depth of the primary lesion, the Clark level of the
primary lesion, the age and gender of the patient, the presence of an ulcerated
primary lesion, and the site (axial or extremity) of the primary lesion. The
overall incidence of a positive biopsy was 16.9%, the median age was 51 years,
and 58% were male. A multivariate analysis with logistic regression showed
that tumor thickness and ulceration were highly significant in predicting SLN
status. For additional details about this study, refer to Rousseau et al., but
for the present the focus will be on tumor thickness for the diagnosis of lymph
node metastasis.
How accurate is tumor thickness for the diagnosis of lymph node metas-
tasis? The original measurement of tumor thickness was categorized into four
groups: (1) ≤1 mm, (2) 1.01–2.00 mm, (3) 2.01–4.00 mm, and (4) 4.00 mm.
If groups 3 and 4 are used to designate a positive (lymph node metastasis)
test, and groups 1 and 2 a negative test, the sensitivity and specificity are cal-
culated as 156/234 = 0.666 and 832/1147 = 0.725, respectively. There were
156 patients with a tumor thickness 2 mm among 234 patients with a pos-
itive SLN biopsy, on the other hand, there were 832 patients with a tumor
thickness ≤2 mm among 1147 with a negative SLN biopsy. Also, using the
original continuous measurement and a conventional estimation method, the
36. K11763 Chapter: 2 page: 15 date: June 17, 2011
Medical Tests and Preliminary Information 15
area under the ROC curve is 0.767 with a standard deviation of 0.016. This is
the type of problem that will be studied in the following chapters, but from a
Bayesian perspective.
2.3.4 Interventional radiology: A biopsy for non-small
cell lung cancer
At the MD Anderson Cancer Center (MDACC), the Department of Inter-
ventional Radiology is part of the Division of Diagnostic Imaging, and they
perform invasive biopsy procedures. For example, they perform biopsies of lung
lesions using a CT-guided technique, see Gupta et al. [14]. The Gupta example
described below compared two methods of biopsy, short vs. long needle path,
for target lesions 2 cm in size. The objective is to retrieve a specimen of the
lesion to be examined for malignancy by a cytopathologist.
Many people are involved, including those assisting the interventional radi-
ologist in guiding the needle to the target lesion, which was earlier detected and
located by various imaging modalities. Of main concern is the occurrence of a
pneumothorax, which can result in a collapsed lung and bleeding, sometimes
requiring a chest tube to drain fluid from the chest cavity.
This cohort study included 176 patients, 79 men and 97 women, with an
age range from 18 to 84 years. This was not a randomized study, and patient
information came from all persons who underwent a CT-guided biopsy for
lung nodules during the period from November 1, 2000 to December 31, 2002.
There were two groups: Group A with 48 patients, where the needle path was
1 cm in length of aerated lung; and Group B with 128 patients, where the
needle path length was 1 cm.
The two groups were similar with regard to age, gender, lesion size, and
lesion location, and the major endpoints were diagnostic yield (number of diag-
nostic samples and test accuracy, measured by sensitivity and specificity) and
frequency of pneumothorax. The pathology report served as a gold standard
for test accuracy.
The statistical analysis consisted of estimating the test accuracy of the
two methods and comparing accuracy via the chi-square test. There was no
significant difference between the two groups with regard to sensitivity and
specificity, however, there were significant differences between the two with
regard to complications from the procedure. For example, the pneumothorax
rate of 35/48 = 0.73 was larger for the short needle path group compared to
38/128 = 0.29 for the long needle path group.
As a follow up to this, Gupta et al. [15] recently studied 191 lung biopsy
patients who experienced a pneumothorax. In that study, the principal aim
was to identify those factors that significantly impact the development of a
persistent air leak of the lung.
A conventional statistical analysis was performed for these studies, but
later in this book, we will revisit them with a Bayesian approach for the
analysis.
37. K11763 Chapter: 2 page: 16 date: June 17, 2011
16 Advanced Bayesian Methods for Medical Test Accuracy
2.3.5 Coronary artery disease
A common scenario in the diagnosis of coronary artery disease is: following
complaints of chest pain, the patient undergoes an exercise stress test and, if
necessary, followed by an angiogram, a catheterization of the coronary arter-
ies. There are several experimental studies that involve a CT determination
of the CAC in the coronary arteries. One such study involved 1958 men and
1281 women, who were referred to the Shields Coronary Artery Center in
Spokane, Washington, from January 1990 to May 1998. Some of the subjects
had been diagnosed with coronary artery disease, while others were referred
because they were suspected of having the disease. Measurements of CAC
were made with the Imatron C-100 Ultrafast CT Scanner. In Chapter 4, the
diagnostic accuracy of CAC is examined with a Bayesian technique for this
study.
Another way to diagnose coronary artery disease is to measure the
degree of stenosis in the arteries by magnetic resonance angiography, where
Obuchowski [16] used the results of a study by Masaryk et al. [17] to illus-
trate a non-parametric way of estimating the area under the ROC curve for
clustered data. There were two readers and two measurements per patient,
one for the left and one for the right coronary arteries, and the correlation
introduced by this clustering effect was taken into account by Obuchowski’s
analysis.
2.3.6 Type 2 diabetes
There are several tests for type 2 diabetes, including a random plasma
glucose test, a fasting blood glucose test, and an oral glucose tolerance test.
The first does not require fasting and can be given at any time, even after a
meal. If the amount of glucose is 200 mg/dL, the subject is considered to be
diabetic.
A better method to test for type 2 diabetes is the fasting blood glucose test,
which requires the subject to fast for approximately 8 hours before the test.
The test is usually done in the morning before breakfast, where a blood glucose
level between 70 and 110 mg/dL is considered normal; however, a level between
111 and 125 mg/dL indicates some problems with glucose metabolism. Levels
in excess of 126 mg/dL are usually an indication that the subject has diabetes.
Perhaps if the fasting blood glucose test indicates that the subject has the
disease, the doctor will order an oral glucose tolerance test, which requires
that the subject fast for 10 hours before the test. At baseline, a blood glucose
test is given, then the subject is given a high amount of sugar and the blood
glucose level is measured 30 minutes later, 1 hour later, and 2 and 3 hours later,
thus, there are four measurements taken after baseline. In a person without
diabetes, the glucose level rises immediately after taking the sugar load, but
then falls back to “normal” as insulin is produced. On the other hand, in
diabetics the glucose levels rise higher than normal after drinking the sugar
load. A person is said to have impaired glucose tolerance if the 2-hour level is
38. K11763 Chapter: 2 page: 17 date: June 17, 2011
Medical Tests and Preliminary Information 17
between 140 and 200 mg/dL and is referred to as prediabetes. A person with
a 2-hour level in excess of 200 mg/dL is considered to be diabetic, and one
should seek a physician’s advice in order to treat the disease.
The fasting blood glucose test and the glucose tolerance test will be con-
sidered several times in later chapters as an illustration for estimating the
accuracy of medical tests.
2.3.7 Other medical tests
Johnson, Sandmire, and Klein [18] should be read for additional informa-
tion about the accuracy of medical tests. Of course, there are many other
medical tests that can be presented, but for the present, those for human
immunodeficiency virus (HIV), prostate, and ovarian cancer will be described.
2.3.7.1 Tests for HIV
There are several tests for HIV, including enzyme linked immunosorbent
assay (ELISA) and oral tests.
The ELISA test is the most commonly used test to look for HIV anti-
bodies and if present, a confirmatory test called the Western blot analysis
is done. Once an antibody test shows that the subject has been exposed to
HIV, a plasma viral load (PVL) test can be performed and will often be
ordered to measure the amount of HIV virus in the blood. Three different PVL
tests are commonly used: the reverse transcription polymerase chain reaction
(RT-PCR) the branched DNA (bDNA), and the nucleic acid sequence-based
amplification (NASBA) test. All these tests work well and measure the same
thing, the amount of HIV virus in the blood, but they can differ in the recorded
amounts, thus, one test should be used throughout the treatment for the dis-
ease. It is comforting to know that the risk of a false positive with ELISA
is quite low. Note that several tests are administered in order to diagnose
the disease and their accuracy plays an important role both for diagnosis and
treatment.
If you are at a high risk for HIV and you have a negative ELISA, the test
should be repeated every 6 months. False negatives using RT-PCR are also
rare because of prior testing using ELISA.
2.3.7.2 Tests for ovarian cancer
The carcinogenic antigen (CA) 125 blood test measures the levels of a pro-
tein that is normally confined to the cell wall, but if the wall is inflamed or
damaged, the protein may be released into the blood stream. Ovarian cancer
cells may produce an excess of these protein molecules, thus a test involving
CA 125 can help in the diagnosis and monitoring of the disease. It is important
to remember that basing the diagnosis of ovarian cancer only on CA 125 is
prone to error because the levels of CA 125 are not present in the early stages
of the disease and false positives can occur. Used together with transvaginal
39. K11763 Chapter: 2 page: 18 date: June 17, 2011
18 Advanced Bayesian Methods for Medical Test Accuracy
ultrasound, CA 125 can be quite effective in detecting the disease. A transvagi-
nal ultrasound involves the use of sound waves to delineate internal structures
with a transducer placed in the vagina. An example is given later in the book
using CA 125 to detect ovarian cancer.
2.3.7.3 Prostate-specific antigen test for prostate cancer
Prostate specific antigen (PSA), discovered in 1979, is a protein produced
by the cells that line the inside of the prostate gland. The cancer causes cell
changes to the cellular barriers that normally keep PSA within the ductal
system of the gland, and PSA is released into the blood stream in higher than
normal quantities. The total PSA test measures the total amount of PSA in
the blood, where the results are given in nanograms per millimeter and a
level in excess of 4 ng/mL is considered a possible sign of prostate cancer.
The total PSA test and the digital rectal examination are considered the
first line of defense against the disease and if suspicious findings are found
in either examination, follow-up tests are ordered, including the percent-free
PSA test and transrectal prostate ultrasound. Medical tests involving PSA
and transrectal ultrasound will be presented in later chapters. The percent-free
PSA test is mainly used as a follow-up test when the total PSA is found in the
gray area, between 4 and 9.9 ng/mL, to help determine who should undergo a
biopsy of the prostate. Currently, a biopsy is ordered if the percent-free PSA
level is 25%. The PSA test has problems with accuracy, where only 15–25%
of men who have elevated levels of total PSA in excess of 4 ng/dL develop
prostate cancer. Also, 30% of men who have prostate cancer have normal PSA
levels!
2.3.7.4 Bacterial infection with Strongyloides
Strongyloides is an infectious organism that affects certain groups and
is used as an example in this book where a gold standard is not available.
A group of Cambodian refugees immigrating to Canada is tested for the dis-
ease with two medical tests, a serology test and a test based on a stool example.
This example relies on prior information about the accuracy of the two tests,
and Bayesian inference is used to correct the observed accuracy of the two
tests.
2.3.7.5 Tuberculosis
Another case of infectious disease used in this book is a study of two tests
to diagnose tuberculosis, at two different sites, the first is a southern school
district in the 1940s and the second is a tuberculosis sanatorium. The scenario
is a case when there is no gold standard, and Bayesian methods are used to
correct the accuracy of the two tests, namely, the Mantour and Tine tests,
where the Mantour test is based on a sputum sample, and the Tine is a tuber-
culin skin test.
40. K11763 Chapter: 2 page: 19 date: June 17, 2011
Medical Tests and Preliminary Information 19
2.4 Activities Involved in Medical Testing
As stated earlier, medical tests are ubiquitous in the health care system.
These activities will be divided generally into two categories: (1) screening
for preclinical disease, such as breast cancer, heart disease, or lung cancer;
and (2) as part of patient management during the patient’s stay in a large,
modern health care facility. The emphasis in this book will be on the latter,
where the patient has been diagnosed with the help of imaging, and are then
followed and monitored during their stay in the hospital. During the patient’s
stay, the following imaging activities are usually involved: primary diagnosis or
confirmation of earlier diagnoses, diagnostic imaging to determine the extent
of disease including biopsy procedures, so-called staging studies, and follow-up
medical procedures, such as surgery for biopsy or other forms of therapy, and
monitoring the progression of the disease during therapy, such as in Phase II
clinical trials.
Screening is performed to detect disease in the early phase, before symp-
toms appear. The main objective of screening is the early detection of disease
when treatment is more effective and less expensive. It is assumed that early
detection will lead to a more favorable diagnosis, and that early treatment
will be more effective than treatment given after symptoms appear. Another
important goal of screening is to identify risk factors that would predispose
the subject to a higher than average risk of developing disease. Imaging is
almost always involved in the diagnosis of disease, but mammography is the
only examination in wide use today as a screening tool. There are some other
areas where screening is being tested, namely, in lung cancer with multide-
tector CT, and in the detection of colorectal adenomatous polyps. One of the
most important and difficult problems in clinical medicine is making recom-
mendations for imaging studies for disease screening.
Screening should only be performed if the disease is serious and in the pre-
clinical phase, and on a population that is at relatively high risk for developing
the disease. Screening would not be effective if the disease can be treated effec-
tively after the appearance of symptoms. If a false positive occurs, the patient
is subjected to unnecessary follow-up procedures, such as surgery, additional
imaging, and pathological testing for extent of disease.
A medical test like mammography is efficacious only if it is accurate, if it
has good diagnostic characteristics like high sensitivity, specificity, and posi-
tive predictive value, and if a survival advantage can be demonstrated. How
should a study be designed in order to evaluate the effectiveness of an imaging
screening procedure? Of course, randomized studies have an advantage and
are the basis for a recent paper by Shen et al. [19], who reported on the survival
advantage of screening detected cases over control groups. This investigation
used data from three randomized studies with a total of 65,170 patients, and
it used Cox regression techniques to control for the so-called lead time bias
(detection of early stage disease with screening), tumor size, stage of disease,
41. K11763 Chapter: 2 page: 20 date: June 17, 2011
20 Advanced Bayesian Methods for Medical Test Accuracy
lymph node status, and age. They conclude that mammography screening is
indeed effective. For additional information on the advantages of mammogra-
phy, see Berry et al. [20]. For recent Bayesian contributions to the estimation
of sensitivity and lead time in mammography, see Wu, Rosner, and Broemeling
[21, 22].
The whole area of diagnostic screening has a voluminous literature. This
book will not focus on screening and the reader is referred to Shen et al. [19],
who cite the most relevant studies.
2.5 Accuracy and Agreement
How good is a diagnostic procedure? For example, suppose one is using
mammography to diagnose breast cancer, then how well does it correctly clas-
sify patients who have disease and those who do not have disease? Among
those patients who have been classified with disease, what proportion actually
have it? And, among those who were designated without disease, how many
actually do not have it? To answer these questions, one must have a gold
standard by which the true status of disease is determined. Thus, the gold
standard will divide the patients into two groups: those with and those with-
out the disease.
Another question is how does the radiologist decide when to classify an
image as showing a malignant lesion? Often a confidence level scale is used,
where 1 designates definitely no malignancy, 2 probably no malignancy, 3 inde-
terminate, 4 probably a malignant lesion, and 5 definitely a malignant lesion.
Given this diagnostic ordinal scale, how does the reader decide when to
designate a patient as diseased? In the case of mammography, a score of 4
or 5 is often used to classify a patient as having the disease, in which case
each image can be classified as either: (1) a true positive, (2) a true negative,
(3) a false positive, and (4) a false negative. Of course, these four possibilities
can only be used if one knows the true status of the disease as given by the
gold standard. Given these four outcomes, one may estimate the accuracy of
the procedure with the usual measures of sensitivity, specificity, and positive
and negative predictive values. For example, the specificity is estimated as the
proportion of patients who test negative, among those that do have the dis-
ease. There are many statistical methods to estimate test accuracy and these
will be explained in detail in Chapter 4. The idea of the area under the ROC
will be explained and many examples introduced to demonstrate its use as an
overall measure of test accuracy.
Other factors that need to be taken into account are: (1) the design of
the study, (2) the gold standard and how it is utilized, and (3) the variability
among and between observers and the input of others involved in diagnostic
decisions.
42. K11763 Chapter: 2 page: 21 date: June 17, 2011
Medical Tests and Preliminary Information 21
With regard to the design, several questions must be asked: How are the
patients selected? Is one group of patients selected at random from some pop-
ulation, or are two groups of patients, diseased and non diseased, selected? Or
are they selected from patient charts, such as in a retrospective review? Along
with this is the nature of the population from which the patients are selected.
Is it a screening population, a community clinic, or a group of patients under-
going biopsy? These factors all affect the final determination of the accuracy
as well as what biases will be introduced.
The gold standard often depends on surgery for biopsy, the pathology
report from the laboratory, and additional imaging procedures. When and
how the gold standard is used, frequently depends on the results of the diag-
nostic test. Often, only those who test positive for disease are subjected to the
gold standard, while those that test negative are not. For example, with mam-
mography those that test positive are tested further with biopsy and tests for
histology. While among those that test negative, follow up of patient status is
the gold standard.
Lastly, with regard to reader variability, it is important to remember that
the medical test is an aid for the people who make the diagnosis, and that the
diagnosis is made by a group (e.g., cardiologists, oncologists, surgeons, radi-
ologists, and pathologists). All of this introduces variability and error into
the final determination of disease status. Is agreement between and among
observers (radiologists, pathologists, surgeons, etc.) an important component
of diagnostic medicine? Of course it is, for suppose a Phase II clinical trial
is being conducted to determine the efficacy of new treatment for advanced
prostate cancer with, say, 35 patients. The major endpoint is tumor response
to therapy, which is based on the change in tumor size from baseline to some
future time point. Often, the percentage change from baseline is used and,
furthermore, this determination depends on the readings of the same images
by several radiologists. Since they differ in regard to training and experience,
their determination of the percentage change varies from reader to reader.
How is this taken into account? How is a consensus reached?
Statistical methods that take into account and measure agreement are
well developed. For example, with ordinal test scores, agreement between
observers is often measured by the Kappa statistic, while if the test score
is continuous, regression techniques for calibration (e.g., Bland-Altman) are
frequently done to assess accuracy within and between observers. Analysis
of variance techniques that account for various sources (patients, readers,
modalities, replications, etc.) of variability help in estimating the between
and within reader variability, via the intra class correlation coefficient. In
Chapters 4 and 5, test accuracy and agreement between observers will be
revealed in detail. See Broemeling [23] for a Bayesian approach to the study of
agreement.
Kundel and Polansky [24] give a brief introduction to the various issues
concerning the measurement of agreement between observers in diagnostic
imaging, and Shoukri [25] has an excellent book on the subject.
43. K11763 Chapter: 2 page: 22 date: June 17, 2011
22 Advanced Bayesian Methods for Medical Test Accuracy
2.6 Developmental Trials for Medical Devices
When developing a new imaging modality, the test must pass three phases
labeled I, II, and III. This is similar to the designation for patient clinical
trials, but what is being referred to here is the development of medical devices.
The different phases are for different objectives of test accuracy and are as
follows.
Phase I trials are exploratory and are usually retrospective with 10–50
patients and 2–3 readers. There are two populations, a homogenous group of
diseased subjects who are definitely known to have the disease, and a second
group of homogenous people who are definitely known not to have the disease.
The key word here is homogenous, where the manifestations of the disease
are more or less the same among diseased patients, while among the non
diseased, their health status is the same. The accuracy is measured by true
positive and false positive rates, as well as the area under the ROC curve.
Thus, if the accuracy is not good, the modality needs to be improved. See
Bogaert et al. [26] for a good example of Phase I developmental trial involving
MRI angiography.
If a device has sufficient accuracy during Phase I, it is studied as a Phase II
trial, and is called a challenge trial, with 50–200 cases and 5–10 observers.
They are also retrospective, but with a wide spectrum of the disease in the
two groups. Thus, if the disease is, say, non-small cell lung cancer, patients
with different manifestations (different ages, different stages of disease, and
patients who have disease similar to non-small cell lung cancer) of disease
are included. Thus, it is more difficult for the device to distinguish between
diseased and non-diseased subjects. Among the non diseased, the patients are
also heterogeneous. Test accuracy is measured as in a Phase I trial, and the
association between accuracy and the pathological, clinical, and co-morbid
features of the patient can be investigated with regression modeling. A com-
parison between digital radiography and conventional chest imaging was per-
formed as a Phase II trial by Theate et al. [27].
Beam, Lyde, and Sullivan [28] investigated the interpretation of screening
mammograms as a Phase III trial using 108 readers, 79 images read twice by
each reader, and many health care centers. The sensitivity ranged from 0.47
to 1 and specificity from 0.36 to 0.99 across the readers. Phase III trials are
prospective and are designed to estimate test performance in a well-defined
clinical population and involve at least 10 observers, several hundred cases, and
competing modalities. A device should pass all three phases before becoming
standard in a general clinical setting.
Note that it is important to know the inter observer variability in these
trials, because the accuracy of the modality depends not only on the device,
but also the interpretation of the image via the various readers. In Chapter 8,
Pepe [29] provides a more detailed description of developmental trials, and
44. K11763 Chapter: 2 page: 23 date: June 17, 2011
Medical Tests and Preliminary Information 23
Obuchowski [30] provides sample size tables for the number of observers and
the number of patients in trials for device development.
2.7 Literature
As mentioned earlier, biostatistics plays a pivotal role in the imaging liter-
ature, as can be discerned by reading papers in the mainline journals, such as
Academic Radiology, The American Journal of Roentgenology, and Radiology,
and the more specialized journals, such as The Journal of Computed Assisted
Tomography, The Journal of Magnetic Resonance Imaging, The Journal of
Nuclear Medicine, and Ultrasound in Medicine. For non-imaging studies, the
journal Pathology provides many examples of studies for medical test accuracy.
For some reference books in the area of general diagnostic imaging, the
standard one is Fundamentals of Diagnostic Radiology (1999 Second Edi-
tion), edited by Brant and Helms [31]. Both references are for radiologists
and give the fundamentals of imaging principals plus a description of the lat-
est clinical applications. For some good general information for the patient,
Johnson, Sandmire, and Klein [18] describe medical tests for a large number
of diseases, including those for cancer, stroke, heart disease, diabetes, and
infectious diseases.
Two statistical books are relevant: The Statistical Evaluation of Medical
Tests for Classification and Prediction by Pepe [29], and Statistical Methods
in Diagnostic Medicine by Zhou, McClish, and Obuchowski [32]. Both are
excellent and are intended for biostatisticians.
References
[1] Mielke, C.H., Shields, J.P., and Broemeling, L.D. Coronary artery cal-
cium, coronary artery disease, and diabetes. Diabetes Research and Clin-
ical Practice, 53:55, 2001.
[2] Mielke, C.H., Shields, J.P., and Broemeling, L.D. Risk factors and coro-
nary artery disease for asymptomatic women using electron beam com-
puted tomography. Journal of Cardiovascular Risk, 8:81, 2001.
[3] DasGupta, N., Xie, P., Cheney, M.O., Broemeling, L., and Mielke, C.H.
The Spokane heart study: Weibull regression and coronary artery disease.
Communications in Statistics, 29:747, 2000.
45. K11763 Chapter: 2 page: 24 date: June 17, 2011
24 Advanced Bayesian Methods for Medical Test Accuracy
[4] Broemeling, L.D. and Mielke, C.H. Coronary risk assessment in women.
The Lancet, 354:426, 1999.
[5] Wolbarst, A.B. Looking Within: How X-ray, CT, MRI, and Ultrasound
and Other Medical Images are Created and How They Help Physicians
Save Lives. University of California Press, Berkeley, 1999.
[6] Jawad, I.J. A Practical Guide to Echocardiography and Cardiac Doppler
Ultrasound (2nd ed.). Little, Brown Co., Boston, 1996.
[7] Chandra, R. Nuclear Medicine Physics: The Basics. Williams Wilkins,
Baltimore, MD, 1998.
[8] Seeram, E. Computed Tomography, Clinical Applications, and Quality
Control (2nd ed.). W.B. Saunders Company, Philadelphia, 2001.
[9] Markisz, J.A. and Aguilia, M. Technical Magnetic Imaging. Appleton
Lange, Stamford, CT, 1996.
[10] Pawlik, T.M. and Gershenwald, J.E. Sentinel lymph node biopsy for
melanoma. Contemporary Surgery, 61(4):175, 2005.
[11] Morton, D.L., Wanek, L., Nizze, J.A., Elashoff, R.M., and Wong, J.H.
Improved long term survival lymphadenectomy of melanoma metastatic
to regional lymph nodes: Analysis of prognostic factors in 1134 patients
from the John Wayne Cancer Center Institute. Annals of Surgery,
214:491, 1991.
[12] Gershenwald, J.E., Tseng, C.H., Thompson, W., Mansfield, P., Lee, J.E.,
Bouvet, M., Lee, J.J., and Ross, M.I. Improved sentinel lymph node local-
ization in patients with primary melanoma with the use of radiolabeled
colloid. Surgery, 124:203, 1998.
[13] Rousseau, D.L., Ross, M.I., Johnson, M.M., Prieto, V.G., Lee, J.E.,
Mansfield, P.F., and Gershenwald, J.E. Revised American Joint Com-
mittee on Cancer staging criteria accurately predict sentinel lymph node
positivity in clinically node negative melanoma patients. Annals of Sur-
gical Oncology, 10(5):569, 2003.
[14] Gupta, S., Krishnamurth, S., Broemeling, L.D., Morello, F.A., Wallace,
M.J., Ahrar, K., Madoff, D.L., Murthy, R., and Hicks, M.E. Small
(2 cm) subpleural pulmonary lesions; short versus long needle path,
CT-guided biopsy: Comparison of diagnostic yields and complications.
Radiology, 234:631, 2005.
[15] Gupta, S., Kobayashi, S., Phongkitkarun, S., Broemeling, L.D., and Kun,
S. Effect of trans catheter hepatic arterial embolization on angiogenesis
in an animal model. Investigative Radiology, 41(6):516, 2006.
46. K11763 Chapter: 2 page: 25 date: June 17, 2011
Medical Tests and Preliminary Information 25
[16] Obuchowski, N.A. Non parametric analysis of clustered ROC curve data.
Biometrics, 53:567, 1997.
[17] Masaryk, A.M., Ross, J.S., DiCello, M.C., Modic, M.T., Paranandi, L.,
and Masaryk, T.J. Angiography of the carotid bifurcation: Potential and
limitations as a screening examination. Radiology, 121:337, 1991.
[18] Johnson, D., Sandmire, D., and Klein, D. Medical Tests that Can Save
Your Life: 21 Tests Your Doctor Won’t Order. . . Unless You Know to
Ask. Rodale, New York, 2004.
[19] Shen, Y., Inoue, L.Y.T., Munsell, M.F., Miller, A.B., and Berry, D.A.
Role of detection method in predicting breast cancer survival: analysis
of randomized screening trials. Journal of the National Cancer Institute,
97:1195, 2005.
[20] Berry, D.A., Cronin, K.A., and Plevritis, S.K. Effect of screening and
adjuvant therapy on mortality from breast cancer. The New England
Journal of Medicine, 353(17):1784, 2005.
[21] Wu, D., Rosner, G., and Broemeling, L.D. MLE and Bayesian inferences
of age-dependent sensitivity and transition probability in periodic screen-
ing. Biometrics, 61:1056, 2005.
[22] Wu, D., Rosner, G., and Broemeling, L.D. Bayesian inference for the lead
time in periodic cancer screening. Biometrics, 63:873, 2005.
[23] Broemeling, L.D. Bayesian Methods for Measures of Agreement. Chap-
man Hall/CRC, Boca Raton, 2010.
[24] Kundel, H.L. and Polansky, M. Measure of observer agreement. Radiol-
ogy, 228:303, 2003.
[25] Shoukri, M.M. Measures of Interobserver Agreement. Chapman Hall/
CRC, Boca Raton, 2002.
[26] Bogaert, J., Kuzo, R., Dymarkowski, S., Becke, R., Piessens, J., and
Rademakers, F.E. Coronary artery imaging with real-time navigator
three dimensional turbo field echo MR coronary angiography: Initial
experience. Radiology, 226:707, 2003.
[27] Thaete, F.L., Fuhrman, C.R., Oliver, J.H., Britton, C.A., Campbell,
W.L., Feist, J.H., Staub, W.H., Davis, P.L., and Plunkett, M.B. Dig-
ital radiography and conventional imaging of the chest: a comparison
of observer performance. American Journal of Roentgeneology, 162:575,
1994.
[28] Beam, C.A., Lyde, P.M., and Sullivan, D.C. Variability in the interpreta-
tion of screening mammograms by US radiologists. Archives of Internal
Medicine, 156:209, 1996.
47. K11763 Chapter: 2 page: 26 date: June 17, 2011
26 Advanced Bayesian Methods for Medical Test Accuracy
[29] Pepe, M.S. The Statistical Evaluation of Medical Tests for Classification
and Prediction. Oxford University Press, Oxford, UK, 2003.
[30] Obuchowski, N.A. Sample size tables for receiver operating characteristic
studies. American Journal of Roentgenology, 175:603, 2000.
[31] Brant, W.E. and Helms, C.A. Fundamentals of Diagnostic Imaging
(2nd ed.). Lippincott, Williams Wilkins, New York, 1999.
[32] Zhou, H.H., McClish, D.K., and Obuchowski, N.A. 2002. Statistical Meth-
ods for Diagnostic Medicine. John Wiley, New York, 2002.
48. Chapter 3
Preview of the Book
3.1 Introduction
This chapter should give the reader a good idea of what this book is about.
In one sentence, this book introduces the reader to the design and analysis
of medical test accuracy, with emphasis on a Bayesian analysis. A Bayesian
approach is taken where the foundation is based on Bayes theorem, and all
inferences are expressed as posterior distributions of the relevant parameters.
WinBUGS is the software that will execute Bayesian inferences for medical
test accuracy and the associated code is labeled in the book and also appears
on the author’s blog. In what follows, I will carefully describe the contents of
each chapter, so that the reader will know what to expect.
3.2 Preliminary Information
The first three chapters present the preliminary information necessary for
the reader to understand the importance of knowing the accuracy of a medical
test.
3.2.1 Chapter 1: Introduction
The chapter begins with a short introduction previewing the chapter, fol-
lowed by a very brief introduction to the indicators of accuracy, including
the four basic measures: true positive fraction (TPF), false positive fraction
(FPF), positive predictive value, and negative predictive value. Such mea-
sures are applicable if the test scores are binary or if the scores have been
dichotomized with a cutoff value. The area under the receiver operating char-
acteristic (ROC) curve is described as a measure of overall accuracy for med-
ical tests that have ordinal or continuous scores. The next part of the chapter
explains the various datasets that are used for the examples. For example,
some of the datasets in the book by Pepe [1] will be used, as will some exam-
ples from the book by Zhou, McClish, and Obuchowski [2]. Also included for
analysis is information that the author obtained while consulting at the Uni-
versity of Texas MD Anderson Cancer Center (MDACC). The information
27
K11763 Chapter: 3 page: 27 date: June 17, 2011
49. K11763 Chapter: 3 page: 28 date: June 17, 2011
28 Advanced Bayesian Methods for Medical Test Accuracy
quite valuable and contains many examples of imaging studies for cancer,
including studies involving x-ray, computed tomography (CT), magnetic reso-
nance imaging (MRI), nuclear medicine, and ultrasound. The various forms of
cancer include breast, prostate, lung, ovarian, etc., and will give the reader a
good idea of the important role played by the accuracy of a particular medical
test. Other sources used in the book are papers appearing in the Journal of
Radiology, with an emphasis on procedures that combine two or more tests.
The software employed in this book is WinBUGS and is most appropriate
for our purposes of expressing accuracy inferences via the posterior distribu-
tion of the appropriate parameter. Inference is expressed by computing the
posterior mean, median, standard deviation, and the lower and upper 2 1/2
percentiles of the posterior distribution. WinBUGS generates samples from
the posterior distribution, via Markov Chain Monte Carlo (MCMC), where the
simulation sample size can be adjusted by referring to the MCMC error. The
reader is expected to have some knowledge of Bayesian inference, but a brief
introduction is presented and some history from Bayes to the present day is
given.
3.2.2 Chapter 2: Medical tests
and preliminary information
Knowing the various medical tests used in health care is essential to
understanding the value of medical test accuracy, and this chapter gives brief
descriptions of several medical devices. First to be considered are the stan-
dard imaging tests found in the diagnostic radiology department of a modern
hospital and include descriptions of x-ray, CT, mammography, MRI, nuclear
medicine, and ultrasonography (US). Sometimes, more than one test is used to
give a better picture of the extent of the disease, e.g., MRI and CT to monitor
lung cancer patients. All the tests mentioned are used to diagnose and monitor
cancer patients, however, they are also used to diagnose and monitor heart
disease and other maladies. Next to be portrayed are some specialized tests
for cancer, including nuclear medicine procedures for detecting metastasis of
melanoma from the primary tumor to the lymph nodes. Another diagnostic
test used for melanoma metastasis is using the depth of the primary tumor.
Switching from cancer to other diseases, the use of CT for screening and
monitoring coronary heart disease is characterized. There are many tests for
diagnosing heart disease, including the exercise stress test, followed if neces-
sary by coronary angiography, but a promising CT test measures the amount
of calcium in the coronary arteries. The advantage of the CT test is that it is
safer than the stress test or coronary angiography and is in the experimental
stage in order to assess its accuracy. Type 2 diabetes is becoming more of
a problem and is diagnosed by the fasting blood glucose test and the blood
glucose tolerance test. Both these blood tests are explained in Chapter 2, and
will be used in a later chapter as a way to combine two tests to achieve better
accuracy. The remaining medical tests to be portrayed are the enzyme linked
50. K11763 Chapter: 3 page: 29 date: June 17, 2011
Preview of the Book 29
immunosorbent assay (ELISA) test to detect antibodies for human immuno-
deficiency virus (HIV), the biomarker CA 125 test to detect ovarian cancer,
and the prostate-specific antigen (PSA) biomarker to diagnose prostate cancer.
The chapter continues by characterizing the interplay between agreement
and medical test accuracy. It is important to remember that several people
are sometimes involved in interpreting the output of a medical test. With the
aid of medical test(s), several health care workers use the medical test output
to give a diagnosis or to monitor the progress of the patient under treatment,
thus agreement or disagreement between readers is present in the treatment
of the patient. Agreement among readers will be explicated in more detail in
Chapter 6, but is given a brief introduction in Chapter 2.
Developmental trials for medical devices, including medical tests, are
briefly explained at the end of the chapter. In this part of the book, the design
aspects of the subject are mentioned for the first time, where a promising med-
ical device is first examined with two different populations, a population with
the disease and the other without the disease. Under such conditions, the test,
if it has any accuracy, should be able to discriminate between the two popu-
lations. If the test passes the Phase I trial, it is subject to a more stringent
challenge involving many readers and institutions. This part of the chapter
describes in detail Phase I, II, and III studies for medical devices.
3.2.3 Chapter 3: Preview of the book
This chapter gives a preview of the book.
3.3 Fundamentals of Test Accuracy
Chapters 4 through 6 present the basics for understanding the measure-
ment of medical test accuracy, with Chapter 4 describing the four fundamental
indicators: the TPF and the FPF, and the positive and negative predictive
values. Chapter 5 is largely devoted to regression techniques for incorporat-
ing covariate information, while Chapter 6 stresses the study of agreement
between several readers who are interpreting the output of medical tests. How
does agreement or disagreement between readers affect the accuracy of a med-
ical test?
3.3.1 Chapter 4: Fundamentals of medical
test accuracy
The chapter begins with an introduction to the design of a study to
measure the accuracy of a medical device by outlining the components that
are necessary for implementing the study, where the components of a good
design are listed as: objectives, background, patient and reader selection, study
51. K11763 Chapter: 3 page: 30 date: June 17, 2011
30 Advanced Bayesian Methods for Medical Test Accuracy
design, number of patients, statistical design and analysis, and, lastly, a sec-
tion for the reference of the study.
Next, a description of the four fundamental indicators of test accuracy for
binary test scores is given, where the basic theory is presented, followed by a
WinBUGS program that illustrates the estimation of test accuracy. The four
indicators are the so-called classification probabilities, namely, the true and
false positive fractions. This is followed by the positive and negative predictive
values that are of interest to the patient, and the four indicators are estimated
by an example using the exercise stress test to diagnose coronary artery dis-
ease. The Bayesian analysis is executed with BUGS CODE 4.1 using 45,000
observations, a burn in of 5,000 and a refresh of 100, and the results consist
of the posterior characteristics for the four indicators and a graph of their
posterior densities. The Bayesian approach is continued by defining the area
under the ROC curve and illustrated with an example of mammography, where
the test scores are ordinal: 1 indicating positively no evidence of malignancy;
2 indicating there is very little evidence of malignancy; 3 implying an ambigu-
ous situation for scoring the lesion malignant; 4 indicating some evidence of
malignancy; and 5 indicating that the lesion is definitely malignant. There are
30 patients with the disease and 30 without, and the analysis is executed with
BUGS CODE 4.2. Remember, the code is listed in the book and on the
author’s blog and is easily accessible to the reader. The ROC area is also illus-
trated with the Shields Heart Study, which uses CT to measure the extent of
coronary artery disease. The Bayesian methods for ordinal scores are devel-
oped by the author and appear to be unique.
The chapter continues with an interesting generalization of the ROC area
when the scores are ordinal, and portrays the case when the scores are clus-
tered, which is the case for mammography, that is, the image is partitioned into
several regions and the radiologist assigns a score from 1 to 5 to each region of
the mammogram. In this scenario, one would expect the scores to be correlated
and the chapter presents the theory and illustrates the idea with an example
taken from Zhou, McClish, and Obuchowski [2: 134] involving mammography,
where the Bayesian analysis is executed with BUGS CODE 4.4.
With ordinal scores, the subject is expanded to include a comparison
between the accuracies of two medical tests to diagnose the same disease,
and is illustrated with CT and MRI to detect lung cancer. The two tests are
compared based on their ROC areas, and the Bayesian analysis is executed
with BUGS CODE 4.6; it is noted that the design is paired in that both tests
are administered to the same patients. The chapter concludes by estimating
accuracy with ROC areas for tests with continuous scores and comparing two
tests via their ROC areas.
3.3.2 Chapter 5: Regression and medical test accuracy
This chapter deals with patient covariate information that can be accounted
for in the estimation of test accuracy. Regression techniques for ordinal and
52. K11763 Chapter: 3 page: 31 date: June 17, 2011
Preview of the Book 31
continuous test scores are considered, and the chapter begins with an example
from audiology, where the subject’s covariate information is accounted for in
estimating the true and false positive fractions. An example from an audiology
study test is considered, where the accuracy of the test that is supposed to
detect impaired hearing is taken from Pepe [1], where the dependent variable is
the false positive rate and the covariates are the age of the patient, the version
of the test, and the location where the test is given. This example is analyzed
using two link functions, the first is a log link and the second is a logistic link,
and the analysis is based on BUGS CODE 5.1 and 5.2, the former for the log
link and the latter for the logistic link. This example is continued by estimating
the positive diagnostic likelihood ratios using the same patient covariates. Next
to be considered is using patient covariates to estimate the area under the ROC
curve with an ordinal regression model formulated by Congdon [3: 108] and
illustrated with an example of a clinical trial that measures tumor response
to two therapies. The example is from Holtbrugge and Schumacher [4] and is
executed with BUGS CODE 5.4. Another example using ordinal regression
is a staging study for metastasis of melanoma and involves four radiologists
who all see the same information on the same patients.
When the test scores are continuous and normally distributed, the Bayesian
regression approach of O’Malley et al. [5] allows covariate information for
estimating the ROC area and is illustrated with an example from Pepe [1],
involving screening for prostate cancer, where the test scores are the total PSA
values. BUGS CODE 5.6 is executed to produce a posterior analysis where
the patient covariate is age, resulting in a posterior mean of 0.80 for the area.
The chapter concludes with another example with continuous scores using
yet another audiology example. There are 17 exercises that give the student
additional valuable information about the Bayesian analysis that estimates
test accuracy with the aid of regression models for ordinal and continuous
observations. When studying the exercises, remember to download the code
and data from the blog: http://medtestacc.blogspot.com.
3.3.3 Chapter 6: Agreement and test accuracy
Several readers are usually involved in interpreting the results of a medi-
cal test, and this chapter emphasizes how they affect the overall accuracy of
the test. Recall the melanoma example of Chapter 4, where four readers were
scoring the degree of metastasis of the disease. Since the readers are viewing
the same images, one would expect correlation between the reader scores and
their results to be similar, however, some readers may have more experience
than others, a factor that introduces additional variability to the determina-
tion of test accuracy. The ROC area estimates the accuracy of the test, one
for each reader, but which areas do we use? All four are reported, but should
one employ some type of summary of the four areas?
The first case to be considered is the melanoma metastasis example of
Chapter 5, and the four ROC areas are estimated with a Bayesian approach
53. K11763 Chapter: 3 page: 32 date: June 17, 2011
32 Advanced Bayesian Methods for Medical Test Accuracy
using BUGS CODE 6.1 with 65,000 observations generated from the poste-
rior distribution. It turns out that the four estimated areas varied from a low
of 0.64, estimated by reader 3, to a high of 0.80 for reader 2. On the contrary,
a second example involving the blood glucose test for type 2 diabetes with
three readers, revealed very little difference in the posterior means of the ROC
areas. The latter case involves a continuous score and the O’Malley et al. [5]
method of estimating the ROC curve, and is continued by expanding the anal-
ysis to include patient age and gender as covariates. A Bayesian analysis based
on BUGS CODE 6.2 estimates a summary ROC area with a weighted mean,
where the posterior mean area of each reader is weighted by the inverse of the
posterior variance. The unweighted mean is also computed as 0.8162(0.0130),
which compares to the weighted mean of 0.991(0.0022).
A gold standard is present for the above scenarios, and the chapter con-
tinues by considering the case when no gold standard is available, and brings
the standard approach to estimating the agreement between the readers. Of
course, if the gold standard is present, the readers can be compared on the
basis of the ROC areas, but when the gold standard is not available, how
should agreement be estimated?
There is a long history of statistical agreement based mostly on the Kappa
coefficient, and that approach will be taken for the remainder of Chapter 6.
The Kappa coefficient is defined and the Bayesian approach to the index is
described and illustrated with an example for nominal scores using an example
from Von Eye and Mun [6: 12]. The example consists of a 3 × 3 table with two
psychiatrists, who are assigning scores that express the degree of depression in
each of 129 patients, where the three scores are defined as: 1 = not depressed,
2 = mildly depressed, and 3 = clinically depressed. The Bayesian analysis is
run with BUGS CODE 6.3, using 25,000 observations for the simulation and is
available on the author’s blog: http://medtestacc.blogspot.com. Also reported
is the density of the posterior distribution of conditional Kappa.
Various generalizations of Kappa are presented in the remainder of
Chapter 6, including Kappa and stratification where a hypothetical exam-
ple portrays the essential components for estimating agreement. Suppose that
the agreement between x-ray and CT is estimated, where the study is con-
ducted at three different sites and calls for a total enrollment of 2500 subjects,
with 1000 each at two sites and 500 patients at a third site. Our objective is
to estimate the overall agreement between the two devices, using a weighted
Kappa where the weights are the inverse of the posterior variance of Kappa
for a particular site. In this case, there is good agreement at each site, conse-
quently the weighted Kappa is very close to the simple average of the posterior
mean of the three Kappas.
Chapter 6 continues with various generalizations of Kappa, including an
explanation of the Bayesian analysis for the so-called intraclass Kappa. The
situation is similar to that of a one-way layout with c groups and an unequal
number of binary observations in the various groups; observations between
different groups are assumed to be independent. A crucial assumption is that
54. K11763 Chapter: 3 page: 33 date: June 17, 2011
Preview of the Book 33
each binary observation has the same probability of being “1.” The Bayesian
theory is described to estimate the intraclass Kappa, which is the common
correlation between the binary observations in the same group. The intraclass
correlation is estimated for an interesting example of three groups, where the
“subjects” in a group are rabbit fetuses, and each fetus responds or does
not respond to a treatment. BUGS CODE 6.3 is executed in order to estimate
intraclass Kappa with a posterior mean of 0.0907(0.1063) and a posterior mean
of 0.2262(0.0403) for the common probability of a response. In this case, Kappa
estimates the common correlation between the binary responses of the fetuses
in the same group, which is similar to the case of the usual one-way random
model with normally distributed observations.
Other measures of agreement are introduced, including the G coefficient
and the Jacquard index, both of which have the value “1” when there is perfect
agreement between two binary scores, but the Kappa coefficient remains the
index preferred by researchers in the social and medical sciences.
There is a well-known relationship between the Kappa coefficient and
the sensitivity and specificity of two readers assigning scores. Kraemer [7]
expressed Kappa in terms of the specificity and sensitivity of the two read-
ers and showed the dependence of Kappa on the disease incidence. Ironically,
when the disease incidence is low and the specificity and sensitivity are “high,”
nevertheless, Kappa can be small. A similar situation occurs in diagnostic
testing when the disease rate is small, in that it can be true that the posi-
tive predictive value can be small even though the sensitivity and specificity
are high.
Chapter 6 continues with a discussion of consensus between readers with
an example applicable to Phase II clinical trials. In such studies, two or more
radiologists grade the response of each patient, and at the end of the trial must
come to a conclusion about the success or failure of the trial.
The idea of agreement is generalized to ways to compute Kappa when
there are more than two raters with binary scores, and is demonstrated with
an example of four students who assign scores to each image where the analysis
is executed with BUGS CODE 6.8 using 25,000 observations for the simula-
tion. When there are more than two raters, one can consider several ways to
measure partial agreement. For example, when six raters are assigning binary
scores, one can consider the agreement between, say, exactly two of six among
them. This is accomplished by defining a Kappa coefficient and illustrating
the idea with an example of six pathologists who assign a 0 or 1 if there is
a certain lesion present or not in the image. The Bayesian analysis is done
with BUGS CODE 6.9 and estimates Kappa as 0.6382(0.0598) with the pos-
terior mean. Various other scenarios for partial agreement are discussed and
a relevant Kappa defined and further illustrated with real-life examples. The
last generalization for agreement is to define a Kappa when there are many
raters and ordinal scores. Twenty exercises reinforce the Bayesian analysis for
agreement presented in the chapter and are essential for a complete under-
standing of the subject.
56. [60]
[61]
“Drew,” Johnny said, turning to his sturdy young friend,
“I came here the moment I reached the city. How come
the place was locked up and dark?”
“Been on a vacation; just got back.” Drew’s face lighted.
“Went to the Rockies. Had some wonderful hunting—
grizzly bears. Can’t say that’s more exciting than
hunting crooks, though,” he laughed.
“Met a girl you’d like on the way back.” Drew Lane
turned to Joyce. “Came on the bus. People in a bus,
traveling far, get to be like one big family. Funny part
was—” He gave a low chuckle. “She’s coming here to
help her uncle. He has a store on Maxwell Street.
Maxwell Street! Can you imagine?”
“Rags, scrap-iron, poultry in crates, fish smells and
noise—that’s what Maxwell Street means to me!” Joyce
shuddered.
“Just that!” Drew agreed. “This truly nice girl from
somewhere in Kansas is going there to help in her
uncle’s store. She doesn’t know a thing about Chicago.
Thinks Maxwell Street is all the same as State Street,
I’m sure. Believes her uncle’s store is anyway six stories
high. Well, she’s in for a terrible shock. I feel sorry for
her. Have to get round and see her—gave me the
address. She asked me what I did in Chicago.” Drew
chuckled once more.
“What did you tell her?” Joyce asked.
“Said I looked after people, lots of them.”
“And for once you told the truth,” Johnny laughed.
57. [62]
“But Johnny!” Joyce exclaimed. “Tell me some more
about this ‘House of Magic’ you’ve discovered. Sounds
frightfully interesting. We all thought you were a little
delirious when you first talked of it. But now—”
“Now you begin to believe me.” Johnny’s eyes shone.
“It’s a truly wonderful place.”
“Tell us about it.” Captain Burns insisted from his corner.
“Heard about some of these things before. Shouldn’t
wonder if they’d do things in the end to lift the load off
us poor, over-worked detectives.”
“I’ll tell you all I know, which isn’t much,” Johnny
agreed.
And here I think we may safely leave our friends for a
little time while we look in upon Grace Krowl, the girl
from somewhere in Kansas. She had found her uncle’s
store on Maxwell Street. And how she had found it!
58. [63]
CHAPTER VI
A STORE IN CHICAGO
A slender mite of a girl, barely past her eighteenth
birthday, Grace Krowl was possessed of an indomitable
spirit and a will of her own; else she would not have
been walking down Maxwell Street in Chicago hundreds
of miles from her home, in Kansas.
The look in her eyes as she marched down that street
where all manner of junk and rags are mingled with
much that, after all, is pleasant and desirable, was one
of utter surprise.
“A store,” she murmured, more than once, “a store in
Chicago. And Maxwell Street. I am sure I can’t be
wrong. And yet—”
Arrived at the street number written on a slip of paper
in her hand, she stood staring at the narrow, two-story
building with its blank windows and unpainted walls for
a full moment. Then, a spirit of desperation seizing her,
she sprang up the low steps, grasped the doorknob,
then stepped resolutely inside.
Once inside, she stood quite still. Never in any place
had she witnessed such confusion. What place could
this be? Her mind was in a whirl. Then, like a flash, her
59. [64]
eyes fell upon an object that threw her into action. With
a startled cry, she sprang at a group of women.
She snatched a tortoise shell comb from a huge black
woman’s hand just as she was about to try it in her
kinky hair. She dragged a pink kimono from beneath a
tall, slim woman’s arm and, diving all but headforemost,
gathered in a whole armful of garments that an
astonished little lady had been hugging tight.
By this time the battle turned. She found herself at the
center of a concerted attack. The black woman banged
at her with a picture frame, the tall, thin one jabbed her
with sharp elbows and the little lady made a grab at her
hair.
“Ladies! Ladies!” came in a protesting man’s voice.
“Why must you fight in my store?”
“Fight? Who wants to fight!” the tall woman screamed.
“Here we are peaceful folks looking over the goods in
your store, and here comes this one!” She pointed an
accusing finger at Grace. “She comes in grabbing and
snatching, that’s what she does!”
“Store! Goods!” Grace’s head was in a whirl. How could
they call this a store? It was a place where people
robbed strangers,—stole their trunks and rifled them.
Surely there could be no mistaking that. Were not the
trunks open there before her, a half dozen or more of
them? And was not her own modest steamer trunk
among them? Had she not caught them going through
her trunk? Were not the articles in her arms, the
tortoise shell comb, the kimono and those other
garments her very own? Goods? Store? What could it all
mean? Her head was dizzy.
60. [65]
[66]
“A store,” she whispered to herself, “my uncle’s store in
Chicago. He gave me this address. He must be in the
business of stealing trunks and selling their contents!”
She felt, of a sudden, all hollow inside, and dropping
like an empty sack, half sat upon a partially emptied
trunk.
“Miss! Why do you do this?” The bearded man who now
spoke was almost apologetic in his approach. “Why do
you do this in my store? Many years I, Nicholas Fischer,
have sold goods here and never before have I seen
such as this!”
“Nich—Nicholas Fischer!” The girl’s eyes widened. “Then
you are Nicholas Fischer. And this is your store?
STORE!” she fairly screamed.
She wanted to rise and flee, but she was half stuck in
the trunk and her wobbly legs would not lift her out, so
she said shakily:
“I did it be—because that’s my trunk. I—I am Grace
Krowl, your niece who came from Camden Center,
Kansas, to help you keep your store. But I won’t, I
won’t stay a moment. I’ll never, never, never help a
thief!”
“You?” The bearded man’s face was a study. Surprise,
mortification registered themselves on his face. “Grace
Krowl, my niece,” he murmured. “Her trunk! It is her
trunk! A thief it is she says I am—I, Nicholas Fischer,
who never stole a penny! Tell me, what is all this?” He
stared from face to face as if expecting an answer. But
no answer came.
And then a slow smile overspread his face. “Now I begin
to understand,” he murmured. “It is all a mistake, a
61. [67]
terrible mistake!
“Ladies,” he said, turning pleading eyes on the group of
customers, “will you please put back into that little trunk
everything you have taken out? And if any have paid for
a thing, I will repay. It is my niece’s trunk. It is one
terrible mistake.” He began rocking backwards and
forwards like one in great pain.
“A thief, she said,” he murmured. “But who would not
have thought it?” His eyes took in the half-empty trunks
all about him, then he murmured again, “Who would
not have thought it?”
Four hours later, just after darkness had fallen, this
same girl, Grace Krowl, found herself walking the most
unusual street in America, Maxwell Street in Chicago.
She found it interesting, amusing, sometimes a little
startling, and always unspeakably sad, this place where
a strange sort of bedlam reigns.
Here, as she passed along, fat Jewish women held up
flimsy silk stockings to her view, screaming, “Buy, Miss,
buy now! The price goes up! Cheap! Cheap!” Here a
man seized her rudely by the shoulder, turned her half
around and all but shoved her into a narrow shop,
where gaudy dresses were displayed. This made her
angry. She wanted to fight.
“I fight?” She laughed softly to herself. “I, who have
always lived in Camden Center! A sort of madness
comes over one in such a place as this, I guess.”
Recalling her fight earlier in the day, her cheeks
crimsoned, and she hurried on.
“What a jumble!” she exclaimed aloud as she turned her
attention once more to Maxwell Street. “Shoes, scissors,
62. [68]
[69]
radios, geese, cabbages, rags and more rags, rusty
hardware, musical instruments. Where does it all come
from, and who will buy it?”
She paused to look at a crate of cute white puppies with
pink noses. They, too, were for sale. Then, of a sudden,
her face clouded.
“Can I do it?” she muttered. “Can I? I—I must! But
other people’s things? So often the little treasures they
prized! How can I?”
That she might remove her thoughts from a painful
subject, she forced her eyes to take in her present
surroundings. Then, with a little cry, she sprang
forward. “Books! ‘Everything in books.’” She read the
sign aloud. She disappeared through a dingy door into a
room which was brightly lighted. The lights and the face
that greeted her changed all. The madly fantastic world
was, for the moment, quite shut out. She was at home
with many books and with a girl whose face shone, she
told herself, “like the sun.”
“A book?” this sales girl smiled. “Something
entertaining? A novel, perhaps. Oh no, I don’t think
you’d like ‘Portrait of a Man with Red Hair.’ It’s really
rather terrible. One of the chief characters is a mad man
who loves torturing people.” The girl shuddered.
“But this now—” She took up a well-thumbed volume.
“‘A Lantern in Her Hand.’ It is truly lovely—the story of
brave and simple people. I’m afraid we’re neither very
brave nor very simple these days. Do you feel that we
are?”
“She really is able to think clearly,” Grace whispered to
herself. “I am sure I am going to like her.”
63. [70]
“I’ll take one, that one,” she said putting out her hand
for the book. And then, because she was alone in a
great city, because she was bursting to confide in
someone, she said, “He buys trunks, trunks full of other
people’s things. He takes the things out and sells them,
other people’s things. They packed them away with
such care, and now—now he takes them out, throws
them about and sells them!”
“Who does?” The girl’s eyes opened wide.
“My uncle, Nicholas Fischer.”
“Oh, Nicholas Fischer.” The girl’s voice dropped. “But he
is the kindest man! Comes here with books. He sells
them to Mr. Morrow who owns this store—secondhand
books. Perhaps they come from the trunks. And Mr.
Morrow says he helps poor people, your uncle does, and
he doesn’t let anyone know who it is.”
“But he buys trunks, other people’s trunks, and sells
them!” Grace insisted.
“Yes, buys them at auction, I guess. Several people on
this street do that. Express auctions, railway auctions,
storage house auctions and all that. And you are to help
him open them up!” she exclaimed quite suddenly. “You
are to explore them? How I envy you!”
“Envy?” Grace stared in unbelief.
“But why not? Think of the things you may find.
Diamonds perhaps; stocks and bonds; rare old coins
and rarer old books; ancient silver plate. Just think of
the things people pack away in their trunks! Letters;
diaries; quaint old pictures. It—why it’s like a trip
around the world!”
64. [71]
[72]
“But it—it seems so unfair,” Grace wavered.
“You’re not the one that’s being unfair,” the bright-eyed
one reasoned. “Those people can’t have their things in
those trunks. Perhaps they are dead. In some cases
they lost their trunks because they were too poor to pay
storage or express charges. You can’t well help that. So
why think about it?”
Grace Krowl was to think about it many times and in the
end to do something about it. That something was to
draw her into a great deal of trouble. For the moment
she left the little secondhand bookshop soothed,
comforted, and filled with a desire to call again.
“No doubt you think Maxwell Street a terrible place,” the
smiling girl said as she walked with her to the door,
“and that your uncle’s store is the worst on the street.
But I could tell you—” A shadow fell across her face. “I
could tell you things about grand stores on a very grand
street in this city of ours. Per—perhaps I will sometime.”
Grace was startled as she looked into her face. It had
suddenly become gray and old.
“How strange,” she murmured as, dodging a pushcart
laden with geese, she hurried away toward Nicholas
Fischer’s place on Maxwell Street. “How strange. And
how—how sort of terrible. And yet—”
The words of a great man came to her. “No situation in
life is ever so bad but that it might be worse.”
* * * * * * * *
“What,” you may be asking by this time, “have the
adventures of a girl from Kansas to do with Johnny
65. [73]
Thompson and his friends?” The answer is: “A great
deal.” In the first place, Drew Lane, having discovered
this little lady while traveling in a bus, was not the sort
to desert her in her plight. In the second place, an
invisible finger of light moving across the sky was
destined to join the fates of Johnny Thompson and
Grace Krowl.
However, for the time, we will return to Johnny and his
friends.
66. [74]
CHAPTER VII
THE UNHOLY FIVE
During the course of their conversation about the open
fire in Drew Lane’s shack, Captain Burns took from his
inside pocket a small package which proved to be five
photographs pasted securely upon a strip of stout cloth
in such a manner that they might be folded together in
the form of a small book. “Ever see any of these?” he
said to Johnny after spreading them out upon his knee.
For a moment Johnny studied the pictures thoughtfully.
Then he gave a sudden start. “That,” he exclaimed,
pointing a trembling finger at the third in the row, “is
the man who sat beside me in the auction—who got me
to bid in that package!”
“Are you sure?” The Captain’s tone was tense.
“Can’t be a doubt about it. See that scar like a cross?
Couldn’t well miss that, could I? He’s the one all right.
And, though I could never prove it, I’d swear he was
the one who struck me from the dark.
“And, by all that’s good!” Johnny sprang to his feet. “I’ll
get that man! See if I don’t! No man can strike me from
the shadows and get away with it!”
67. [75]
“Well, I guess that makes your friend Johnny here one
of us. That right, Drew?” the Captain rumbled.
Drew Lane nodded his head.
“Sit down, son,” said the Captain. “I’ll tell you what
those pictures mean. Drew here and Tom Howe carry
those pictures with them always. So does Joyce, though
I don’t know quite where—in her stocking perhaps.”
Joyce smiled.
“We joke at times,” the Captain went on, “but this affair
is no joke. Those men are our assignment. They are to
be our assignment until every man of them is behind
bars or in his grave. You may join us if you will.”
“I will.” Johnny’s voice was low.
The Captain extended his hand as a solemn pledge.
“You have a right to know,” he went on, “just what men
you are after, and what they have done.
“They are hardened criminals, every one, public
enemies of the worst sort. A little more than a month
ago they sealed their fate—they killed a policeman, the
finest copper that ever walked a beat.”
For a time the Captain stared at the fire. “My boy,” he
said at last, in a different voice, “I’m going to take you
with me somewhere, sometime. The finest little family
you ever saw!” he rumbled low as if talking to himself.
Then, with a sudden start, he repeated, “They killed a
policeman. Of course a policeman’s no better than any
68. [76]
[77]
other man. But with us there’s an unwritten law that no
officer shall go unavenged.
“That wasn’t all they did, this unholy five. They went to
a banker’s home at midnight and terrorized his family
until morning. Man’s wife was in ill health. But of course
—” The Captain’s voice rumbled with scorn and hate.
“Of course you couldn’t expect these robbers to take
note of a little thing like that! What do they care for
women and children?
“When morning came they took the man to his bank.
They compelled him to open the vault. They took the
bank’s securities, more than two hundred thousand
dollars worth. Then, of course, they went away.
“By some oversight, the bank’s insurance had been
allowed to lapse. Because of this heavy loss the bank
was forced to close its doors. It was a working man’s
bank. Thousands of common folks lost their savings.
These five men—no doubt they had a fine time with the
currency they took!
“But the bonds—” His voice rose again. “The bonds are
hot. We’ve kept them hot. They dare not sell them. And
we’ll get them back yet, see if we don’t!
“And those are the men we’re after!” he added a
moment later. “Are you still with us?”
“More than ever!” Johnny’s voice was husky.
Once again the Captain offered his hand. “You’re a lad
after my own heart,” he rumbled. “I’ve two places I
want to show you, and I’m sure you’ll like them both.”
69. [78]
[79]
CHAPTER VIII
DOWN A BEAM OF LIGHT
Grace Krowl, the girl from Kansas, found plenty of
things to occupy her thoughts as she sank into a chair
in one of the two small rooms allotted to her on the
upper floor of her uncle’s store in Chicago.
“A store in Chicago.” She laughed low. Her uncle’s store
in Chicago. What dreams had she not dreamed of this
store? Chicago was a grand city. His store must be a
grand place. She had of late pictured it as a six-story
building; pure fancy, for he had never written about its
size or importance. In fact, he had not written at all
until she had written first and asked for a position as
clerk in his store. He had been married to her mother’s
sister. The sister was dead.
When Grace had needed work badly she had written,
and he had replied briefly: “I can give you work at
fifteen dollars per week and board.”
So here she was. And her uncle’s store was little more
than a hole in the wall. No counters, no glass cases.
Things piled in heaps, and all secondhand; glass dishes
here, bed covers there, dresses, sheets, towels,
70. [80]
everything. And in the corner, like so many skeletons, a
great pile of bruised, battered and empty trunks.
“He buys trunks, other people’s trunks.” She shuddered
afresh.
Then the words of her new-found friend of the
bookstore came to her. “Diamonds, stocks and bonds.”
These were dreams. “But rare old books, wonderful bits
of Irish lace, why not?” Perhaps, after all, she could
drive away the ache that came in her throat at the
thought that someone who truly loved these things had
lost them because they were poor.
She thought of her own trunk and laughed aloud. What
a sight that must have been—she snatching at her
prized possessions and those other women poking her
and banging her on the head!
Of course it had all been a mistake. She had come to
Chicago by bus and had sent on her trunk by express.
The van that went for her trunk had also picked up a
half dozen others which her uncle had bought at
auction. The trunks had become mixed. The lock had
been pried off her own and the contents were being
sold when she arrived. Everything had been retrieved
except a pearl-backed brush she prized and a hideous
vase she abhorred.
“That did not turn out so badly,” she assured herself.
“Perhaps everything will come along quite as well.” And
yet, as she took a handful of silver coins and one paper
dollar from her purse and added them up, her face was
very sober. She was a long way from home, and there
could be no retreat.
71. [81]
[82]
The place she was to call home was above the store.
Too tired and preoccupied to notice at first, she received
a shock when she at last became conscious of her
surroundings. The room in which she sat was a tiny
parlor, all her own. Off from that was a bedroom.
Everything—
furniture, rugs, decorations,—
was in
exquisite taste and perfect harmony.
“Contrast!” she exclaimed. “Who could ask for greater
contrast? Rags below, and this above!” She stared in
speechless surprise.
One thing astonished her. Opposite the window in the
parlor was an oval, concave mirror, like an old-fashioned
light reflector. It was some two feet across.
“I wonder why it is here,” she murmured. She was to
wonder more as the days passed.
When she had prepared herself for the night’s rest, she
snapped out the light, then stood for a brief time at the
open window looking out into the night. She was on the
second floor of her uncle’s small building. Before her
were the low, flat roofs of some one-story shacks.
Looking far beyond these, she saw squares of light
against the night sky. These she knew were lighted
windows of distant skyscrapers. There were thousands
of these windows.
“What can they all do at night?” she asked herself.
“Struggling to make money, to get on, to keep their
families housed and fed,” the answer came to her. Then,
strangely enough, her mind carried her back over the
trail that had brought her to this city. It had been an
interesting adventure, that long bus ride. Six of the
72. [83]
passengers, including herself, had ridden hundreds of
miles together. They had become like a little community.
“It was as if these were pioneer days,” she told herself
now. “As if we were journeying in covered wagons in a
strange new land.” One of these long distance
passengers, as you will know, had been a young man.
In his golf knickers and soft, gray cap, he had seemed a
college boy. But he was not. “Out of college and at
work,” was the way he had expressed it.
“What work do you do?” she had asked.
He had hesitated before replying. Then his answer had
been vague. “Oh, I just look after people.”
“Look after people?”
“Lots of people. All sorts.” A queer smile had played
about the corners of his mouth.
She had not pressed the question further. But now,
standing there looking out into his city at night, she
whispered, “His name was Drew Lane. Wonder if I’ll
ever see him again? I hope so. He seemed a nice boy,
and I should love to know how he looks after ‘lots of
people—all sorts.’”
She looked again at the many lighted windows.
Suddenly those who toiled there seemed very near to
her. She found a strange comfort in this.
“I, too, must do my best,” she told herself. “God help
me to be wise and strong, helpful to others and kind to
all!” she prayed as she gave herself over to sleep.
73. [84]
She was wakened at dawn by a whisper. At first, so
closely did dream life blend with the life of day, it
seemed natural that she should be listening to this
whisper. When she had come into full consciousness she
sprang out of bed with a start.
“Good morning!” The words came in slowly, a distinct
whisper. “We hope you are happy this morning. Cheerio!
That’s the word!”
“When you have dressed,” the whisper continued,
“won’t you just step out into the little parlor and take a
seat by the table? It will be good to have a look at your
shining face.”
“Someone in my little parlor! I don’t like it. And that
whisper!”
She dressed hurriedly, then stepped through the door.
What sort of person had she expected to see? Probably
she could not have told. What she did see was an
empty room.
Greatly astonished, hardly knowing why she obeyed the
whispered orders, she took a seat by the table. Instantly
the whisper began once more:
“Ah! There you are! I am talking to you over a beam of
light. I am a mile away. I have interesting things to tell
you. You are going to aid me.”
For a brief space of time the whisper ended. The girl’s
mind was in a whirl. “Talking down a beam of light!” she
thought. “What nonsense! Going to aid that whisperer?”
Here surely was some strange mystery.
74. [85]
[86]
CHAPTER IX
CUT ADRIFT
For some time Grace Krowl remained at her small table
awaiting some further message from the mysterious
whisperer. No further message came. Had this whisper
told the truth? Was he a mile away? She could not
believe it.
On descending to the floor below, she found her strange
uncle prepared to leave his odd store.
“Today I go to an auction,” he said to her with a smile.
“Today there is nothing to unpack. Not many people will
come. They come only when there are trunks.
Tomorrow there will be trunks, perhaps many trunks.”
“Trunks,” Grace thought with an involuntary shudder.
“Today,” her uncle went on, “Margot will tend store.” He
nodded toward an aged woman bending over a pile of
soiled garments. “Today you are free. You may make
yourself at home in your new place.”
All that day in her little parlor, Grace had one ear open
for the Whisperer. She heard nothing. He spoke,
apparently, only at dawn. The day was, for her, quite
uneventful.
75. [87]
The same could not be said for our young friend Johnny.
Late that day, with a narrow bandage still about his
head, he returned to the “House of Magic.” And, almost
at once, adventure struck him squarely between the
eyes.
“You are just in time!” Felix, the inventor’s son, greeted
him. “I have not tried that new thing. We will begin at
dusk, in an hour or two in a captive balloon,—”
“A captive balloon!” Johnny felt a thrill course up his
spine.
“On the Fair grounds,” Felix added. “There is one over
there. The grounds are deserted. I have permission to
use the balloon. I have had it inflated. No one will
bother us there.”
It is better sometimes to do things where there are
crowds. Felix was to learn this. There is safety in
numbers.
At the gate of the deserted Fair grounds Felix presented
his pass. They were admitted.
“Sent the equipment over in a small truck,” he explained
to Johnny. “Rather heavy.”
“What equipment?” The words were on Johnny’s
tongue. He did not say them. Just in time he recollected
that he was to look, listen, help all he could and not ask
questions. “I’ll be told all I need to know in good time,”
he assured himself. Had he but known it, that night he
was to need wisdom not written in any book.
The streets they were passing through now were
strange. The falling darkness gave to everything an air
76. [88]
[89]
of mystery. Here some great man-made dragon opened
its mouth as if to swallow them, there a tattered sign
fluttered and cracked in the wind. “The great Century of
Progress!” Johnny whispered. “Here thousands swarmed
along the Midway. Now all is still. Now—
“What was that?” He stopped dead in his tracks. Had he
caught the sound of scurrying feet? Yes, he was sure of
it. And there, well defined against a wall, were the
shadows of two half crouching figures. One was tall, the
other short. Johnny felt a chill run up his spine.
Felix apparently had seen nothing, heard nothing. He
had gone plodding stolidly on into the gathering
darkness; was at this moment all but lost from sight.
With a little cry of consternation, Johnny sprang after
him.
By the time he caught up to him they were at the spot
where the balloon was kept.
“We just release this clutch when we are ready to go
up,” Felix explained, “then up we go. There is a time
arrangement that will set the electrically operated drum,
winding us back down again in two hours. We only go
up about three hundred feet. Cable holds us. Quite safe
tonight, no wind to speak of.”
Johnny thought this a rather strange arrangement. “No
guard here?” he asked.
“No need. No one’s allowed in the grounds unless they
have a pass. Climb in. All set.”
Johnny did climb in, and up they went.
77. [90]
Johnny had been in the air many times. For all that, he
experienced a strange sense of insecurity as they rose a
hundred, two hundred, three hundred feet into the
murky air of night. “Pooh!” he exclaimed in a low
breath. “It is nothing!”
That he might throw off this feeling of dread, he busied
himself with other thoughts. His gaze swept the city
where lights were gleaming. “Where,” he thought, “are
Drew and Tom? Hunting pickpockets perhaps. And
where is Captain Burns? I’m going to like him, I’m sure.
He is so solid and real; but jovial for all that. He said
he’d take me places. What places? I wonder. Dangerous
places? He said—”
His thoughts were broken in upon by Felix’s voice:
“Here we are at the top. Now for the test.”
The young inventor flashed on a powerful searchlight.
“All I have to do is to connect this through a switch, aim
my light at a window in our house, take up this
microphone and say, ‘Hello father!’ He hears me and no
one else in the world can. He—
“What!” he exclaimed in consternation. “The current is
off. Someone cut the light cable!”
“More than that!” Johnny’s tone was sober. He was
looking over the side of the balloon basket in which they
rode. “The cable that holds us has been cut! We’re
drifting!”
“You’re right!” Consternation sounded in the older boy’s
voice. “We’re going out into the night, over black
waters. And there is no ballast!”
78. [91]
[92]
“They got us, those two!” Johnny muttered.
“What two?” Felix demanded.
“I saw them on the grounds, a tall one and a short one
—anyway I saw their shadows. Should have told you.”
“Oh!” Felix groaned. “Wonder what we’ve done to them.
But they haven’t got us—not yet!” There was courage
and high resolve in Felix Van Loon’s tone. “We’ll beat
them yet. You’ll see!”
Would they? Johnny silently wondered.
Strangely enough, at that moment thoughts not related
at all to this adventure passed through his mind. He was
once more in that place of mystery, the professor’s
house, in the hallway seeing eyes in the wall,
shuddering at sight of his own skeleton. “How could all
that have happened?” he asked himself.
79. [93]
CHAPTER X
A RUNAWAY CAPTURED
Johnny had known a thrill or two, but none quite like
drifting through the night in a balloon that was not
meant for drifting.
“Not an ounce of ballast!” Felix groaned. “And the night
so dark we may plunge without a moment’s notice into
those cold, black waters. And then—oh well, what’s the
good of thinking about that?”
There truly was no use at all of thinking about it. If
worse came to worst and they were able to tell the
moment of great danger, they might throw his
instruments and the searchlight over to lighten the
balloon.
“All this equipment,” Felix moaned, “cost plenty of
money!”
In spite of their predicament, Johnny found himself
wondering about that equipment and what they had
been about to do.
For a time Johnny was silent. Then of a sudden he
exclaimed, “Felix, we are drifting northeast! That means
we’ll be over the lake for hours. If the wind rises, if a
80. [94]
strong gust drags us down, or if the gas bag leaks and
we are plunged into the lake we are lost! A three
hundred foot cable hangs beneath this balloon. It is
weighting us down. Suppose we could cut it away?”
“It’s an idea!” Felix was all alert. “But it hangs from
below. How’ll you reach it?”
“Here’s a rope. I’ll go over the side. You hang on to the
rope.”
“That,” said Felix slowly, “will be taking a long chance.”
“Whole thing’s a chance.” Johnny was tying a loop in the
rope. “Now I’ll put a foot in this loop, hold to the rope
with one hand and work with the other. Flashlight will
tell me all I need to know. Can hold the light in my
teeth.”
“You should be in a circus.” Felix laughed. For all that,
he made the other end of the rope fast, then prepared
to lower his companion.
As he climbed up and over, Johnny felt his heart miss a
beat. It was strange, this crawling out into space. All
was dark below. Was the water a hundred or a
thousand feet down? He could not tell. The majestic
Lindbergh light swept the sky, but its rays did not touch
them.
“If only it did,” he murmured, “someone would see us.”
Strangely enough, at this very moment the professor’s
golden-haired daughter, Beth, was making strenuous
efforts to bring that very thing to pass, to get one of
those eyes of the night, a powerful searchlight, focussed
upon the runaway balloon.
81. [95]
Her father, sensing that something had gone wrong with
the balloon, had hurried her away to the spot from
which the balloon had risen. Arrived there after a wild
taxi ride, she had discovered on the instant what had
happened.
“Some—someone cut the cable with an electric torch!”
In vain her eyes searched the sky for the balloon. She
was about to hurry away when a hand gripped her arm.
“Where would you go?”
“Why! I—”
Taking one look at the man, she sent forth an
involuntary scream. She had seen that man before. He
carried a knife in his sleeve. She was terribly afraid.
Her scream had electrifying results. A huge bulk of a
youth with tangled red hair emerged from somewhere.
“Here you!” he growled, “Let her go!”
Releasing the girl, the small dark man sprang at her
protector.
“Look out!” the girl screamed. “He—he has a knife!”
Her warning was not needed. The little man’s knife went
coursing through the air. Next instant the little man
followed it into the dark. The big fellow’s fists had done
all this.
“Now, sister,” the young giant turned to Beth, “where
was it you wanted to go?”
“The—the Skidmore Building.”
82. [96]
[97]
“The Skidmore? O.K.”
Fairly picking her up, he rushed her to the taxi that was
waiting for her, then climbed in beside her. “Skidmore
Building. Make it snappy!”
Once in the taxi and speeding away, Beth was able to
collect her thoughts. There was, at the top of the tall
Skidmore Building, a searchlight. This was not always in
operation, but was held in readiness for any emergency
either on the water or in the air. If only she could get
that light searching the air for the runaway balloon
something, she felt sure, could be done about it.
The taxi came to a sudden jarring halt.
“Here you are!”
“Here.” She dropped a half dollar in the taxi driver’s
hand. At the same instant something was pressed into
the palm of her left hand. She looked up. Her powerful
young protector was gone. In her hand was a card.
A moment later as she shot toward the stars in an
elevator she looked at that card and smiled.
“Gunderson Shotts, 22 Diversey Way” it read. And in the
lower right hand corner, “Everybody’s Business.”
She smiled in spite of herself as she murmured,
“Gunderson Shotts, Everybody’s Business. What a
strange calling!”
* * * * * * * *
At that same moment Johnny was going over the side
into the dark. It was strange, this adventure. “Must be
83. [98]
careful,” he told himself. And indeed he must. Dark
waters awaited him. A drop from that height would
probably kill or at least maim him.
“No chance,” he murmured.
The bright lights of the city called to him from afar. He
had seen much of that bright and terrible city; had
meant to see much more. “Must see it all,” he told
himself.
“But now I must forget it,” he resolved.
And surely he must, for now he was beneath the
basket. The tiny finger of light from his electric torch
shot about here and there.
Steadying its motion, directing it toward the end of the
cable, he began studying the problem at hand.
And then—something happened. Did his hand slip? Did
the noose about his foot give away? He will never know.
Nor will he forget that instant when his flashlight,
slipping from his chattering teeth, shot downward and
he, by the merest chance, escaped following it.
How it happened he will never be able to tell. This much
he knew: he hung there in all that blackness supporting
his weight by one desperately gripping hand.
Somewhere below was the noose that should offer him
footing. Somewhere far, far below were black waters
waiting. And through his mind there flashed a thousand
pictures of the bright and beautiful world he might, in
ten seconds’ time, leave behind.
84. [99]
All this in the space of a split second, then groping
madly, he found the rope with his other hand. After that
began the heart-breaking task of groping in the dark
with his foot for the dangling rope loop, while the
muscles in his arms became burning bands of fire.
“I must win!” he whispered. “I must!”
“Johnny! Johnny Thompson!” came from above. “What
has happened?”
“Don’t know. I—I’m dangling. Dra—draw me up if you
can.”
Came a sudden tug on the rope that all but tore the
rope from his grip. “No! No! Wait!”
Once again he sought that noose with his toe.
* * * * * * * *
As for Beth, she had gone shooting up in that express
elevator in the Skidmore Building.
Like a rubber ball she bounded from the car, then raced
for a cubby-hole in a corner where two men were
standing.
“The balloon!” she exclaimed. “The captive balloon! It’s
loose, drifting! You must find it with your light!”
“What’s that?” one man demanded sharply. “Impossible!
There’s no gale. That cable couldn’t break!”
“It’s loose! Drifting!” the girl insisted. “They cut the
cable, someone cut it. My brother and another boy are
in the balloon. You must save them.”
85. [100]
[101]
One man glanced at the other. “All right, we better try
it, Ben!”
At that a long finger of white light began feeling its way
through the blackness that is sky above Lake Michigan
on a cloudy night.
Johnny, unable to find the loop in the rope, feeling his
strength unequal to a climb hand over hand, felt the
muscles of his arms weaken until all seemed lost.
And then, as if some miracle had been done, night
turned into day. The powerful light had reached him
only for a second, but that was enough. His keen eye
had caught the loop in the rope. It was by his knee. A
sudden fling and his knee was resting in that loop.
“All—all right now!” he called. “Try to pull me up.”
And at that the gleam of that powerful searchlight
returned to rest on the spot of air in which the runaway
balloon hung.
“I’ll step over and call the sausage balloon, Ben,” one of
the men in the great steel tower said to the other as
Beth, at sight of the balloon still drifting high, began
breathing more easily. “They’ll have to go to the
rescue.”
One more fierce struggle and Johnny tumbled over the
side into the balloon’s basket.
“It—it’s put on with steel rings,” he panted.
“It—what is?” Felix stared.
86. Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com