Sample Size Calculations For Clustered And
Longitudinal Outcomes In Clinical Research Chul
Ahn download
https://ebookbell.com/product/sample-size-calculations-for-
clustered-and-longitudinal-outcomes-in-clinical-research-chul-
ahn-4946084
Explore and download more ebooks at ebookbell.com
Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Sample Size Calculations For Clustered And Longitudinal Outcomes In
Clinical Research Chul Ahn Moonseoung Heo Song Zhang
https://ebookbell.com/product/sample-size-calculations-for-clustered-
and-longitudinal-outcomes-in-clinical-research-chul-ahn-moonseoung-
heo-song-zhang-4960296
Sample Size Calculations In Clinical Research Chapman Hallcrc
Biostatistics Series 3rd Edition Chow
https://ebookbell.com/product/sample-size-calculations-in-clinical-
research-chapman-hallcrc-biostatistics-series-3rd-edition-
chow-55512002
Sample Size Calculations In Clinical Research Third Edition 3rd
Edition Sheinchung Chow
https://ebookbell.com/product/sample-size-calculations-in-clinical-
research-third-edition-3rd-edition-sheinchung-chow-6750322
Sample Size Calculations In Clinical Research Second Sheinchung Chow
https://ebookbell.com/product/sample-size-calculations-in-clinical-
research-second-sheinchung-chow-896782
Sample Size Calculations In Clinical Research 2 Rev Exp Sheinchung
Chow
https://ebookbell.com/product/sample-size-calculations-in-clinical-
research-2-rev-exp-sheinchung-chow-1357638
Methods And Applications Of Sample Size Calculation And Recalculation
In Clinical Trials 1st Ed Meinhard Kieser
https://ebookbell.com/product/methods-and-applications-of-sample-size-
calculation-and-recalculation-in-clinical-trials-1st-ed-meinhard-
kieser-22504494
Sample Size Tables For Clinical Studies 3rd Edition David Machin
https://ebookbell.com/product/sample-size-tables-for-clinical-
studies-3rd-edition-david-machin-2418960
Sample Size Tables For Clinical Studies David Machin Et Al
https://ebookbell.com/product/sample-size-tables-for-clinical-studies-
david-machin-et-al-4138216
Sample Size Determination And Power Thomas P Ryanauth
https://ebookbell.com/product/sample-size-determination-and-power-
thomas-p-ryanauth-4318598
Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn
Accurate sample size calculation ensures that clinical studies have
adequate power to detect clinically meaningful effects. This results in
the efficient use of resources and avoids exposing a disproportionate
number of patients to experimental treatments caused by an over-
powered study.
Sample Size Calculations for Clustered and Longitudinal Out-
comes in Clinical Research explains how to determine sample size
for studies with correlated outcomes, which are widely implemented
in medical, epidemiological, and behavioral studies.
The book focuses on issues specific to the two types of correlated
outcomes: longitudinal and clustered. For clustered studies, the au-
thors provide sample size formulas that accommodate variable clus-
ter sizes and within-cluster correlation. For longitudinal studies, they
present sample size formulas to account for within-subject correla-
tion among repeated measurements and various missing data pat-
terns. For multiple levels of clustering, the level at which to perform
randomization actually becomes a design parameter. The authors
show how this can greatly impact trial administration, analysis, and
sample size requirement.
Addressing the overarching theme of sample size determination for
correlated outcomes, this book provides a useful resource for bio-
statisticians, clinical investigators, epidemiologists, and social scien-
tists whose research involves trials with correlated outcomes. Each
chapter is self-contained so readers can explore topics relevant to
their research projects without having to refer to other chapters.
Statistics
K15411
w w w . c r c p r e s s . c o m
Chul Ahn
Moonseong Heo
Song Zhang
Ahn,
Heo,
and
Zhang
Sample Size
Calculations for
Clustered and
Longitudinal
Outcomes in
Clinical Research
Sample
Size
Calculations
for
Clustered
and
Longitudinal
Outcomes
in
Clinical
Research
K15411_cover.indd 1 11/4/14 10:32 AM
Sample Size Calculations
for Clustered and
Longitudinal Outcomes
in Clinical Research
Editor-in-Chief
Shein-Chung Chow, Ph.D., Professor, Department of Biostatistics and Bioinformatics,
Duke University School of Medicine, Durham, North Carolina
Series Editors
Byron Jones, Biometrical Fellow, Statistical Methodology, Integrated Information Sciences,
Novartis Pharma AG, Basel, Switzerland
Jen-pei Liu, Professor, Division of Biometry, Department of Agronomy,
National Taiwan University, Taipei, Taiwan
Karl E. Peace, Georgia Cancer Coalition, Distinguished Cancer Scholar, Senior Research Scientist
and Professor of Biostatistics, Jiann-Ping Hsu College of Public Health,
Georgia Southern University, Statesboro, Georgia
Bruce W. Turnbull, Professor, School of Operations Research and Industrial Engineering,
Cornell University, Ithaca, New York
Published Titles
Adaptive Design Methods in
Clinical Trials, Second Edition
Shein-Chung Chow and Mark Chang
Adaptive Design Theory and
Implementation Using SAS and R,
Second Edition
Mark Chang
Advanced Bayesian Methods for Medical
Test Accuracy
Lyle D. Broemeling
Advances in Clinical Trial Biostatistics
Nancy L. Geller
Applied Meta-Analysis with R
Ding-Geng (Din) Chen and Karl E. Peace
Basic Statistics and Pharmaceutical
Statistical Applications, Second Edition
James E. De Muth
Bayesian Adaptive Methods for
Clinical Trials
Scott M. Berry, Bradley P. Carlin,
J. Jack Lee, and Peter Muller
Bayesian Analysis Made Simple: An Excel
GUI for WinBUGS
Phil Woodward
Bayesian Methods for Measures of
Agreement
Lyle D. Broemeling
Bayesian Methods in Epidemiology
Lyle D. Broemeling
Bayesian Methods in Health Economics
Gianluca Baio
Bayesian Missing Data Problems: EM,
Data Augmentation and Noniterative
Computation
Ming T. Tan, Guo-Liang Tian,
and Kai Wang Ng
Bayesian Modeling in Bioinformatics
Dipak K. Dey, Samiran Ghosh,
and Bani K. Mallick
Benefit-Risk Assessment in
Pharmaceutical Research and
Development
Andreas Sashegyi, James Felli, and
Rebecca Noel
Biosimilars: Design and Analysis of
Follow-on Biologics
Shein-Chung Chow
Biostatistics: A Computing Approach
Stewart J. Anderson
Causal Analysis in Biomedicine and
Epidemiology: Based on Minimal
Sufficient Causation
Mikel Aickin
Clinical and Statistical Considerations
in Personalized Medicine
Claudio Carini, Sandeep Menon,
and Mark Chang
Clinical Trial Data Analysis using R
Ding-Geng (Din) Chen and Karl E. Peace
Clinical Trial Methodology
Karl E. Peace and Ding-Geng (Din) Chen
Computational Methods in Biomedical
Research
Ravindra Khattree and Dayanand N. Naik
Computational Pharmacokinetics
Anders Källén
Confidence Intervals for Proportions and
Related Measures of Effect Size
Robert G. Newcombe
Controversial Statistical Issues in
Clinical Trials
Shein-Chung Chow
Data and Safety Monitoring Committees
in Clinical Trials
Jay Herson
Design and Analysis of Animal Studies in
Pharmaceutical Development
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Bioavailability and
Bioequivalence Studies, Third Edition
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Bridging Studies
Jen-pei Liu, Shein-Chung Chow,
and Chin-Fu Hsiao
Design and Analysis of Clinical Trials with
Time-to-Event Endpoints
Karl E. Peace
Design and Analysis of Non-Inferiority
Trials
Mark D. Rothmann, Brian L. Wiens,
and Ivan S. F. Chan
Difference Equations with Public Health
Applications
Lemuel A. Moyé and Asha Seth Kapadia
DNA Methylation Microarrays:
Experimental Design and Statistical
Analysis
Sun-Chong Wang and Arturas Petronis
DNA Microarrays and Related Genomics
Techniques: Design, Analysis, and
Interpretation of Experiments
David B. Allison, Grier P. Page,
T. Mark Beasley, and Jode W. Edwards
Dose Finding by the Continual
Reassessment Method
Ying Kuen Cheung
Elementary Bayesian Biostatistics
Lemuel A. Moyé
Frailty Models in Survival Analysis
Andreas Wienke
Generalized Linear Models: A Bayesian
Perspective
Dipak K. Dey, Sujit K. Ghosh,
and Bani K. Mallick
Handbook of Regression and Modeling:
Applications for the Clinical and
Pharmaceutical Industries
Daryl S. Paulson
Inference Principles for Biostatisticians
Ian C. Marschner
Interval-Censored Time-to-Event Data:
Methods and Applications
Ding-Geng (Din) Chen, Jianguo Sun,
and Karl E. Peace
Joint Models for Longitudinal and Time-
to-Event Data: With Applications in R
Dimitris Rizopoulos
Measures of Interobserver Agreement
and Reliability, Second Edition
Mohamed M. Shoukri
Medical Biostatistics, Third Edition
A. Indrayan
Meta-Analysis in Medicine and Health
Policy
Dalene Stangl and Donald A. Berry
Mixed Effects Models for the Population
Approach: Models, Tasks, Methods and
Tools
Marc Lavielle
Monte Carlo Simulation for the
Pharmaceutical Industry: Concepts,
Algorithms, and Case Studies
Mark Chang
Multiple Testing Problems in
Pharmaceutical Statistics
Alex Dmitrienko, Ajit C. Tamhane,
and Frank Bretz
Noninferiority Testing in Clinical Trials:
Issues and Challenges
Tie-Hua Ng
Optimal Design for Nonlinear Response
Models
Valerii V. Fedorov and Sergei L. Leonov
Patient-Reported Outcomes:
Measurement, Implementation and
Interpretation
Joseph C. Cappelleri, Kelly H. Zou,
Andrew G. Bushmakin, Jose Ma. J. Alvir,
Demissie Alemayehu, and Tara Symonds
Quantitative Evaluation of Safety in Drug
Development: Design, Analysis and
Reporting
Qi Jiang and H. Amy Xia
Randomized Clinical Trials of
Nonpharmacological Treatments
Isabelle Boutron, Philippe Ravaud, and
David Moher
Randomized Phase II Cancer Clinical
Trials
Sin-Ho Jung
Sample Size Calculations for Clustered
and Longitudinal Outcomes in Clinical
Research
Chul Ahn, Moonseong Heo, and
Song Zhang
Sample Size Calculations in Clinical
Research, Second Edition
Shein-Chung Chow, Jun Shao
and Hansheng Wang
Statistical Analysis of Human Growth
and Development
Yin Bun Cheung
Statistical Design and Analysis of
Stability Studies
Shein-Chung Chow
Statistical Evaluation of Diagnostic
Performance: Topics in ROC Analysis
Kelly H. Zou, Aiyi Liu, Andriy Bandos,
Lucila Ohno-Machado, and Howard Rockette
Statistical Methods for Clinical Trials
Mark X. Norleans
Statistical Methods in Drug Combination
Studies
Wei Zhao and Harry Yang
Statistics in Drug Research:
Methodologies and Recent
Developments
Shein-Chung Chow and Jun Shao
Statistics in the Pharmaceutical Industry,
Third Edition
Ralph Buncher and Jia-Yeong Tsay
Survival Analysis in Medicine and
Genetics
Jialiang Li and Shuangge Ma
Theory of Drug Development
Eric B. Holmgren
Translational Medicine: Strategies and
Statistical Methods
Dennis Cosmatos and Shein-Chung Chow
Chul Ahn
University of Texas Southwestern Medical Center
Dallas, Texas, USA
Moonseong Heo
Albert Einstein College of Medicine
Bronx, New York, USA
Song Zhang
University of Texas Southwestern Medical Center
Dallas, Texas, USA
Sample Size Calculations
for Clustered and
Longitudinal Outcomes
in Clinical Research
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20141029
International Standard Book Number-13: 978-1-4665-5627-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com
Contents
Preface ix
List of Figures xi
List of Tables xiii
1 Sample Size Determination for Independent Outcomes 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Precision Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Sample Size Determination for Clustered Outcomes 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 One–Sample Clustered Continuous Outcomes . . . . . . . . . 24
2.3 One–Sample Clustered Binary Outcomes . . . . . . . . . . . 28
2.4 Two–Sample Clustered Continuous Outcomes . . . . . . . . 34
2.5 Two–Sample Clustered Binary Outcomes . . . . . . . . . . . 38
2.6 Stratified Cluster Randomization for Binary Outcomes . . . 42
2.7 Nonparametric Approach for One–Sample Clustered Binary
Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 51
3 Sample Size Determination for Repeated Measurement
Outcomes Using Summary Statistics 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Information Needed for Sample Size Estimation . . . . . . . 62
3.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . 64
3.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 78
4 Sample Size Determination for Correlated Outcome
Measurements Using GEE 83
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Review of GEE . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 Compare the Slope for a Continuous Outcome . . . . . . . . 90
4.4 Test the TAD for a Continuous Outcome . . . . . . . . . . . 110
4.5 Compare the Slope for a Binary Outcome . . . . . . . . . . . 119
vii
viii Contents
4.6 Test the TAD for a Binary Outcome . . . . . . . . . . . . . . 123
4.7 Compare the Slope for a Count Outcome . . . . . . . . . . . 126
4.8 Test the TAD for a Count Outcome . . . . . . . . . . . . . . 130
4.9 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 134
5 Sample Size Determination for Correlated Outcomes from
Two-Level Randomized Clinical Trials 149
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2 Statistical Models for Continuous Outcomes . . . . . . . . . 150
5.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 151
5.4 Two-Level Longitudinal Designs: Testing Slope Differences . 158
5.5 Cross-Sectional Factorial Designs: Interactions between
Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 172
5.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 176
5.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 181
6 Sample Size Determination for Correlated Outcomes from
Three-Level Randomized Clinical Trials 187
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Statistical Model for Continuous Outcomes . . . . . . . . . . 187
6.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 189
6.4 Testing Slope Differences . . . . . . . . . . . . . . . . . . . . 200
6.5 Cross-Sectional Factorial Designs: Interactions between
Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 218
6.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 223
6.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 230
Index 235
Preface
One of the most common questions statisticians encounter during interaction
with clinical investigators is “How many subjects do I need for this study?”
Clinicians are often surprised to find out that the required sample size depends
on a number of factors. Obtaining such information for sample size calcula-
tion is not trivial, and often involves preliminary studies, literature review,
and, more than occasionally, educated guess. The validity of clinical research
is judged not by the results but by how it is designed and conducted. Ac-
curate sample size calculation ensures that a study has adequate power to
detect clinically meaningful effects and avoids the waste in resources and the
risk of exposing excessive patients to experimental treatments caused by an
overpowered study.
In this book we focus on sample size determination for studies with cor-
related outcomes, which are widely implemented in medical, epidemiological,
and behavioral studies. Correlated outcomes are usually categorized into two
types: clustered and longitudinal. The former arises from trials where random-
ization is performed at the level of some aggregates (e.g., clinics) of research
subjects (e.g., patients). The latter arises when the outcome is measured at
multiple time points during follow-up from each subject. A key difference
between these two types is that for a clustered design, subjects within a clus-
ter are considered exchangeable, while for a longitudinal design, the multiple
measurements from the each subject are distinguished by their unique time
stamps.
Designing a randomized trial with correlated outcomes poses special chal-
lenges and opportunities for researchers. Appropriately accounting for the
correlation with different structures requires more sophisticated methodolo-
gies for analysis and sample size calculation. In practice it is also likely that
researchers might encounter correlated outcomes with a hierarchical structure.
For example, multiple levels of nested clustering (e.g., patients nested in clinics
and clinics nested in hospital systems) can occur, and such designs can be-
come more complicated if longitudinal measurements are obtained from each
subject. Missing data leads to the challenge of “partially” observed data for
clinical trials with correlated outcomes, and its impact on sample size require-
ment depends on many factors: the number of longitudinal measurements, the
structure and strength of correlation, and the distribution of missing data. On
the other hand, researchers enjoy some additional flexibility in designing ran-
domized trials with correlated outcomes. When multiple levels of clustering
are involved, the level at which to perform randomization actually becomes a
ix
x Preface
design parameter, which can greatly impact trial administration, analysis, and
sample size requirement. This issue is explored in Chapters 5 and 6. Another
example is that in longitudinal studies, to certain extent, researchers can com-
pensate the lack of unique subjects by increasing the number of measurements
from each subject, and vice versa. This feature has profound implication for
the design of clinical trials where the cost of recruiting an additional subject
is drastically different from the cost of obtaining an additional measurement
from an existing subject. It requires researchers to explore the trade-off be-
tween the number of subjects and the number of measurements per subject
in order to achieve the optimal power under a given financial constraint. We
explore this topic in Chapters 3 and 4.
The outline of this book is as follows. In Chapter 1 we review sample size
determination for independent outcomes. Advanced readers who are already
familiar with sample size problems can skip this chapter. In Chapter 2 we
explore sample size determination for variants of clustered trials, including
one- and two-sample trials, continuous and binary outcomes, stratified cluster
design, and nonparametric approaches. In Chapter 3 we review sample size
methods based on summary statistics (such as individually estimated means
or slopes) obtained from longitudinal outcomes. In Chapter 4 we present sam-
ple size determination based on GEE approaches for various types of corre-
lated outcomes, including continuous, binary, and count. The impact of miss-
ing data, correlation structures, and financial constraints is investigated. In
Chapter 5 we present sample size determination based on mixed-effects model
approaches for randomized clinical trials with two level data structure. Lon-
gitudinal and cross-sectional factorial designs are explored. In Chapter 6 we
further extend the mixed-effects model sample size approaches to scenarios
where three level data structures are involved in randomized trials.
We wish this book to serve as a useful resource for biostatisticians, clini-
cal investigators, epidemiologists, and social scientists whose research involves
randomized trials with correlated outcomes. While jointly addressing the over-
arching theme of sample size determination for correlated outcome under such
settings, individual chapters are written in a self-contained manner so that
readers can explore specific topics relevant to their research projects without
having to refer to other chapters.
We give special thanks to Dr. Mimi Y. Kim for her enthusiastic support
by providing critical reviews and suggestions, examples, edits, and corrections
throughout the chapters. Without her input, this book would have not been in
the present form. We also thank Acquisitions Editor David Grubbs for provid-
ing the opportunity to work on this book, and Production Manager Suzanne
Lassandro for her outstanding support in publishing this book. In addition,
we thank the support of the University of Texas Southwestern Medical Center
and the Albert Einstein College of Medicine.
Chul Ahn, PhD
Moonseong Heo, PhD
Song Zhang, PhD
List of Figures
1.1 Sample size estimation for a one–sided test in a one–sample
problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1 Numerical study to explore the relationship between s2
t and
ρ, under the scenario of complete data and various values of
θ from the damped exponential family. θ = 1 corresponds to
AR(1) and θ = 0 corresponds to CS. The measurement times
are normalized such that tm − t1 = 1. Hence ρ1m = ρ under
all values of θ. . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Numerical study to explore the relationship between s2
t and
ρ, under the scenario of incomplete data and various values of
θ from the damped exponential family. θ = 1 corresponds to
AR(1) and θ = 0 corresponds to CS. IM and MM represent
the independent and monotone missing pattern, respectively.
The measurement times are normalized such that tm−t1 = 1.
Hence ρ1m = ρ under all values of θ. . . . . . . . . . . . . . 97
4.3 A numerical study to explore n{m+1}
n{m} under missing data and
different correlation structures. The vertical axis is n{m+1}
n{m} .
“Complete” indicates the scenario of complete data. “IM”
and “’MM” indicate the independent and monotone missing
patterns, respectively, with marginal observant probabilities
computed by δj = 1 − 0.3 ∗ (j − 1)/(m − 1). . . . . . . . . . 101
4.4 Different trends in the marginal observant probabilities. δ1
approximately follows a linear trend. δ2 is relatively steady
initially but drops quickly afterward. δ3 drops quickly from
the beginning but plateaus. . . . . . . . . . . . . . . . . . . 109
5.1 Geometrical representations of fixed parameters in model
(5.12) for a parallel-arm longitudinal cluster randomized
trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.2 Geometrical representations of fixed parameters in model
(5.31) for a 2-by-2 factorial longitudinal cluster randomized
trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
xi
Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn
List of Tables
2.1 Proportion of infection (yi/mi) from n = 29 subjects
(clusters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Distribution of the number of infected sites (mi) . . . . . . 33
2.3 Stepped wedge design, where C represents control and I
represents intervention . . . . . . . . . . . . . . . . . . . . . 53
4.1 Sample sizes under various scenarios . . . . . . . . . . . . . 110
5.1 Sample size and power for detecting a main effect δ(2) in
model (5.3) when randomizations occur at the second level
(two-sided significance level α = 0.05) . . . . . . . . . . . . 154
5.2 Sample size and power for detecting a main effect δ(1) in
model (5.8) when randomizations occur at the first level
(two-sided significance level α = 0.05) . . . . . . . . . . . . 157
5.3 Sample size and power for detecting an effect δ(f) on slope
differences in a fixed-slope model (5.12) with rτ = 0 when
randomizations occur at the second level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 162
5.4 Sample size and power for detecting an effect δ(f) on slope
differences in a random-slope model (5.4.5) with rτ = 0.1
when randomizations occur at the second level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 164
5.5 Sample size and power for detecting a main effect δ(e) at the
end of study in a fixed-slope model (5.22) when randomiza-
tions occur at the second level (two-sided significance level
α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 Sample size and power for detecting a two-way interaction
XZ effect δXZ(2) in model (5.25) for a 2-by-2 factorial design
when randomizations occur at the second level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 170
5.7 Sample size and power for detecting a two-way interaction
XZ effect δXZ(1) in model (5.28) for a 2-by-2 factorial de-
sign when randomizations occur at the first level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 173
xiii
xiv List of Tables
5.8 Sample size and statistical power for detecting a three-way
interaction XZT effect δXZT in model (5.31) for a 2-by-2
factorial design when randomizations occur at the second
level (two-sided significance level α = 0.05) . . . . . . . . . 176
5.9 Sample size and statistical power for detecting a main effect
|p1 − p0| on binary outcome in model with m = 2 (5.34)
when randomizations occur at the second level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 179
5.10 Sample size and statistical power for detecting a main effect
|p1 −p0| on binary outcome in model with m = 1 (5.34) when
randomizations occur at the first level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 181
6.1 Sample size and power for detecting a main effect δ(3) in
model (6.4) when randomizations occur at the third level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 192
6.2 Sample size and power for detecting a main effect δ(2) in
model (6.9) when randomizations occur at the second level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 195
6.3 Sample size and power for detecting a main effect δ(1) in
model (6.13) when randomizations occur at the first level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 198
6.4 Sample size and power for detecting an effect δ(f) on slope
differences in a three-level fixed-slope model (6.17) with rτ =
0 when randomizations occur at the third level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 204
6.5 Sample size and power for detecting an effect δ(r) on slope
differences in a three-level random-slope model (6.22) with
rτ = 0.1 when randomizations occur at the third level (two-
sided significance level α = 0.05) . . . . . . . . . . . . . . . 207
6.6 Sample size and power for detecting a main effect δ(e) at the
end of study in a three-level fixed-slope model (6.28) when
randomizations occur at the third level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 211
6.7 Sample size and power for detecting a two-way interaction
XZ effect δXZ(3) in model with m = 3 (6.31) for a 2-by-2
factorial design when randomizations occur at the third level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 214
6.8 Sample size and power for detecting a two-way interaction
XZ effect δXZ(2) in model with m = 2 (6.31) for a 2-by-
2 factorial design when randomizations occur at the second
level with ρ2 = 0.05 (two-sided significance level α = 0.05) . 216
List of Tables xv
6.9 Sample size and power for detecting a two-way interaction
XZ effect δXZ(1) in model with m = 1 (6.31) for a 2-by-2
factorial design when randomizations occur at the first level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 219
6.10 Sample size and power for detecting a three-way interaction
XZT effect δXZT in model (6.38) for a 2-by-2 factorial de-
sign when randomizations occur at the third level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 222
6.11 Sample size and statistical power for detecting a main effect
|p1 −p0| on binary outcome in model with m = 3 (6.41) when
randomizations occur at third level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 226
6.12 Sample size and statistical power for detecting a main effect
|p1 −p0| on binary outcome in model with m = 2 (6.41) when
randomizations occur at second level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 228
6.13 Sample size and statistical power for detecting a main effect
|p1 − p0| on binary outcome in model with m = 1 (6.41)
when randomizations occur at first level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 231
Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn
1
Sample Size Determination for Independent
Outcomes
1.1 Introduction
One of the most common questions any statistician gets asked from clinical
investigators is “How many subjects do I need?” Researchers are often sur-
prised to find out that the required sample size depends on a number of factors
and they have to provide information to a statistician before they can get an
answer. Clinical research is judged to be valid not by the results but by how it
is designed and conducted. The cliche “do it right or do it over” is particularly
apt in clinical research.
One of the most important aspects in clinical research design is the sample
size estimation. In planning a clinical trial, it is necessary to determine the
number of subjects to be recruited for the clinical trial in order to achieve
sufficient power to detect the hypothesized effect. The ICH E9 guidance [1]
states: “The number of subjects in a clinical trial should always be large
enough to provide a reliable answer to the questions addressed. This number
is usually determined by the primary objective of the trial. If the sample size is
determined on some other basis, then this should be made clear and justified.
For example, a trial sized on the basis of safety questions or requirements
or important secondary objectives may need larger or smaller numbers of
subjects than a trial sized on the basis of the primary efficacy question.”
Sample size in clinical trials must be carefully estimated if the results are to
be credible. If the number of subjects is too small, even a well–conducted
trial will have little chance of detecting the hypothesized effect. Ideally, the
sample size should be large enough to have a high probability of detecting
a clinically important difference between treatment groups and to show it to
be statistically significant if such a difference really exists. If the number of
subjects is too large, the clinical trial will lead to statistical significance for an
effect of little clinical importance. Conversely, the clinical trial may not lead
to statistical significance despite a large difference that is clinically important
if the number of subjects is too small.
When an investigator designs a study, an investigator should consider con-
straints such as time, cost, and the number of available subjects. However,
these constraints should not dictate the sample size. There is no reason to
1
2 Sample Size Calculations for Clustered and Longitudinal Outcomes
carry out a study that is too small, only to come up with results that are
inconclusive, since an investigator will then need to carry out another study
to confirm or refute the initial results. Selecting an appropriate sample size is
a crucial step in the design of a study. A study with an insufficient sample size
may not have sufficient statistical power to detect meaningful effects and may
not produce reliable answers to important research questions. Krzywinski and
Altman [2] say that the ability to detect experimental effect is weakened in
studies that do not have sufficient power. Choosing the appropriate sample
size increases the chance of detecting a clinically meaningful effect and ensures
that the study is both ethical and cost-effective.
Sample size is usually estimated by precision analysis or power analysis.
In precision analysis, sample size is determined by the standard error or the
margin of error at a fixed significance level. The approach of precision anal-
ysis is simple and easy to estimate the sample size [3]. In power analysis,
sample size is estimated to achieve a desired power for detecting a clinically
or scientifically meaningful difference at a fixed type I error rate. Power anal-
ysis is the most commonly used method for sample size estimation in clinical
research. The sample size calculation requires assumptions that typically can-
not be tested until the data have been collected from the trial. Sample size
calculations are thus inherently hypothetical.
1.2 Precision Analysis
Sample size estimation is needed for the study in which the goal is to estimate
the unknown parameter with a certain degree of precision. Thus, some key
decisions in planning a study are “How precise will the parameter estimate
be if I select a particular sample size?” and “How large a sample size do I
need to attain a desirable level of precision?” What we are essentially saying
is that we want the confidence interval to be of a certain width, in which the
100(1−α)% confidence level reflects the probability of including the true (but
unknown) value of the parameter. Since the precision is determined by the
width of the confidence interval, the goal of precision analysis is to determine
the sample size that allows the confidence interval to be within a pre-specified
width. The narrower the confidence interval is, the more precise the parameter
inference is. Confidence interval estimation provides a convenient alternative
to significance testing in most situations. The confidence interval approach
is equivalent to the method of hypothesis testing. That is, if the confidence
interval does not include the parameter value under the null hypothesis, the
null hypothesis is rejected at a two–sided significance level of α. For example,
consider the hypothesis of no difference between means (µ1 and µ2). The
method of hypothesis testing rejects the hypothesis H0 : µ1 − µ2 = 0 at
the two–sided significance level of α if and only if the 100(1 − α)% confidence
Sample Size Determination for Independent Outcomes 3
interval for the mean difference (µ1−µ2) does not include the value zero. Thus,
the significance test can be performed with the confidence interval approach.
1.2.1 Continuous Outcomes
Suppose that Y1, . . . , Yn are independent and identically distributed normal
random variables with mean µ and variance σ2
. The parameter µ can be
estimated by the sample mean ȳ =
Pn
i=1 Yi. When σ2
is known, the 100(1 −
α)% confidence interval is
ȳ ± z1−α/2
σ
√
n
,
where z1−α/2 is the 100(1−α/2)th percentile of the standard normal distribu-
tion. Note that the sample size estimate based on precision analysis depends
on the type I error rate, not on the type II error rate. The maximum half
width of the confidence interval is called the maximum error of an estimate
of the unknown parameter. Suppose that the maximum error of µ is δ. Then,
the required minimum sample size is the smallest integer that is greater than
or equal to n solved from the following equation:
z1−α/2
σ
√
n
= δ.
Thus, the required sample size is the smallest integer that is greater than or
equal to n:
n =
z2
1−α/2σ2
δ2
. (1.1)
From Equation (1.1), we can obtain the required sample size once the
maximum error or the width of the 100(1 − α)% confidence interval of µ is
specified.
1.2.1.1 Example
Suppose that a clinical investigator is interested in estimating how much re-
duction will be made on the fasting serum–cholesterol level with administra-
tion of a new cholesterol–lowering drug for 6 months among recent Hispanic
immigrants with a given degree of precision. Suppose that the standard de-
viation (σ) for reduction in cholesterol level equals 40 mg/dl. We would like
to estimate the minimum sample size needed to estimate the reduction in
fasting serum–cholesterol level if we require that the 95% confidence interval
for reduction in cholesterol level is no wider than 20 mg/dl. The 100(1 − α)%
confidence interval for true reduction in fasting serum–cholesterol level is
ȳ ± z1−α/2
σ
√
n
,
where ȳ is the mean change in fasting serum–cholesterol level after adminis-
tration of a drug, and z1−α/2 is the 100(1 − α/2)th percentile of the standard
4 Sample Size Calculations for Clustered and Longitudinal Outcomes
normal distribution. The width of a 95% confidence interval is
2 · z1−α/2
σ
√
n
= 2 · 1.96 ·
40
√
n
.
We want the width of the 95% confidence interval to be no wider than 20
mg/dl. The required sample size is the smallest integer satisfying n ≥ 4 ·
(1.96)2
(40)2
/(20)2
= 61.5. In order for a 95% confidence interval of reduction
in cholesterol level to be no wider than 20 mg/dl, we need at least 62 subjects
when the standard deviation for reduction in cholesterol level equals to 40
mg/dl.
1.2.2 Binary Outcomes
The study goal may be based on finding a suitably narrow confidence interval
for the statistics of interest at a given significance level (α), where the signif-
icance level is usually considered as the maximum probability of type I error
that can be tolerated. We may want to know how many subjects are required
for the 100(1 − α)% confidence interval to be a certain width.
Suppose that Y1, . . . , Yn are independent and identically distributed
Bernoulli random variables with mean p = E(Yi), (i = 1, . . . , n). The param-
eter p can be estimated by the sample mean p̂ =
Pn
i=1 Yi/n. For large n, p̂ is
asymptotically normal with mean p and variance p(1−p)/n. The 100(1−α)%
confidence interval for p is
p̂ ± z1−α/2
r
p̂(1 − p̂)
n
.
Suppose that the maximum error of p is δ. Then, the sample size can be
estimated by
z1−α/2
r
p̂(1 − p̂)
n
= δ.
Thus, the required sample size is
n =
z2
1−α/2p̂(1 − p̂)
δ2
. (1.2)
We can estimate the sample size from Equation (1.2) once the maximum error
or the width of the 100(1 − α)% confidence interval for p is specified. There
are a number of alternative ways to estimate the confidence interval for a
binomial proportion [4].
1.2.2.1 Example
Suppose that a clinical investigator is interested in conducting a clinical trial
with a new cancer drug to estimate the response rate with a maximum er-
ror of 20%. In oncology, the response rate (RR) is generally defined as the
Sample Size Determination for Independent Outcomes 5
proportion of patients whose tumor completely disappears (termed a complete
response, CR) or shrinks more than 50% after treatment (termed a partial re-
sponse, PR). In simpler terms, RR = PR + CR. An investigator expects the
response rate of a new cancer drug to be 30%. How many patients are needed
to achieve a maximum error of 20%? Let p̂ be the estimate of the response
rate. The maximum error of the response rate is z1−α/2
p
p̂(1 − p̂)/n. With
the guessed value of p̂ = 0.3, a maximum error of p is z1−α/2
p
0.3 · 0.7/n.
Thus, we need z1−α/2
p
0.3 · 0.7/n ≤ 0.2, or n ≥ 21. That is, we need at least
21 subjects to obtain a maximum error ≤ 20%. When we do not know the
value of p, a conservative approach is to use p̂ that yields the maximum error.
The maximum error of p occurs when p̂ = 0.5. So, a conservative maximum
error of p is z1−α/2
p
0.5 · 0.5/n = z1−α/20.5/
√
n. Thus, 1.96 · 0.5/
√
n ≤ 0.2
at a 5% significance level. Therefore, the required sample size is n = 25. An
investigator should recruit at least 25 subjects to achieve a maximum error of
20% in the response rate estimation.
The larger the sample size, the more precise the estimate of the parameter
will be if all the other factors are equal. An investigator should specify what
degree of precision is aimed for the study. A trial will take more cost and time
as the size of a trial increases. In order to estimate the sample size using preci-
sion analysis, we need to decide how large the maximum error of the unknown
parameter is or how wide the confidence interval for the unknown parameter
is, and we need to know the formula for the relevant maximum error.
1.3 Power Analysis
Power analysis uses two types of errors (type I and II errors) for sample size
estimation while precision analysis uses only one type of error (type I error)
for sample size estimation. Power analysis tests the null hypothesis at a pre-
determined level of significance with a desired power.
1.3.1 Information Needed for Power Analysis
A clinical trial that is conducted without attention to sample size or power
information takes the risks of either failing to detect clinically meaningful
differences (i.e., type II error) or using an unnecessarily excessive number of
subjects for a study. Either case fails to adhere to the Ethical Guidelines of
the American Statistical Association which says, “Avoid the use of excessive
or inadequate number of research subjects by making informed recommen-
dations for study size” [5]. The sample size estimate is important for eco-
nomic and ethical reasons [6]. An oversized clinical trial exposes more than
necessary number of subjects to a potentially harmful trial, and uses more re-
sources than necessary. An undersized clinical trial exposes the subjects to a
6 Sample Size Calculations for Clustered and Longitudinal Outcomes
potentially harmful trial and leads to a waste of resources without producing
useful results. The sample size estimate will allow the estimation of total cost
of the proposed study. While the exact final number that will be used for anal-
ysis will be unknown due to missing information such as lack of demographic
information and clinical information, it is still desirable to determine a target
sample size based on the proposed study design. In this section, we describe
the general information needed to estimate the sample size for the trial.
1. Choose the primary endpoint
The primary endpoint should be chosen so that the primary objective of the
trial can be assessed, and the primary endpoint is generally used for sample
size estimation. Primary endpoint measures the outcome that will answer the
primary question being asked by a trial. Suppose that the primary hypothesis
is to test whether the new cancer drug yields longer overall survival than the
standard cancer drug. In this case, the primary endpoint is overall survival.
The sample size for a trial is determined by the power needed to detect a
clinically meaningful difference in overall survival at a given significance level.
The secondary hypothesis is to investigate other relevant questions from the
same trial. For example, the secondary hypothesis is to test whether the new
cancer drug produces better quality of life than the standard cancer drug, or
whether the new cancer drug yields longer progression–free survival than the
standard cancer drug.
The sample size calculation depends on the type of primary endpoint. The
variable type of the primary outcome must be defined before sample size and
power calculations can be conducted. The variable type may be continuous,
categorical, ordinal, or survival. Categorical variables may have only two cat-
egories or more than two categories.
• A quantitative (or continuous) outcome representing a specific measure (e.g.,
total cholesterol, quality of life, or blood pressure). Mean and median can
be used to compare the primary endpoint between treatment groups.
• A binary outcome indicating occurrence of an event (e.g., the occurrence of
myocardial infarction, or the occurrence of recurrent disease). Odds ratio,
risk difference, and risk ratio can be used to compare the primary endpoint
between treatment groups.
• Survival outcome for the time to occurrence of an event of interest (e.g., the
time from study entry to death, or time to progression). A Kaplan–Meier
survival curve is often used to graphically display the time to the event, and
log–rank test or Cox regression analysis is frequently used to test if there is
a significant difference in the treatment effect between treatment groups.
2. Determine the hypothesis of interest
The primary purpose of a clinical trial is to address a scientific hypothesis,
which is usually related to the evaluation of the efficacy and safety of a drug
Sample Size Determination for Independent Outcomes 7
product. To address a hypothesis, different statistical methods are used de-
pending on the type of question to be answered. Most often the hypothesis is
related to the effect of one treatment as compared to another. For example,
one trial could compare the effectiveness of a new drug to that of a standard
drug. Yet the specific comparison to be performed will depend on the hypoth-
esis to be addressed. Let µ1 and µ2 be the mean responses of a new drug and
a standard drug, respectively.
• A superiority test is designed to detect a meaningful difference in mean
response between a standard drug and a new drug [7]. The primary objective
is to show that the mean response of a new drug is different from that of a
standard drug.
H0 : µ1 = µ2 versus H1 : µ1 6= µ2
The null hypothesis (H0) says that the two drugs are not different with
respect to the mean response (µ1 = µ2). The alternative hypothesis (H1)
says that the two drugs are different with respect to the mean response
(µ1 6= µ2). The statistical test is a two–sided test since there are two chances
of rejecting the null hypothesis (µ1 > µ2 or µ1 < µ2) with each side allocated
an equal amount of the type I error of α/2.
If the alternative hypothesis is µ1 > µ2 or µ1 < µ2 instead of µ1 6= µ2, then
the statistical test is referred to as a one–sided test since there is only one
chance of rejecting the null hypothesis with one side allocated the type I
error of α.
• An equivalence test is designed to confirm the absence of a meaningful dif-
ference between a standard drug and a new drug. The primary objective is
to show that the mean responses to two drugs differ by an amount that is
clinically unimportant. This is usually demonstrated by showing that the
absolute difference in mean responses between drugs is likely to lie within
an equivalence margin (∆) of clinically acceptable differences.
H0 : |µ1 − µ2| ≥ ∆ versus H1 : |µ1 − µ2| < ∆
The null hypothesis (H0) says that the two drugs are different with respect
to the mean response (|µ1 − µ2| ≥ ∆). The alternative hypothesis (H1)
says that the two drugs are not different with respect to the mean response
(|µ1 − µ2| < ∆). In an equivalence test, an investigator wants to test if
the difference between a new drug and a standard drug is of no clinical
importance. This is to test for equivalence of two drugs.
The null hypothesis is expressed as a union (µ1 − µ2 ≥ ∆ or µ1 − µ2 ≤ −∆)
and the alternative hypothesis (H1) as an intersection (−∆ < µ1 − µ2 <
∆). Each component of the null hypothesis needs to rejected to conclude
equivalence.
8 Sample Size Calculations for Clustered and Longitudinal Outcomes
• A non–inferiority test is designed to show that a new drug is not less effective
than a standard drug by more than ∆, the margin of non–inferiority. The
null and alternative hypotheses can be specified as:
H0 : µ1 − µ2 ≤ −∆ versus H1 : µ1 − µ2 > −∆
The null hypothesis (H0) says that a new drug is inferior to a standard drug
with respect to the mean response. The alternative hypothesis (H1) says
that a new drug is non–inferior to a standard drug with respect to the mean
response. That is, the alternative hypothesis of non–inferiority trial states
that a standard drug may indeed be more effective than a new drug, but
no more than ∆. In phase III clinical trials that compare a new drug with
a standard drug, non–inferiority trials are more common than equivalence
trials since it is only the non–inferiority limit that is usually of interest. This
is to test for non–inferiority of the new drug.
Choice of hypothesis depends on which scientific question an investigator is
trying to answer. All the above hypothesis tests are useful in the development
of drugs. In comparison studies with a standard drug, a non–inferiority trial is
used to demonstrate that a new drug provides at least the same benefit to the
subject as a standard drug. Non–inferiority trials are commonly used when a
new drug is easier to administer, less expensive, and less toxic than a standard
drug. Equivalence trials are used to show that a new drug is identical (within
an acceptable range) to a standard drug. This is used in the registration and
approval of biosimilar drugs that are shown to be equivalent to their branded
reference drugs [8]. Most equivalence trials are bioequivalence trials that aim
to compare a generic drug with the original branded reference drug.
3. Determine ∆
Sample size calculation depends on the hypothesis of interest. For a superiority
test, the necessary sample size depends on the clinically meaningful difference
(∆). In superiority trials, fewer subjects will be needed for a larger value of
∆ while more subjects will be needed for a smaller value of ∆. For instance,
we can detect a 40% difference in efficacy with a modest number of subjects.
However, a larger number of subjects will be needed to reliably detect a 10%
difference in efficacy. Because sample size is inversely related to the square of
∆, even the slightly misspecified difference can lead to a large change in the
sample size. Clinically meaningful differences are commonly specified using
one of two approaches. One is to select the drug effect deemed important to
detect, and the other is to calculate the sample size according to the best
guess concerning the true effect of drug [9].
For an equivalence test, the required sample size depends on the margin of
clinical equivalence. In an equivalence test, the equivalence margin of clinically
acceptable difference (∆) depends on the disease being studied. For example,
Sample Size Determination for Independent Outcomes 9
an absolute difference of 1% is often used as the clinically meaningful differ-
ence in thrombolytic trials while a 20% difference is considered as clinically
meaningful in most other situations including migraine headache [10]. Bioe-
quivalence trials aim to show the equivalent pharmacokinetic profile through
the most commonly used pharmacokinetic variables such as area under the
curve (AUC) and maximum concentration(Cmax). Average bioequivalence is
widely used for comparison of a generic drug with the original branded drug.
The 80/125 rule is currently used as regulation for the assessment of average
bioequivalence [11]. For average bioequivalence, the FDA [11] recommends
that the geometric means ratio between the test drug and the reference drug
is within 80% and 125% for the bioavailability measures (AUC and Cmax).
For a non–inferiority test, the necessary sample size depends on the up-
per bound for non–inferiority. Setting the non–inferiority margin is a major
issue in designing a non–inferiority trial. The Food and Drug Administration
[12] and the European Medicines Agency [13] issued guidances on the choice
of non–inferiority margin. The choice of the non–inferiority margin needs to
take account of both statistical reasoning and clinical judgement. An appro-
priate selection of non–inferiority margin should provide assurance that a new
drug has a clinically relevant superiority over placebo, and a new drug is not
substantially inferior to a standard drug, which results in a tighter margin.
The clinically or scientifically meaningful margin (∆) needs to be specified
to estimate the number of subjects for the trial since the purpose of the sample
size estimation is to provide sufficient power to reject the null hypothesis when
the alternative hypothesis is true.
In this book, we restrict the sample size estimation to a superiority test,
which is most commonly used in clinical trials. Julious [7, 14, 15] and Chow
et al. [3] provided general sample size formulas for equivalence trials and non–
inferiority trials.
4. Determine the variance of the primary endpoint
The variance of the primary endpoint is usually unknown in advance. In cross-
sectional studies, the variance or the standard deviation is generally obtained
from either previous studies or pilot studies. However, for correlated outcomes
such as clustered outcomes or repeated measurement outcomes, the variance of
the primary endpoint generally needs to be estimated utilizing various sources
of information such as missing proportion, correlation among measurements,
and the number of measurements, etc. Detailed description of the estimation
of the variance for correlated outcomes will be given in later chapters. A large
variance will lead to a large sample size for a study. That is, as the variance
increases, the sample size increases.
5. Choose type I error and power
Type I error (α) is the probability of rejecting the null hypothesis when the
null hypothesis is actually true. Type II error (β) is the probability of not
rejecting the null hypothesis when it is actually false. The aim of the sample
10 Sample Size Calculations for Clustered and Longitudinal Outcomes
size calculation is to estimate the minimal sample size required to meet the
objectives of the study for a fixed probability of type I error to achieve a desired
power, which is defined as 1 − β. The power is the probability of rejecting the
null hypothesis when it is actually false. A two–sided type I error of 5% is
commonly used to reflect a 95% confidence interval for an unknown parameter,
and this is familiar to most investigators as the conventional benchmark of
5%. As α decreases, the sample size increases. For example, a study with α
level of 0.01 requires more sample size than a study with α level of 0.05.
Typically, the sample size is computed to provide a fixed level of power
under a specified alternative hypothesis. The alternative hypothesis usually
represents a minimal clinically or scientifically meaningful difference in efficacy
between treatment groups. Power (1 − β) is an important consideration in
sample size determination. Low power can cause a true difference in a clinical
outcome between study groups to go undetected. However, too much power
may make results statistically significant when results do not show a clinically
meaningful difference.
When there is a large difference such as a 100% real difference in thera-
peutic efficacy between a standard drug and a new drug, it is unlikely to be
missed by most studies. That is, type II error (β) is small when there is a large
difference in therapeutic efficacy. However, type II error is a common problem
in studies that aim to distinguish between a standard drug and a new drug
that may differ in therapeutic efficacy by only a small amount such as 1% or
5%. The number of subjects must be drastically increased to reduce type II
error when the aim is to discriminate a small difference between a standard
drug and a new drug. Otherwise, there is a high chance of incorrectly over-
looking small differences in therapeutic efficacy with an insufficient number of
subjects. Type II error (β) of 10% or 20% is commonly used for sample size
estimation. That is, the power (1 − β) of 80% or 90% is widely used for the
design of the study. The higher the power, the less likely the risk of type II
error. The power increases as the sample size increases. A sufficient sample
size ensures that the study is able to reliably detect a true difference, and not
underpowered.
6. Select a statistical method for data analysis
A statistical method for sample size estimation should adequately align with
the statistical method for data analysis [16]. For example, an investigator
would like to test whether there is a significant difference in total cholesterol
levels between those who take a new drug and who take a standard drug.
The investigator plans to analyze the data using a two–sample t–test. In this
case, a sample size calculation based on a two-group chi–square test with
dichotomization of total cholesterol levels would be inappropriate since the
statistical method used for power analysis is different from that to be used
for data analysis. Discrepancy between the statistical method for sample size
estimation and the statistical method for data analysis can lead to a sample
Sample Size Determination for Independent Outcomes 11
size that is too large or too small. The statistical method used for sample size
calculation should be the same as that used for data analysis.
1.3.2 One–Sample Test for Means
We illustrate the sample size calculation using a one–sided test through an
example. Suppose that the total cholesterol levels for male college students are
normally distributed with a mean (µ) of 180 mg/dl and a standard deviation
(σ) of 80 mg/dl. Suppose that an investigator would like to examine whether
the mean total cholesterol level of the physically inactive male college students
is higher than 180 mg/dl using a one–sided 5% significance level (α). That is,
an investigator would like to test the hypotheses: H0 : µ = µ0 = 180 mg/dl
(or µ ≤ 180 mg/dl) versus H1 : µ > 180 mg/dl assuming that the standard
deviation of the total cholesterol level is the same as that of male college
students. An investigator wants to risk a 10% chance (90% power) of failing
to reject the null hypothesis when the true mean (µ1) of the total cholesterol
level is as large as 210 mg/dl. How many subjects are needed to detect 30
mg/dl difference in total cholesterol level from the population mean of 180
mg/dl at a one–sided 5% significance level and a power of 90%?
For α = 0.05, we would reject the null hypothesis (H0) if the average total
cholesterol level is greater than the critical value (C) in Figure 1.1, where
C = µ0 + z1−α · σ/
√
n = 180 + 1.645 · 80/
√
n. If the true mean is 210 mg/dl
with a power of 90% (β = 0.1), we would not reject the null hypothesis when
the sample average is less than C = µ1 + zβ · σ/
√
n = 210 − 1.282 · 80/
√
n.
The sample size (n) can be estimated by setting two equations equal to each
other:
180 + 1.645 · 80/
√
n = 210 − 1.282 · 80/
√
n.
Therefore, the required number of subjects is
n =
(1.645 + 1.282)2
· 802
(180 − 210)2
= 61.
In general, the estimated sample size for a one–sided test for testing H0 :
µ = µ0 versus H1 : µ > µ1 with a significance level of α and a power of 1 − β
is the smallest integer that is larger than or equal to n satisfying the following
equation
n =
(z1−α + z1−β)2
σ2
(µ0 − µ1)2
. (1.3)
We will show how the sample size can be estimated for a two–sided one–
sample test. Let n be the number of subjects. Let Yi denote the response for
subject i, (i = 1, . . . , n), and ȳ be the sample mean. We assume that Y 0
i s are
independent and normally distributed random variables with mean µ0 and
variance σ2
. Suppose that we want to test the hypotheses H0 : µ = µ0 versus
H1 : µ = µ1 6= µ0.
12 Sample Size Calculations for Clustered and Longitudinal Outcomes
FIGURE 1.1
Sample size estimation for a one–sided test in a one–sample problem
When σ2
is known, we reject the null hypothesis at the significance level
α if
ȳ − µ0
σ/
√
n
> z1−α/2,
where z1−α/2 is the 100(1 − α/2)th percentile of the standard normal distri-
bution. Under the alternative hypothesis (H1 : µ = µ1), the power is given
by
Φ
√
n(µ1 − µ0)
σ
− z1−α/2

+ Φ

−
√
n(µ1 − µ0)
σ
− z1−α/2

,
where Φ is the cumulative standard normal distribution function. By ignor-
ing the small value of the second term in the above equation, the power is
approximated by the first term. Thus, the sample size required to achieve the
power of 1 − β can be obtained by solving the following equation
√
n(µ1 − µ0)
σ
− z1−α/2 = z1−β.
The required sample size is the smallest integer that is larger than or equal
to n satisfying the following equation
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
. (1.4)
Sample Size Determination for Independent Outcomes 13
If the population variance σ2
is unknown, σ2
can be estimated by the
sample variance s2
=
Pn
i=1(yi − ȳ)2
/(n − 1), which is an unbiased estimator
of σ2
. For large n, we reject the null hypothesis H0 : µ = µ0 at the significance
level α if
ȳ − µ0
s/
√
n
 z1−α/2.
Therefore, the sample size estimates for a one–sided test and a two–sided test
can be obtained by replacing σ2
by s2
in Equations (1.3) and (1.4).
1.3.2.1 Example
Consider the design of a single-arm psychiatric study that evaluates the effect
of a test drug on cognitive functioning of children with mental retardation
before and after administration of a test drug. A pilot study shows that the
mean difference in cognitive functioning before and after taking a test drug
was 6 with a standard deviation equal to 9. We would like to estimate the
sample size needed to detect the mean difference of 6 in cognitive functioning
to achieve 80% power at a two–sided 5% significance level assuming a stan-
dard deviation of 9. Let µ denote the mean difference in cognitive functioning
between pre- and post-drug administration. The null hypothesis H0 : µ = 0
is to be tested against the alternative hypothesis H1 : µ = 6. From Equa-
tion (1.4), n = (1.960 + 0.842)2
· 92
/62
= 17.7. Therefore, a sample size of
18 subjects is needed to detect a change in mean difference of 6 in cognitive
functioning, assuming a standard deviation of 9 using a normal approximation
with a two–sided significance level of 5% and a power of 80%.
1.3.2.2 Example
Concerning the effect of a test drug on systolic blood pressure before and
after the treatment, a pilot study shows that the mean systolic blood pressure
changes after a 4–month administration of a test drug was 15 mm Hg with a
standard deviation of 40 mm Hg. We would like to estimate the sample size
needed to detect 15 mm Hg in systolic blood pressure to achieve 80% power at
a two–sided 5% significance level assuming the standard deviation of 40 mm
Hg. From Equation (1.4), n = (1.960 + 0.842)2
· 402
/152
= 55.8. Therefore, a
sample size of 56 subjects will have 80% power to detect a change in mean
of 15 mm Hg in systolic blood pressure, assuming a standard deviation of 40
mm Hg at a two–sided 5% significance level.
1.3.3 One–Sample Test for Proportions
Let Yi denote a binary response variable of the ith subject with p = E(Yi),
(i = 1, . . . , n), where n is the number of subjects in the trial. For example, Yi
can denote the response or non–response in cancer clinical trials, where Yi = 0
denotes non–response, and Yi = 1 denotes response, which includes either
complete response or partial response. The response rate can be estimated by
14 Sample Size Calculations for Clustered and Longitudinal Outcomes
the observed proportion p̂ =
Pn
i=1 Yi/n, where n is the number of subjects.
We illustrate the sample size calculation using the one–sided test. Suppose we
wish to test the null hypothesis H0 : p = p0 versus the alternative hypothesis
H1 : p = p1  p0 at the one–sided significance level of α. Under the null
hypothesis, the test statistic
Z =
p̂ − p0
p
p̂(1 − p̂)/n
approximately has a standard normal distribution for large n. We reject the
null hypothesis at a significance level α if the test statistic Z is greater than
z1−α.
For α = 0.05, we would reject the null hypothesis (H0) if the aver-
age response rate is greater than the critical value (C), where C = p0 +
z1−α
p
p0(1 − p0)/n. If the alternative hypothesis is true, that is, if the true
response rate is p1, we would not reject the null hypothesis if the response
rate is less than C = p1 + zβ
p
p1(1 − p1)/n.
By setting the two equations equal, we get
p0 + z1−α
p
p0(1 − p0)/n = p1 + zβ
p
p1(1 − p1)/n.
The required sample size to test H0 : p = p0 versus H1 : p = p1  p0 at a
one–sided significance level of α and a power of 1 − β is
n =
(z1−α
p
p0(1 − p0) + z1−β
p
p1(1 − p1))2
(p1 − p0)2
.
The sample size for a two–sided test H0 : p = p0 versus H1 : p = p1 for p1 6= p0
can be obtained by replacing z1−α by z1−α/2 as shown in a one–sample test
for means:
n =
(z1−α/2
p
p0(1 − p0) + z1−β
p
p1(1 − p1))2
(p1 − p0)2
. (1.5)
1.3.3.1 Example
Consider the design of a single-arm oncology clinical trial that evaluates if a
new molecular therapy has at least a 40% response rate. Let p be the response
rate of a new molecular therapy. We would like to estimate the sample size
needed to test the null hypothesis H0 : p = p0 = 0.20 against the alternative
hypothesis H1 : p = p1 6= p0. The trial is designed based on a two–sided test
that achieves 80% power at p = p1 = 0.40 with a two–sided 5% significance
level. From Equation (1.5),
n =
(1.96
p
0.2(1 − 0.2) + 0.842
p
0.4(1 − 0.4))2
(0.4 − 0.2)2
= 35.8.
The required number of subjects is 36 to detect the difference between the
null hypothesis proportion of 0.2 and the alternative proportion of 0.4 at a
two–sided significance level of 5% and a power of 80%.
Sample Size Determination for Independent Outcomes 15
1.3.4 Two–Sample Test for Means
Suppose that Y1i, (i = 1, ..., n1) and Y2i, (i = 1, ..., n2) represent observations
from groups 1 and 2, and Y1i and Y2i are independent and normally distributed
with means µ1 and µ2 and variances σ2
1 and σ2
2, respectively. Let’s consider
a one–sided test. Suppose that we want to test the hypotheses H0 : µ1 = µ2
versus H1 : µ1  µ2.
Let ȳ1 and ȳ2 be the sample means of Y1i and Y2i. Assume that the vari-
ances σ2
1 and σ2
2 are known, and n1 = n2 = n. Then, the Z–test statistic can
be written as
Z =
ȳ1 − ȳ2
p
σ2
1/n + σ2
2/n
.
Under the null hypothesis (H0), the test statistic Z is normally distributed
with mean 0 and variance 1. Thus, we reject the null hypothesis if Z  z1−α.
Under the alternative hypothesis (H1), let µ1 −µ2 = ∆, which is the clinically
meaningful difference to be detected. Then, under the alternative hypothesis
(H1), the expected value of (ȳ1−ȳ2) is ∆, and Z follows the normal distribution
with mean µ∗
and variance 1, where µ∗
= ∆/
p
σ2
1/n + σ2
2/n.
Under the null hypothesis (H0),
P{Z  z1−α|H0}  α.
Similarly, under the alternative hypothesis (H1),
P{Z  z1−α|H1}  1 − β.
That is,
P{
ȳ1 − ȳ2
p
σ2
1/n + σ2
2/n
 z1−α|H1}  1 − β.
Under the alternative hypothesis, the expected value of (ȳ1 − ȳ2) is ∆. Thus,
P{
(ȳ1 − ȳ2) − ∆
p
σ2
1/n + σ2
2/n
 z1−α −
∆
p
σ2
1/n + σ2
2/n
|H1}  1 − β.
The above equation can be written as follows due to the symmetry of the
normal distribution:
z1−α −
∆
p
σ2
1/n + σ2
2/n
= zβ = −z1−β.
The simple manipulation yields the required sample size per group assuming
equal allocation of subjects in each group,
n =
(σ2
1 + σ2
2)(z1−α + z1−β)2
∆2
.
16 Sample Size Calculations for Clustered and Longitudinal Outcomes
If σ2
1 = σ2
2 = σ2
, then the required sample size per group is
n =
2σ2
(z1−α + z1−β)2
∆2
. (1.6)
In some randomized clinical trials, more subjects are assigned to the treat-
ment group than to the control group to encourage participation of subjects
in a trial due to their higher chance of being randomized to the treatment
group than the control group. Let n1 = n be the number of subjects in the
control group and n2 = kn be the number of subjects in the treatment group.
Then, the sample size for the study will be
n1 = n = (1 + 1/k)σ2 (z1−α + z1−β)2
∆2
. (1.7)
The total sample size for the trial is n1 +n2. The relative sample size required
to maintain the power and type I error rate of a trial against the trial with
an equal number of subjects in each group is (2 + k + 1/k)/4. For example, in
a trial that randomizes subjects in a 2:1 ratio requires a 12.5% larger sample
size in order to maintain the same power as a trial with a 1:1 randomization.
The sample size needed to detect the difference in means between two
groups with a two–sided test can be obtained by replacing z1−α by z1−α/2 as
shown in a one–sample test for means:
n1 = n = (1 + 1/k)σ2 (z1−α/2 + z1−β)2
∆2
. (1.8)
If the population variance σ2
is unknown, σ2
can be estimated by the
sample pooled variance s2
= {
Pn1
i=1(y1i −ȳ1)2
+
Pn2
i=1(y2i −ȳ2)2
}/(n1 +n2 −2),
which is an unbiased estimator of σ2
. For large n1 and n2, we reject the null
hypothesis H0 : µ1 = µ2 against the alternative hypothesis H1 : µ1 6= µ2 at
the significance level α if the absolute value of the test statistic Z is greater
than z1−α/2.
Z =
ȳ1 − ȳ2
s
q
1
n1
+ 1
n2
.
If n1 = n and n2 = kn, the Z test statistic becomes
Z =
ȳ1 − ȳ2
s
q
k+1
kn
.
Therefore, the sample size estimates for a one–sided test and a two–sided
test can be obtained by replacing σ2
by s2
in Equations (1.7) and (1.8).
1.3.4.1 Example
In a prior randomized clinical trial [17] investigating the effect of propranolol
versus no propranolol in geriatric patients with New York Heart Association
Sample Size Determination for Independent Outcomes 17
functional class II or III congestive heart failure (CHF), the changes in mean
left ventricular ejection fraction (LVEF) from baseline to 1 year after treat-
ment were 6% and 2% for propranolol and no propranolol groups, respectively.
We will conduct a two–arm randomized clinical trial with a placebo and a new
beta blocker drug to investigate if patients taking propranolol significantly im-
prove LVEF after 1 year compared with patients taking placebo. We assume
the similar increase in LVEF as in the prior study and a common standard
deviation of 8% in changes in LVEF from baseline to 1 year after treatment.
How many subjects are needed to test the superiority of a new drug in im-
proving LVEF over placebo with a two–sided 5% significance level and 80%
power? The required sample size is
n =
2σ2
(z1−α/2 + z1−β)2
∆2
= 2 · 82
· (1.960 + 0.842)2
/42
= 62.8.
The required sample size is 63 subjects per group.
1.3.5 Two–Sample Test for Proportions
In a randomized clinical trial subjects are randomly assigned to one of two
treatment groups. Let Yij be the binary random variable (Yij = 1 for response,
0 for no response) of the jth subject in the ith treatment, j = 1, . . . , ni, and
i = 1, 2. We assume that Y 0
ijs are independent and identically distributed
with E(Yij) = pi for a fixed i. The response rate pi is usually estimated by
the observed proportion in the ith treatment group:
p̂i =
ni
X
j=1
Yij/ni.
Let p1 and p2 be the response rates of control and treatment arms, respec-
tively. The sample sizes are n1 and n2 in each treatment group, respectively.
Suppose that an investigator wants to test whether there is a difference in
the response rates between control and treatment arms. The null (H0) and
alternative (H1) hypotheses are:
H0: The response rates are equal (p1 = p2).
H1: The response rates are different (p1 6= p2).
We reject the null hypothesis H0 : p1 = p2 at the significance level of α if
p̂1 − p̂2
p
p̂1(1 − p̂1)/n1 + p̂2(1 − p̂2)/n2
 z1−α/2.
Under the alternative hypothesis, the power of the test is approximated
by
Φ
|p1 − p2|
p
p1(1 − p1)/n1 + p2(1 − p2)/n2
− z1−α/2
!
.
18 Sample Size Calculations for Clustered and Longitudinal Outcomes
The sample size estimate needed to achieve a power of 1 − β can be obtained
by solving the following equation:
|p1 − p2|
p
p1(1 − p1)/n1 + p2(1 − p2)/n2
− z1−α/2 = z1−β.
When n2 = k · n1, n1 can be written as
n1 =
(z1−α/2 + z1−β)2
(p1 − p2)2
[p1(1 − p1) + p2(1 − p2)/k] .
Under equal allocation, n1 = n2 = n, the required sample size per group is
n1 = n2 = n =
(z1−α/2 + z1−β)2
(p1 − p2)2
[p1(1 − p1) + p2(1 − p2)] .
1.4 Further Readings
Sample size calculation is an important issue in the experimental design of
biomedical research. The sample size formulas presented in this chapter are
based on asymptotic approximation and superiority trials. Closed–form sam-
ple size estimates for independent outcomes can be obtained using normal
approximation for equivalence trials, cross–over trials, non–inferiority trials,
and bioequivalence trials [14]. In some clinical trials such as phase II cancer
clinical trials [18], sample sizes are usually small. Therefore, the sample size
calculation based on asymptotic approximation would not be appropriate for
clinical trials with a small number of subjects. The small sample sizes for
typical phase II clinical trials imply the need for the use of exact statistical
methods in sample size estimation [19]. Chow et al. [3] provided procedures
for sample size estimation for proportions based on exact tests for small sam-
ples. Even though the closed–form formulas cannot be obtained for sample
size estimates based on exact tests, the sample size estimates can be obtained
numerically.
The tests for proportions using normal approximation to the binomial
outcome are equivalent to the usual chi–square tests since Z2
= χ2
. The
p–values for the two tests are equal. For example, the critical value of the
chi–square with 1 degree of freedom is χ2
0.05 = 3.841 at the α = 0.05 level,
which is equal to the square of two–sided Zα/2 = Z0.025 = 1.96. If one wishes
to use a two–sided chi–square test, one should use a two–sided sample size
or power determination by using Zα/2 instead of Zα [20]. Others [21, 22, 23]
have used arcsine transformation of proportions, A(p) = 2 arcsin (
√
p), to
stabilize variance in the sense that the variance formula of A(p) is free of the
proportion p. Given a proportion p̂ with E(p̂) = p, A(p̂) is asymptotically
normal with mean A(p) and variance 1/n, where n is the sample size. Since
Sample Size Determination for Independent Outcomes 19
the variance of A(p) does not depend on the expectation, the sample size and
power calculation becomes simplified.
Pre– and post–intervention studies have been widely used in medical and
social behavioral studies [24, 25, 26, 27, 28]. In pre–post studies, each sub-
ject contributes a pair of dependent observations: one observation at pre–
intervention and the other observation at post–intervention. Paired t–test has
been used to detect the intervention effect on a continuous outcome while
McNemar’s test [29] has been the most widely used approach to detect the
intervention effect on a binary outcome in pre–post studies. Paired t–test can
be conducted by applying the one–sample t–test on the difference between
pre–test and post–test observations. Sample size needed to detect a difference
between a pair of continuous outcomes from pre–post tests can be estimated
by using the sample size formula for a one–sample test for means in Equation
(1.4). However, unlike paired continuous outcomes from pre–post tests, sam-
ple size formulas for independent outcomes presented in this chapter cannot
be used to estimate the sample size needed to detect a difference between a
pair of binary observations from pre–post studies. Sample size determination
for studies involving a pair of binary observations from pre–post studies will
be discussed in Chapter 4.
Clustered data often arise in medical and behavioral studies such as den-
tal, ophthalmologic, radiologic, and community intervention studies in which
data are obtained from multiple units of each cluster. In radiologic studies, as
many as 60 lesions may be observed through positron emission tomography
(PET) in one patient since PET offers the possibility of imaging the whole
body [30]. Sample size estimation for clustered outcomes should be done in-
corporating the dependence of within–cluster observations. Here, the unit of
data collection is a cluster (subject), and the unit of data analysis is a lesion
within a cluster. Two major problems arise in a sample size calculation for
clustered data. One is that the number of units in each cluster, called cluster
size, tends to vary cluster by cluster with a certain distribution. The other
is that observations within each cluster are correlated. The sample size esti-
mate needs to incorporate the variable cluster size and the correlation among
observations within a cluster.
Controlled clinical trials often employ a parallel–groups repeated measures
design in which subjects are randomly assigned between treatment groups,
evaluated at baseline, and then evaluated at intervals across a treatment pe-
riod of fixed total duration. The repeated measurements are usually equally
spaced, although not necessarily so. The hypothesis of primary interest in
short–term efficacy trials concerns the difference in the rates of changes or
the time–averaged responses between treatment groups [31]. Major problems
in the sample size estimation of repeated measurement data are missing data
and the correlation among repeated observations within a subject. As in the
sample size estimate of clustered outcomes, sample size should be estimated
incorporating the correlation among repeated measurements within each
20 Sample Size Calculations for Clustered and Longitudinal Outcomes
subject and the missing data mechanisms for studies with repeated measure-
ments. Here, a sample size means the number of subjects.
In the subsequent chapters, sample size estimates will be provided using
large sample approximation for correlated outcomes such as clustered out-
comes and repeated measurement outcomes. There are many complexities in
estimating sample size. For example, different sample size formulas are appro-
priate for different types of study designs, with computations more complex
for studies that recruit study subjects at multiple centers. Sample size de-
terminations also have to take into account that some subjects will be lost
to follow-up or otherwise drop out of a study. Certain manipulations, such
as increased precision of measurements or repeating measurements at various
time points, can be used to maximize power for a given sample size.
Bibliography
[1] ICH. Statistical Principles for Clinical Trials. Tripartite International
Conference on Harmonized Guidelines, E9, 1998.
[2] M. Krzywinski and N. Altman. Points of significance: Power and sample
size. Nature Methods, 10:1139–1140, 2013.
[3] S. C. Chow, J. Shao, and H. Wang. Sample Size Calculations in Clinical
Research. Chapman  Hall/CRC, 2008.
[4] R. G. Newcombe. Two sided confidence intervals for the single propor-
tion: Comparison of seven methods. Statistics in Medicine, 17:857–872,
1998.
[5] ASA. Ethical guidelines for statistical practice: Executive summary. Am-
stat News, April:12–15, 1999.
[6] R. V. Lenth. Some practical guidelines for effective sample size determi-
nation. American Statistician, 55(3):187–193, 2001.
[7] S. A. Julious. Tutorial in biostatistics: Sample size for clinical trials.
Statistics in Medicine, 23:1921–1986, 2004.
[8] S. C. Chow. Biosimilars: Design and Analysis of Follow-on Biologics.
Chapman  Hall/CRC, 2013.
[9] J. Wittes. Sample size calculations for randomized clinical trials. Epi-
demiologic Reviews, 24(1):39–53, 1984.
[10] J. S. Lee. Understanding equivalence trials (and why we should care).
Canadian Association of Emergency Physicians, 2(3):194–196, 2000.
Sample Size Determination for Independent Outcomes 21
[11] FDA. Guidance for Industry Bioavailability and Bioequivalence Studies
for Orally Administered Drug Products General Considerations. Center
for Drug Evaluation and Research, the U.S. Food and Drug Administra-
tion, Rockville, MD., 2003.
[12] FDA. Guideline for Industry on Non-Inferiority Clinical Trials. Center
for Drug Evaluation and Research and Center for Biologics Evaluation
and Research, Food and Drug Administration, Rockville, MD, 2010.
[13] EMEA. Guidelines on the Choice of the Non-Inferiority Margin. Euro-
pean Medicines Agency CHMP/EWP/2158/99, London, UK, 2005.
[14] S. A. Julious. Sample Sizes for Clinical Trials. Chapman  Hall/CRC,
2009.
[15] S. A. Julious and M. J. Campbell. Tutorial in biostatistics: Sample size
for parallel group clinical trials with binary data. Statistics in Medicine,
31:2904–2936, 2010.
[16] K. E. Muller, L. M. Lavange, S. L. Ramey, and C. T. Ramey. Power calcu-
lations for general linear multivariate models including repeated measures
applications. Journal of American Statistical Association, 87(420):1209–
1226, 1992.
[17] W. S. Aronow and C. Ahn. Postprandial hypotension in 499 elderly
persons in a long-term health care facility. Journal of the American
Geriatrics Society, 42(9):930–932, 1994.
[18] S. Piantadosi. Clinical Trials: A Methodologic Perspective, (2nd ed.).
John Wiley  Sons, Inc, 2005.
[19] R. P. Hern. Sample size tables for exact single–stage phase II designs.
Statistics in Medicine, 20:859–866, 2001.
[20] J. M. Lachin. Introduction to sample size determination and power anal-
ysis for clinical trials. Controlled Clinical Trials, 2:93–113, 1981.
[21] R. D. Sokal and F. J. Rohlf. Biometry: The Principles and Practice of
Statistics in Biometric Research. San Francisco: Freeman, 1969.
[22] S. H. Jung and C. Ahn. Estimation of response probability in correlated
binary data: A new approach. Drug Information Journal, 34:599–604,
2000.
[23] S. H. Jung, S. H. Kang, and C. Ahn. Sample size calculations for clustered
binary data. Statistics in Medicine, 20:1971–1982, 2001.
[24] M. C. Rossi, C. Perozzi, C. Consorti, T. Almonti, P. Foglini, N. Giostra,
P. Nanni, S. Talevi, D. Bartolomei, and G. Vespasiani. An interactive
diary for diet management (DAI): A new telemedicine system able to
22 Sample Size Calculations for Clustered and Longitudinal Outcomes
promote body weight reduction, nutritional education, and consumption
of fresh local produce. Diabetes Technology and Therapeutics, 12(8):641–
647, 2010.
[25] A. Wajnberg, K. H. Wang, M. Aniff, and H. V. Kunins. Hospitalizations
and skilled nursing facility admissions before and after the implementa-
tion of a home-based primary care program. Journal of the American
Geriatric Society, 58(6):1144–1147, 2010.
[26] E. J. Knudtson, L. B. Lorenz, V. J. Skaggs, J. D. Peck, J. R. Good-
man, and A. A. Elimian. The effect of digital cervical examination on
group b streptococcal culture. Journal of the American Geriatric Society,
202(1):58.e1–4, 2010.
[27] T. Zieschang, I. Dutzi, E. Müller, U. Hestermann, K. Grunendahl, A. K.
Braun, D. Huger, D. Kopf, N. Specht-Leible, and P. Oster. Improving
care for patients with dementia hospitalized for acute somatic illness in a
specialized care unit: a feasibility study. International Psychogeriatrics,
22(1):139–146, 2010.
[28] A. M. Spleen, B. C. Kluhsman, A. D. Clark, M. B. Dignan, E. J.
Lengerich, and The ACTION Health Cancer Task Force. An increase in
HPV–related knowledge and vaccination intent among parental and non–
parental caregivers of adolescent girls, age 9–17 years, in Appalachian
Pennsylvania. Journal of Cancer Education, 27(2):312–319, 2012.
[29] Q. McNemar. Note on the sampling error of the difference between cor-
related proportions or percentages. Psychometrika, 12(2):153–157, 1947.
[30] M. Gonen, K. S. Panageas, and S. M. Larson. Statistical issues in analysis
of diagnostic imaging experiments with multiple observations per patient.
Radiology, 221:763–767, 2001.
[31] P. J. Diggle, P. Heagerty, K. Y. Liang, and S. L. Zeger. Analysis of
longitudinal data (2nd ed.). Oxford University Press, 2002.
2
Sample Size Determination for Clustered
Outcomes
2.1 Introduction
Clustered data frequently arise in many fields of applications. We frequently
make observations from multiple sites of each subject (called a cluster). For
example, observations from the same subject are correlated although those
from different subjects are independent. In periodontal studies that observe
each tooth, each patient usually contributes data from more than one tooth
to the studies. In this case, a patient corresponds to a cluster, and a tooth
corresponds to a site.
The degree of similarity or correlation is typically measured by intraclus-
ter correlation coefficient (ρ). If one simply ignores the clustering effect and
analyzes clustered data using standard statistical methods developed for the
analysis of independent observations, one may underestimate the true p-value
and inflate the type I error rate of such tests since the correlation among
observations within a cluster tends to be positive [1, 2]. Therefore, clustered
data should be analyzed using statistical methods that take into account of
the dependence of within–cluster observations. If one fails to take into ac-
count the clustered nature of the study design during the planning stage of
the study, one will obtain smaller sample size estimate and statistical power
than planned. However, one will obtain larger sample size estimate and statis-
tical power than planned in some studies such as split–mouth trials [3, 4, 5] in
which each of two treatments is randomly assigned to two segments of a sub-
ject‘s mouth. In split–mouth trials, both intervention and control treatments
are applied in each subject.
Intracluster correlation coefficient (ρ) is defined by ρ = σ2
B/(σ2
B + σ2
W ),
where σ2
B is the between–cluster variance, and σ2
W is the within–cluster vari-
ance. As the within–cluster variance (σ2
W ) approaches to 0, ρ approaches to 1.
Let n be the number of clusters and m be the number of observations in each
cluster. When ρ = 1, all responses within a cluster are identical. The effective
sample size (ESS) is reduced to the number of clusters (n) when ρ = 1 since
all responses within a cluster are identical. A very small value of ρ implies that
the within–cluster variance (σ2
W ) is much larger than the between–cluster vari-
ance (σ2
B). When ρ = 0, there is no correlation among observations within a
23
24 Sample Size Calculations for Clustered and Longitudinal Outcomes
cluster. The effective sample size is the total number of observations across all
clusters (nm) when ρ = 0. To get the effective sample size, the total number
of observations (the number of observations per cluster (m) times the number
of clusters (n)) is divided by a correction factor [1 + (m − 1)ρ] that includes ρ
and the number of observations per cluster (m). That is, the effective sample
size is nm/[1 + (m − 1)ρ]. The correction factor, [1 + (m − 1)ρ], is called the
design effect or the variance inflation factor [6].
In the TOSS (trial of cilostazol in symptomatic intracranial arterial steno-
sis) clinical trial [7], investigators examined the effect of cilostazol on the
progression of intracranial arterial stenosis, which narrows an artery inside
the brain that can lead to stroke. Cilostazol is a medication for the treat-
ment of intermittent claudication, a condition caused by narrowing of the
arteries that supply blood to the legs. One hundred thirty–six subjects were
randomly allocated to receive either cilostazol or placebo with an equal prob-
ability. Three arteries (two middle cerebral arteries and one basilar artery)
were evaluated for the progression of intracranial stenosis in both cilostazol
and placebo groups.
The number of arteries evaluated in each treatment group is 204 (=3 ar-
teries/subject x 68 subjects). If observations in three arteries are independent
(ρ = 0), then the effective number of observations is 204. If the observations in
three arteries are completely dependent (ρ = 1), then the effective number of
observations is 68. If ρ takes the value between 0 and 1, the effective number
of observations is 204/[1 + (m − 1)ρ], where m = 3. The effective number of
observations in each treatment group is nm/[1 + (m − 1)ρ] when 0 ≤ ρ ≤ 1.
As a special case, the effective number of observations is nm when ρ = 0, and
n when ρ = 1.
2.2 One–Sample Clustered Continuous Outcomes
Clustered continuous outcomes occur frequently in biomedical studies. Exam-
ples include size of tumors in cancer patients, and pocket probing depth and
clinical attachment level in teeth of subjects undergoing root planning under
local anesthetic.
2.2.1 Equal Cluster Size
We assume that the number of observations in each cluster (m) is small com-
pared to the number of clusters (n) so that asymptotic theories can be ap-
plied to n for sample size estimation. Let Yij be a random variable of the
jth (j = 1, . . . , m) observation in the ith (i = 1, . . . , n) cluster, where Yij
is assumed to be normally distributed with mean E(Yij) = µ and common
Sample Size Determination for Clustered Outcomes 25
variance V (Yij) = σ2
. We assume a pairwise common intracluster correlation
coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0
.
Let yi =
Pm
j=1 Yij denote the sum of responses in the ith cluster, and ȳi be
the mean response computed over m observations in the ith cluster. The total
number of observations is nm. The mean of Yij computed over all observations
is written as
ȳ =
Pn
i=1
Pm
j=1 Yij
nm
,
where ȳ estimates the population mean µ.
The degree of dependence within clusters is measured by the intracluster
correlation coefficient (ρ), which can be estimated by analysis of variance
(ANOVA) estimate [8] as
ρ̂ =
MSC − MSW
MSC + (m − 1)MSW
,
where
MSC = m
n
X
i=1
(ȳi − ȳ)2
n − 1
,
MSW =
n
X
i=1
m
X
j=1
(yij − ȳi)2
n(m − 1)
.
The overall mean ȳ has a normal distribution with mean µ and variance
V , where
V =
Pn
i=1 m{1 + (m − 1)ρ̂}σ2
(nm)2
=
{1 + (m − 1)ρ̂}σ2
nm
.
We test the null hypothesis H0 : µ = µ0 versus the alternative hypothesis
H1 : µ = µ1 for µ0 6= µ1. The test statistic Z = (ȳ−µ0)/
√
V is asymptotically
normal with mean 0 and variance 1. We reject H0 : µ = µ0 if the absolute
value of Z is larger than z1−α/2, the 100(1−α/2)th percentile of the standard
normal distribution.
We are interested in estimating the sample size n with a power of 1−β for
the projected alternative hypothesis H1 : µ = µ1. The sample size (n) needed
to achieve a power of 1 −β can be obtained by solving the following equation:
|µ1 − µ0|
√
V
= z1−α/2 + z1−β.
The required number of clusters is
n =
(z1−α/2 + z1−β)2
(µ1 − µ0)2
{1 + (m − 1)ρ̂}
m
σ2
. (2.1)
26 Sample Size Calculations for Clustered and Longitudinal Outcomes
The total number of observations is
n · m =
(z1−α/2 + z1−β)2
{1 + (m − 1)ρ̂}σ2
(µ1 − µ0)2
.
When the cluster size is 1 (m = 1), the required number of observations is
n1 =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
.
When cluster size is m(m  1), the variance is inflated by a factor of {1 +
(m−1)ρ̂} compared with the variance under m = 1. The factor {1+(m−1)ρ̂}
is called variance inflation factor or design effect. That is, the total number
of observations can be computed by multiplying n1 by the design effect {1 +
(m − 1)ρ̂}.
2.2.2 Unequal Cluster Size
Cluster sizes are often unequal in cluster randomized studies. When the cluster
sizes are not constant, one approach is to replace the cluster size (m) by an
advance estimate of the average cluster sizes, which was referred to as the
average cluster size method [9, 10]. The average cluster size method is likely
to underestimate the actual required sample size [11]. Another approach is to
replace the cluster size (m) by the largest expected cluster size in the sample,
which was called as the maximum cluster size method [10]. Here, we provide
the sample size estimate under variable cluster size.
Let n be the number of clusters in a clinical trial, and mi be the cluster size
in the ith cluster (i = 1, . . . , n). The number of observations in the ith cluster,
mi, may vary at random with a certain distribution. Here, we estimate the
sample size using the information on varying cluster sizes. We assume that the
cluster sizes (mi, i = 1, . . . n) are independent and identically distributed, and
the cluster sizes (mi’s) are small compared to n so that asymptotic theories
can be applied to n for sample size estimation. Let Yij be a random variable of
the jth observation (j = 1, . . . , mi) in the ith cluster, where Yij is assumed to
be normally distributed with mean µ and variance σ2
. We assume a pairwise
common intracluster correlation coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0
. The
correlation is assumed not to vary with the number of observations per cluster.
Let yi =
Pmi
j=1 Yij denote the sum of responses in the ith cluster, and
ȳi =
Pmi
j=1 Yij/mi be the mean response computed over mi responses in the
ith cluster. Then, the mean of yij computed over all clusters is written as
ȳ =
Pn
i=1 miȳi
Pn
i=1 mi
,
where ȳ estimates the population mean µ. The mean cluster size is m̄ =
Pn
i=1 mi/n.
Sample Size Determination for Clustered Outcomes 27
The degree of dependence within clusters is measured by the intracluster
correlation coefficient (ρ), which can be estimated by analysis of variance
(ANOVA) estimate [8].
It can be shown that conditional on the empirical distribution of mi’s, the
overall mean (ȳ) has a normal distribution with mean µ and variance V , where
V =
Pn
i=1 mi{1 + (mi − 1)ρ̂}σ2
(
Pn
i=1 mi)2
.
Based on the asymptotic result, we can reject H0 : µ = µ0 if the absolute value
of the test statistic Z = (ȳ−µ0)/
√
V is larger than z1−α/2, the 100(1−α/2)th
percentile of the standard normal distribution.
We are interested in estimating the sample size n with a power of 1−β for
the projected alternative hypothesis H1 : µ = µ1. Since mi’s are independent
and identically distributed random variables, by the law of large numbers, as
n → ∞,
nV →
E[m{1 + (m − 1)ρ̂}]σ2
E(m)2
,
where m is the random variable associated with the cluster size and E(·) is
the expectation with respect to the distribution of the cluster size.
The sample size needed to achieve a power of 1 − β can be obtained by
solving the following equation:
|µ1 − µ0|
√
V
= z1−α/2 + z1−β.
This leads to
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
E[m{1 + (m − 1)ρ̂}]
E(m)2
.
Let E(m) = θ, V (m) = τ2
, and γ = τ/θ, where γ is the coefficient of variation
of the cluster size. Then, we can write
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
{(1 − ρ̂)
1
θ
+ ρ̂ + ρ̂γ2
}. (2.2)
The sample size formula (2.2) provides the sample size estimate by accounting
for variability in cluster size. When cluster sizes are equal across all clusters,
then the sample size formula (2.2) is the same as the sample size formula (2.1)
with γ = 0.
Let (w1, . . . , wn) be a set of weights assigned to clusters with wi ≥ 0 and
Pn
i=1 wi = 1. The overall mean can be expressed as ȳ =
Pn
i=1 wiȳi. The overall
mean (ȳ) is an unbiased estimate of µ. The above sample size estimate is based
on equal weights to observations by letting wi = mi/
Pn
i=1 mi. Sample size
can be also estimated by an estimator that assigns equal weights (wi = 1/n)
to each cluster or an estimator that minimizes the variance of an overall mean
(ȳ). These weighting schemes will be described in detail for clustered binary
outcomes.
28 Sample Size Calculations for Clustered and Longitudinal Outcomes
2.2.2.1 Example
Reports have established the effectiveness of minimally invasive periodontal
surgery (MIPS) in treating osseous defects [12, 13]. Since these papers were
published, new devices (including a videoscope and ultrasonic tips) have been
incorporated to enhance the effectiveness of the procedure. Haffajee et al. [14]
computed the intracluster correlation coefficients of periodontal measurements
for five groups of treated periodontal disease subjects and one group of un-
treated subjects with periodontal disease. The median intracluster correlation
coefficient (ρ) is 0.067 for clinical attachment level change. Harrel et al. [12]
showed clinical attachment loss (CAL) gains of 4.05 mm following application
of minimally invasive periodontal surgery (MIPS) in 16 subjects presenting
multiple sites with deep pockets associated with different morphologies, in-
cluding furcation involvements.
An investigator is proposing a prospective cohort study to evaluate the
effectiveness of the MIPS using these new devices. He expects CAL gains of
3.0 mm with a standard deviation of 3.5 mm over the 1–year study period.
An investigator will evaluate three sites in each subject and would like to
estimate the sample size to detect the mean difference of 1.05 mm in clinical
attachment loss (CAL) gains over the 1–year study period to achieve 80%
power at a two–sided 5% significance level. We estimate the sample size (n)
to test the null hypothesis of H0 : µ = 4.05 versus the alternative hypothesis
H1 : µ = 3.0 with a two–sided 5% significance level and 80% power assuming
three sites per subjects (m = 3) and ρ = 0.067. From Equation (2.1) with the
fixed number of sites per subject (m = 3), the required sample size for testing
H0 : µ = 4.05 versus H1 : µ = 3.0 is
n =
(1.96 + 0.842)2
(4.05 − 3.0)2
{1 + (3 − 1)0.067}
3
3.52
= 33.
Suppose that the number of sites examined per subject varies among sub-
jects with a mean of 3 and a standard deviation of 2. Then, from Equation
(2.2) with a variable number of sites per subject (θ = 3 and γ = 2/3), the
required sample size is
n =
(1.96 + 0.842)2
3.52
(4.05 − 3.0)2
{(1 − 0.067)/3 + 0.067 + 0.067(2/3)2
} = 36.
2.3 One–Sample Clustered Binary Outcomes
Clustered binary outcomes occur frequently in medical and behavioral studies.
Examples include the presence of cavities in one or more teeth, the presence
of arthritic pain in one or more joints, the presence of infection in one or two
eyes, and the occurrence of lymph node metastases in cancer patients.
Sample Size Determination for Clustered Outcomes 29
2.3.1 Equal Cluster Size
We assume that cluster sizes are equal across clusters. Let n be the total
number of clusters in an experiment and m be the number of observations in
each cluster. Let Yij be the binary random variable of the jth (j = 1, . . . , m)
observation in the ith (i = 1, . . . , n) cluster, which is coded as 1 for response
and 0 for non–response.
We assume that observations within a cluster are exchangeable in the sense
that, given m, Yi1, . . . , Yim have a common marginal response probability
P(Yij = 1) = p(0  p  1) and a common pairwise intracluster correlation
coefficient ρ = corr(Yij, Yij0 ) for j 6= j0
.
Let yi =
Pm
j=1 Yij denote the total number of responses in the ith cluster.
Under the exchangeability assumption, we have E(yi) = mp and var(yi) =
mp(1 − p){1 + (m − 1)ρ}. The proportion of responses in the ith cluster is
estimated by p̂i = yi/m with E(p̂i) = p. An unbiased estimate of p is p̂ =
Pn
i=1 p̂i/n.
For large n,
√
n(p̂ − p) is approximately normal with mean 0 and variance
σ̂2
= p̂(1 − p̂)
{1 + (m − 1)ρ̂}
m
,
where ρ̂ can be obtained by ANOVA method. The ANOVA method suitable
for continuous variables can be used to estimate the intracluster correlation
coefficient for binary outcomes. Ridout et al. [15] conducted simulation studies
to investigate the performance of various estimators of intracluster correlation
coefficient for clustered binary data under the common intracluster correla-
tion, ρ = corr(Yij, Yij0 ) for j 6= j0
. Their simulation studies showed that the
ANOVA estimator performed well for clustered binary data. The ANOVA
estimator of intracluster correlation coefficient can be written as
ρ̂ =
MSC − MSW
MSC + (m − 1)MSW
,
where MSC =
P
m(p̂i − p̂)2
/(n − 1), and MSW =
P
yi(1 − p̂i)/{n(m − 1)}.
Suppose that we wish to test the null hypothesis H0 : p = p0 versus
H1 : p = p1 for p0 6= p1 at a two–sided significance level of α. Under the null
hypothesis, the test statistic
Z =
√
n(p̂ − p0)
σ̂
is asymptotically normal with mean 0 and variance 1. We reject H0 : p = p0
if the absolute value of the test statistic Z is larger than z1−α/2, the 100(1 −
α/2)th percentile of the standard normal distribution. We are interested in
calculating the sample size n against the alternative hypothesis H1 : p = p1
with a two–sided significance level of α and power of 1 − β. The required
sample size can be obtained by solving
√
n|p0 − p1|/σ̂ = z1−α/2 + z1−β. The
30 Sample Size Calculations for Clustered and Longitudinal Outcomes
required number of clusters is
n =
σ̂2
(z1−α/2 + z1−β)2
(p0 − p1)2
=
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
·
{1 + (m − 1)ρ̂}
m
.
When the cluster size is 1 (m = 1), the required sample size becomes
n1 =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
.
When cluster size is m(m  1), the total number of observations (nm) is
{1 + (m − 1)ρ̂} times the required number of observations under m = 1. The
factor {1 + (m − 1)ρ̂} is called variance inflation factor or design effect.
2.3.2 Unequal Cluster Size
Let n be the total number of clusters in an experiment and mi be the number
of observations in the ith (i = 1, . . . , n) cluster. The number of observations
per cluster may vary at random with a certain distribution. Let Yij be the
binary random variable of the jth (j = 1, . . . , mi) observation in the ith
cluster, which is coded as 1 for response and 0 for non–response.
We assume that observations within a cluster are exchangeable with
P(Yij) = p (0  p  1) and Corr(Yij, Yij0 ) = ρ for j 6= j0
as in equal cluster
size. The intracluster correlation is assumed not to vary with the number of
observations per cluster.
Let yi =
Pmi
j=1 Yij denote the total number of responses in the ith cluster.
The proportion of responses in the ith cluster is estimated by p̂i = yi/mi
with E(p̂i) = p. Under the exchangeability assumption, we have E(yi) = mip
and var(yi) = mip(1 − p){1 + (mi − 1)ρ}. Let (w1, . . . , wn) be a set of weights
assigned to clusters with wi ≥ 0 and
Pn
i=1 wi = 1. An unbiased estimate of p is
p̂ =
Pn
i=1 wip̂i. Three weighting schemes have been proposed for parametric
and nonparametric sample size estimation for one–sample clustered binary
data [16, 17]. Three weighting schemes are equal weights to observations,
equal weights to clusters, and minimum variance weights that minimize the
variance of the weighted estimator.
Cochran [18] and Donner and Klar [11] used the estimator p̂u =
P
yi/
P
mi that assigns equal weights to observations with wi =
mi/
Pn
i0=1 mi0 . Lee [19] and Lee and Dubin [20] used the estimator p̂c =
P
pi/n that assigns equal weights to clusters with wi = 1/n. Ahn [21] showed
that the method of assigning equal weights to clusters is preferred to the
method of assigning equal weights to observations when the intracluster cor-
relation is 0.6 or greater in a simulation study. Jung et al. [16] also showed
that the sample size under equal weights to observations (nu) is usually smaller
than that under equal weights to clusters (nc) for small ρ while nc is gener-
ally smaller than nu for large ρ. If observations within a cluster are highly
dependent, then making another observation from the same cluster will not
Sample Size Determination for Clustered Outcomes 31
add much information. In this case, the method assigning equal weights to
clusters is preferred to the method assigning equal weights to observations. If
all clusters have an equal number of observations, then these two weighting
methods are identical.
Jung and Ahn [22] proposed a minimum variance estimator, p̂m, that min-
imizes the variance of p̂ =
Pn
i=1 wip̂i. The variance of the estimator (p̂m) is
minimized with weights
wi =
mi{1 + (mi − 1)ρ̂}−1
Pn
i=1 mi{1 + (mi − 1)ρ̂}−1
,
where ρ̂ can be obtained by the ANOVA method. The ANOVA estimator of
intracluster correlation coefficient can be written as
ρ̂ =
MSC − MSW
MSC + (mA − 1)MSW
,
where MSC =
Pn
i=1 mi(p̂i − p̂)2
/(n − 1), MSW =
Pn
i=1 yi(1 − p̂i)/(M − n),
mA = (M −
Pn
i=1 m2
i /M)/(n − 1), and M =
Pn
i=1 mi. Note that pm = pu
if ρ = 0 and pm = pc if ρ = 1. If cluster sizes are equal across all clusters
(mi = m), then pm = pu = pc.
We would like to test the null hypothesis H0 : p = p0 versus the alternative
hypothesis H1 : p = p1 for p0 6= p1. The test statistic
Zw =
√
n(p̂w − p0)
σ̂w
is asymptotically normal with mean 0 and variance 1, where w = u, c, and m.
Hence, we reject H0 if the absolute value of Zw is larger than z1−α/2, which
is the 100(1 − α/2)th percentile of the standard normal distribution.
Jung et al. [16] provided the sample size formulas needed to test the null
hypothesis H0 : p = p0 versus the alternative hypothesis H1 : p = p1 with a
power of 1−β using three weighting schemes of equal weights to observations,
equal weights to clusters, and minimum variance weights.
2.3.2.1 Equal Weights to Observations
Under equal weights to observations with wi = mi/
Pn
i=1 mi, the variance of
√
n(p̂u − p0) is
σ̂2
u = V {
√
n(p̂u − p0)} = p̂u(1 − p̂u)
n
P
i mi{1 + (mi − 1)ρ̂}
(
P
i mi)2
.
The test statistic
Zu =
√
n(p̂u − p0)
σ̂u
has a standard normal distribution with mean 0 and variance 1 for large n.
Under the alternative hypothesis (H1 : p = p1), σ̂2
u converges to σ2
u, where
σ2
u = p1(1 − p1)
E(m) + [E(m2
) − E(m)]ρ̂
[E(m)]2
,
32 Sample Size Calculations for Clustered and Longitudinal Outcomes
and E(m) and E(m2
) are computed using the probability distribution of clus-
ter sizes. The required sample size to test H0 : p = p0 versus H1 : p = p1 at a
two–sided significance level of α and a power of 1 − β is
nu =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
{E(m) + [E(m2
) − E(m)]ρ̂}
[E(m)]2
.
2.3.2.2 Equal Weights to Clusters
Under equal weights to clusters with wi = 1/n, the variance of
√
n(p̂c − p0) is
σ̂2
c = V {
√
n(p̂c − p0)} = p̂c(1 − p̂c)
1
n
X
i
1 + (mi − 1)ρ̂
mi
.
The test statistic
Zc =
√
n(p̂c − p0)
σ̂c
is asymptotically normal with mean 0 and variance 1. Under the alternative
hypothesis (H1 : p = p1), σ̂2
c converges to σ2
c , where
σ2
c = p1(1 − p1){E(1/m) + {1 − E(1/m)}ρ̂},
and E(1/m) is computed using the probability distribution of cluster sizes.
The required sample size with the power of 1−β for the alternative hypothesis
H1 : p = p1 is
nc =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
{E(1/m) + {1 − E(1/m)}ρ̂}.
2.3.2.3 Minimum Variance Weights
The variance of the estimator p̂ =
Pn
i=1 wip̂i is minimized when the weight,
wi, is inversely proportional to the variance of p̂i, V (p̂i) = V (yi)/m2
i [22]. The
weight that minimizes the variance of the estimator is
wi =
mi{1 + (mi − 1)ρ̂}−1
Pn
i=1 mi{1 + (mi − 1)ρ̂}−1
,
where ρ̂ can be obtained by the ANOVA method. The variance of p̂m is con-
sistently estimated by
σ̂2
m =
p̂m(1 − p̂m)
n−1
P
i mi{1 + (mi − 1)ρ̂}−1
.
The test statistic
Zm =
√
n(p̂m − p0)
σ̂m
Sample Size Determination for Clustered Outcomes 33
has a standard normal distribution with mean 0 and variance 1 for large n.
Under the alternative hypothesis (H1 : p = p1), σ̂2
m converges to σ2
m, where
σ2
m = p1(1 − p1)
1
E[m + {1 + (m − 1)ρ̂}−1]
.
The required sample size against the alternative hypothesis H1 : p = p1
for a two–sided significance level of α and power of 1 − β is
nm =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
1
E[m + {1 + (m − 1)ρ̂}−1]
.
The sample size (nm) under minimum variance estimate is always smaller
than or equal to nu and nc.
2.3.2.4 Example
We use the data of Hujoel et al. [23] as a pilot data to illustrate sample
size calculation for clustered binary outcomes. An enzymatic diagnostic test
was used to determine whether a site was infected by two specific organisms,
treponema denticola and bacteroides gingivalis. Each subject had a different
number of infected sites, as determined by the gold standard (an antibody
assay against the two organisms).
In a sample of 29 subjects, the number of true positive test results (yi)
and the number of infected sites (mi) are given in Table 2.1.
In the example of an enzymatic diagnostic test in Table 2.1, the ANOVA
estimate (ρ̂) of the intracluster correlation coefficient is 0.20.
Suppose that we would like to estimate the sample size based on the hy-
pothesis H0 : p0 = .6 versus H1 : p1 = .7 using a two–sided significance level
of 5% and a power of 80%. Table 2.2 shows the distribution of the number of
infected sites (mi).
Using the observed relative frequency from Table 2.2, E(m) =
4.897, E(1/m) = 0.224, E(m2
) = 25.379, and E[m{1 + (m − 1)ρ̂}−1
] = 2.704.
Therefore, the required sample sizes are nu = 62, nc = 63, and nm = 61.
TABLE 2.1
Proportion of infection (yi/mi) from n = 29 subjects (clusters)
3/6, 2/6, 2/4, 5/6, 4/5, 5/5, 4/6, 3/4, 2/4, 3/4, 5/5, 4/4, 6/6, 3/3, 5/6,
1/2, 4/6, 0/4, 5/6, 4/5, 4/6, 0/6, 4/5, 3/5, 0/2, 2/6, 2/4, 5/5, 4/6.
TABLE 2.2
Distribution of the number of infected sites (mi)
m
2 3 4 5 6
Relative frequency, f(m) 2/29 1/29 7/29 7/29 12/29
Another Random Document on
Scribd Without Any Related Topics
drawbridge are carved the Spanish arms and an inscription recording
the completion of the fort in 1756, when Ferdinand VI. was King of
Spain and Don Hereda Governor of Florida. It mounted one hundred
of the small guns of those days, and the interior is a square parade
ground, surrounded by large casemates. Upon each side of the
casemate opposite the sally-port is a niche for holy water, and at the
farther end the Chapel. Dungeons and subterranean passages
abound, of which ghostly tales are told. This fort is the most
interesting relic of the ancient city, a picturesque place, with charms
even in its dilapidation.
There are other quaint structures in this curious old town. A gray
gateway about ten feet wide, flanked by tall square towers, marks
the northern entrance to the city, the ditch from the fort passing in
front of it. In one of the streets is the palace of the Spanish
Governors, since changed into a post-office. The official centre of the
city is a public square, the Plaza de la Constitucion, having a
monument commemorating the Spanish Liberal Constitution of 1812,
and also a Confederate Soldiers' Monument. This square fronts on
the sea-wall, and alongside it and stretching westward is the
Alameda, known as King Street, leading to the group of grand hotels
recently constructed in Spanish and Moorish style, which have made
modern St. Augustine so famous. These are the Ponce de Leon, the
Alcazar and the Cordova, with the Casino, adjoined by spacious and
beautiful gardens. These buildings reproduce all types of the
Hispano-Moorish architecture, with many suggestions from the
Alhambra. The Ponce de Leon, the largest, is three hundred and
eighty by five hundred and twenty feet, enclosing an open court,
and its towers rise above the red-tiled roofs to a height of one
hundred and sixty-five feet, the adornments in colors being very
effective. To the southward of the town, adjoining the barracks, is
the military cemetery, where a monument and three white pyramids
tell the horrid story of the Dade massacre during the Seminole War.
Major Dade, a gallant officer, and one hundred and seven men, were
ambushed and massacred by eight hundred Indians in December,
1835, and their remains afterwards brought here and interred under
the pyramids. Opposite the barracks is what is claimed to be the
oldest house in the United States, occupied by Franciscan monks
from 1565 to 1580, and afterwards a dwelling. It has been restored,
and contains a collection of historical relics.
St. Augustine has had a chequered history. In 1586, Queen
Elizabeth's naval hero, Sir Francis Drake, sailing all over the world to
fight Spaniards, attacked and plundered the town and burnt the
greater part of it. Then for nearly a century the Indians, pirates,
French, English and neighboring Georgians and Carolinians made
matters lively for the harried inhabitants. In 1763 the British came
into possession, but they ceded it back to Spain twenty years later,
the town then containing about three hundred householders and
nine hundred negroes. It became American in 1821, and was an
important military post during the subsequent Seminole War, which
continued several years. It was early captured by the Union forces
during the Civil War, and was a valuable stronghold for them. This
curious old town has many traditions that tell of war and massacre
and the horrible cruelties of the Spanish Inquisition, the remains of
cages in which prisoners were starved to death being shown in the
fort. Its best modern story, however, is told of the escape of Coa-
coo-chee, the Seminole chief, whose adventurous spirit and savage
nature gained him the name of the Wild Cat. The ending of the
Seminole War was the signing of a treaty by the older chiefs
agreeing to remove west of the Mississippi. Coa-coo-chee, with other
younger chiefs, opposed this and renewed the conflict. He was
ultimately captured and taken to Fort Marion. Feigning sickness, he
was removed into a casemate giving him air, there being an aperture
two feet high by nine inches wide in the wall about thirteen feet
above the floor, and under it a platform five feet high. Here, while
still feigning illness, he became attenuated by voluntary abstinence
from food, and finally one night squeezed himself through the
aperture and dropped to the bottom of the moat, which was dry.
Eluding all the guards, he escaped and rejoined his people. The
flight caused a great sensation, and there was hot pursuit. After
some time he was recaptured, and being taken before General
Worth, was used to compel the remnant of the tribe to remove to
the West. Worth told him if his people were not at Tampa in twenty
days he would be killed, and he was ordered to notify them by
Indian runners. He hesitated, but afterwards yielded, and the
runners were given twenty twigs, one to be broken each day, so
they might know when the last one was broken his life would pay
the penalty. In seventeen days the task was accomplished. The tribe
came to Tampa, and the captive was released, accompanying his
warriors to the far West. This ended most of the Indian troubles in
Florida, but some descendants of the Seminoles still exist in the
remote fastnesses of the everglades.
THE FLORIDA EAST COAST.
All along the Atlantic shore of Florida south of St. Augustine are
popular winter resorts, their broad and attractive beaches, fine
climate and prolific tropical vegetation being among the charms that
bring visitors. Ormond is between the ocean front and the pleasant
Halifax River, its picturesque tributary, the Tomoka, being a favorite
resort for picnic parties. A few miles south on the Halifax River is
Daytona, known as the Fountain City, and having its suburb, the
City Beautiful, on the opposite bank. New Smyrna, settled by
Minorcan indigo planters in the eighteenth century, is on the
northern arm of Indian River. Here are found some of the ancient
Indian shell mounds that are frequent in Florida, and also the orange
groves that make this region famous. Inland about thirty miles are a
group of pretty lakes, and in the pines at Lake Helen is located the
Southern Cassadaga, or Spiritualists' Assembly. For more than a
hundred and fifty miles the noted Indian River stretches down the
coast of Florida. It is a long and narrow lagoon, parallel with the
ocean, and is part of the series of lagoons found on the eastern
coast almost continuously for more than three hundred miles from
St. Augustine south to Biscayne Bay, and varying in width from
about fifty yards to six or more miles. They are shallow waters,
rarely over twelve feet deep, and are entered by very shallow inlets
from the sea. The Indian River shores, stretching down to Jupiter
Inlet, are lined with luxuriant vegetation, and the water is at times
highly phosphorescent. Upon the western shore are most of the
celebrated Indian River orange groves whose product is so highly
prized. At Titusville, the head of navigation, where there are about a
thousand people, the river is about, at its widest part, six miles.
Twenty miles below, at Rockledge, it narrows to about a mile in
width, washing against the perpendicular sides of a continuous
enclosing ledge of coquina rock, with pleasant overhanging trees.
Here comes in, around an island, its eastern arm, the Banana River,
and to the many orange groves are added plantations of the luscious
pineapple. Various limpid streams flow out from the everglade region
at the westward, and Fort Pierce is the trading station for that
district, to which the remnant of the Seminoles come to exchange
alligator hides, bird plumage and snake skins for various supplies,
not forgetting fire-water. Below this is the wide estuary of St. Lucie
River and the Jupiter River, with the lighthouse on the ocean's edge
at Jupiter Inlet, the mouth of Indian River.
Seventeen miles below this Inlet is Palm Beach, a noted resort,
situated upon the narrow strip of land between the long and narrow
lagoon of Lake Worth and the Atlantic Ocean. Here are the vast
Hotel Royal Poinciana and the Palm Beach Inn, with their cocoanut
groves, which also fringe for miles the pleasant shores of Lake
Worth. Prolific vegetation and every charm that can add to this
American Riviera bring a crowded winter population. The Poinciana
is a tree bearing gorgeous flowers, and the two magnificent hotels,
surrounded by an extensive tropical paradise, are connected by a
wide avenue of palms a half-mile long, one house facing the lake
and the other the ocean. There is not a horse in the settlement, and
only one mule, whose duty is to haul a light summer car between
the houses. The vehicles of Palm Beach are said to be confined to
bicycles, wheel-chairs and jinrickshas. Off to the westward the
distant horizon is bounded by the mysterious region of the
everglades. Far down the coast the railway terminates at Miami, the
southernmost railway station in the United States, a little town on
Miami River, where it enters the broad expanse of Biscayne Bay,
which is separated from the Atlantic by the first of the long chain of
Florida keys. Here are many fruit and vegetable plantations, and the
town, which is a railway terminal and steamship port for lines to
Nassau, Key West and Havana, is growing. Nassau is but one
hundred and seventy-five miles distant in the Bahamas, off the
Southern Florida coast, and has become a favorite American winter
tourist resort.
ASCENDING ST. JOHN'S RIVER.
The St. John's is the great river of Florida, rising in the region of
lakes, swamps and savannahs in the lower peninsula, and flowing
northward four hundred miles to Jacksonville, then turning eastward
to the ocean. It comes through a low and level region, with mostly a
sluggish current; is bordered by dense foliage, and in its northern
portion is a series of lagoons varying in width from one to six miles.
The river is navigable fully two hundred miles above Jacksonville.
The earlier portion of the journey is monotonous, the shores being
distant and the landings made at long piers jutting out over the
shallows from the villages and plantations. At Mandarin is the orange
grove which was formerly the winter home of Harriet Beecher
Stowe; Magnolia amid the pines is a resort for consumptives; and
nearby is Green Cove Springs, having a large sulphur spring of
medicinal virtue. In all directions stretch the pine forests; and the
river water, while clear and sparkling in the sunlight, is colored a
dark amber from the swamps whence it comes. The original Indian
name of this river was We-la-ka, or a chain of lakes, the literal
meaning, in the figurative idea of the savage, being the water has
its own way. It broadens into various bays, and at one of these,
about seventy-five miles south of Jacksonville, is the chief town of
the upper river, Palatka, having about thirty-five hundred inhabitants
and a much greater winter population. It is largely a Yankee town,
shipping oranges and early vegetables to the North; and across the
river, just above, is one of the leading orange plantations of Florida—
Colonel Hart's, a Vermonter who came here dying of consumption,
but lived to become, in his time, the leading fruit-grower of the
State. Above Palatka the river is narrower, excepting where it may
broaden into a lake; the foliage is greener, the shores more swampy,
the wild-fowl more frequent, and the cypress tree more general. The
young cypress knees can be seen starting up along the swampy
edge of the shore, looking like so many champagne bottles set to
cool in the water. The river also becomes quite crooked, and here is
an ancient Spanish and Indian settlement, well named Welaka,
opposite which flows in the weird Ocklawaha River, the haunt of the
alligator and renowned as the crookedest stream on the continent.
On the Ocklawaha
GOING DOWN THE OCKLAWAHA.
The Ocklawaha, the dark, crooked water, comes from the south, by
tortuous windings, through various lakes and swamps, and then
turns east and southeast to flow into St. John's River, after a course
of over three hundred miles. It rises in Lake Apopka, down the
Peninsula, elevated about a hundred feet above the sea, the second
largest of the Florida Lakes, and covering one hundred and fifty
square miles. This lake has wooded highlands to the westward,
dignified by the title of Apopka Mountains, which rise probably one
hundred and twenty feet above its surface. To the northward is a
group of lakes—Griffin, Yale, Eustis, Dora, Harris and others—having
clear amber waters and low shores, which are all united by the
Ocklawaha, the stream finally flowing northward out of Lake Griffin.
This is a region of extensive settlement, mainly by Northern people.
The mouth of the Ocklawaha is sixty-five miles from Lake Eustis in a
straight line, but the river goes two hundred and thirty miles to get
there. To the northward of this lake district is the thriving town of
Ocala, with five thousand people, in a region of good agriculture and
having large phosphate beds, the settlement having been originally
started as a military post during the Seminole War. About five miles
east of Ocala is the famous Silver Spring, which is believed to have
been the fountain of perpetual youth, for which Juan Ponce de
Leon vainly searched. It is the largest and most beautiful of the
many Florida springs, having wonderfully clear waters, and covers
about three acres. The waters can be plainly seen pouring upwards
through fissures in the rocky bottom, like an inverted Niagara, eighty
feet beneath the surface. It has an enormous outflow, and a swift
brook runs from it, a hundred feet wide, for some eight miles to the
Ocklawaha.
This strange stream is hardly a river in the ordinary sense, having
fixed banks and a well-defined channel, but is rather a tortuous but
navigable passage through a succession of lagoons and cypress
swamps. Above the Silver Spring outlet, only the smallest boats of
light draft can get through the crooked channel. This outlet is thirty
miles in a direct line from the mouth of the river at the St. John's,
but the Ocklawaha goes one hundred and nine miles thither. The
swampy border of the stream is rarely more than a mile broad, and
beyond it are the higher pine lands. Through this curious channel,
amid the thick cypress forests and dense jungle of undergrowth, the
wayward and crooked river meanders. The swampy bottom in which
it has its course is so low-lying as to be undrainable and cannot be
improved, so that it will probably always remain as now, a refuge for
the sub-tropical animals, birds, reptiles and insects of Florida, which
abound in its inmost recesses. Here flourishes the alligator, coming
out to sun himself at mid-day on the logs and warm grassy lagoons
at the edge of the stream, in just the kinds of places one would
expect to find him. Yet the alligator is said to be a coward, rarely
attacking, unless his retreat to water in which to hide himself is cut
off. He thus becomes more a curiosity than a foe. These reptiles are
hatched from eggs which the female deposits during the spring, in
large numbers, in muddy places, where she digs out a spacious
cavity, fills it with several hundred eggs, and covering them thickly
with mud, leaves nature to do the rest. After a long incubation the
little fellows come out and make a bee-line for the nearest water.
The big alligators of the neighborhood have many breakfasts on the
newly-born little ones, but some manage to grow up, after several
years, to maturity, and exhibit themselves along this remarkable
river.
It is almost impossible to conceive of the concentrated crookedness
of the Ocklawaha and the difficulties of passage. It is navigated by
stout and narrow flat-bottomed boats of light draft, constructed so
as to quickly turn sharp corners, bump the shores and run on logs
without injury. The river turns constantly at short intervals and
doubles upon itself in almost every mile, while the huge cypress
trees often compress the water way so that a wider boat could not
get through. There are many beautiful views in its course displaying
the noble ranks of cypress trees rising as the stream bends along its
bordering edge of swamps. Occasionally a comparatively straight
river reach opens like the aisle of a grand building with the moss-
hung cypress columns in long and sombre rows on either hand. At
rare intervals fast land comes down to the stream bank, where there
is some cultivation attempted for oranges and vegetables. Terrapin,
turtles and water-fowl abound. When the passenger boat, after
bumping and swinging around the corners, much like a ponderous
teetotum, halts for a moment at a landing in this swampy fastness,
half-clad negroes usually appear, offering for sale partly-grown baby
alligators, which are the prolific crop of the district. Various Turkey
bends, Hell's half-acres, Log Jams, Bone Yards and Double S
Bends are passed, and at one place is the Cypress Gate, where
three large trees are in the way, and by chopping off parts of their
roots, a passage about twenty feet wide had been secured to let the
boats through. There are said to be two thousand bends in one
hundred miles of this stream, and many of them are like corrugated
circles, by which the narrow water way, in a mile or two of its
course, manages to twist back to within a few feet of where it
started. At night, to aid the navigation, the lurid glare of huge pine-
knot torches, fitfully blazing, gives the scene a weird and unnatural
aspect. The monotonous sameness of cypress trunks, sombre moss
and twisting stream for many hours finally becomes very tiresome,
but it is nevertheless a most remarkable journey of the strangest
character possible in this country to sail down the Ocklawaha.
LOWER FLORIDA AND THE SEMINOLES.
South of the mouth of the Ocklawaha the St. John's River broadens
into Lake George, the largest of its many lakes, a pretty sheet of
water six to nine miles wide and twelve miles long. Volusia, the site
of an ancient Spanish mission, is at the head of this lake, and the
discharge from the swift but narrow stream above has made sand
bars, so that jetties are constructed to deepen the channel. For a
long distance the upper river is narrow and tortuous, with numerous
islands and swamps, the dark coffee-colored water disclosing its
origin; but the Blue Spring in one place is unique, sending out an
ample and rich blue current to mix with the amber. Then Lake
Monroe is reached, ten miles long and five miles wide, the head of
navigation, by the regular lines of steamers, one hundred and
seventy miles above Jacksonville. Here are two flourishing towns,
Enterprise on the northern shore and Sanford on the southern, both
popular winter resorts, and the latter having two thousand people.
The St. John's extends above Lake Monroe, a crooked, narrow,
shallow stream, two hundred and fourteen miles farther
southeastward to its source. The region through which it there
passes is mostly a prairie with herds of cattle and much game, and is
only sparsely settled. The upper river approaches the seacoast,
being in one place but three miles from the lagoons bordering the
Atlantic. To the southward of Lake Monroe are the winter resorts of
Winter Park and Orlando, the latter a town of three thousand
population. There are numerous lakes in this district, and then
leaving the St. John's valley and crossing the watershed southward
through the pine forests, the Okeechobee waters are reached, which
flow down to that lake. This region was the home of a part of the
Seminole Indians, and Tohopekaliga was their chief, whom they
revered so highly that they named their largest lake in his honor. The
Kissimmee River flows southward through this lake, and then
traverses a succession of lakes and swamps to Lake Okeechobee,
about two hundred miles southward by the water-line. Kissimmee
City is on Lake Tohopekaliga, and extensive drainage operations have
been conducted here and to the southward, reclaiming a large extent
of valuable lands, and lowering the water-level in all these lakes and
attendant swamps.
From Lake Tohopekaliga through the tortuous water route to Lake
Okeechobee, and thence by the Caloosahatchie westward to the Gulf
of Mexico, is a winding channel of four hundred and sixty miles,
though in a direct line the distance is but one hundred and fifty
miles. Okeechobee, the word meaning the large water, covers
about twelve hundred and fifty square miles, and almost all about it
are the everglades or grass water, the shores being generally a
swampy jungle. This district for many miles is a mass of waving
sedge grass eight to ten feet high above the water, and inaccessible
excepting through narrow, winding and generally hidden channels.
In one locality a few tall lone pines stand like sentinels upon Arpeika
Island, formerly the home of the bravest and most dreaded of the
Seminoles, and still occupied by some of their descendants. The
name of the Seminole means the separatist or runaway Indians,
they having centuries ago separated from the Creeks in Georgia and
gone southward into Florida. From the days of De Soto to the time of
their deportation in the nineteenth century the Spanish, British,
French and Americans made war with these Seminole Indians.
Gradually they were pressed southward through Florida. Their final
refuge was the green islands and hummocks of the everglades, and
they then clung to their last homes with the tenacity of despair. The
greater part of this region is an unexplored mystery; the deep
silence that can be actually felt, everywhere pervades; and once lost
within the labyrinth, the adventurer is doomed unless rescued. Only
the Indians knew its concealed and devious paths. On Arpeika Island
the Cacique of the Caribs is said to have ruled centuries ago, until
forced south out of Florida by the Seminoles. It was at times a
refuge for the buccaneer with his plunder and a shrine for the
missionary martyr who planted the Cross and was murdered beside
it. This island was the last retreat of the Seminoles in the desultory
war from 1835 to 1843, when they defied the Government, which,
during eight years, spent $50,000,000 upon expeditions sent against
them. Then the attempt to remove all of them was abandoned, and
the remnant have since rested in peace, living by hunting and a little
trading with the coast settlements. The names of the noted chiefs of
this great race—Osceola, Tallahassee, Tohopekaliga, Coa-coo-chee
and others—are preserved in the lakes, streams and towns of
Florida. Most of the deported tribe were sent to the Indian Territory.
There may be three or four hundred of them still in the everglades,
peaceful, it is true, yet haughty and suspicious, and sturdily rejecting
all efforts to educate or civilize them. They celebrate their great
feast, the Green Corn Dance, in late June; and they have
unwavering faith in the belief that the time will yet come when all
their prized everglade land will be theirs again, and the glory of the
past redeemed, if not in this world, then in the next one, beyond the
Big Sleep.
WESTERN FLORIDA.
Westward from Jacksonville, a railway runs through the pine forests
until it reaches the rushing Suwanee River, draining the Okifenokee
swamp out to the Gulf, just north of Cedar Key. This stream is best
known from the minstrel song, long so popular, of the Old Folks at
Home. Beyond it the land rises into the rolling country of Middle
Florida, the undulating surface sometimes reaching four hundred
feet elevation, and presenting fertile soil and pleasant scenery, with a
less tropical vegetation than the Peninsula of Florida. Here is
Tallahassee, the capital of the State, one hundred and sixty-five
miles from Jacksonville, a beautiful town of four thousand
population, almost embedded in flowering plants, shrubbery and
evergreens, and familiarly known from these beauties as the Floral
City, the gardens being especially attractive in the season of roses.
The Capitol and Court-house and West Florida Seminary, set on a
hill, are the chief public buildings. In the suburbs, at Monticello, lived
Prince Achille Murat, a son of the King of Naples, who died in 1847,
and his grave is in the Episcopal Cemetery. There are several lakes
near the town, one of them the curious Lake Miccosukie, which
contracts into a creek, finally disappearing underground. The noted
Wakulla Spring, an immense limestone basin of great depth and
volume of water, with wonderful transparency, is fifteen miles
southward.
Some distance to the westward the Flint and Chattahoochee Rivers
join to form the Appalachicola River, flowing down to the Gulf at
Appalachicola, a somewhat decadent port from loss of trade, its
exports being principally lumber and cotton. The shallowness of most
of these Gulf harbors, which readily silt up, destroys their usefulness
as ports for deep-draft shipping. The route farther westward skirts
the Gulf Coast, crosses Escambia Bay and reaches Pensacola, on its
spacious harbor, ten miles within the Gulf. This is the chief Western
Florida port, with fifteen thousand people, having a Navy Yard and
much trade in lumber, cotton, coal and grain, a large elevator for the
latter being erected in 1898. The Spaniards made this a frontier post
in 1696, and the remains of their forts, San Miguel and San
Bernardo, can be seen behind the town, while near the outer edge of
the harbor is the old-time Spanish defensive battery, Fort San Carlos
de Barrancos. The harbor entrance is now defended by Fort Pickens
and Fort McRae. Pensacola Bay was the scene of one of the first
spirited naval combats of the Civil War, when the Union forces early
in 1862 recaptured the Navy Yard and defenses. The name of
Pensacola was originally given by the Choctaws to the bearded
Europeans who first settled there, and signifies the hair people.
THE FLORIDA GULF COAST.
The coast of Florida on the Gulf of Mexico has various attractive
places, reached by a convenient railway system. Homosassa is a
popular resort about fifty miles southwestward from Ocala. A short
distance in the interior is the locality where the Seminoles surprised
and massacred Major Dade and his men in December, 1835, only
three soldiers escaping alive to tell the horrid tale. The operations
against these Indians were then mainly conducted from the military
post of Tampa, and thither were taken for deportation the portions of
the tribe that were afterwards captured, or who surrendered under
the treaty. When Ferdinand de Soto entered this magnificent harbor
on his voyage of discovery and gold hunting, he called it Espiritu
Sancto Bay. It is from six to fifteen miles wide, and stretches nearly
forty miles into the land, being dotted with islands, its waters
swarming with sea-fowl, turtles and fish, deer abounding in the
interior and on some of the islands, and there being abundant
anchorage for the largest vessels. This is the great Florida harbor
and the chief winter resort on the western coast. It was the main
port of rendezvous and embarkation for the American forces in the
Spanish War of 1898. The head of the harbor divides into Old Tampa
and Hillsborough Bays, and on the latter and at the mouth of
Hillsborough River is the city, numbering about twenty-five thousand
inhabitants. The great hotels are surrounded by groves with orange
and lemon trees abounding, and everything is invoked that can add
to the tourist attractions. The special industry of the resident
population is cigar-making. Port Tampa is out upon the Peninsula
between the two bays, several miles below the city, and a long
railway trestle leads from the shore for a mile to deep water. Upon
the outer end of this long wharf is Tampa Inn, built on a mass of
piles, much like some of the constructions in Venice. The guests can
almost catch fish out of the bedroom windows, and while eating
breakfast can watch the pelican go fishing in the neighboring waters,
for this queer-looking bird, with the duck and gull, is everywhere
seen in these attractive regions. An outer line of keys defends Tampa
harbor from the storms of the Gulf. There are many popular resorts
on the islands and shores of Tampa Bay, and regular lines of
steamers are run to the West India ports, Mobile and New Orleans.
All the surroundings are attractive, and a pleased visitor writes of the
place: Conditions hereabouts exhilarate the men; a perpetual sun
and ocean breeze are balm to the invalid and an inspiration to a
robust health. The landscape affords uncommon diversion, and the
sea its royal sport with rod and gaff.
Farther down the coast is Charlotte Harbor, also deeply indented and
sheltered from the sea by various outlying islands. It is eight to ten
miles long and extends twenty-five miles into the land, having
valuable oyster-beds and fisheries, and its port is Punta Gorda.
Below this is the projecting shore of Punta Rassa, where the outlet of
Lake Okeechobee, the Caloosahatchie River, flows to the sea, having
the military post of Fort Myers, another popular resort, a short
distance inland, upon its bank. The Gulf Coast now trends to the
southeast, with various bays, in one of which, with Cape Romano as
the guarding headland, is the archipelago of the ten thousand
islands, while below is Cape Sable, the southwestern extremity of
Florida. To the southward, distant from the shore, are the long line
of Florida Keys, the name coming from the Spanish word cayo, an
island. This remarkable coral formation marks the northern limit of
the Gulf Stream, where it flows swiftly out to round the extremity of
the Peninsula and begin its northern course through the Atlantic
Ocean. Although well lighted and charted, the Straits of Florida along
these reefs are dangerous to navigate and need special pilots.
Nowhere rising more than eight to twelve feet above the sea, the
Keys thus low-lying are luxuriantly covered with tropical vegetation.
From the Dry Tortugas at the west, around to Sand's Key at the
entrance to Biscayne Bay, off the Atlantic Coast, about two hundred
miles, is a continuous reef of coral, upon the whole extent of which
the little builder is still industriously working. The reef is occasionally
broken by channels of varying depth, and within the outer line are
many habitable islands. The whole space inside this reef is slowly
filling up, just as all the Keys are also slowly growing through
accretions from floating substances becoming entangled in the
myriad roots of the mangroves. The present Florida Reef is a good
example of the way in which a large part of the Peninsula was
formed. No less than seven old coral reefs have been found to exist
south of Lake Okeechobee, and the present one at the very edge of
the deep water of the Gulf Stream is probably the last that can be
formed, as the little coral-builder cannot live at a greater depth than
sixty feet. The Gulf Stream current is so swift and deep along the
outer reef that there is no longer a foundation on which to build.
The Gulf Stream is the best known of all the great ocean currents.
The northeast and southeast trade-winds, constantly blowing, drive a
great mass of water from the Atlantic Ocean into the Caribbean Sea,
and westward through the passages between the Windward Islands,
which is contracted by the converging shores of the Yucatan
Peninsula and the Island of Cuba, so that it pours between them into
the Gulf of Mexico, raising its surface considerably above the level of
the Atlantic. These currents then move towards the Florida
Peninsula, and pass around the Florida Reef and out into the
Atlantic. It is estimated by the Coast Survey that the hourly flow of
the Gulf Stream past the reef is nearly ninety thousand million tons
of water, the speed at the surface of the axis of the stream being
over three and one-half miles an hour. To conceive what the
immensity of this flow means, it is stated that if a single hour's flow
of water were evaporated, the salt thus produced would require to
carry it one hundred times the number of ocean-going vessels now
afloat. The Gulf Stream water is of high temperature, great clearness
and a deep blue color; and when it meets the greener waters of the
Atlantic to the northward, the line of distinction is often very well
defined. At the exit to the Atlantic below Jupiter Inlet the stream is
forty-eight miles wide to Little Bahama Bank, and its depth over four
hundred fathoms.
There are numerous harbors of refuge among the Florida Keys, and
that at Key West is the best. This is a coral island seven miles long
and one to two miles broad, but nowhere elevated more than eleven
feet above the sea. Its name, by a free translation, comes from the
original Spanish name of Cayo Hueso, or the Bone Island, given
because the early mariners found human bones upon it. Here are
twenty thousand people, mostly Cubans and settlers from the
Bahamas, the chief industry being cigar-making, while catching fish
and turtles and gathering sponges also give much employment.
There are no springs on the island, and the inhabitants are
dependent on rain or distillation for water. The air is pure and the
climate healthy, the trees and shrubbery, with the residences
embowered in perennial flowers, giving the city a picturesque
appearance. Key West has a good harbor, and as it commands the
gateway to and from the Gulf near the western extremity of the
Florida coral reef, it is strongly defended, the prominent work being
Fort Taylor, constructed on an artificial island within the main harbor
entrance. The little Sand Key, seven miles to the southwest, is the
southernmost point of the United States. Forty miles to the westward
is the group of ten small, low and barren islands known as the Dry
Tortugas, from the Spanish tortuga, a tortoise. Upon the farthest
one, Loggerhead Key, stands the great guiding light for the Florida
Reef, of which this is the western extremity, the tower rising one
hundred and fifty feet. Fort Jefferson is on Garden Key, where there
is a harbor, and in it were confined various political prisoners during
the Civil War, among them some who were concerned in the
conspiracy to assassinate President Lincoln.
Here, with the encircling waters of the Gulf all around us, terminates
this visit to the Sunny South. As we have progressed, the gradual
blending of the temperate into the torrid zone, with the changing
vegetation, has reminded of Bayard Taylor's words:
There, in the wondering airs of the Tropics,
Shivers the Aspen, still dreaming of cold:
There stretches the Oak from the loftiest ledges,
His arms to the far-away lands of his brothers,
And the Pine tree looks down on his rival, the Palm.
And as the journey down the Florida Peninsula has displayed some of
the most magnificent winter resorts of the American Riviera, with
their wealth of tropical foliage, fruits and flowers, and their seductive
and balmy climate, this too has reminded of Cardinal Damiani's
glimpse of the Joys of Heaven:
Stormy winter, burning summer, rage within these regions never,
But perpetual bloom of roses and unfading spring forever;
Lilies gleam, the crocus glows, and dropping balms their scents
deliver.
Along this famous peninsula the sea rolls with ceaseless beat upon
some of the most gorgeous beaches of the American coast. To the
glories of tropical vegetation and the charms of the climate, Florida
thus adds the magnificence of its unrivalled marine environment.
Everywhere upon these pleasant coasts—
The bridegroom, Sea,
Is toying with his wedded bride,—the Shore.
He decorates her shining brow with shells,
And then retires to see how fine she looks,
Then, proud, runs up to kiss her.
TRAVERSING THE PRAIRIE LAND.
VI.
TRAVERSING THE PRAIRIE LAND.
The Northwest Territory—Beaver River—Fort McIntosh—Mahoning Valley—
Steubenville—Youngstown—Canton—Massillon—Columbus—Scioto River—Wayne
Defeats the Miamis—Sandusky River—Findlay—Natural Gas Fields—Fort Wayne—
Maumee River—The Little Turtle—Old Tippecanoe—Tecumseh—Battle of
Tippecanoe—Harrison Defeats the Prophet—Tecumseh Slain in Canada—
Indianapolis—Wabash River—Terre Haute—Illinois River—Springfield—Lincoln's
Home and Tomb—Peoria—The Great West—Lake Erie—Tribe of the Cat—
Conneaut—The Western Reserve—Ashtabula—Mentor—Cleveland—Cuyahoga
River—Moses Cleaveland—Euclid Avenue—Oberlin—Elyria—The Fire Lands—
Sandusky—Put-in-Bay Island—Perry's Victory—Maumee River—Toledo—South
Bend—Chicago—The Pottawatomies—Fort Dearborn—Chicago Fire—Lake
Michigan—Chicago River—Drainage Canal—Lockport—Water Supply—Fine
Buildings, Streets and Parks—University of Chicago—Libraries—Federal Steel
Company—Great Business Establishments—Union Stock Yards—The Hog—The
Board of Trade—Speculative Activity—George M. Pullman—The Sleeping Car—
The Pioneer—Town of Pullman—Agricultural Wealth of the Prairies—The Corn
Crop—Whittier's Corn Song.
THE NORTHWEST TERRITORY.
Beyond the Allegheny ranges, which are gradually broken down into
their lower foothills, and then to an almost monotonous level, the
expansive prairie lands stretch towards the setting sun. From their
prolific agriculture has come much of the wealth and prosperity of
the United States. The rivers flowing out of the mountains seek the
Mississippi Valley, thus reaching the sea through the Great Father of
Waters. Among these rivers is the Ohio, and at its confluence with
the Beaver, near the western border of Pennsylvania, was, in the
early days, the Revolutionary outpost of Fort McIntosh, a defensive
work against the Indians. All about is a region of coal and gas,
extending across the boundary into the Mahoning district of Ohio,
the Mahoning River being an affluent of the Beaver. Numerous
railroads serve its many towns of furnaces and forges. To the
southward is Steubenville on the Ohio, and to the northward
Youngstown on the Mahoning, both busy manufacturing centres.
Salem and Alliance are also prominent, and some distance northwest
is Canton, a city of thirty thousand people, in a fertile grain district,
the home of President William McKinley. Massillon, upon the pleasant
Tuscarawas River, in one of the most productive Ohio coal-fields,
preserves the memory of the noted French missionary priest, Jean
Baptiste Massillon, for all this region was first traversed, and opened
to civilization, by the French religious explorers from Canada who
went out to convert the Indians.
In the centre of the State of Ohio is the capital, Columbus, built on
the banks of the Scioto River, a tributary of the Ohio flowing
southward and two hundred miles long. This river receives the
Olentangy or Whetstone River at Columbus, in a region of great
fertility, which is in fact the characteristic of the whole Scioto Valley.
The Ohio capital, which has a population of one hundred and twenty
thousand, large commerce and many important manufacturing
establishments, dates from 1812, and became the seat of the State
Government in 1816. The large expenditures of public money upon
numerous public institutions, all having fine buildings, the wide, tree-
shaded streets, and the many attractive residences, have made it
one of the finest cities in the United States. Broad Street, one
hundred and twenty feet wide, beautifully shaded with maples and
elms, extends for seven miles. The Capitol occupies a large park
surrounded with elms, and is an impressive Doric building of gray
limestone, three hundred and four feet long and one hundred and
eighty-four feet wide, the rotunda being one hundred and fifty-seven
feet high. There are fine parks on the north, south and east of the
city, the latter containing the spacious grounds of the Agricultural
Society. Almost all the Ohio State buildings, devoted to its
benevolence, justice or business, have been concentrated in
Columbus, adding to its attractions, and it is also the seat of the
Ohio State University with one thousand students. Railroads radiate
in all directions, adding to its commercial importance.
In going westward, the region we are traversing beyond the
Pennsylvania boundary gradually changes from coal and iron to a
rich agricultural section. As we move away from the influence of the
Allegheny ranges, the hills become gentler, and the rolling surface is
more and more subdued, until it is smoothed out into an almost level
prairie, heavily timbered where not yet cleared for cultivation. This
was the Northwest Territory, first explored by the French, who were
led by the Sieur de la Salle in his original discoveries in the
seventeenth century. The French held it until the conquest of
Canada, when that Dominion and the whole country west to the
Mississippi River came under the British flag by the treaty of 1763.
After the Revolution, the various older Atlantic seaboard States
claiming the region, ceded sovereignty to the United States
Government, and then its history was chequered by Indian wars until
General Wayne conducted an expedition against the Miamis and
defeated them in 1794, after which the Northwest Territory was
organized, and the State of Ohio taken out of it and admitted to the
Union in 1803, its first capital being Chillicothe. It was removed to
Zanesville for a couple of years, but finally located at Columbus.
Beyond the Scioto the watershed is crossed, by which the waters of
the Ohio are left behind and the valley of Sandusky River is reached,
a tributary of Lake Erie. Here is Bucyrus, in another prolific natural
gas region, the centre of which is Findlay. At this town, in 1887, the
inhabitants, who had then had just one year of natural gas
development, spent three days in exuberant festivity, to show their
appreciation of the wonderful discovery. They had thirty-one gas
wells pouring out ninety millions of cubic feet in a day, all piped into
town and feeding thirty thousand glaring natural gas torches of
enormous power, which blew their roaring flames as an
accompaniment to the oratory of John Sherman and Joseph B.
Foraker, who were then respectively Senator and Governor of Ohio.
The soldiers and firemen paraded, and a multitude of brass bands
tried to drown the Niagara of gas which was heard roaring five miles
away, while the country at night was illuminated for twenty miles
around. But the wells have since diminished their flow, although the
gas still exists; while another field with a prolific yield is in Fairfield
County, a short distance southeast of Columbus. Over the State
boundary in Indiana is yet another great gas-field covering five
thousand square miles in a dozen counties, with probably two
thousand wells and a yield which has reached three thousand
millions of cubic feet in a day. This gas supplies many cities and
towns, including Chicago, and it is one of the greatest gas-fields
known. In the same region there are also large petroleum deposits.
Not far beyond the State boundary is Fort Wayne, the leading city of
Northern Indiana, having forty thousand population, an important
railway centre, and prominent also in manufactures. It stands in a
fertile agricultural district, and being located at the highest part of
the gentle elevation, beyond the Sandusky Valley, diverting the
waters east and west, it is appropriately called the Summit City.
Here the Maumee River is formed by the confluence of the two
streams St. Joseph and St. Mary, and flows through the prairie
towards the northeast, to make the head of Lake Erie. The French,
under La Salle, in the eighteenth century established a fur-trading
post here, and erected Fort Miami, and in 1760 the British
penetrated to this then remote region and also built a fort. During
the Revolution this country was abandoned to the Indians, but when
General Wayne defeated the Miamis in 1794 he thought the place
would make a good frontier outpost to hold the savages in check,
and he then constructed a strong work, to which he gave the name
of Fort Wayne. Around this post the town afterwards grew, being
greatly prospered by the Wabash and Erie Canal, and by the various
railways subsequently constructed in all directions. All this prairie
region was the hunting-ground of the Miamis, whose domain
extended westward to Lake Michigan, and southward along the
valley of the Miami River to the Ohio. They were a warlike and
powerful tribe, and their adherence to the English during the
Revolution provoked almost constant hostilities with the settlers who
afterwards came across the mountains to colonize the Northwest
Territory. Under the leadership of their renowned chief
Mishekonequah, or the Little Turtle, they defeated repeated
expeditions sent against them, until finally beaten by Wayne.
Subsequently they dwindled in importance, and when removed
farther west, about 1848, they numbered barely two hundred and
fifty persons.
OLD TIPPECANOE.
Some distance westward is the Tippecanoe River, a stream flowing
southwest into the Wabash, and thence into the Ohio. The word
Tippecanoe is said to mean the great clearing, and on this river
was fought the noted battle by Old Tippecanoe, General William
Henry Harrison, against the combined forces of the Shawnees,
Miamis and several other tribes, which resulted in their complete
defeat. They were united under Elskwatawa, or the Prophet, the
brother of the famous Tecumseh. These two chieftains were
Shawnees, and they preached a crusade by which they gathered all
the northwestern tribes in a concerted movement to resist the
steady encroachments of the whites. The brother, who was a
medicine man, in 1805 set up as an inspired prophet, denouncing
the use of liquors, and of all food, manners and customs introduced
by the hated palefaces, and confidently predicted they would
ultimately be driven from the land. For years both chiefs travelled
over the country stirring up the Indians. General Harrison, who was
the Governor of the Northwest Territory, gathered his forces together
and advanced up the Wabash against the Prophet's town of
Tippecanoe, when the Indians, hoping to surprise him, suddenly
attacked his camp, but he being prepared, they were signally
defeated, thus giving Harrison his popular title of Old Tippecanoe,
which had much to do with electing him President in 1840. Some
time after this defeat the War of 1812 broke out, when Tecumseh
espoused the English cause, went to Canada with his warriors, and
was made a brigadier-general. He was killed there in the battle of
the Thames, in Ontario Province, and it is said had a premonition of
death, for, laying aside his general's uniform, he put on a hunting-
dress and fought desperately until he was slain. Tecumseh was the
most famous Indian chief of his time, and the honor of killing him
was claimed by several who fought in the battle, so that the problem
of Who killed Tecumseh? was long discussed throughout the
country.
The State of Indiana was admitted into the Union in 1816, and in its
centre, built upon a broad plain, on the east branch of White River, is
its capital and largest city, Indianapolis, having two hundred
thousand population. This is a great railway centre, having lines
radiating in all directions, and it also has extensive manufactures and
a large trade in live stock. The city plan, with wide streets crossing at
right angles, and four diagonal avenues radiating from a circular
central square, makes it very attractive; and the residential quarter,
displaying tasteful houses, ornate grounds and shady streets, is
regarded as one of the most beautiful in the country. The State
Capitol, in a spacious park, is a Doric building with colonnade,
central tower and dome, and in an enclosure on its eastern front is
erected one of the finest Soldiers' and Sailors' Monuments existing,
rising two hundred and eighty-five feet, out-topping everything
around, having been designed and largely constructed in Europe.
There are also many prominent public buildings throughout the city.
Indianapolis, first settled in 1819, had but a small population until
the railways centred there, the Capitol being removed from Corydon
in 1825. The Wabash River, to which reference has been made,
receives White River, and is one of the largest affluents of the Ohio,
about five hundred and fifty miles long, being navigable over half
that length. It rises in the State of Ohio, flows across Indiana, and,
turning southward, makes for a long distance the Illinois boundary.
Its chief city is Terre Haute, the High Ground, about seventy miles
west of Indianapolis, another prominent railroad centre, having forty-
five thousand people, with extensive manufactures. It is surrounded
by valuable coal-fields, is built upon an elevated plateau, and, like all
these prairie cities, is noted for its many broad and well-shaded
streets. It was founded in 1816.
THE GREAT WEST.
Progressing westward, the timbered prairie gradually changes to the
grass-covered prairie, spreading everywhere a great ocean of
fertility. Across the Wabash is the Prairie State of Illinois, its name
coming from its principal river, which the Indians named after
themselves. The word is a French adaptation of the Indian name
Illini, meaning the superior men, the earliest explorers and
settlers having been French, the first comers on the Illinois River
being Father Marquette and La Salle. At the beginning of the
eighteenth century their little settlements were flourishing, and the
most glowing accounts were sent home, describing the region, which
they called New France, on account of its beauty, attractiveness
and prodigious fertility, as a new Paradise. There were many years of
Indian conflicts and hostility, but after peace was restored and a
stable government established, population flowed in, and Illinois was
admitted as a State to the Union in 1818. The capital was
established at Springfield in 1837, an attractive city of about thirty
thousand inhabitants, built on a prairie a few miles south of
Sangamon River, a tributary of the Illinois, and from its floral
development and the adornment of its gardens and shade trees,
Springfield is popularly known as the Flower City. There is a
magnificent State Capitol with high surmounting dome, patterned
somewhat after the Federal Capitol at Washington. Springfield has
coal-mines which add to its prosperity, but its great fame is
connected with Abraham Lincoln. He lived in Springfield, and the
house he occupied when elected President has been acquired by the
State and is on public exhibition. After his assassination in 1865, his
remains were brought from Washington to Springfield, and interred
in the picturesque Oak Ridge Cemetery, in the northern suburbs,
where a magnificent monument was erected to his memory and
dedicated in 1874. About sixty miles north of Springfield, the Illinois
River expands into Peoria Lake, and here came La Salle down the
river in 1680, and at the foot of the lake established a trading-post
and fort, one of the earliest in that region. When more than a
century had elapsed, a little town grew there which is now the busy
industrial city of Peoria, famous for its whiskey and glucose, and
turning out products that annually approximate a hundred millions,
furnishing vast traffic for numerous railroads. It is the chief city of
the corn belt, and is served by all the prominent trunk railway
lines.
Like the pioneers of a hundred years ago, we have left the Atlantic
seaboard, crossed the Allegheny Mountains and entered the
expansive Northwest Territory, which in the first half of the
nineteenth century was the Mecca of the colonist and frontiersman.
This was then the region of the Great West, though that has since
moved far beyond the Mississippi. Its agricultural wealth made the
prosperity of the country for many decades, and its prodigious
development was hardly realized until put to the test of the Civil War,
when it poured out the men and officers, and had the staying
qualities so largely contributing to the result of that great conflict.
Gradually overspread by a network of railways, the numerous cross-
roads have expanded everywhere into towns and cities, almost all
patterned alike, and all of them centres of rich farming districts.
Coal, oil and gas have come to minister to its manufacturing wants,
and thus growing into mature Commonwealths, this prolific region in
the later decades has been itself, in turn, contributing largely to the
tide of migration flowing to the present Great Northwest, a
thousand miles or more beyond. It presents a rich agricultural
picture, but little scenic attractiveness. Everywhere an almost dead
level, the numerous railways cross and recross the surface in all
directions at grade, and are easily built, it being only necessary to
dig a shallow ditch on either side, throw the earth in the centre, and
lay the ties and rails. Nature has made the prairie as smooth as a
lake, so that hardly any grading is necessary, and the region of
expansive green viewed out of the car window has been aptly
described as having a face but no features, when one looks afar
over an ocean of waving verdure.
LAKE ERIE.
This vast prairie extends northward to and beyond the Great Lakes,
and it is recorded that in the early history of the proposed legislation
for the Northwest Territory, Congress gravely selected as the
names of the States which were to be created out of it such
ponderous conglomerates as Metropotamia, Assenispia,
Pelisipia and Polypotamia, titles which happily were long ago
permitted to pass into oblivion. Northward, in Ohio, the region
stretches to Lake Erie, the most southern and the smallest of the
group of Great Lakes above Niagara. It is regarded as the least
attractive lake, having neither romances nor much scenery. Yet, from
its favorable position, it carries an enormous commerce. It is elliptical
in form, about two hundred and forty miles long and sixty miles
broad, the surface being five hundred and sixty-five feet above the
ocean level. It is a very shallow lake, the depth rarely exceeding one
hundred and twenty feet, excepting at the lower end, while the other
lakes are much deeper, and in describing this difference of level it is
said that the surplus waters poured from the vast basins of Superior,
Michigan and Huron, flow across the plate of Erie into the deep bowl
of Ontario. This shallowness causes it to be easily disturbed, so that
it is the most dangerous of these fresh-water seas, and it has few
harbors, and those very poor, especially upon the southern shore.
The bottom of the lake is a light, clayey sediment, rapidly
accumulated from the wearing away of the shores, largely composed
of clay strata. The loosely-aggregated products of these
disintegrated strata are frequently seen along its coast, forming cliffs
extending back into elevated plateaus, through which the rivers cut
deep channels. Their mouths are clogged by sand-bars, and
dredging and breakwaters have made the harbors on the southern
shore, around which have grown the chief towns—Dunkirk, Erie,
Ashtabula, Cleveland, Sandusky and Toledo. The name of Lake Erie
comes from the Indian tribe of the Cat, whom the French called
the Chats, because their early explorers, penetrating to the shores
of the lake, found them abounding in wild cats, and thus they gave
the same name to the cats and the savages. In their own parlance,
these Indians were the Eries, and in the seventeenth century they
numbered about two thousand warriors. In 1656 the Iroquois
attacked and almost annihilated them.
The Lake Erie ports in the Buckeye State of Ohio, so called from
the buckeye tree, are chiefly harbors for shipping coal and receiving
ores from the upper lakes, their railroads leading to the great
industrial centres to the southward. Near the eastern boundary of
Ohio is Conneaut, on the bank of a wide and deep ravine, formed by
a small river, broadening into a bay at the shore of the lake, the
name meaning many fish. Here landed in 1796 the first settlers
from Connecticut, who entered the Western Reserve, as all this
region was then called. On July 4th of that year, celebrating the
national anniversary, they pledged each other in tin cups of lake
water, accompanied by a salute of fowling-pieces, and the next day
began building the first house on the Reserve, constructed of logs,
and long known as Stow Castle. Conneaut is consequently known
as the Plymouth of the Western Reserve, as here began the
settlements made by the Puritan New England migration to Ohio. On
deep ravines making their harbors are Ashtabula, an enormous
entrepôt for ores, and a few miles farther westward, Painesville, on
Grand River, named for Thomas Paine. Beyond is Mentor, the home
of the martyred President Garfield, whose large white house stands
near the railway. All along here, the southern shore of Lake Erie is a
broad terrace at eighty to one hundred feet elevation above the
water, while farther inland is another and considerably higher
plateau. Each sharp declivity facing northward seems at one time to
have been the actual shore of the lake when its surface before the
waters receded was much higher than now. The outer plateau
having once been the overflowed lake bed, is level, excepting where
the crooked but attractive streams have deeply cut their winding
ravines down through it to reach Lake Erie.
THE CITY OF CLEVELAND.
Thus we come to Cleveland, the second city in Ohio, having four
hundred thousand people, and extensive manufacturing industries. It
is the capital of the Western Reserve and the chief city of Northern
Ohio, its commanding position upon a high bluff, falling off
precipitously to the edge of the water, giving it the most attractive
situation on the shore of Lake Erie. Shade trees embower it,
including many elms planted by the early settlers, who learned to
love them in New England, and hence it delights in the popular title
of the Forest City. Were not the streets so wide, the profusion of
foliage might make Cleveland seem like a town in the woods. The
little Cuyahoga River, its name meaning the crooked stream, flows
with wayward course down a deeply washed and winding ravine,
making a valley in the centre of the city, known as the Flats, and
this, with the tributary ravines of some smaller streams, is packed
with factories and foundries, oil refineries and lumber mills, their
chimneys keeping the business section constantly under a cloud of
smoke. Railways run in all directions over these flats and through the
ravines, while, high above, the city has built a stone viaduct nearly a
half-mile long, crossing the valley. Here are the great works of the
Standard Oil Company, controlling that trade, and several of the
petroleum magnates have their palaces in the city.
Old Moses Cleaveland, a shrewd but unsatisfied Puritan of the town
of Windham, Connecticut, became the agent of the Connecticut Lead
Company, who brought out the first colony in 1796 that landed at
Conneaut. They explored the lake shore, and selecting as a good
location the mouth of Cuyahoga River, Moses wrote back to his
former home that they had found a spot on the bank of Lake Erie
which was called by my name, and I believe the child is now born
that may live to see that place as large as old Windham. In little
over a century the town has grown far beyond his wildest dreams,
although it did not begin to expand until the era of canals and
railways, and it was not so long ago that the people in grateful
memory erected a bronze statue of the founder. One of the local
antiquaries, delving into the records, has found why various original
settlers made their homes at Cleveland. He learned that one man,
on his way farther West, was laid up with the ague and had to stop;
another ran out of money and could get no farther; another had
been to St. Louis and wanted to get back home, but saw a chance to
make money in ferrying people across the river; another had $200
over, and started a bank; while yet another thought he could make a
living by manufacturing ox-yokes, and he stayed. This earnest
investigator continues: A man with an agricultural eye would look at
the soil and kick his toe into it, and then would shake his head and
declare that it would not grow white beans—but he knew not what
this soil would bring forth; his hope and trust was in beans, he
wanted to know them more, and wanted potatoes, corn, oats and
cabbage, and he knew not the future of Euclid Avenue.
On either side of the deep valley of the Flats stretch upon the
plateau the long avenues of Cleveland, with miles of pleasant
residences, surrounded by lawns and gardens, each house isolated in
green, and the whole appearing like a vast rural village more than a
city. This pleasant plan of construction had its origin in the New
England ideas of the people. Yet the city also has a numerous
population of Germans, and it is recorded that one of the early
landowners wrote, in explaining his project of settlement: If I make
the contract for thirty thousand acres, I expect with all speed to send
you fifteen or twenty families of prancing Dutchmen. These Teutons
came and multiplied, for the original Puritan stock can hardly be
responsible for the vineyards of the neighborhood, the music and
dancing, and the public gardens along the pleasant lake shore,
where the crowds go, when work is over, to enjoy recreation and
watch the gorgeous summer sunsets across the bosom of the lake
which are the glory of Cleveland. Upon the plateau, the centre of the
city, is the Monumental Park, where stand the statue of Moses
Cleaveland, the founder, who died in 1806, and a fine Soldiers'
Monument, with also a statue of Commodore Perry. This Park is an
attractive enclosure of about ten acres, having fountains, gardens,
monuments and a little lake, and it is intersected at right angles by
two broad streets, and surrounded by important buildings. One of
the streets is the chief business highway, Superior Street, and the
other leads down to the edge of the bluff on the lake shore, where
the steep slope is made into a pleasure-ground, with more flower-
beds and fountains and a pleasant outlook over the water, although
at its immediate base is a labyrinth of railroads and an ample supply
of smoke from the numerous locomotives. A long breakwater
protects the harbor entrance, and out under the lake is bored the
water-works tunnel.
There extends far to the eastward, from a corner of the Monumental
Park, Cleveland's famous street—Euclid Avenue. The people regard it
as the handsomest highway in America, in the combined
magnificence of houses and grounds. It is a level avenue of about
one hundred and fifty feet width, with a central roadway and stone
footwalks on either hand, shaded by rows of grand overarching elms,
and bordered on both sides by well-kept lawns. This is the public
highway, every part being kept scrupulously neat, while a light railing
marks the boundary between the street and the private grounds. For
a long distance this noble avenue is bordered by stately residences,
each surrounded by ample gardens, the stretch of grass, flowers and
foliage extending back from one hundred to four hundred feet
between the street and the buildings. Embowered in trees, and with
all the delights of garden and lawn seen in every direction, this grand
avenue makes a delightful driveway and promenade. Upon it live the
multi-millionaires of Cleveland, the finest residences being upon the
northern side, where they have invested part of the profits of their
railways, mills, mines, oil wells and refineries in adorning their homes
and ornamenting their city. This splendid boulevard, in one way, is a
reproduction of the Parisian Avenue of the Champs Elysées and its
gardens, but with more attractions in the surroundings of its
bordering rows of palaces. Here live the men who vie with those of
Chicago in controlling the commerce of the lakes and the affairs of
the Northwest. Plenty of room and an abundance of income are
necessary to provide each man, in the heart of the city, with two to
ten acres of lawns and gardens around his house, but it is done here
with eminent success. About four miles out is the beautiful Wade
Park, opposite which are the handsome buildings of the Western
Reserve University, having, with its adjunct institutions, a thousand
students. Beyond this, the avenue ends at the attractive Lake View
Cemetery, where, on the highest part of the elevated plateau, with a
grand outlook over Lake Erie, is the grave of the assassinated
President Garfield. His imposing memorial rises to a height of one
hundred and sixty-five feet.
CLEVELAND TO CHICAGO.
Thirty-five miles southwest of Cleveland, and some distance inland
from Lake Erie, is Oberlin, where, in a fertile and prosperous district,
is the leading educational foundation of Northern Ohio—Oberlin
College—named in memory of the noted French philanthropist, and
established in 1833 by the descendants of the Puritan colonists, to
carry out their idea of thorough equality in education. It admits
students without distinction of sex or color, and has about thirteen
hundred, almost equally divided between the sexes, occupying a
cluster of commodious buildings. To the westward is the beautiful
ravine of Black River, which gets out to the lake by falling over a
rocky ledge in two streams, and on the peninsula formed by its forks
is the town of Elyria. Maria Ely was the wife of the founder of the
settlement, who named it after her in this peculiar reversible way.
This romantic stream bounds the Fire Lands of the Western
Reserve, a tract of nearly eight hundred square miles abutting on the
lake shore, which Connecticut set apart for colonization by her
people, who had been sufferers from destructive fires in the towns of
New London, Fairfield and Norwalk on Long Island Sound. They
secured this wilderness in the early part of the nineteenth century,
and their chief town is Sandusky, with twenty-five thousand
population. Here lived most of the Eries, the Indian tribe of the
Cat, who fished in Sandusky Bay, its upper waters being an
archipelago of little green islands abounding with water fowl. They
were known to the adjoining tribes as the Neutral Nation, for they
maintained two villages of refuge on Sandusky River, between the
warlike Indians of the east and the west, and whoever entered their
boundaries was safe from pursuit, the sanctuary being rigidly
observed. The early French missionaries who found them in the
seventeenth century speak of these anomalous villages among the
savages as having then been long in existence.
The name of Sandusky is a corruption of a Wyandot word meaning
cold-water pools, the French having originally rendered it as
Sandosquet. The shores are low, but there is a good harbor and
much trade, and here is located the Ohio State Fish Hatchery. The
railroads are laid among the savannahs and lagoons, and one of the
suburban stations has been not inaptly named Venice. There are
extensive vineyards on the flat and sunny shores of the bay, and this
is one of the most prolific grape districts in the State. Sandusky Bay
is a broad sheet of water, in places six miles wide, and about twenty
miles long. Sandusky has a large timber trade, being noted for the
manufacture of hard woods. Out beyond the bold peninsula,
protruding into the lake at the entrance to the bay, is a group of
islands spreading over the southwestern waters of Lake Erie, of
which Kelly's Island is the chief, an archipelago formed largely from
the detritus washed out of the Detroit, Maumee and various other
rivers flowing into the head of the lake. Here the Erie Indians had a
fortified stronghold, whose outlines can still be traced. The most
noted of the group is Put-in-Bay Island, now a popular watering-
place, which got its name from Commodore Perry, who put in there
with the captured British fleet at the naval battle of Lake Erie,
September 10, 1813. It was from this place, just after his victory,
that he sent the historic despatch, giving him fame, We have met
the enemy and they are ours. The killed of both fleets were buried
side by side near the beach on the island, the place being marked by
a mound. The lovely sheet of water of Put-in-Bay glistens in front,
having the towns of villa-crowned Gibraltar Island upon its surface.
Vineyards and roses abound, these islands, like the adjacent shores,
being noted for their wines.
The Maumee River, coming up from Fort Wayne, flows into the head
of Lake Erie, the largest stream on its southern coast. It comes from
the southwest through the region of the Black Swamp, a vast
district, originally morass and forest, which has been drained to
make a most fertile country. This miserable bog, as the original
settlers denounced it, when they were jolted over the rude corduroy
roads that sustained them upon the quaking morass, has since
become the prolific garden and magnificent forest described by
the modern tourist. The Maumee Valley was an almost continual
battle-ground with the Indians when Mad Anthony Wayne
commanded on that frontier, he being called by them the Wind,
because he drives and tears everything before him. For a quarter
of a century border warfare raged along this river, then known as the
Miami of the Lakes, and its chief settlement, Toledo, passed its
infancy in a baptism of blood and fire. It was at the battle of Fallen
Timbers, fought in 1794, almost on the site of Toledo, that Wayne
gave his laconic and noted field orders. General William Henry
Harrison, then his aide, told Wayne just before the battle he was
afraid he would get into the fight and forget to give the necessary
field orders. Wayne replied: Perhaps I may, and if I do, recollect
that the standing order for the day is, charge the rascals with the
bayonets. Toledo is built on the flat surface on both sides of the
Maumee River and Bay, which make it a good harbor, stretching six
miles down to Lake Erie. There are a hundred thousand population
here, and this energetic reproduction of the ancient Spanish city has
named its chief newspaper the Toledo Blade. The city has extensive
railway connections and a large trade in lumber and grain, coal and
ores, and does much manufacturing, it being well served with
natural gas. A dozen grain elevators line the river banks, and the
factory smokes overhang the broad low-lying city like a pall. To the
westward, crossing the rich lands of the reclaimed swamp, is the
Indiana boundary, that State being here a broad and level prairie,
Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

More Related Content

PDF
Statistical Design And Analysis Of Clinical Trials Principles And Methods 1st...
PDF
Statistical Methods For Immunogenicity Assessment Harry Yang Jianchun Zhang B...
PDF
Modern Adaptive Randomized Clinical Trials Statistical and Practical Aspects ...
PDF
Bayesian Methods For Repeated Measures Broemeling Lyle D
PDF
Bayesian Methods For Repeated Measures Broemeling Lyle D
PDF
Statistical Methods For Healthcare Performance Monitoring 1st Edition Alex Bo...
PDF
Medical Biostatistics 4th Edition Abhaya Indrayan
PDF
Exposureresponse Modeling Methods And Practical Implementation 1st Edition Ji...
Statistical Design And Analysis Of Clinical Trials Principles And Methods 1st...
Statistical Methods For Immunogenicity Assessment Harry Yang Jianchun Zhang B...
Modern Adaptive Randomized Clinical Trials Statistical and Practical Aspects ...
Bayesian Methods For Repeated Measures Broemeling Lyle D
Bayesian Methods For Repeated Measures Broemeling Lyle D
Statistical Methods For Healthcare Performance Monitoring 1st Edition Alex Bo...
Medical Biostatistics 4th Edition Abhaya Indrayan
Exposureresponse Modeling Methods And Practical Implementation 1st Edition Ji...

Similar to Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn (20)

PDF
Medical Biostatistics 4th Edition Abhaya Indrayan
PDF
Quantitative Methods For Traditional Chinese Medicine Development Chow
PDF
Quantitative Methods For Traditional Chinese Medicine Development Chow
PDF
Statistical Methods for Healthcare Performance Monitoring 1st Edition Alex Bo...
PDF
Inference Principles for Biostatisticians 1st Edition Ian C. Marschner
PDF
Cluster Randomised Trials Second Edition Richard J. Hayes
PDF
Medical biostatistics Fourth Edition Indrayan
PDF
Fundamental Concepts for New Clinical Trialists Scott Evans
PDF
Medical biostatistics Fourth Edition Indrayan
PDF
Methods in comparative effectiveness research 1st Edition Gatsonis
PDF
Emerging Nonclinical Biostatistics In Biopharmaceutical Development And Manuf...
PDF
Fundamental Concepts for New Clinical Trialists Scott Evans
PDF
Sample size calculations in clinical research Second Edition Shein-Chung Chow
PDF
Exposure Response Modeling Methods and Practical Implementation 1st Edition J...
PDF
Fundamental Concepts for New Clinical Trialists Scott Evans
PDF
Quantitative Evaluation Of Safety In Drug Development Design Analysis And Rep...
PDF
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
PDF
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
PDF
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
PDF
Randomization Masking and Allocation Concealment 1st Edition Vance Berger
Medical Biostatistics 4th Edition Abhaya Indrayan
Quantitative Methods For Traditional Chinese Medicine Development Chow
Quantitative Methods For Traditional Chinese Medicine Development Chow
Statistical Methods for Healthcare Performance Monitoring 1st Edition Alex Bo...
Inference Principles for Biostatisticians 1st Edition Ian C. Marschner
Cluster Randomised Trials Second Edition Richard J. Hayes
Medical biostatistics Fourth Edition Indrayan
Fundamental Concepts for New Clinical Trialists Scott Evans
Medical biostatistics Fourth Edition Indrayan
Methods in comparative effectiveness research 1st Edition Gatsonis
Emerging Nonclinical Biostatistics In Biopharmaceutical Development And Manuf...
Fundamental Concepts for New Clinical Trialists Scott Evans
Sample size calculations in clinical research Second Edition Shein-Chung Chow
Exposure Response Modeling Methods and Practical Implementation 1st Edition J...
Fundamental Concepts for New Clinical Trialists Scott Evans
Quantitative Evaluation Of Safety In Drug Development Design Analysis And Rep...
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
Statistical Methods For Immunogenicity Assessment Yang Harry Yu
Randomization Masking and Allocation Concealment 1st Edition Vance Berger
Ad

More from bhjodkn142 (6)

PDF
Flight From Wonder An Investigation Of Scientific Creativity 1st Edition Albe...
PDF
F104 Starfighter Units In Combat Peter Davies Rolando Ugolini
PDF
Architectural Tiles Conservation And Restoration 2nd Edition Lesley Durbin
PDF
An Introduction To Essential Algebraic Structures 1st Edition Martyn R Dixon
PDF
Ancient And Medieval Concepts Of Friendship Suzanne Sterngillet
PDF
Administrative Law In A Changing State Essays In Honour Of Mark Aronson Linda...
Flight From Wonder An Investigation Of Scientific Creativity 1st Edition Albe...
F104 Starfighter Units In Combat Peter Davies Rolando Ugolini
Architectural Tiles Conservation And Restoration 2nd Edition Lesley Durbin
An Introduction To Essential Algebraic Structures 1st Edition Martyn R Dixon
Ancient And Medieval Concepts Of Friendship Suzanne Sterngillet
Administrative Law In A Changing State Essays In Honour Of Mark Aronson Linda...
Ad

Recently uploaded (20)

PPTX
Share_Module_2_Power_conflict_and_negotiation.pptx
PPTX
Education and Perspectives of Education.pptx
PDF
Hazard Identification & Risk Assessment .pdf
PDF
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
PDF
Literature_Review_methods_ BRACU_MKT426 course material
DOCX
Cambridge-Practice-Tests-for-IELTS-12.docx
PPTX
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
PDF
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
PDF
Environmental Education MCQ BD2EE - Share Source.pdf
PDF
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
PDF
CRP102_SAGALASSOS_Final_Projects_2025.pdf
PPTX
Module on health assessment of CHN. pptx
PDF
Complications of Minimal Access-Surgery.pdf
PPTX
Computer Architecture Input Output Memory.pptx
PDF
Race Reva University – Shaping Future Leaders in Artificial Intelligence
PDF
FORM 1 BIOLOGY MIND MAPS and their schemes
PDF
My India Quiz Book_20210205121199924.pdf
PDF
English Textual Question & Ans (12th Class).pdf
PDF
IP : I ; Unit I : Preformulation Studies
PDF
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf
Share_Module_2_Power_conflict_and_negotiation.pptx
Education and Perspectives of Education.pptx
Hazard Identification & Risk Assessment .pdf
FOISHS ANNUAL IMPLEMENTATION PLAN 2025.pdf
Literature_Review_methods_ BRACU_MKT426 course material
Cambridge-Practice-Tests-for-IELTS-12.docx
ELIAS-SEZIURE AND EPilepsy semmioan session.pptx
BP 704 T. NOVEL DRUG DELIVERY SYSTEMS (UNIT 1)
Environmental Education MCQ BD2EE - Share Source.pdf
LEARNERS WITH ADDITIONAL NEEDS ProfEd Topic
CRP102_SAGALASSOS_Final_Projects_2025.pdf
Module on health assessment of CHN. pptx
Complications of Minimal Access-Surgery.pdf
Computer Architecture Input Output Memory.pptx
Race Reva University – Shaping Future Leaders in Artificial Intelligence
FORM 1 BIOLOGY MIND MAPS and their schemes
My India Quiz Book_20210205121199924.pdf
English Textual Question & Ans (12th Class).pdf
IP : I ; Unit I : Preformulation Studies
1.3 FINAL REVISED K-10 PE and Health CG 2023 Grades 4-10 (1).pdf

Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn

  • 1. Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn download https://ebookbell.com/product/sample-size-calculations-for- clustered-and-longitudinal-outcomes-in-clinical-research-chul- ahn-4946084 Explore and download more ebooks at ebookbell.com
  • 2. Here are some recommended products that we believe you will be interested in. You can click the link to download. Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn Moonseoung Heo Song Zhang https://ebookbell.com/product/sample-size-calculations-for-clustered- and-longitudinal-outcomes-in-clinical-research-chul-ahn-moonseoung- heo-song-zhang-4960296 Sample Size Calculations In Clinical Research Chapman Hallcrc Biostatistics Series 3rd Edition Chow https://ebookbell.com/product/sample-size-calculations-in-clinical- research-chapman-hallcrc-biostatistics-series-3rd-edition- chow-55512002 Sample Size Calculations In Clinical Research Third Edition 3rd Edition Sheinchung Chow https://ebookbell.com/product/sample-size-calculations-in-clinical- research-third-edition-3rd-edition-sheinchung-chow-6750322 Sample Size Calculations In Clinical Research Second Sheinchung Chow https://ebookbell.com/product/sample-size-calculations-in-clinical- research-second-sheinchung-chow-896782
  • 3. Sample Size Calculations In Clinical Research 2 Rev Exp Sheinchung Chow https://ebookbell.com/product/sample-size-calculations-in-clinical- research-2-rev-exp-sheinchung-chow-1357638 Methods And Applications Of Sample Size Calculation And Recalculation In Clinical Trials 1st Ed Meinhard Kieser https://ebookbell.com/product/methods-and-applications-of-sample-size- calculation-and-recalculation-in-clinical-trials-1st-ed-meinhard- kieser-22504494 Sample Size Tables For Clinical Studies 3rd Edition David Machin https://ebookbell.com/product/sample-size-tables-for-clinical- studies-3rd-edition-david-machin-2418960 Sample Size Tables For Clinical Studies David Machin Et Al https://ebookbell.com/product/sample-size-tables-for-clinical-studies- david-machin-et-al-4138216 Sample Size Determination And Power Thomas P Ryanauth https://ebookbell.com/product/sample-size-determination-and-power- thomas-p-ryanauth-4318598
  • 5. Accurate sample size calculation ensures that clinical studies have adequate power to detect clinically meaningful effects. This results in the efficient use of resources and avoids exposing a disproportionate number of patients to experimental treatments caused by an over- powered study. Sample Size Calculations for Clustered and Longitudinal Out- comes in Clinical Research explains how to determine sample size for studies with correlated outcomes, which are widely implemented in medical, epidemiological, and behavioral studies. The book focuses on issues specific to the two types of correlated outcomes: longitudinal and clustered. For clustered studies, the au- thors provide sample size formulas that accommodate variable clus- ter sizes and within-cluster correlation. For longitudinal studies, they present sample size formulas to account for within-subject correla- tion among repeated measurements and various missing data pat- terns. For multiple levels of clustering, the level at which to perform randomization actually becomes a design parameter. The authors show how this can greatly impact trial administration, analysis, and sample size requirement. Addressing the overarching theme of sample size determination for correlated outcomes, this book provides a useful resource for bio- statisticians, clinical investigators, epidemiologists, and social scien- tists whose research involves trials with correlated outcomes. Each chapter is self-contained so readers can explore topics relevant to their research projects without having to refer to other chapters. Statistics K15411 w w w . c r c p r e s s . c o m Chul Ahn Moonseong Heo Song Zhang Ahn, Heo, and Zhang Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research K15411_cover.indd 1 11/4/14 10:32 AM
  • 6. Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research
  • 7. Editor-in-Chief Shein-Chung Chow, Ph.D., Professor, Department of Biostatistics and Bioinformatics, Duke University School of Medicine, Durham, North Carolina Series Editors Byron Jones, Biometrical Fellow, Statistical Methodology, Integrated Information Sciences, Novartis Pharma AG, Basel, Switzerland Jen-pei Liu, Professor, Division of Biometry, Department of Agronomy, National Taiwan University, Taipei, Taiwan Karl E. Peace, Georgia Cancer Coalition, Distinguished Cancer Scholar, Senior Research Scientist and Professor of Biostatistics, Jiann-Ping Hsu College of Public Health, Georgia Southern University, Statesboro, Georgia Bruce W. Turnbull, Professor, School of Operations Research and Industrial Engineering, Cornell University, Ithaca, New York Published Titles Adaptive Design Methods in Clinical Trials, Second Edition Shein-Chung Chow and Mark Chang Adaptive Design Theory and Implementation Using SAS and R, Second Edition Mark Chang Advanced Bayesian Methods for Medical Test Accuracy Lyle D. Broemeling Advances in Clinical Trial Biostatistics Nancy L. Geller Applied Meta-Analysis with R Ding-Geng (Din) Chen and Karl E. Peace Basic Statistics and Pharmaceutical Statistical Applications, Second Edition James E. De Muth Bayesian Adaptive Methods for Clinical Trials Scott M. Berry, Bradley P. Carlin, J. Jack Lee, and Peter Muller Bayesian Analysis Made Simple: An Excel GUI for WinBUGS Phil Woodward Bayesian Methods for Measures of Agreement Lyle D. Broemeling Bayesian Methods in Epidemiology Lyle D. Broemeling Bayesian Methods in Health Economics Gianluca Baio Bayesian Missing Data Problems: EM, Data Augmentation and Noniterative Computation Ming T. Tan, Guo-Liang Tian, and Kai Wang Ng Bayesian Modeling in Bioinformatics Dipak K. Dey, Samiran Ghosh, and Bani K. Mallick Benefit-Risk Assessment in Pharmaceutical Research and Development Andreas Sashegyi, James Felli, and Rebecca Noel Biosimilars: Design and Analysis of Follow-on Biologics Shein-Chung Chow Biostatistics: A Computing Approach Stewart J. Anderson Causal Analysis in Biomedicine and Epidemiology: Based on Minimal Sufficient Causation Mikel Aickin Clinical and Statistical Considerations in Personalized Medicine Claudio Carini, Sandeep Menon, and Mark Chang Clinical Trial Data Analysis using R Ding-Geng (Din) Chen and Karl E. Peace
  • 8. Clinical Trial Methodology Karl E. Peace and Ding-Geng (Din) Chen Computational Methods in Biomedical Research Ravindra Khattree and Dayanand N. Naik Computational Pharmacokinetics Anders Källén Confidence Intervals for Proportions and Related Measures of Effect Size Robert G. Newcombe Controversial Statistical Issues in Clinical Trials Shein-Chung Chow Data and Safety Monitoring Committees in Clinical Trials Jay Herson Design and Analysis of Animal Studies in Pharmaceutical Development Shein-Chung Chow and Jen-pei Liu Design and Analysis of Bioavailability and Bioequivalence Studies, Third Edition Shein-Chung Chow and Jen-pei Liu Design and Analysis of Bridging Studies Jen-pei Liu, Shein-Chung Chow, and Chin-Fu Hsiao Design and Analysis of Clinical Trials with Time-to-Event Endpoints Karl E. Peace Design and Analysis of Non-Inferiority Trials Mark D. Rothmann, Brian L. Wiens, and Ivan S. F. Chan Difference Equations with Public Health Applications Lemuel A. Moyé and Asha Seth Kapadia DNA Methylation Microarrays: Experimental Design and Statistical Analysis Sun-Chong Wang and Arturas Petronis DNA Microarrays and Related Genomics Techniques: Design, Analysis, and Interpretation of Experiments David B. Allison, Grier P. Page, T. Mark Beasley, and Jode W. Edwards Dose Finding by the Continual Reassessment Method Ying Kuen Cheung Elementary Bayesian Biostatistics Lemuel A. Moyé Frailty Models in Survival Analysis Andreas Wienke Generalized Linear Models: A Bayesian Perspective Dipak K. Dey, Sujit K. Ghosh, and Bani K. Mallick Handbook of Regression and Modeling: Applications for the Clinical and Pharmaceutical Industries Daryl S. Paulson Inference Principles for Biostatisticians Ian C. Marschner Interval-Censored Time-to-Event Data: Methods and Applications Ding-Geng (Din) Chen, Jianguo Sun, and Karl E. Peace Joint Models for Longitudinal and Time- to-Event Data: With Applications in R Dimitris Rizopoulos Measures of Interobserver Agreement and Reliability, Second Edition Mohamed M. Shoukri Medical Biostatistics, Third Edition A. Indrayan Meta-Analysis in Medicine and Health Policy Dalene Stangl and Donald A. Berry Mixed Effects Models for the Population Approach: Models, Tasks, Methods and Tools Marc Lavielle Monte Carlo Simulation for the Pharmaceutical Industry: Concepts, Algorithms, and Case Studies Mark Chang Multiple Testing Problems in Pharmaceutical Statistics Alex Dmitrienko, Ajit C. Tamhane, and Frank Bretz
  • 9. Noninferiority Testing in Clinical Trials: Issues and Challenges Tie-Hua Ng Optimal Design for Nonlinear Response Models Valerii V. Fedorov and Sergei L. Leonov Patient-Reported Outcomes: Measurement, Implementation and Interpretation Joseph C. Cappelleri, Kelly H. Zou, Andrew G. Bushmakin, Jose Ma. J. Alvir, Demissie Alemayehu, and Tara Symonds Quantitative Evaluation of Safety in Drug Development: Design, Analysis and Reporting Qi Jiang and H. Amy Xia Randomized Clinical Trials of Nonpharmacological Treatments Isabelle Boutron, Philippe Ravaud, and David Moher Randomized Phase II Cancer Clinical Trials Sin-Ho Jung Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research Chul Ahn, Moonseong Heo, and Song Zhang Sample Size Calculations in Clinical Research, Second Edition Shein-Chung Chow, Jun Shao and Hansheng Wang Statistical Analysis of Human Growth and Development Yin Bun Cheung Statistical Design and Analysis of Stability Studies Shein-Chung Chow Statistical Evaluation of Diagnostic Performance: Topics in ROC Analysis Kelly H. Zou, Aiyi Liu, Andriy Bandos, Lucila Ohno-Machado, and Howard Rockette Statistical Methods for Clinical Trials Mark X. Norleans Statistical Methods in Drug Combination Studies Wei Zhao and Harry Yang Statistics in Drug Research: Methodologies and Recent Developments Shein-Chung Chow and Jun Shao Statistics in the Pharmaceutical Industry, Third Edition Ralph Buncher and Jia-Yeong Tsay Survival Analysis in Medicine and Genetics Jialiang Li and Shuangge Ma Theory of Drug Development Eric B. Holmgren Translational Medicine: Strategies and Statistical Methods Dennis Cosmatos and Shein-Chung Chow
  • 10. Chul Ahn University of Texas Southwestern Medical Center Dallas, Texas, USA Moonseong Heo Albert Einstein College of Medicine Bronx, New York, USA Song Zhang University of Texas Southwestern Medical Center Dallas, Texas, USA Sample Size Calculations for Clustered and Longitudinal Outcomes in Clinical Research
  • 11. CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20141029 International Standard Book Number-13: 978-1-4665-5627-0 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information stor- age or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copy- right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro- vides licenses and registration for a variety of users. For organizations that have been granted a photo- copy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com
  • 12. Contents Preface ix List of Figures xi List of Tables xiii 1 Sample Size Determination for Independent Outcomes 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Precision Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 18 2 Sample Size Determination for Clustered Outcomes 23 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 One–Sample Clustered Continuous Outcomes . . . . . . . . . 24 2.3 One–Sample Clustered Binary Outcomes . . . . . . . . . . . 28 2.4 Two–Sample Clustered Continuous Outcomes . . . . . . . . 34 2.5 Two–Sample Clustered Binary Outcomes . . . . . . . . . . . 38 2.6 Stratified Cluster Randomization for Binary Outcomes . . . 42 2.7 Nonparametric Approach for One–Sample Clustered Binary Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 51 3 Sample Size Determination for Repeated Measurement Outcomes Using Summary Statistics 61 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.2 Information Needed for Sample Size Estimation . . . . . . . 62 3.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . 64 3.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 78 4 Sample Size Determination for Correlated Outcome Measurements Using GEE 83 4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Review of GEE . . . . . . . . . . . . . . . . . . . . . . . . . 85 4.3 Compare the Slope for a Continuous Outcome . . . . . . . . 90 4.4 Test the TAD for a Continuous Outcome . . . . . . . . . . . 110 4.5 Compare the Slope for a Binary Outcome . . . . . . . . . . . 119 vii
  • 13. viii Contents 4.6 Test the TAD for a Binary Outcome . . . . . . . . . . . . . . 123 4.7 Compare the Slope for a Count Outcome . . . . . . . . . . . 126 4.8 Test the TAD for a Count Outcome . . . . . . . . . . . . . . 130 4.9 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 134 5 Sample Size Determination for Correlated Outcomes from Two-Level Randomized Clinical Trials 149 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 5.2 Statistical Models for Continuous Outcomes . . . . . . . . . 150 5.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 151 5.4 Two-Level Longitudinal Designs: Testing Slope Differences . 158 5.5 Cross-Sectional Factorial Designs: Interactions between Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 172 5.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 176 5.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 181 6 Sample Size Determination for Correlated Outcomes from Three-Level Randomized Clinical Trials 187 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 6.2 Statistical Model for Continuous Outcomes . . . . . . . . . . 187 6.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 189 6.4 Testing Slope Differences . . . . . . . . . . . . . . . . . . . . 200 6.5 Cross-Sectional Factorial Designs: Interactions between Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211 6.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 218 6.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 223 6.8 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 230 Index 235
  • 14. Preface One of the most common questions statisticians encounter during interaction with clinical investigators is “How many subjects do I need for this study?” Clinicians are often surprised to find out that the required sample size depends on a number of factors. Obtaining such information for sample size calcula- tion is not trivial, and often involves preliminary studies, literature review, and, more than occasionally, educated guess. The validity of clinical research is judged not by the results but by how it is designed and conducted. Ac- curate sample size calculation ensures that a study has adequate power to detect clinically meaningful effects and avoids the waste in resources and the risk of exposing excessive patients to experimental treatments caused by an overpowered study. In this book we focus on sample size determination for studies with cor- related outcomes, which are widely implemented in medical, epidemiological, and behavioral studies. Correlated outcomes are usually categorized into two types: clustered and longitudinal. The former arises from trials where random- ization is performed at the level of some aggregates (e.g., clinics) of research subjects (e.g., patients). The latter arises when the outcome is measured at multiple time points during follow-up from each subject. A key difference between these two types is that for a clustered design, subjects within a clus- ter are considered exchangeable, while for a longitudinal design, the multiple measurements from the each subject are distinguished by their unique time stamps. Designing a randomized trial with correlated outcomes poses special chal- lenges and opportunities for researchers. Appropriately accounting for the correlation with different structures requires more sophisticated methodolo- gies for analysis and sample size calculation. In practice it is also likely that researchers might encounter correlated outcomes with a hierarchical structure. For example, multiple levels of nested clustering (e.g., patients nested in clinics and clinics nested in hospital systems) can occur, and such designs can be- come more complicated if longitudinal measurements are obtained from each subject. Missing data leads to the challenge of “partially” observed data for clinical trials with correlated outcomes, and its impact on sample size require- ment depends on many factors: the number of longitudinal measurements, the structure and strength of correlation, and the distribution of missing data. On the other hand, researchers enjoy some additional flexibility in designing ran- domized trials with correlated outcomes. When multiple levels of clustering are involved, the level at which to perform randomization actually becomes a ix
  • 15. x Preface design parameter, which can greatly impact trial administration, analysis, and sample size requirement. This issue is explored in Chapters 5 and 6. Another example is that in longitudinal studies, to certain extent, researchers can com- pensate the lack of unique subjects by increasing the number of measurements from each subject, and vice versa. This feature has profound implication for the design of clinical trials where the cost of recruiting an additional subject is drastically different from the cost of obtaining an additional measurement from an existing subject. It requires researchers to explore the trade-off be- tween the number of subjects and the number of measurements per subject in order to achieve the optimal power under a given financial constraint. We explore this topic in Chapters 3 and 4. The outline of this book is as follows. In Chapter 1 we review sample size determination for independent outcomes. Advanced readers who are already familiar with sample size problems can skip this chapter. In Chapter 2 we explore sample size determination for variants of clustered trials, including one- and two-sample trials, continuous and binary outcomes, stratified cluster design, and nonparametric approaches. In Chapter 3 we review sample size methods based on summary statistics (such as individually estimated means or slopes) obtained from longitudinal outcomes. In Chapter 4 we present sam- ple size determination based on GEE approaches for various types of corre- lated outcomes, including continuous, binary, and count. The impact of miss- ing data, correlation structures, and financial constraints is investigated. In Chapter 5 we present sample size determination based on mixed-effects model approaches for randomized clinical trials with two level data structure. Lon- gitudinal and cross-sectional factorial designs are explored. In Chapter 6 we further extend the mixed-effects model sample size approaches to scenarios where three level data structures are involved in randomized trials. We wish this book to serve as a useful resource for biostatisticians, clini- cal investigators, epidemiologists, and social scientists whose research involves randomized trials with correlated outcomes. While jointly addressing the over- arching theme of sample size determination for correlated outcome under such settings, individual chapters are written in a self-contained manner so that readers can explore specific topics relevant to their research projects without having to refer to other chapters. We give special thanks to Dr. Mimi Y. Kim for her enthusiastic support by providing critical reviews and suggestions, examples, edits, and corrections throughout the chapters. Without her input, this book would have not been in the present form. We also thank Acquisitions Editor David Grubbs for provid- ing the opportunity to work on this book, and Production Manager Suzanne Lassandro for her outstanding support in publishing this book. In addition, we thank the support of the University of Texas Southwestern Medical Center and the Albert Einstein College of Medicine. Chul Ahn, PhD Moonseong Heo, PhD Song Zhang, PhD
  • 16. List of Figures 1.1 Sample size estimation for a one–sided test in a one–sample problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.1 Numerical study to explore the relationship between s2 t and ρ, under the scenario of complete data and various values of θ from the damped exponential family. θ = 1 corresponds to AR(1) and θ = 0 corresponds to CS. The measurement times are normalized such that tm − t1 = 1. Hence ρ1m = ρ under all values of θ. . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2 Numerical study to explore the relationship between s2 t and ρ, under the scenario of incomplete data and various values of θ from the damped exponential family. θ = 1 corresponds to AR(1) and θ = 0 corresponds to CS. IM and MM represent the independent and monotone missing pattern, respectively. The measurement times are normalized such that tm−t1 = 1. Hence ρ1m = ρ under all values of θ. . . . . . . . . . . . . . 97 4.3 A numerical study to explore n{m+1} n{m} under missing data and different correlation structures. The vertical axis is n{m+1} n{m} . “Complete” indicates the scenario of complete data. “IM” and “’MM” indicate the independent and monotone missing patterns, respectively, with marginal observant probabilities computed by δj = 1 − 0.3 ∗ (j − 1)/(m − 1). . . . . . . . . . 101 4.4 Different trends in the marginal observant probabilities. δ1 approximately follows a linear trend. δ2 is relatively steady initially but drops quickly afterward. δ3 drops quickly from the beginning but plateaus. . . . . . . . . . . . . . . . . . . 109 5.1 Geometrical representations of fixed parameters in model (5.12) for a parallel-arm longitudinal cluster randomized trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160 5.2 Geometrical representations of fixed parameters in model (5.31) for a 2-by-2 factorial longitudinal cluster randomized trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 xi
  • 18. List of Tables 2.1 Proportion of infection (yi/mi) from n = 29 subjects (clusters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 2.2 Distribution of the number of infected sites (mi) . . . . . . 33 2.3 Stepped wedge design, where C represents control and I represents intervention . . . . . . . . . . . . . . . . . . . . . 53 4.1 Sample sizes under various scenarios . . . . . . . . . . . . . 110 5.1 Sample size and power for detecting a main effect δ(2) in model (5.3) when randomizations occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . . . . 154 5.2 Sample size and power for detecting a main effect δ(1) in model (5.8) when randomizations occur at the first level (two-sided significance level α = 0.05) . . . . . . . . . . . . 157 5.3 Sample size and power for detecting an effect δ(f) on slope differences in a fixed-slope model (5.12) with rτ = 0 when randomizations occur at the second level (two-sided signifi- cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 162 5.4 Sample size and power for detecting an effect δ(f) on slope differences in a random-slope model (5.4.5) with rτ = 0.1 when randomizations occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 164 5.5 Sample size and power for detecting a main effect δ(e) at the end of study in a fixed-slope model (5.22) when randomiza- tions occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 5.6 Sample size and power for detecting a two-way interaction XZ effect δXZ(2) in model (5.25) for a 2-by-2 factorial design when randomizations occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 170 5.7 Sample size and power for detecting a two-way interaction XZ effect δXZ(1) in model (5.28) for a 2-by-2 factorial de- sign when randomizations occur at the first level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 173 xiii
  • 19. xiv List of Tables 5.8 Sample size and statistical power for detecting a three-way interaction XZT effect δXZT in model (5.31) for a 2-by-2 factorial design when randomizations occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . 176 5.9 Sample size and statistical power for detecting a main effect |p1 − p0| on binary outcome in model with m = 2 (5.34) when randomizations occur at the second level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 179 5.10 Sample size and statistical power for detecting a main effect |p1 −p0| on binary outcome in model with m = 1 (5.34) when randomizations occur at the first level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 181 6.1 Sample size and power for detecting a main effect δ(3) in model (6.4) when randomizations occur at the third level with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 192 6.2 Sample size and power for detecting a main effect δ(2) in model (6.9) when randomizations occur at the second level with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 195 6.3 Sample size and power for detecting a main effect δ(1) in model (6.13) when randomizations occur at the first level with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 198 6.4 Sample size and power for detecting an effect δ(f) on slope differences in a three-level fixed-slope model (6.17) with rτ = 0 when randomizations occur at the third level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 204 6.5 Sample size and power for detecting an effect δ(r) on slope differences in a three-level random-slope model (6.22) with rτ = 0.1 when randomizations occur at the third level (two- sided significance level α = 0.05) . . . . . . . . . . . . . . . 207 6.6 Sample size and power for detecting a main effect δ(e) at the end of study in a three-level fixed-slope model (6.28) when randomizations occur at the third level (two-sided signifi- cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 211 6.7 Sample size and power for detecting a two-way interaction XZ effect δXZ(3) in model with m = 3 (6.31) for a 2-by-2 factorial design when randomizations occur at the third level with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 214 6.8 Sample size and power for detecting a two-way interaction XZ effect δXZ(2) in model with m = 2 (6.31) for a 2-by- 2 factorial design when randomizations occur at the second level with ρ2 = 0.05 (two-sided significance level α = 0.05) . 216
  • 20. List of Tables xv 6.9 Sample size and power for detecting a two-way interaction XZ effect δXZ(1) in model with m = 1 (6.31) for a 2-by-2 factorial design when randomizations occur at the first level with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 219 6.10 Sample size and power for detecting a three-way interaction XZT effect δXZT in model (6.38) for a 2-by-2 factorial de- sign when randomizations occur at the third level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . 222 6.11 Sample size and statistical power for detecting a main effect |p1 −p0| on binary outcome in model with m = 3 (6.41) when randomizations occur at third level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 226 6.12 Sample size and statistical power for detecting a main effect |p1 −p0| on binary outcome in model with m = 2 (6.41) when randomizations occur at second level (two-sided significance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 228 6.13 Sample size and statistical power for detecting a main effect |p1 − p0| on binary outcome in model with m = 1 (6.41) when randomizations occur at first level (two-sided signifi- cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 231
  • 22. 1 Sample Size Determination for Independent Outcomes 1.1 Introduction One of the most common questions any statistician gets asked from clinical investigators is “How many subjects do I need?” Researchers are often sur- prised to find out that the required sample size depends on a number of factors and they have to provide information to a statistician before they can get an answer. Clinical research is judged to be valid not by the results but by how it is designed and conducted. The cliche “do it right or do it over” is particularly apt in clinical research. One of the most important aspects in clinical research design is the sample size estimation. In planning a clinical trial, it is necessary to determine the number of subjects to be recruited for the clinical trial in order to achieve sufficient power to detect the hypothesized effect. The ICH E9 guidance [1] states: “The number of subjects in a clinical trial should always be large enough to provide a reliable answer to the questions addressed. This number is usually determined by the primary objective of the trial. If the sample size is determined on some other basis, then this should be made clear and justified. For example, a trial sized on the basis of safety questions or requirements or important secondary objectives may need larger or smaller numbers of subjects than a trial sized on the basis of the primary efficacy question.” Sample size in clinical trials must be carefully estimated if the results are to be credible. If the number of subjects is too small, even a well–conducted trial will have little chance of detecting the hypothesized effect. Ideally, the sample size should be large enough to have a high probability of detecting a clinically important difference between treatment groups and to show it to be statistically significant if such a difference really exists. If the number of subjects is too large, the clinical trial will lead to statistical significance for an effect of little clinical importance. Conversely, the clinical trial may not lead to statistical significance despite a large difference that is clinically important if the number of subjects is too small. When an investigator designs a study, an investigator should consider con- straints such as time, cost, and the number of available subjects. However, these constraints should not dictate the sample size. There is no reason to 1
  • 23. 2 Sample Size Calculations for Clustered and Longitudinal Outcomes carry out a study that is too small, only to come up with results that are inconclusive, since an investigator will then need to carry out another study to confirm or refute the initial results. Selecting an appropriate sample size is a crucial step in the design of a study. A study with an insufficient sample size may not have sufficient statistical power to detect meaningful effects and may not produce reliable answers to important research questions. Krzywinski and Altman [2] say that the ability to detect experimental effect is weakened in studies that do not have sufficient power. Choosing the appropriate sample size increases the chance of detecting a clinically meaningful effect and ensures that the study is both ethical and cost-effective. Sample size is usually estimated by precision analysis or power analysis. In precision analysis, sample size is determined by the standard error or the margin of error at a fixed significance level. The approach of precision anal- ysis is simple and easy to estimate the sample size [3]. In power analysis, sample size is estimated to achieve a desired power for detecting a clinically or scientifically meaningful difference at a fixed type I error rate. Power anal- ysis is the most commonly used method for sample size estimation in clinical research. The sample size calculation requires assumptions that typically can- not be tested until the data have been collected from the trial. Sample size calculations are thus inherently hypothetical. 1.2 Precision Analysis Sample size estimation is needed for the study in which the goal is to estimate the unknown parameter with a certain degree of precision. Thus, some key decisions in planning a study are “How precise will the parameter estimate be if I select a particular sample size?” and “How large a sample size do I need to attain a desirable level of precision?” What we are essentially saying is that we want the confidence interval to be of a certain width, in which the 100(1−α)% confidence level reflects the probability of including the true (but unknown) value of the parameter. Since the precision is determined by the width of the confidence interval, the goal of precision analysis is to determine the sample size that allows the confidence interval to be within a pre-specified width. The narrower the confidence interval is, the more precise the parameter inference is. Confidence interval estimation provides a convenient alternative to significance testing in most situations. The confidence interval approach is equivalent to the method of hypothesis testing. That is, if the confidence interval does not include the parameter value under the null hypothesis, the null hypothesis is rejected at a two–sided significance level of α. For example, consider the hypothesis of no difference between means (µ1 and µ2). The method of hypothesis testing rejects the hypothesis H0 : µ1 − µ2 = 0 at the two–sided significance level of α if and only if the 100(1 − α)% confidence
  • 24. Sample Size Determination for Independent Outcomes 3 interval for the mean difference (µ1−µ2) does not include the value zero. Thus, the significance test can be performed with the confidence interval approach. 1.2.1 Continuous Outcomes Suppose that Y1, . . . , Yn are independent and identically distributed normal random variables with mean µ and variance σ2 . The parameter µ can be estimated by the sample mean ȳ = Pn i=1 Yi. When σ2 is known, the 100(1 − α)% confidence interval is ȳ ± z1−α/2 σ √ n , where z1−α/2 is the 100(1−α/2)th percentile of the standard normal distribu- tion. Note that the sample size estimate based on precision analysis depends on the type I error rate, not on the type II error rate. The maximum half width of the confidence interval is called the maximum error of an estimate of the unknown parameter. Suppose that the maximum error of µ is δ. Then, the required minimum sample size is the smallest integer that is greater than or equal to n solved from the following equation: z1−α/2 σ √ n = δ. Thus, the required sample size is the smallest integer that is greater than or equal to n: n = z2 1−α/2σ2 δ2 . (1.1) From Equation (1.1), we can obtain the required sample size once the maximum error or the width of the 100(1 − α)% confidence interval of µ is specified. 1.2.1.1 Example Suppose that a clinical investigator is interested in estimating how much re- duction will be made on the fasting serum–cholesterol level with administra- tion of a new cholesterol–lowering drug for 6 months among recent Hispanic immigrants with a given degree of precision. Suppose that the standard de- viation (σ) for reduction in cholesterol level equals 40 mg/dl. We would like to estimate the minimum sample size needed to estimate the reduction in fasting serum–cholesterol level if we require that the 95% confidence interval for reduction in cholesterol level is no wider than 20 mg/dl. The 100(1 − α)% confidence interval for true reduction in fasting serum–cholesterol level is ȳ ± z1−α/2 σ √ n , where ȳ is the mean change in fasting serum–cholesterol level after adminis- tration of a drug, and z1−α/2 is the 100(1 − α/2)th percentile of the standard
  • 25. 4 Sample Size Calculations for Clustered and Longitudinal Outcomes normal distribution. The width of a 95% confidence interval is 2 · z1−α/2 σ √ n = 2 · 1.96 · 40 √ n . We want the width of the 95% confidence interval to be no wider than 20 mg/dl. The required sample size is the smallest integer satisfying n ≥ 4 · (1.96)2 (40)2 /(20)2 = 61.5. In order for a 95% confidence interval of reduction in cholesterol level to be no wider than 20 mg/dl, we need at least 62 subjects when the standard deviation for reduction in cholesterol level equals to 40 mg/dl. 1.2.2 Binary Outcomes The study goal may be based on finding a suitably narrow confidence interval for the statistics of interest at a given significance level (α), where the signif- icance level is usually considered as the maximum probability of type I error that can be tolerated. We may want to know how many subjects are required for the 100(1 − α)% confidence interval to be a certain width. Suppose that Y1, . . . , Yn are independent and identically distributed Bernoulli random variables with mean p = E(Yi), (i = 1, . . . , n). The param- eter p can be estimated by the sample mean p̂ = Pn i=1 Yi/n. For large n, p̂ is asymptotically normal with mean p and variance p(1−p)/n. The 100(1−α)% confidence interval for p is p̂ ± z1−α/2 r p̂(1 − p̂) n . Suppose that the maximum error of p is δ. Then, the sample size can be estimated by z1−α/2 r p̂(1 − p̂) n = δ. Thus, the required sample size is n = z2 1−α/2p̂(1 − p̂) δ2 . (1.2) We can estimate the sample size from Equation (1.2) once the maximum error or the width of the 100(1 − α)% confidence interval for p is specified. There are a number of alternative ways to estimate the confidence interval for a binomial proportion [4]. 1.2.2.1 Example Suppose that a clinical investigator is interested in conducting a clinical trial with a new cancer drug to estimate the response rate with a maximum er- ror of 20%. In oncology, the response rate (RR) is generally defined as the
  • 26. Sample Size Determination for Independent Outcomes 5 proportion of patients whose tumor completely disappears (termed a complete response, CR) or shrinks more than 50% after treatment (termed a partial re- sponse, PR). In simpler terms, RR = PR + CR. An investigator expects the response rate of a new cancer drug to be 30%. How many patients are needed to achieve a maximum error of 20%? Let p̂ be the estimate of the response rate. The maximum error of the response rate is z1−α/2 p p̂(1 − p̂)/n. With the guessed value of p̂ = 0.3, a maximum error of p is z1−α/2 p 0.3 · 0.7/n. Thus, we need z1−α/2 p 0.3 · 0.7/n ≤ 0.2, or n ≥ 21. That is, we need at least 21 subjects to obtain a maximum error ≤ 20%. When we do not know the value of p, a conservative approach is to use p̂ that yields the maximum error. The maximum error of p occurs when p̂ = 0.5. So, a conservative maximum error of p is z1−α/2 p 0.5 · 0.5/n = z1−α/20.5/ √ n. Thus, 1.96 · 0.5/ √ n ≤ 0.2 at a 5% significance level. Therefore, the required sample size is n = 25. An investigator should recruit at least 25 subjects to achieve a maximum error of 20% in the response rate estimation. The larger the sample size, the more precise the estimate of the parameter will be if all the other factors are equal. An investigator should specify what degree of precision is aimed for the study. A trial will take more cost and time as the size of a trial increases. In order to estimate the sample size using preci- sion analysis, we need to decide how large the maximum error of the unknown parameter is or how wide the confidence interval for the unknown parameter is, and we need to know the formula for the relevant maximum error. 1.3 Power Analysis Power analysis uses two types of errors (type I and II errors) for sample size estimation while precision analysis uses only one type of error (type I error) for sample size estimation. Power analysis tests the null hypothesis at a pre- determined level of significance with a desired power. 1.3.1 Information Needed for Power Analysis A clinical trial that is conducted without attention to sample size or power information takes the risks of either failing to detect clinically meaningful differences (i.e., type II error) or using an unnecessarily excessive number of subjects for a study. Either case fails to adhere to the Ethical Guidelines of the American Statistical Association which says, “Avoid the use of excessive or inadequate number of research subjects by making informed recommen- dations for study size” [5]. The sample size estimate is important for eco- nomic and ethical reasons [6]. An oversized clinical trial exposes more than necessary number of subjects to a potentially harmful trial, and uses more re- sources than necessary. An undersized clinical trial exposes the subjects to a
  • 27. 6 Sample Size Calculations for Clustered and Longitudinal Outcomes potentially harmful trial and leads to a waste of resources without producing useful results. The sample size estimate will allow the estimation of total cost of the proposed study. While the exact final number that will be used for anal- ysis will be unknown due to missing information such as lack of demographic information and clinical information, it is still desirable to determine a target sample size based on the proposed study design. In this section, we describe the general information needed to estimate the sample size for the trial. 1. Choose the primary endpoint The primary endpoint should be chosen so that the primary objective of the trial can be assessed, and the primary endpoint is generally used for sample size estimation. Primary endpoint measures the outcome that will answer the primary question being asked by a trial. Suppose that the primary hypothesis is to test whether the new cancer drug yields longer overall survival than the standard cancer drug. In this case, the primary endpoint is overall survival. The sample size for a trial is determined by the power needed to detect a clinically meaningful difference in overall survival at a given significance level. The secondary hypothesis is to investigate other relevant questions from the same trial. For example, the secondary hypothesis is to test whether the new cancer drug produces better quality of life than the standard cancer drug, or whether the new cancer drug yields longer progression–free survival than the standard cancer drug. The sample size calculation depends on the type of primary endpoint. The variable type of the primary outcome must be defined before sample size and power calculations can be conducted. The variable type may be continuous, categorical, ordinal, or survival. Categorical variables may have only two cat- egories or more than two categories. • A quantitative (or continuous) outcome representing a specific measure (e.g., total cholesterol, quality of life, or blood pressure). Mean and median can be used to compare the primary endpoint between treatment groups. • A binary outcome indicating occurrence of an event (e.g., the occurrence of myocardial infarction, or the occurrence of recurrent disease). Odds ratio, risk difference, and risk ratio can be used to compare the primary endpoint between treatment groups. • Survival outcome for the time to occurrence of an event of interest (e.g., the time from study entry to death, or time to progression). A Kaplan–Meier survival curve is often used to graphically display the time to the event, and log–rank test or Cox regression analysis is frequently used to test if there is a significant difference in the treatment effect between treatment groups. 2. Determine the hypothesis of interest The primary purpose of a clinical trial is to address a scientific hypothesis, which is usually related to the evaluation of the efficacy and safety of a drug
  • 28. Sample Size Determination for Independent Outcomes 7 product. To address a hypothesis, different statistical methods are used de- pending on the type of question to be answered. Most often the hypothesis is related to the effect of one treatment as compared to another. For example, one trial could compare the effectiveness of a new drug to that of a standard drug. Yet the specific comparison to be performed will depend on the hypoth- esis to be addressed. Let µ1 and µ2 be the mean responses of a new drug and a standard drug, respectively. • A superiority test is designed to detect a meaningful difference in mean response between a standard drug and a new drug [7]. The primary objective is to show that the mean response of a new drug is different from that of a standard drug. H0 : µ1 = µ2 versus H1 : µ1 6= µ2 The null hypothesis (H0) says that the two drugs are not different with respect to the mean response (µ1 = µ2). The alternative hypothesis (H1) says that the two drugs are different with respect to the mean response (µ1 6= µ2). The statistical test is a two–sided test since there are two chances of rejecting the null hypothesis (µ1 > µ2 or µ1 < µ2) with each side allocated an equal amount of the type I error of α/2. If the alternative hypothesis is µ1 > µ2 or µ1 < µ2 instead of µ1 6= µ2, then the statistical test is referred to as a one–sided test since there is only one chance of rejecting the null hypothesis with one side allocated the type I error of α. • An equivalence test is designed to confirm the absence of a meaningful dif- ference between a standard drug and a new drug. The primary objective is to show that the mean responses to two drugs differ by an amount that is clinically unimportant. This is usually demonstrated by showing that the absolute difference in mean responses between drugs is likely to lie within an equivalence margin (∆) of clinically acceptable differences. H0 : |µ1 − µ2| ≥ ∆ versus H1 : |µ1 − µ2| < ∆ The null hypothesis (H0) says that the two drugs are different with respect to the mean response (|µ1 − µ2| ≥ ∆). The alternative hypothesis (H1) says that the two drugs are not different with respect to the mean response (|µ1 − µ2| < ∆). In an equivalence test, an investigator wants to test if the difference between a new drug and a standard drug is of no clinical importance. This is to test for equivalence of two drugs. The null hypothesis is expressed as a union (µ1 − µ2 ≥ ∆ or µ1 − µ2 ≤ −∆) and the alternative hypothesis (H1) as an intersection (−∆ < µ1 − µ2 < ∆). Each component of the null hypothesis needs to rejected to conclude equivalence.
  • 29. 8 Sample Size Calculations for Clustered and Longitudinal Outcomes • A non–inferiority test is designed to show that a new drug is not less effective than a standard drug by more than ∆, the margin of non–inferiority. The null and alternative hypotheses can be specified as: H0 : µ1 − µ2 ≤ −∆ versus H1 : µ1 − µ2 > −∆ The null hypothesis (H0) says that a new drug is inferior to a standard drug with respect to the mean response. The alternative hypothesis (H1) says that a new drug is non–inferior to a standard drug with respect to the mean response. That is, the alternative hypothesis of non–inferiority trial states that a standard drug may indeed be more effective than a new drug, but no more than ∆. In phase III clinical trials that compare a new drug with a standard drug, non–inferiority trials are more common than equivalence trials since it is only the non–inferiority limit that is usually of interest. This is to test for non–inferiority of the new drug. Choice of hypothesis depends on which scientific question an investigator is trying to answer. All the above hypothesis tests are useful in the development of drugs. In comparison studies with a standard drug, a non–inferiority trial is used to demonstrate that a new drug provides at least the same benefit to the subject as a standard drug. Non–inferiority trials are commonly used when a new drug is easier to administer, less expensive, and less toxic than a standard drug. Equivalence trials are used to show that a new drug is identical (within an acceptable range) to a standard drug. This is used in the registration and approval of biosimilar drugs that are shown to be equivalent to their branded reference drugs [8]. Most equivalence trials are bioequivalence trials that aim to compare a generic drug with the original branded reference drug. 3. Determine ∆ Sample size calculation depends on the hypothesis of interest. For a superiority test, the necessary sample size depends on the clinically meaningful difference (∆). In superiority trials, fewer subjects will be needed for a larger value of ∆ while more subjects will be needed for a smaller value of ∆. For instance, we can detect a 40% difference in efficacy with a modest number of subjects. However, a larger number of subjects will be needed to reliably detect a 10% difference in efficacy. Because sample size is inversely related to the square of ∆, even the slightly misspecified difference can lead to a large change in the sample size. Clinically meaningful differences are commonly specified using one of two approaches. One is to select the drug effect deemed important to detect, and the other is to calculate the sample size according to the best guess concerning the true effect of drug [9]. For an equivalence test, the required sample size depends on the margin of clinical equivalence. In an equivalence test, the equivalence margin of clinically acceptable difference (∆) depends on the disease being studied. For example,
  • 30. Sample Size Determination for Independent Outcomes 9 an absolute difference of 1% is often used as the clinically meaningful differ- ence in thrombolytic trials while a 20% difference is considered as clinically meaningful in most other situations including migraine headache [10]. Bioe- quivalence trials aim to show the equivalent pharmacokinetic profile through the most commonly used pharmacokinetic variables such as area under the curve (AUC) and maximum concentration(Cmax). Average bioequivalence is widely used for comparison of a generic drug with the original branded drug. The 80/125 rule is currently used as regulation for the assessment of average bioequivalence [11]. For average bioequivalence, the FDA [11] recommends that the geometric means ratio between the test drug and the reference drug is within 80% and 125% for the bioavailability measures (AUC and Cmax). For a non–inferiority test, the necessary sample size depends on the up- per bound for non–inferiority. Setting the non–inferiority margin is a major issue in designing a non–inferiority trial. The Food and Drug Administration [12] and the European Medicines Agency [13] issued guidances on the choice of non–inferiority margin. The choice of the non–inferiority margin needs to take account of both statistical reasoning and clinical judgement. An appro- priate selection of non–inferiority margin should provide assurance that a new drug has a clinically relevant superiority over placebo, and a new drug is not substantially inferior to a standard drug, which results in a tighter margin. The clinically or scientifically meaningful margin (∆) needs to be specified to estimate the number of subjects for the trial since the purpose of the sample size estimation is to provide sufficient power to reject the null hypothesis when the alternative hypothesis is true. In this book, we restrict the sample size estimation to a superiority test, which is most commonly used in clinical trials. Julious [7, 14, 15] and Chow et al. [3] provided general sample size formulas for equivalence trials and non– inferiority trials. 4. Determine the variance of the primary endpoint The variance of the primary endpoint is usually unknown in advance. In cross- sectional studies, the variance or the standard deviation is generally obtained from either previous studies or pilot studies. However, for correlated outcomes such as clustered outcomes or repeated measurement outcomes, the variance of the primary endpoint generally needs to be estimated utilizing various sources of information such as missing proportion, correlation among measurements, and the number of measurements, etc. Detailed description of the estimation of the variance for correlated outcomes will be given in later chapters. A large variance will lead to a large sample size for a study. That is, as the variance increases, the sample size increases. 5. Choose type I error and power Type I error (α) is the probability of rejecting the null hypothesis when the null hypothesis is actually true. Type II error (β) is the probability of not rejecting the null hypothesis when it is actually false. The aim of the sample
  • 31. 10 Sample Size Calculations for Clustered and Longitudinal Outcomes size calculation is to estimate the minimal sample size required to meet the objectives of the study for a fixed probability of type I error to achieve a desired power, which is defined as 1 − β. The power is the probability of rejecting the null hypothesis when it is actually false. A two–sided type I error of 5% is commonly used to reflect a 95% confidence interval for an unknown parameter, and this is familiar to most investigators as the conventional benchmark of 5%. As α decreases, the sample size increases. For example, a study with α level of 0.01 requires more sample size than a study with α level of 0.05. Typically, the sample size is computed to provide a fixed level of power under a specified alternative hypothesis. The alternative hypothesis usually represents a minimal clinically or scientifically meaningful difference in efficacy between treatment groups. Power (1 − β) is an important consideration in sample size determination. Low power can cause a true difference in a clinical outcome between study groups to go undetected. However, too much power may make results statistically significant when results do not show a clinically meaningful difference. When there is a large difference such as a 100% real difference in thera- peutic efficacy between a standard drug and a new drug, it is unlikely to be missed by most studies. That is, type II error (β) is small when there is a large difference in therapeutic efficacy. However, type II error is a common problem in studies that aim to distinguish between a standard drug and a new drug that may differ in therapeutic efficacy by only a small amount such as 1% or 5%. The number of subjects must be drastically increased to reduce type II error when the aim is to discriminate a small difference between a standard drug and a new drug. Otherwise, there is a high chance of incorrectly over- looking small differences in therapeutic efficacy with an insufficient number of subjects. Type II error (β) of 10% or 20% is commonly used for sample size estimation. That is, the power (1 − β) of 80% or 90% is widely used for the design of the study. The higher the power, the less likely the risk of type II error. The power increases as the sample size increases. A sufficient sample size ensures that the study is able to reliably detect a true difference, and not underpowered. 6. Select a statistical method for data analysis A statistical method for sample size estimation should adequately align with the statistical method for data analysis [16]. For example, an investigator would like to test whether there is a significant difference in total cholesterol levels between those who take a new drug and who take a standard drug. The investigator plans to analyze the data using a two–sample t–test. In this case, a sample size calculation based on a two-group chi–square test with dichotomization of total cholesterol levels would be inappropriate since the statistical method used for power analysis is different from that to be used for data analysis. Discrepancy between the statistical method for sample size estimation and the statistical method for data analysis can lead to a sample
  • 32. Sample Size Determination for Independent Outcomes 11 size that is too large or too small. The statistical method used for sample size calculation should be the same as that used for data analysis. 1.3.2 One–Sample Test for Means We illustrate the sample size calculation using a one–sided test through an example. Suppose that the total cholesterol levels for male college students are normally distributed with a mean (µ) of 180 mg/dl and a standard deviation (σ) of 80 mg/dl. Suppose that an investigator would like to examine whether the mean total cholesterol level of the physically inactive male college students is higher than 180 mg/dl using a one–sided 5% significance level (α). That is, an investigator would like to test the hypotheses: H0 : µ = µ0 = 180 mg/dl (or µ ≤ 180 mg/dl) versus H1 : µ > 180 mg/dl assuming that the standard deviation of the total cholesterol level is the same as that of male college students. An investigator wants to risk a 10% chance (90% power) of failing to reject the null hypothesis when the true mean (µ1) of the total cholesterol level is as large as 210 mg/dl. How many subjects are needed to detect 30 mg/dl difference in total cholesterol level from the population mean of 180 mg/dl at a one–sided 5% significance level and a power of 90%? For α = 0.05, we would reject the null hypothesis (H0) if the average total cholesterol level is greater than the critical value (C) in Figure 1.1, where C = µ0 + z1−α · σ/ √ n = 180 + 1.645 · 80/ √ n. If the true mean is 210 mg/dl with a power of 90% (β = 0.1), we would not reject the null hypothesis when the sample average is less than C = µ1 + zβ · σ/ √ n = 210 − 1.282 · 80/ √ n. The sample size (n) can be estimated by setting two equations equal to each other: 180 + 1.645 · 80/ √ n = 210 − 1.282 · 80/ √ n. Therefore, the required number of subjects is n = (1.645 + 1.282)2 · 802 (180 − 210)2 = 61. In general, the estimated sample size for a one–sided test for testing H0 : µ = µ0 versus H1 : µ > µ1 with a significance level of α and a power of 1 − β is the smallest integer that is larger than or equal to n satisfying the following equation n = (z1−α + z1−β)2 σ2 (µ0 − µ1)2 . (1.3) We will show how the sample size can be estimated for a two–sided one– sample test. Let n be the number of subjects. Let Yi denote the response for subject i, (i = 1, . . . , n), and ȳ be the sample mean. We assume that Y 0 i s are independent and normally distributed random variables with mean µ0 and variance σ2 . Suppose that we want to test the hypotheses H0 : µ = µ0 versus H1 : µ = µ1 6= µ0.
  • 33. 12 Sample Size Calculations for Clustered and Longitudinal Outcomes FIGURE 1.1 Sample size estimation for a one–sided test in a one–sample problem When σ2 is known, we reject the null hypothesis at the significance level α if ȳ − µ0 σ/ √ n > z1−α/2, where z1−α/2 is the 100(1 − α/2)th percentile of the standard normal distri- bution. Under the alternative hypothesis (H1 : µ = µ1), the power is given by Φ √ n(µ1 − µ0) σ − z1−α/2 + Φ − √ n(µ1 − µ0) σ − z1−α/2 , where Φ is the cumulative standard normal distribution function. By ignor- ing the small value of the second term in the above equation, the power is approximated by the first term. Thus, the sample size required to achieve the power of 1 − β can be obtained by solving the following equation √ n(µ1 − µ0) σ − z1−α/2 = z1−β. The required sample size is the smallest integer that is larger than or equal to n satisfying the following equation n = (z1−α/2 + z1−β)2 σ2 (µ1 − µ0)2 . (1.4)
  • 34. Sample Size Determination for Independent Outcomes 13 If the population variance σ2 is unknown, σ2 can be estimated by the sample variance s2 = Pn i=1(yi − ȳ)2 /(n − 1), which is an unbiased estimator of σ2 . For large n, we reject the null hypothesis H0 : µ = µ0 at the significance level α if ȳ − µ0 s/ √ n z1−α/2. Therefore, the sample size estimates for a one–sided test and a two–sided test can be obtained by replacing σ2 by s2 in Equations (1.3) and (1.4). 1.3.2.1 Example Consider the design of a single-arm psychiatric study that evaluates the effect of a test drug on cognitive functioning of children with mental retardation before and after administration of a test drug. A pilot study shows that the mean difference in cognitive functioning before and after taking a test drug was 6 with a standard deviation equal to 9. We would like to estimate the sample size needed to detect the mean difference of 6 in cognitive functioning to achieve 80% power at a two–sided 5% significance level assuming a stan- dard deviation of 9. Let µ denote the mean difference in cognitive functioning between pre- and post-drug administration. The null hypothesis H0 : µ = 0 is to be tested against the alternative hypothesis H1 : µ = 6. From Equa- tion (1.4), n = (1.960 + 0.842)2 · 92 /62 = 17.7. Therefore, a sample size of 18 subjects is needed to detect a change in mean difference of 6 in cognitive functioning, assuming a standard deviation of 9 using a normal approximation with a two–sided significance level of 5% and a power of 80%. 1.3.2.2 Example Concerning the effect of a test drug on systolic blood pressure before and after the treatment, a pilot study shows that the mean systolic blood pressure changes after a 4–month administration of a test drug was 15 mm Hg with a standard deviation of 40 mm Hg. We would like to estimate the sample size needed to detect 15 mm Hg in systolic blood pressure to achieve 80% power at a two–sided 5% significance level assuming the standard deviation of 40 mm Hg. From Equation (1.4), n = (1.960 + 0.842)2 · 402 /152 = 55.8. Therefore, a sample size of 56 subjects will have 80% power to detect a change in mean of 15 mm Hg in systolic blood pressure, assuming a standard deviation of 40 mm Hg at a two–sided 5% significance level. 1.3.3 One–Sample Test for Proportions Let Yi denote a binary response variable of the ith subject with p = E(Yi), (i = 1, . . . , n), where n is the number of subjects in the trial. For example, Yi can denote the response or non–response in cancer clinical trials, where Yi = 0 denotes non–response, and Yi = 1 denotes response, which includes either complete response or partial response. The response rate can be estimated by
  • 35. 14 Sample Size Calculations for Clustered and Longitudinal Outcomes the observed proportion p̂ = Pn i=1 Yi/n, where n is the number of subjects. We illustrate the sample size calculation using the one–sided test. Suppose we wish to test the null hypothesis H0 : p = p0 versus the alternative hypothesis H1 : p = p1 p0 at the one–sided significance level of α. Under the null hypothesis, the test statistic Z = p̂ − p0 p p̂(1 − p̂)/n approximately has a standard normal distribution for large n. We reject the null hypothesis at a significance level α if the test statistic Z is greater than z1−α. For α = 0.05, we would reject the null hypothesis (H0) if the aver- age response rate is greater than the critical value (C), where C = p0 + z1−α p p0(1 − p0)/n. If the alternative hypothesis is true, that is, if the true response rate is p1, we would not reject the null hypothesis if the response rate is less than C = p1 + zβ p p1(1 − p1)/n. By setting the two equations equal, we get p0 + z1−α p p0(1 − p0)/n = p1 + zβ p p1(1 − p1)/n. The required sample size to test H0 : p = p0 versus H1 : p = p1 p0 at a one–sided significance level of α and a power of 1 − β is n = (z1−α p p0(1 − p0) + z1−β p p1(1 − p1))2 (p1 − p0)2 . The sample size for a two–sided test H0 : p = p0 versus H1 : p = p1 for p1 6= p0 can be obtained by replacing z1−α by z1−α/2 as shown in a one–sample test for means: n = (z1−α/2 p p0(1 − p0) + z1−β p p1(1 − p1))2 (p1 − p0)2 . (1.5) 1.3.3.1 Example Consider the design of a single-arm oncology clinical trial that evaluates if a new molecular therapy has at least a 40% response rate. Let p be the response rate of a new molecular therapy. We would like to estimate the sample size needed to test the null hypothesis H0 : p = p0 = 0.20 against the alternative hypothesis H1 : p = p1 6= p0. The trial is designed based on a two–sided test that achieves 80% power at p = p1 = 0.40 with a two–sided 5% significance level. From Equation (1.5), n = (1.96 p 0.2(1 − 0.2) + 0.842 p 0.4(1 − 0.4))2 (0.4 − 0.2)2 = 35.8. The required number of subjects is 36 to detect the difference between the null hypothesis proportion of 0.2 and the alternative proportion of 0.4 at a two–sided significance level of 5% and a power of 80%.
  • 36. Sample Size Determination for Independent Outcomes 15 1.3.4 Two–Sample Test for Means Suppose that Y1i, (i = 1, ..., n1) and Y2i, (i = 1, ..., n2) represent observations from groups 1 and 2, and Y1i and Y2i are independent and normally distributed with means µ1 and µ2 and variances σ2 1 and σ2 2, respectively. Let’s consider a one–sided test. Suppose that we want to test the hypotheses H0 : µ1 = µ2 versus H1 : µ1 µ2. Let ȳ1 and ȳ2 be the sample means of Y1i and Y2i. Assume that the vari- ances σ2 1 and σ2 2 are known, and n1 = n2 = n. Then, the Z–test statistic can be written as Z = ȳ1 − ȳ2 p σ2 1/n + σ2 2/n . Under the null hypothesis (H0), the test statistic Z is normally distributed with mean 0 and variance 1. Thus, we reject the null hypothesis if Z z1−α. Under the alternative hypothesis (H1), let µ1 −µ2 = ∆, which is the clinically meaningful difference to be detected. Then, under the alternative hypothesis (H1), the expected value of (ȳ1−ȳ2) is ∆, and Z follows the normal distribution with mean µ∗ and variance 1, where µ∗ = ∆/ p σ2 1/n + σ2 2/n. Under the null hypothesis (H0), P{Z z1−α|H0} α. Similarly, under the alternative hypothesis (H1), P{Z z1−α|H1} 1 − β. That is, P{ ȳ1 − ȳ2 p σ2 1/n + σ2 2/n z1−α|H1} 1 − β. Under the alternative hypothesis, the expected value of (ȳ1 − ȳ2) is ∆. Thus, P{ (ȳ1 − ȳ2) − ∆ p σ2 1/n + σ2 2/n z1−α − ∆ p σ2 1/n + σ2 2/n |H1} 1 − β. The above equation can be written as follows due to the symmetry of the normal distribution: z1−α − ∆ p σ2 1/n + σ2 2/n = zβ = −z1−β. The simple manipulation yields the required sample size per group assuming equal allocation of subjects in each group, n = (σ2 1 + σ2 2)(z1−α + z1−β)2 ∆2 .
  • 37. 16 Sample Size Calculations for Clustered and Longitudinal Outcomes If σ2 1 = σ2 2 = σ2 , then the required sample size per group is n = 2σ2 (z1−α + z1−β)2 ∆2 . (1.6) In some randomized clinical trials, more subjects are assigned to the treat- ment group than to the control group to encourage participation of subjects in a trial due to their higher chance of being randomized to the treatment group than the control group. Let n1 = n be the number of subjects in the control group and n2 = kn be the number of subjects in the treatment group. Then, the sample size for the study will be n1 = n = (1 + 1/k)σ2 (z1−α + z1−β)2 ∆2 . (1.7) The total sample size for the trial is n1 +n2. The relative sample size required to maintain the power and type I error rate of a trial against the trial with an equal number of subjects in each group is (2 + k + 1/k)/4. For example, in a trial that randomizes subjects in a 2:1 ratio requires a 12.5% larger sample size in order to maintain the same power as a trial with a 1:1 randomization. The sample size needed to detect the difference in means between two groups with a two–sided test can be obtained by replacing z1−α by z1−α/2 as shown in a one–sample test for means: n1 = n = (1 + 1/k)σ2 (z1−α/2 + z1−β)2 ∆2 . (1.8) If the population variance σ2 is unknown, σ2 can be estimated by the sample pooled variance s2 = { Pn1 i=1(y1i −ȳ1)2 + Pn2 i=1(y2i −ȳ2)2 }/(n1 +n2 −2), which is an unbiased estimator of σ2 . For large n1 and n2, we reject the null hypothesis H0 : µ1 = µ2 against the alternative hypothesis H1 : µ1 6= µ2 at the significance level α if the absolute value of the test statistic Z is greater than z1−α/2. Z = ȳ1 − ȳ2 s q 1 n1 + 1 n2 . If n1 = n and n2 = kn, the Z test statistic becomes Z = ȳ1 − ȳ2 s q k+1 kn . Therefore, the sample size estimates for a one–sided test and a two–sided test can be obtained by replacing σ2 by s2 in Equations (1.7) and (1.8). 1.3.4.1 Example In a prior randomized clinical trial [17] investigating the effect of propranolol versus no propranolol in geriatric patients with New York Heart Association
  • 38. Sample Size Determination for Independent Outcomes 17 functional class II or III congestive heart failure (CHF), the changes in mean left ventricular ejection fraction (LVEF) from baseline to 1 year after treat- ment were 6% and 2% for propranolol and no propranolol groups, respectively. We will conduct a two–arm randomized clinical trial with a placebo and a new beta blocker drug to investigate if patients taking propranolol significantly im- prove LVEF after 1 year compared with patients taking placebo. We assume the similar increase in LVEF as in the prior study and a common standard deviation of 8% in changes in LVEF from baseline to 1 year after treatment. How many subjects are needed to test the superiority of a new drug in im- proving LVEF over placebo with a two–sided 5% significance level and 80% power? The required sample size is n = 2σ2 (z1−α/2 + z1−β)2 ∆2 = 2 · 82 · (1.960 + 0.842)2 /42 = 62.8. The required sample size is 63 subjects per group. 1.3.5 Two–Sample Test for Proportions In a randomized clinical trial subjects are randomly assigned to one of two treatment groups. Let Yij be the binary random variable (Yij = 1 for response, 0 for no response) of the jth subject in the ith treatment, j = 1, . . . , ni, and i = 1, 2. We assume that Y 0 ijs are independent and identically distributed with E(Yij) = pi for a fixed i. The response rate pi is usually estimated by the observed proportion in the ith treatment group: p̂i = ni X j=1 Yij/ni. Let p1 and p2 be the response rates of control and treatment arms, respec- tively. The sample sizes are n1 and n2 in each treatment group, respectively. Suppose that an investigator wants to test whether there is a difference in the response rates between control and treatment arms. The null (H0) and alternative (H1) hypotheses are: H0: The response rates are equal (p1 = p2). H1: The response rates are different (p1 6= p2). We reject the null hypothesis H0 : p1 = p2 at the significance level of α if p̂1 − p̂2 p p̂1(1 − p̂1)/n1 + p̂2(1 − p̂2)/n2 z1−α/2. Under the alternative hypothesis, the power of the test is approximated by Φ |p1 − p2| p p1(1 − p1)/n1 + p2(1 − p2)/n2 − z1−α/2 ! .
  • 39. 18 Sample Size Calculations for Clustered and Longitudinal Outcomes The sample size estimate needed to achieve a power of 1 − β can be obtained by solving the following equation: |p1 − p2| p p1(1 − p1)/n1 + p2(1 − p2)/n2 − z1−α/2 = z1−β. When n2 = k · n1, n1 can be written as n1 = (z1−α/2 + z1−β)2 (p1 − p2)2 [p1(1 − p1) + p2(1 − p2)/k] . Under equal allocation, n1 = n2 = n, the required sample size per group is n1 = n2 = n = (z1−α/2 + z1−β)2 (p1 − p2)2 [p1(1 − p1) + p2(1 − p2)] . 1.4 Further Readings Sample size calculation is an important issue in the experimental design of biomedical research. The sample size formulas presented in this chapter are based on asymptotic approximation and superiority trials. Closed–form sam- ple size estimates for independent outcomes can be obtained using normal approximation for equivalence trials, cross–over trials, non–inferiority trials, and bioequivalence trials [14]. In some clinical trials such as phase II cancer clinical trials [18], sample sizes are usually small. Therefore, the sample size calculation based on asymptotic approximation would not be appropriate for clinical trials with a small number of subjects. The small sample sizes for typical phase II clinical trials imply the need for the use of exact statistical methods in sample size estimation [19]. Chow et al. [3] provided procedures for sample size estimation for proportions based on exact tests for small sam- ples. Even though the closed–form formulas cannot be obtained for sample size estimates based on exact tests, the sample size estimates can be obtained numerically. The tests for proportions using normal approximation to the binomial outcome are equivalent to the usual chi–square tests since Z2 = χ2 . The p–values for the two tests are equal. For example, the critical value of the chi–square with 1 degree of freedom is χ2 0.05 = 3.841 at the α = 0.05 level, which is equal to the square of two–sided Zα/2 = Z0.025 = 1.96. If one wishes to use a two–sided chi–square test, one should use a two–sided sample size or power determination by using Zα/2 instead of Zα [20]. Others [21, 22, 23] have used arcsine transformation of proportions, A(p) = 2 arcsin ( √ p), to stabilize variance in the sense that the variance formula of A(p) is free of the proportion p. Given a proportion p̂ with E(p̂) = p, A(p̂) is asymptotically normal with mean A(p) and variance 1/n, where n is the sample size. Since
  • 40. Sample Size Determination for Independent Outcomes 19 the variance of A(p) does not depend on the expectation, the sample size and power calculation becomes simplified. Pre– and post–intervention studies have been widely used in medical and social behavioral studies [24, 25, 26, 27, 28]. In pre–post studies, each sub- ject contributes a pair of dependent observations: one observation at pre– intervention and the other observation at post–intervention. Paired t–test has been used to detect the intervention effect on a continuous outcome while McNemar’s test [29] has been the most widely used approach to detect the intervention effect on a binary outcome in pre–post studies. Paired t–test can be conducted by applying the one–sample t–test on the difference between pre–test and post–test observations. Sample size needed to detect a difference between a pair of continuous outcomes from pre–post tests can be estimated by using the sample size formula for a one–sample test for means in Equation (1.4). However, unlike paired continuous outcomes from pre–post tests, sam- ple size formulas for independent outcomes presented in this chapter cannot be used to estimate the sample size needed to detect a difference between a pair of binary observations from pre–post studies. Sample size determination for studies involving a pair of binary observations from pre–post studies will be discussed in Chapter 4. Clustered data often arise in medical and behavioral studies such as den- tal, ophthalmologic, radiologic, and community intervention studies in which data are obtained from multiple units of each cluster. In radiologic studies, as many as 60 lesions may be observed through positron emission tomography (PET) in one patient since PET offers the possibility of imaging the whole body [30]. Sample size estimation for clustered outcomes should be done in- corporating the dependence of within–cluster observations. Here, the unit of data collection is a cluster (subject), and the unit of data analysis is a lesion within a cluster. Two major problems arise in a sample size calculation for clustered data. One is that the number of units in each cluster, called cluster size, tends to vary cluster by cluster with a certain distribution. The other is that observations within each cluster are correlated. The sample size esti- mate needs to incorporate the variable cluster size and the correlation among observations within a cluster. Controlled clinical trials often employ a parallel–groups repeated measures design in which subjects are randomly assigned between treatment groups, evaluated at baseline, and then evaluated at intervals across a treatment pe- riod of fixed total duration. The repeated measurements are usually equally spaced, although not necessarily so. The hypothesis of primary interest in short–term efficacy trials concerns the difference in the rates of changes or the time–averaged responses between treatment groups [31]. Major problems in the sample size estimation of repeated measurement data are missing data and the correlation among repeated observations within a subject. As in the sample size estimate of clustered outcomes, sample size should be estimated incorporating the correlation among repeated measurements within each
  • 41. 20 Sample Size Calculations for Clustered and Longitudinal Outcomes subject and the missing data mechanisms for studies with repeated measure- ments. Here, a sample size means the number of subjects. In the subsequent chapters, sample size estimates will be provided using large sample approximation for correlated outcomes such as clustered out- comes and repeated measurement outcomes. There are many complexities in estimating sample size. For example, different sample size formulas are appro- priate for different types of study designs, with computations more complex for studies that recruit study subjects at multiple centers. Sample size de- terminations also have to take into account that some subjects will be lost to follow-up or otherwise drop out of a study. Certain manipulations, such as increased precision of measurements or repeating measurements at various time points, can be used to maximize power for a given sample size. Bibliography [1] ICH. Statistical Principles for Clinical Trials. Tripartite International Conference on Harmonized Guidelines, E9, 1998. [2] M. Krzywinski and N. Altman. Points of significance: Power and sample size. Nature Methods, 10:1139–1140, 2013. [3] S. C. Chow, J. Shao, and H. Wang. Sample Size Calculations in Clinical Research. Chapman Hall/CRC, 2008. [4] R. G. Newcombe. Two sided confidence intervals for the single propor- tion: Comparison of seven methods. Statistics in Medicine, 17:857–872, 1998. [5] ASA. Ethical guidelines for statistical practice: Executive summary. Am- stat News, April:12–15, 1999. [6] R. V. Lenth. Some practical guidelines for effective sample size determi- nation. American Statistician, 55(3):187–193, 2001. [7] S. A. Julious. Tutorial in biostatistics: Sample size for clinical trials. Statistics in Medicine, 23:1921–1986, 2004. [8] S. C. Chow. Biosimilars: Design and Analysis of Follow-on Biologics. Chapman Hall/CRC, 2013. [9] J. Wittes. Sample size calculations for randomized clinical trials. Epi- demiologic Reviews, 24(1):39–53, 1984. [10] J. S. Lee. Understanding equivalence trials (and why we should care). Canadian Association of Emergency Physicians, 2(3):194–196, 2000.
  • 42. Sample Size Determination for Independent Outcomes 21 [11] FDA. Guidance for Industry Bioavailability and Bioequivalence Studies for Orally Administered Drug Products General Considerations. Center for Drug Evaluation and Research, the U.S. Food and Drug Administra- tion, Rockville, MD., 2003. [12] FDA. Guideline for Industry on Non-Inferiority Clinical Trials. Center for Drug Evaluation and Research and Center for Biologics Evaluation and Research, Food and Drug Administration, Rockville, MD, 2010. [13] EMEA. Guidelines on the Choice of the Non-Inferiority Margin. Euro- pean Medicines Agency CHMP/EWP/2158/99, London, UK, 2005. [14] S. A. Julious. Sample Sizes for Clinical Trials. Chapman Hall/CRC, 2009. [15] S. A. Julious and M. J. Campbell. Tutorial in biostatistics: Sample size for parallel group clinical trials with binary data. Statistics in Medicine, 31:2904–2936, 2010. [16] K. E. Muller, L. M. Lavange, S. L. Ramey, and C. T. Ramey. Power calcu- lations for general linear multivariate models including repeated measures applications. Journal of American Statistical Association, 87(420):1209– 1226, 1992. [17] W. S. Aronow and C. Ahn. Postprandial hypotension in 499 elderly persons in a long-term health care facility. Journal of the American Geriatrics Society, 42(9):930–932, 1994. [18] S. Piantadosi. Clinical Trials: A Methodologic Perspective, (2nd ed.). John Wiley Sons, Inc, 2005. [19] R. P. Hern. Sample size tables for exact single–stage phase II designs. Statistics in Medicine, 20:859–866, 2001. [20] J. M. Lachin. Introduction to sample size determination and power anal- ysis for clinical trials. Controlled Clinical Trials, 2:93–113, 1981. [21] R. D. Sokal and F. J. Rohlf. Biometry: The Principles and Practice of Statistics in Biometric Research. San Francisco: Freeman, 1969. [22] S. H. Jung and C. Ahn. Estimation of response probability in correlated binary data: A new approach. Drug Information Journal, 34:599–604, 2000. [23] S. H. Jung, S. H. Kang, and C. Ahn. Sample size calculations for clustered binary data. Statistics in Medicine, 20:1971–1982, 2001. [24] M. C. Rossi, C. Perozzi, C. Consorti, T. Almonti, P. Foglini, N. Giostra, P. Nanni, S. Talevi, D. Bartolomei, and G. Vespasiani. An interactive diary for diet management (DAI): A new telemedicine system able to
  • 43. 22 Sample Size Calculations for Clustered and Longitudinal Outcomes promote body weight reduction, nutritional education, and consumption of fresh local produce. Diabetes Technology and Therapeutics, 12(8):641– 647, 2010. [25] A. Wajnberg, K. H. Wang, M. Aniff, and H. V. Kunins. Hospitalizations and skilled nursing facility admissions before and after the implementa- tion of a home-based primary care program. Journal of the American Geriatric Society, 58(6):1144–1147, 2010. [26] E. J. Knudtson, L. B. Lorenz, V. J. Skaggs, J. D. Peck, J. R. Good- man, and A. A. Elimian. The effect of digital cervical examination on group b streptococcal culture. Journal of the American Geriatric Society, 202(1):58.e1–4, 2010. [27] T. Zieschang, I. Dutzi, E. Müller, U. Hestermann, K. Grunendahl, A. K. Braun, D. Huger, D. Kopf, N. Specht-Leible, and P. Oster. Improving care for patients with dementia hospitalized for acute somatic illness in a specialized care unit: a feasibility study. International Psychogeriatrics, 22(1):139–146, 2010. [28] A. M. Spleen, B. C. Kluhsman, A. D. Clark, M. B. Dignan, E. J. Lengerich, and The ACTION Health Cancer Task Force. An increase in HPV–related knowledge and vaccination intent among parental and non– parental caregivers of adolescent girls, age 9–17 years, in Appalachian Pennsylvania. Journal of Cancer Education, 27(2):312–319, 2012. [29] Q. McNemar. Note on the sampling error of the difference between cor- related proportions or percentages. Psychometrika, 12(2):153–157, 1947. [30] M. Gonen, K. S. Panageas, and S. M. Larson. Statistical issues in analysis of diagnostic imaging experiments with multiple observations per patient. Radiology, 221:763–767, 2001. [31] P. J. Diggle, P. Heagerty, K. Y. Liang, and S. L. Zeger. Analysis of longitudinal data (2nd ed.). Oxford University Press, 2002.
  • 44. 2 Sample Size Determination for Clustered Outcomes 2.1 Introduction Clustered data frequently arise in many fields of applications. We frequently make observations from multiple sites of each subject (called a cluster). For example, observations from the same subject are correlated although those from different subjects are independent. In periodontal studies that observe each tooth, each patient usually contributes data from more than one tooth to the studies. In this case, a patient corresponds to a cluster, and a tooth corresponds to a site. The degree of similarity or correlation is typically measured by intraclus- ter correlation coefficient (ρ). If one simply ignores the clustering effect and analyzes clustered data using standard statistical methods developed for the analysis of independent observations, one may underestimate the true p-value and inflate the type I error rate of such tests since the correlation among observations within a cluster tends to be positive [1, 2]. Therefore, clustered data should be analyzed using statistical methods that take into account of the dependence of within–cluster observations. If one fails to take into ac- count the clustered nature of the study design during the planning stage of the study, one will obtain smaller sample size estimate and statistical power than planned. However, one will obtain larger sample size estimate and statis- tical power than planned in some studies such as split–mouth trials [3, 4, 5] in which each of two treatments is randomly assigned to two segments of a sub- ject‘s mouth. In split–mouth trials, both intervention and control treatments are applied in each subject. Intracluster correlation coefficient (ρ) is defined by ρ = σ2 B/(σ2 B + σ2 W ), where σ2 B is the between–cluster variance, and σ2 W is the within–cluster vari- ance. As the within–cluster variance (σ2 W ) approaches to 0, ρ approaches to 1. Let n be the number of clusters and m be the number of observations in each cluster. When ρ = 1, all responses within a cluster are identical. The effective sample size (ESS) is reduced to the number of clusters (n) when ρ = 1 since all responses within a cluster are identical. A very small value of ρ implies that the within–cluster variance (σ2 W ) is much larger than the between–cluster vari- ance (σ2 B). When ρ = 0, there is no correlation among observations within a 23
  • 45. 24 Sample Size Calculations for Clustered and Longitudinal Outcomes cluster. The effective sample size is the total number of observations across all clusters (nm) when ρ = 0. To get the effective sample size, the total number of observations (the number of observations per cluster (m) times the number of clusters (n)) is divided by a correction factor [1 + (m − 1)ρ] that includes ρ and the number of observations per cluster (m). That is, the effective sample size is nm/[1 + (m − 1)ρ]. The correction factor, [1 + (m − 1)ρ], is called the design effect or the variance inflation factor [6]. In the TOSS (trial of cilostazol in symptomatic intracranial arterial steno- sis) clinical trial [7], investigators examined the effect of cilostazol on the progression of intracranial arterial stenosis, which narrows an artery inside the brain that can lead to stroke. Cilostazol is a medication for the treat- ment of intermittent claudication, a condition caused by narrowing of the arteries that supply blood to the legs. One hundred thirty–six subjects were randomly allocated to receive either cilostazol or placebo with an equal prob- ability. Three arteries (two middle cerebral arteries and one basilar artery) were evaluated for the progression of intracranial stenosis in both cilostazol and placebo groups. The number of arteries evaluated in each treatment group is 204 (=3 ar- teries/subject x 68 subjects). If observations in three arteries are independent (ρ = 0), then the effective number of observations is 204. If the observations in three arteries are completely dependent (ρ = 1), then the effective number of observations is 68. If ρ takes the value between 0 and 1, the effective number of observations is 204/[1 + (m − 1)ρ], where m = 3. The effective number of observations in each treatment group is nm/[1 + (m − 1)ρ] when 0 ≤ ρ ≤ 1. As a special case, the effective number of observations is nm when ρ = 0, and n when ρ = 1. 2.2 One–Sample Clustered Continuous Outcomes Clustered continuous outcomes occur frequently in biomedical studies. Exam- ples include size of tumors in cancer patients, and pocket probing depth and clinical attachment level in teeth of subjects undergoing root planning under local anesthetic. 2.2.1 Equal Cluster Size We assume that the number of observations in each cluster (m) is small com- pared to the number of clusters (n) so that asymptotic theories can be ap- plied to n for sample size estimation. Let Yij be a random variable of the jth (j = 1, . . . , m) observation in the ith (i = 1, . . . , n) cluster, where Yij is assumed to be normally distributed with mean E(Yij) = µ and common
  • 46. Sample Size Determination for Clustered Outcomes 25 variance V (Yij) = σ2 . We assume a pairwise common intracluster correlation coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0 . Let yi = Pm j=1 Yij denote the sum of responses in the ith cluster, and ȳi be the mean response computed over m observations in the ith cluster. The total number of observations is nm. The mean of Yij computed over all observations is written as ȳ = Pn i=1 Pm j=1 Yij nm , where ȳ estimates the population mean µ. The degree of dependence within clusters is measured by the intracluster correlation coefficient (ρ), which can be estimated by analysis of variance (ANOVA) estimate [8] as ρ̂ = MSC − MSW MSC + (m − 1)MSW , where MSC = m n X i=1 (ȳi − ȳ)2 n − 1 , MSW = n X i=1 m X j=1 (yij − ȳi)2 n(m − 1) . The overall mean ȳ has a normal distribution with mean µ and variance V , where V = Pn i=1 m{1 + (m − 1)ρ̂}σ2 (nm)2 = {1 + (m − 1)ρ̂}σ2 nm . We test the null hypothesis H0 : µ = µ0 versus the alternative hypothesis H1 : µ = µ1 for µ0 6= µ1. The test statistic Z = (ȳ−µ0)/ √ V is asymptotically normal with mean 0 and variance 1. We reject H0 : µ = µ0 if the absolute value of Z is larger than z1−α/2, the 100(1−α/2)th percentile of the standard normal distribution. We are interested in estimating the sample size n with a power of 1−β for the projected alternative hypothesis H1 : µ = µ1. The sample size (n) needed to achieve a power of 1 −β can be obtained by solving the following equation: |µ1 − µ0| √ V = z1−α/2 + z1−β. The required number of clusters is n = (z1−α/2 + z1−β)2 (µ1 − µ0)2 {1 + (m − 1)ρ̂} m σ2 . (2.1)
  • 47. 26 Sample Size Calculations for Clustered and Longitudinal Outcomes The total number of observations is n · m = (z1−α/2 + z1−β)2 {1 + (m − 1)ρ̂}σ2 (µ1 − µ0)2 . When the cluster size is 1 (m = 1), the required number of observations is n1 = (z1−α/2 + z1−β)2 σ2 (µ1 − µ0)2 . When cluster size is m(m 1), the variance is inflated by a factor of {1 + (m−1)ρ̂} compared with the variance under m = 1. The factor {1+(m−1)ρ̂} is called variance inflation factor or design effect. That is, the total number of observations can be computed by multiplying n1 by the design effect {1 + (m − 1)ρ̂}. 2.2.2 Unequal Cluster Size Cluster sizes are often unequal in cluster randomized studies. When the cluster sizes are not constant, one approach is to replace the cluster size (m) by an advance estimate of the average cluster sizes, which was referred to as the average cluster size method [9, 10]. The average cluster size method is likely to underestimate the actual required sample size [11]. Another approach is to replace the cluster size (m) by the largest expected cluster size in the sample, which was called as the maximum cluster size method [10]. Here, we provide the sample size estimate under variable cluster size. Let n be the number of clusters in a clinical trial, and mi be the cluster size in the ith cluster (i = 1, . . . , n). The number of observations in the ith cluster, mi, may vary at random with a certain distribution. Here, we estimate the sample size using the information on varying cluster sizes. We assume that the cluster sizes (mi, i = 1, . . . n) are independent and identically distributed, and the cluster sizes (mi’s) are small compared to n so that asymptotic theories can be applied to n for sample size estimation. Let Yij be a random variable of the jth observation (j = 1, . . . , mi) in the ith cluster, where Yij is assumed to be normally distributed with mean µ and variance σ2 . We assume a pairwise common intracluster correlation coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0 . The correlation is assumed not to vary with the number of observations per cluster. Let yi = Pmi j=1 Yij denote the sum of responses in the ith cluster, and ȳi = Pmi j=1 Yij/mi be the mean response computed over mi responses in the ith cluster. Then, the mean of yij computed over all clusters is written as ȳ = Pn i=1 miȳi Pn i=1 mi , where ȳ estimates the population mean µ. The mean cluster size is m̄ = Pn i=1 mi/n.
  • 48. Sample Size Determination for Clustered Outcomes 27 The degree of dependence within clusters is measured by the intracluster correlation coefficient (ρ), which can be estimated by analysis of variance (ANOVA) estimate [8]. It can be shown that conditional on the empirical distribution of mi’s, the overall mean (ȳ) has a normal distribution with mean µ and variance V , where V = Pn i=1 mi{1 + (mi − 1)ρ̂}σ2 ( Pn i=1 mi)2 . Based on the asymptotic result, we can reject H0 : µ = µ0 if the absolute value of the test statistic Z = (ȳ−µ0)/ √ V is larger than z1−α/2, the 100(1−α/2)th percentile of the standard normal distribution. We are interested in estimating the sample size n with a power of 1−β for the projected alternative hypothesis H1 : µ = µ1. Since mi’s are independent and identically distributed random variables, by the law of large numbers, as n → ∞, nV → E[m{1 + (m − 1)ρ̂}]σ2 E(m)2 , where m is the random variable associated with the cluster size and E(·) is the expectation with respect to the distribution of the cluster size. The sample size needed to achieve a power of 1 − β can be obtained by solving the following equation: |µ1 − µ0| √ V = z1−α/2 + z1−β. This leads to n = (z1−α/2 + z1−β)2 σ2 (µ1 − µ0)2 E[m{1 + (m − 1)ρ̂}] E(m)2 . Let E(m) = θ, V (m) = τ2 , and γ = τ/θ, where γ is the coefficient of variation of the cluster size. Then, we can write n = (z1−α/2 + z1−β)2 σ2 (µ1 − µ0)2 {(1 − ρ̂) 1 θ + ρ̂ + ρ̂γ2 }. (2.2) The sample size formula (2.2) provides the sample size estimate by accounting for variability in cluster size. When cluster sizes are equal across all clusters, then the sample size formula (2.2) is the same as the sample size formula (2.1) with γ = 0. Let (w1, . . . , wn) be a set of weights assigned to clusters with wi ≥ 0 and Pn i=1 wi = 1. The overall mean can be expressed as ȳ = Pn i=1 wiȳi. The overall mean (ȳ) is an unbiased estimate of µ. The above sample size estimate is based on equal weights to observations by letting wi = mi/ Pn i=1 mi. Sample size can be also estimated by an estimator that assigns equal weights (wi = 1/n) to each cluster or an estimator that minimizes the variance of an overall mean (ȳ). These weighting schemes will be described in detail for clustered binary outcomes.
  • 49. 28 Sample Size Calculations for Clustered and Longitudinal Outcomes 2.2.2.1 Example Reports have established the effectiveness of minimally invasive periodontal surgery (MIPS) in treating osseous defects [12, 13]. Since these papers were published, new devices (including a videoscope and ultrasonic tips) have been incorporated to enhance the effectiveness of the procedure. Haffajee et al. [14] computed the intracluster correlation coefficients of periodontal measurements for five groups of treated periodontal disease subjects and one group of un- treated subjects with periodontal disease. The median intracluster correlation coefficient (ρ) is 0.067 for clinical attachment level change. Harrel et al. [12] showed clinical attachment loss (CAL) gains of 4.05 mm following application of minimally invasive periodontal surgery (MIPS) in 16 subjects presenting multiple sites with deep pockets associated with different morphologies, in- cluding furcation involvements. An investigator is proposing a prospective cohort study to evaluate the effectiveness of the MIPS using these new devices. He expects CAL gains of 3.0 mm with a standard deviation of 3.5 mm over the 1–year study period. An investigator will evaluate three sites in each subject and would like to estimate the sample size to detect the mean difference of 1.05 mm in clinical attachment loss (CAL) gains over the 1–year study period to achieve 80% power at a two–sided 5% significance level. We estimate the sample size (n) to test the null hypothesis of H0 : µ = 4.05 versus the alternative hypothesis H1 : µ = 3.0 with a two–sided 5% significance level and 80% power assuming three sites per subjects (m = 3) and ρ = 0.067. From Equation (2.1) with the fixed number of sites per subject (m = 3), the required sample size for testing H0 : µ = 4.05 versus H1 : µ = 3.0 is n = (1.96 + 0.842)2 (4.05 − 3.0)2 {1 + (3 − 1)0.067} 3 3.52 = 33. Suppose that the number of sites examined per subject varies among sub- jects with a mean of 3 and a standard deviation of 2. Then, from Equation (2.2) with a variable number of sites per subject (θ = 3 and γ = 2/3), the required sample size is n = (1.96 + 0.842)2 3.52 (4.05 − 3.0)2 {(1 − 0.067)/3 + 0.067 + 0.067(2/3)2 } = 36. 2.3 One–Sample Clustered Binary Outcomes Clustered binary outcomes occur frequently in medical and behavioral studies. Examples include the presence of cavities in one or more teeth, the presence of arthritic pain in one or more joints, the presence of infection in one or two eyes, and the occurrence of lymph node metastases in cancer patients.
  • 50. Sample Size Determination for Clustered Outcomes 29 2.3.1 Equal Cluster Size We assume that cluster sizes are equal across clusters. Let n be the total number of clusters in an experiment and m be the number of observations in each cluster. Let Yij be the binary random variable of the jth (j = 1, . . . , m) observation in the ith (i = 1, . . . , n) cluster, which is coded as 1 for response and 0 for non–response. We assume that observations within a cluster are exchangeable in the sense that, given m, Yi1, . . . , Yim have a common marginal response probability P(Yij = 1) = p(0 p 1) and a common pairwise intracluster correlation coefficient ρ = corr(Yij, Yij0 ) for j 6= j0 . Let yi = Pm j=1 Yij denote the total number of responses in the ith cluster. Under the exchangeability assumption, we have E(yi) = mp and var(yi) = mp(1 − p){1 + (m − 1)ρ}. The proportion of responses in the ith cluster is estimated by p̂i = yi/m with E(p̂i) = p. An unbiased estimate of p is p̂ = Pn i=1 p̂i/n. For large n, √ n(p̂ − p) is approximately normal with mean 0 and variance σ̂2 = p̂(1 − p̂) {1 + (m − 1)ρ̂} m , where ρ̂ can be obtained by ANOVA method. The ANOVA method suitable for continuous variables can be used to estimate the intracluster correlation coefficient for binary outcomes. Ridout et al. [15] conducted simulation studies to investigate the performance of various estimators of intracluster correlation coefficient for clustered binary data under the common intracluster correla- tion, ρ = corr(Yij, Yij0 ) for j 6= j0 . Their simulation studies showed that the ANOVA estimator performed well for clustered binary data. The ANOVA estimator of intracluster correlation coefficient can be written as ρ̂ = MSC − MSW MSC + (m − 1)MSW , where MSC = P m(p̂i − p̂)2 /(n − 1), and MSW = P yi(1 − p̂i)/{n(m − 1)}. Suppose that we wish to test the null hypothesis H0 : p = p0 versus H1 : p = p1 for p0 6= p1 at a two–sided significance level of α. Under the null hypothesis, the test statistic Z = √ n(p̂ − p0) σ̂ is asymptotically normal with mean 0 and variance 1. We reject H0 : p = p0 if the absolute value of the test statistic Z is larger than z1−α/2, the 100(1 − α/2)th percentile of the standard normal distribution. We are interested in calculating the sample size n against the alternative hypothesis H1 : p = p1 with a two–sided significance level of α and power of 1 − β. The required sample size can be obtained by solving √ n|p0 − p1|/σ̂ = z1−α/2 + z1−β. The
  • 51. 30 Sample Size Calculations for Clustered and Longitudinal Outcomes required number of clusters is n = σ̂2 (z1−α/2 + z1−β)2 (p0 − p1)2 = p1(1 − p1)(z1−α/2 + z1−β)2 (p0 − p1)2 · {1 + (m − 1)ρ̂} m . When the cluster size is 1 (m = 1), the required sample size becomes n1 = p1(1 − p1)(z1−α/2 + z1−β)2 (p0 − p1)2 . When cluster size is m(m 1), the total number of observations (nm) is {1 + (m − 1)ρ̂} times the required number of observations under m = 1. The factor {1 + (m − 1)ρ̂} is called variance inflation factor or design effect. 2.3.2 Unequal Cluster Size Let n be the total number of clusters in an experiment and mi be the number of observations in the ith (i = 1, . . . , n) cluster. The number of observations per cluster may vary at random with a certain distribution. Let Yij be the binary random variable of the jth (j = 1, . . . , mi) observation in the ith cluster, which is coded as 1 for response and 0 for non–response. We assume that observations within a cluster are exchangeable with P(Yij) = p (0 p 1) and Corr(Yij, Yij0 ) = ρ for j 6= j0 as in equal cluster size. The intracluster correlation is assumed not to vary with the number of observations per cluster. Let yi = Pmi j=1 Yij denote the total number of responses in the ith cluster. The proportion of responses in the ith cluster is estimated by p̂i = yi/mi with E(p̂i) = p. Under the exchangeability assumption, we have E(yi) = mip and var(yi) = mip(1 − p){1 + (mi − 1)ρ}. Let (w1, . . . , wn) be a set of weights assigned to clusters with wi ≥ 0 and Pn i=1 wi = 1. An unbiased estimate of p is p̂ = Pn i=1 wip̂i. Three weighting schemes have been proposed for parametric and nonparametric sample size estimation for one–sample clustered binary data [16, 17]. Three weighting schemes are equal weights to observations, equal weights to clusters, and minimum variance weights that minimize the variance of the weighted estimator. Cochran [18] and Donner and Klar [11] used the estimator p̂u = P yi/ P mi that assigns equal weights to observations with wi = mi/ Pn i0=1 mi0 . Lee [19] and Lee and Dubin [20] used the estimator p̂c = P pi/n that assigns equal weights to clusters with wi = 1/n. Ahn [21] showed that the method of assigning equal weights to clusters is preferred to the method of assigning equal weights to observations when the intracluster cor- relation is 0.6 or greater in a simulation study. Jung et al. [16] also showed that the sample size under equal weights to observations (nu) is usually smaller than that under equal weights to clusters (nc) for small ρ while nc is gener- ally smaller than nu for large ρ. If observations within a cluster are highly dependent, then making another observation from the same cluster will not
  • 52. Sample Size Determination for Clustered Outcomes 31 add much information. In this case, the method assigning equal weights to clusters is preferred to the method assigning equal weights to observations. If all clusters have an equal number of observations, then these two weighting methods are identical. Jung and Ahn [22] proposed a minimum variance estimator, p̂m, that min- imizes the variance of p̂ = Pn i=1 wip̂i. The variance of the estimator (p̂m) is minimized with weights wi = mi{1 + (mi − 1)ρ̂}−1 Pn i=1 mi{1 + (mi − 1)ρ̂}−1 , where ρ̂ can be obtained by the ANOVA method. The ANOVA estimator of intracluster correlation coefficient can be written as ρ̂ = MSC − MSW MSC + (mA − 1)MSW , where MSC = Pn i=1 mi(p̂i − p̂)2 /(n − 1), MSW = Pn i=1 yi(1 − p̂i)/(M − n), mA = (M − Pn i=1 m2 i /M)/(n − 1), and M = Pn i=1 mi. Note that pm = pu if ρ = 0 and pm = pc if ρ = 1. If cluster sizes are equal across all clusters (mi = m), then pm = pu = pc. We would like to test the null hypothesis H0 : p = p0 versus the alternative hypothesis H1 : p = p1 for p0 6= p1. The test statistic Zw = √ n(p̂w − p0) σ̂w is asymptotically normal with mean 0 and variance 1, where w = u, c, and m. Hence, we reject H0 if the absolute value of Zw is larger than z1−α/2, which is the 100(1 − α/2)th percentile of the standard normal distribution. Jung et al. [16] provided the sample size formulas needed to test the null hypothesis H0 : p = p0 versus the alternative hypothesis H1 : p = p1 with a power of 1−β using three weighting schemes of equal weights to observations, equal weights to clusters, and minimum variance weights. 2.3.2.1 Equal Weights to Observations Under equal weights to observations with wi = mi/ Pn i=1 mi, the variance of √ n(p̂u − p0) is σ̂2 u = V { √ n(p̂u − p0)} = p̂u(1 − p̂u) n P i mi{1 + (mi − 1)ρ̂} ( P i mi)2 . The test statistic Zu = √ n(p̂u − p0) σ̂u has a standard normal distribution with mean 0 and variance 1 for large n. Under the alternative hypothesis (H1 : p = p1), σ̂2 u converges to σ2 u, where σ2 u = p1(1 − p1) E(m) + [E(m2 ) − E(m)]ρ̂ [E(m)]2 ,
  • 53. 32 Sample Size Calculations for Clustered and Longitudinal Outcomes and E(m) and E(m2 ) are computed using the probability distribution of clus- ter sizes. The required sample size to test H0 : p = p0 versus H1 : p = p1 at a two–sided significance level of α and a power of 1 − β is nu = p1(1 − p1)(z1−α/2 + z1−β)2 (p0 − p1)2 {E(m) + [E(m2 ) − E(m)]ρ̂} [E(m)]2 . 2.3.2.2 Equal Weights to Clusters Under equal weights to clusters with wi = 1/n, the variance of √ n(p̂c − p0) is σ̂2 c = V { √ n(p̂c − p0)} = p̂c(1 − p̂c) 1 n X i 1 + (mi − 1)ρ̂ mi . The test statistic Zc = √ n(p̂c − p0) σ̂c is asymptotically normal with mean 0 and variance 1. Under the alternative hypothesis (H1 : p = p1), σ̂2 c converges to σ2 c , where σ2 c = p1(1 − p1){E(1/m) + {1 − E(1/m)}ρ̂}, and E(1/m) is computed using the probability distribution of cluster sizes. The required sample size with the power of 1−β for the alternative hypothesis H1 : p = p1 is nc = p1(1 − p1)(z1−α/2 + z1−β)2 (p0 − p1)2 {E(1/m) + {1 − E(1/m)}ρ̂}. 2.3.2.3 Minimum Variance Weights The variance of the estimator p̂ = Pn i=1 wip̂i is minimized when the weight, wi, is inversely proportional to the variance of p̂i, V (p̂i) = V (yi)/m2 i [22]. The weight that minimizes the variance of the estimator is wi = mi{1 + (mi − 1)ρ̂}−1 Pn i=1 mi{1 + (mi − 1)ρ̂}−1 , where ρ̂ can be obtained by the ANOVA method. The variance of p̂m is con- sistently estimated by σ̂2 m = p̂m(1 − p̂m) n−1 P i mi{1 + (mi − 1)ρ̂}−1 . The test statistic Zm = √ n(p̂m − p0) σ̂m
  • 54. Sample Size Determination for Clustered Outcomes 33 has a standard normal distribution with mean 0 and variance 1 for large n. Under the alternative hypothesis (H1 : p = p1), σ̂2 m converges to σ2 m, where σ2 m = p1(1 − p1) 1 E[m + {1 + (m − 1)ρ̂}−1] . The required sample size against the alternative hypothesis H1 : p = p1 for a two–sided significance level of α and power of 1 − β is nm = p1(1 − p1)(z1−α/2 + z1−β)2 (p0 − p1)2 1 E[m + {1 + (m − 1)ρ̂}−1] . The sample size (nm) under minimum variance estimate is always smaller than or equal to nu and nc. 2.3.2.4 Example We use the data of Hujoel et al. [23] as a pilot data to illustrate sample size calculation for clustered binary outcomes. An enzymatic diagnostic test was used to determine whether a site was infected by two specific organisms, treponema denticola and bacteroides gingivalis. Each subject had a different number of infected sites, as determined by the gold standard (an antibody assay against the two organisms). In a sample of 29 subjects, the number of true positive test results (yi) and the number of infected sites (mi) are given in Table 2.1. In the example of an enzymatic diagnostic test in Table 2.1, the ANOVA estimate (ρ̂) of the intracluster correlation coefficient is 0.20. Suppose that we would like to estimate the sample size based on the hy- pothesis H0 : p0 = .6 versus H1 : p1 = .7 using a two–sided significance level of 5% and a power of 80%. Table 2.2 shows the distribution of the number of infected sites (mi). Using the observed relative frequency from Table 2.2, E(m) = 4.897, E(1/m) = 0.224, E(m2 ) = 25.379, and E[m{1 + (m − 1)ρ̂}−1 ] = 2.704. Therefore, the required sample sizes are nu = 62, nc = 63, and nm = 61. TABLE 2.1 Proportion of infection (yi/mi) from n = 29 subjects (clusters) 3/6, 2/6, 2/4, 5/6, 4/5, 5/5, 4/6, 3/4, 2/4, 3/4, 5/5, 4/4, 6/6, 3/3, 5/6, 1/2, 4/6, 0/4, 5/6, 4/5, 4/6, 0/6, 4/5, 3/5, 0/2, 2/6, 2/4, 5/5, 4/6. TABLE 2.2 Distribution of the number of infected sites (mi) m 2 3 4 5 6 Relative frequency, f(m) 2/29 1/29 7/29 7/29 12/29
  • 55. Another Random Document on Scribd Without Any Related Topics
  • 56. drawbridge are carved the Spanish arms and an inscription recording the completion of the fort in 1756, when Ferdinand VI. was King of Spain and Don Hereda Governor of Florida. It mounted one hundred of the small guns of those days, and the interior is a square parade ground, surrounded by large casemates. Upon each side of the casemate opposite the sally-port is a niche for holy water, and at the farther end the Chapel. Dungeons and subterranean passages abound, of which ghostly tales are told. This fort is the most interesting relic of the ancient city, a picturesque place, with charms even in its dilapidation. There are other quaint structures in this curious old town. A gray gateway about ten feet wide, flanked by tall square towers, marks the northern entrance to the city, the ditch from the fort passing in front of it. In one of the streets is the palace of the Spanish Governors, since changed into a post-office. The official centre of the city is a public square, the Plaza de la Constitucion, having a monument commemorating the Spanish Liberal Constitution of 1812, and also a Confederate Soldiers' Monument. This square fronts on the sea-wall, and alongside it and stretching westward is the Alameda, known as King Street, leading to the group of grand hotels recently constructed in Spanish and Moorish style, which have made modern St. Augustine so famous. These are the Ponce de Leon, the Alcazar and the Cordova, with the Casino, adjoined by spacious and beautiful gardens. These buildings reproduce all types of the Hispano-Moorish architecture, with many suggestions from the Alhambra. The Ponce de Leon, the largest, is three hundred and eighty by five hundred and twenty feet, enclosing an open court, and its towers rise above the red-tiled roofs to a height of one hundred and sixty-five feet, the adornments in colors being very effective. To the southward of the town, adjoining the barracks, is the military cemetery, where a monument and three white pyramids tell the horrid story of the Dade massacre during the Seminole War. Major Dade, a gallant officer, and one hundred and seven men, were ambushed and massacred by eight hundred Indians in December, 1835, and their remains afterwards brought here and interred under
  • 57. the pyramids. Opposite the barracks is what is claimed to be the oldest house in the United States, occupied by Franciscan monks from 1565 to 1580, and afterwards a dwelling. It has been restored, and contains a collection of historical relics. St. Augustine has had a chequered history. In 1586, Queen Elizabeth's naval hero, Sir Francis Drake, sailing all over the world to fight Spaniards, attacked and plundered the town and burnt the greater part of it. Then for nearly a century the Indians, pirates, French, English and neighboring Georgians and Carolinians made matters lively for the harried inhabitants. In 1763 the British came into possession, but they ceded it back to Spain twenty years later, the town then containing about three hundred householders and nine hundred negroes. It became American in 1821, and was an important military post during the subsequent Seminole War, which continued several years. It was early captured by the Union forces during the Civil War, and was a valuable stronghold for them. This curious old town has many traditions that tell of war and massacre and the horrible cruelties of the Spanish Inquisition, the remains of cages in which prisoners were starved to death being shown in the fort. Its best modern story, however, is told of the escape of Coa- coo-chee, the Seminole chief, whose adventurous spirit and savage nature gained him the name of the Wild Cat. The ending of the Seminole War was the signing of a treaty by the older chiefs agreeing to remove west of the Mississippi. Coa-coo-chee, with other younger chiefs, opposed this and renewed the conflict. He was ultimately captured and taken to Fort Marion. Feigning sickness, he was removed into a casemate giving him air, there being an aperture two feet high by nine inches wide in the wall about thirteen feet above the floor, and under it a platform five feet high. Here, while still feigning illness, he became attenuated by voluntary abstinence from food, and finally one night squeezed himself through the aperture and dropped to the bottom of the moat, which was dry. Eluding all the guards, he escaped and rejoined his people. The flight caused a great sensation, and there was hot pursuit. After some time he was recaptured, and being taken before General
  • 58. Worth, was used to compel the remnant of the tribe to remove to the West. Worth told him if his people were not at Tampa in twenty days he would be killed, and he was ordered to notify them by Indian runners. He hesitated, but afterwards yielded, and the runners were given twenty twigs, one to be broken each day, so they might know when the last one was broken his life would pay the penalty. In seventeen days the task was accomplished. The tribe came to Tampa, and the captive was released, accompanying his warriors to the far West. This ended most of the Indian troubles in Florida, but some descendants of the Seminoles still exist in the remote fastnesses of the everglades. THE FLORIDA EAST COAST. All along the Atlantic shore of Florida south of St. Augustine are popular winter resorts, their broad and attractive beaches, fine climate and prolific tropical vegetation being among the charms that bring visitors. Ormond is between the ocean front and the pleasant Halifax River, its picturesque tributary, the Tomoka, being a favorite resort for picnic parties. A few miles south on the Halifax River is Daytona, known as the Fountain City, and having its suburb, the City Beautiful, on the opposite bank. New Smyrna, settled by Minorcan indigo planters in the eighteenth century, is on the northern arm of Indian River. Here are found some of the ancient Indian shell mounds that are frequent in Florida, and also the orange groves that make this region famous. Inland about thirty miles are a group of pretty lakes, and in the pines at Lake Helen is located the Southern Cassadaga, or Spiritualists' Assembly. For more than a hundred and fifty miles the noted Indian River stretches down the coast of Florida. It is a long and narrow lagoon, parallel with the ocean, and is part of the series of lagoons found on the eastern coast almost continuously for more than three hundred miles from St. Augustine south to Biscayne Bay, and varying in width from about fifty yards to six or more miles. They are shallow waters, rarely over twelve feet deep, and are entered by very shallow inlets
  • 59. from the sea. The Indian River shores, stretching down to Jupiter Inlet, are lined with luxuriant vegetation, and the water is at times highly phosphorescent. Upon the western shore are most of the celebrated Indian River orange groves whose product is so highly prized. At Titusville, the head of navigation, where there are about a thousand people, the river is about, at its widest part, six miles. Twenty miles below, at Rockledge, it narrows to about a mile in width, washing against the perpendicular sides of a continuous enclosing ledge of coquina rock, with pleasant overhanging trees. Here comes in, around an island, its eastern arm, the Banana River, and to the many orange groves are added plantations of the luscious pineapple. Various limpid streams flow out from the everglade region at the westward, and Fort Pierce is the trading station for that district, to which the remnant of the Seminoles come to exchange alligator hides, bird plumage and snake skins for various supplies, not forgetting fire-water. Below this is the wide estuary of St. Lucie River and the Jupiter River, with the lighthouse on the ocean's edge at Jupiter Inlet, the mouth of Indian River. Seventeen miles below this Inlet is Palm Beach, a noted resort, situated upon the narrow strip of land between the long and narrow lagoon of Lake Worth and the Atlantic Ocean. Here are the vast Hotel Royal Poinciana and the Palm Beach Inn, with their cocoanut groves, which also fringe for miles the pleasant shores of Lake Worth. Prolific vegetation and every charm that can add to this American Riviera bring a crowded winter population. The Poinciana is a tree bearing gorgeous flowers, and the two magnificent hotels, surrounded by an extensive tropical paradise, are connected by a wide avenue of palms a half-mile long, one house facing the lake and the other the ocean. There is not a horse in the settlement, and only one mule, whose duty is to haul a light summer car between the houses. The vehicles of Palm Beach are said to be confined to bicycles, wheel-chairs and jinrickshas. Off to the westward the distant horizon is bounded by the mysterious region of the everglades. Far down the coast the railway terminates at Miami, the southernmost railway station in the United States, a little town on
  • 60. Miami River, where it enters the broad expanse of Biscayne Bay, which is separated from the Atlantic by the first of the long chain of Florida keys. Here are many fruit and vegetable plantations, and the town, which is a railway terminal and steamship port for lines to Nassau, Key West and Havana, is growing. Nassau is but one hundred and seventy-five miles distant in the Bahamas, off the Southern Florida coast, and has become a favorite American winter tourist resort. ASCENDING ST. JOHN'S RIVER. The St. John's is the great river of Florida, rising in the region of lakes, swamps and savannahs in the lower peninsula, and flowing northward four hundred miles to Jacksonville, then turning eastward to the ocean. It comes through a low and level region, with mostly a sluggish current; is bordered by dense foliage, and in its northern portion is a series of lagoons varying in width from one to six miles. The river is navigable fully two hundred miles above Jacksonville. The earlier portion of the journey is monotonous, the shores being distant and the landings made at long piers jutting out over the shallows from the villages and plantations. At Mandarin is the orange grove which was formerly the winter home of Harriet Beecher Stowe; Magnolia amid the pines is a resort for consumptives; and nearby is Green Cove Springs, having a large sulphur spring of medicinal virtue. In all directions stretch the pine forests; and the river water, while clear and sparkling in the sunlight, is colored a dark amber from the swamps whence it comes. The original Indian name of this river was We-la-ka, or a chain of lakes, the literal meaning, in the figurative idea of the savage, being the water has its own way. It broadens into various bays, and at one of these, about seventy-five miles south of Jacksonville, is the chief town of the upper river, Palatka, having about thirty-five hundred inhabitants and a much greater winter population. It is largely a Yankee town, shipping oranges and early vegetables to the North; and across the river, just above, is one of the leading orange plantations of Florida—
  • 61. Colonel Hart's, a Vermonter who came here dying of consumption, but lived to become, in his time, the leading fruit-grower of the State. Above Palatka the river is narrower, excepting where it may broaden into a lake; the foliage is greener, the shores more swampy, the wild-fowl more frequent, and the cypress tree more general. The young cypress knees can be seen starting up along the swampy edge of the shore, looking like so many champagne bottles set to cool in the water. The river also becomes quite crooked, and here is an ancient Spanish and Indian settlement, well named Welaka, opposite which flows in the weird Ocklawaha River, the haunt of the alligator and renowned as the crookedest stream on the continent.
  • 62. On the Ocklawaha GOING DOWN THE OCKLAWAHA. The Ocklawaha, the dark, crooked water, comes from the south, by tortuous windings, through various lakes and swamps, and then turns east and southeast to flow into St. John's River, after a course of over three hundred miles. It rises in Lake Apopka, down the Peninsula, elevated about a hundred feet above the sea, the second largest of the Florida Lakes, and covering one hundred and fifty square miles. This lake has wooded highlands to the westward, dignified by the title of Apopka Mountains, which rise probably one
  • 63. hundred and twenty feet above its surface. To the northward is a group of lakes—Griffin, Yale, Eustis, Dora, Harris and others—having clear amber waters and low shores, which are all united by the Ocklawaha, the stream finally flowing northward out of Lake Griffin. This is a region of extensive settlement, mainly by Northern people. The mouth of the Ocklawaha is sixty-five miles from Lake Eustis in a straight line, but the river goes two hundred and thirty miles to get there. To the northward of this lake district is the thriving town of Ocala, with five thousand people, in a region of good agriculture and having large phosphate beds, the settlement having been originally started as a military post during the Seminole War. About five miles east of Ocala is the famous Silver Spring, which is believed to have been the fountain of perpetual youth, for which Juan Ponce de Leon vainly searched. It is the largest and most beautiful of the many Florida springs, having wonderfully clear waters, and covers about three acres. The waters can be plainly seen pouring upwards through fissures in the rocky bottom, like an inverted Niagara, eighty feet beneath the surface. It has an enormous outflow, and a swift brook runs from it, a hundred feet wide, for some eight miles to the Ocklawaha. This strange stream is hardly a river in the ordinary sense, having fixed banks and a well-defined channel, but is rather a tortuous but navigable passage through a succession of lagoons and cypress swamps. Above the Silver Spring outlet, only the smallest boats of light draft can get through the crooked channel. This outlet is thirty miles in a direct line from the mouth of the river at the St. John's, but the Ocklawaha goes one hundred and nine miles thither. The swampy border of the stream is rarely more than a mile broad, and beyond it are the higher pine lands. Through this curious channel, amid the thick cypress forests and dense jungle of undergrowth, the wayward and crooked river meanders. The swampy bottom in which it has its course is so low-lying as to be undrainable and cannot be improved, so that it will probably always remain as now, a refuge for the sub-tropical animals, birds, reptiles and insects of Florida, which abound in its inmost recesses. Here flourishes the alligator, coming
  • 64. out to sun himself at mid-day on the logs and warm grassy lagoons at the edge of the stream, in just the kinds of places one would expect to find him. Yet the alligator is said to be a coward, rarely attacking, unless his retreat to water in which to hide himself is cut off. He thus becomes more a curiosity than a foe. These reptiles are hatched from eggs which the female deposits during the spring, in large numbers, in muddy places, where she digs out a spacious cavity, fills it with several hundred eggs, and covering them thickly with mud, leaves nature to do the rest. After a long incubation the little fellows come out and make a bee-line for the nearest water. The big alligators of the neighborhood have many breakfasts on the newly-born little ones, but some manage to grow up, after several years, to maturity, and exhibit themselves along this remarkable river. It is almost impossible to conceive of the concentrated crookedness of the Ocklawaha and the difficulties of passage. It is navigated by stout and narrow flat-bottomed boats of light draft, constructed so as to quickly turn sharp corners, bump the shores and run on logs without injury. The river turns constantly at short intervals and doubles upon itself in almost every mile, while the huge cypress trees often compress the water way so that a wider boat could not get through. There are many beautiful views in its course displaying the noble ranks of cypress trees rising as the stream bends along its bordering edge of swamps. Occasionally a comparatively straight river reach opens like the aisle of a grand building with the moss- hung cypress columns in long and sombre rows on either hand. At rare intervals fast land comes down to the stream bank, where there is some cultivation attempted for oranges and vegetables. Terrapin, turtles and water-fowl abound. When the passenger boat, after bumping and swinging around the corners, much like a ponderous teetotum, halts for a moment at a landing in this swampy fastness, half-clad negroes usually appear, offering for sale partly-grown baby alligators, which are the prolific crop of the district. Various Turkey bends, Hell's half-acres, Log Jams, Bone Yards and Double S Bends are passed, and at one place is the Cypress Gate, where
  • 65. three large trees are in the way, and by chopping off parts of their roots, a passage about twenty feet wide had been secured to let the boats through. There are said to be two thousand bends in one hundred miles of this stream, and many of them are like corrugated circles, by which the narrow water way, in a mile or two of its course, manages to twist back to within a few feet of where it started. At night, to aid the navigation, the lurid glare of huge pine- knot torches, fitfully blazing, gives the scene a weird and unnatural aspect. The monotonous sameness of cypress trunks, sombre moss and twisting stream for many hours finally becomes very tiresome, but it is nevertheless a most remarkable journey of the strangest character possible in this country to sail down the Ocklawaha. LOWER FLORIDA AND THE SEMINOLES. South of the mouth of the Ocklawaha the St. John's River broadens into Lake George, the largest of its many lakes, a pretty sheet of water six to nine miles wide and twelve miles long. Volusia, the site of an ancient Spanish mission, is at the head of this lake, and the discharge from the swift but narrow stream above has made sand bars, so that jetties are constructed to deepen the channel. For a long distance the upper river is narrow and tortuous, with numerous islands and swamps, the dark coffee-colored water disclosing its origin; but the Blue Spring in one place is unique, sending out an ample and rich blue current to mix with the amber. Then Lake Monroe is reached, ten miles long and five miles wide, the head of navigation, by the regular lines of steamers, one hundred and seventy miles above Jacksonville. Here are two flourishing towns, Enterprise on the northern shore and Sanford on the southern, both popular winter resorts, and the latter having two thousand people. The St. John's extends above Lake Monroe, a crooked, narrow, shallow stream, two hundred and fourteen miles farther southeastward to its source. The region through which it there passes is mostly a prairie with herds of cattle and much game, and is only sparsely settled. The upper river approaches the seacoast,
  • 66. being in one place but three miles from the lagoons bordering the Atlantic. To the southward of Lake Monroe are the winter resorts of Winter Park and Orlando, the latter a town of three thousand population. There are numerous lakes in this district, and then leaving the St. John's valley and crossing the watershed southward through the pine forests, the Okeechobee waters are reached, which flow down to that lake. This region was the home of a part of the Seminole Indians, and Tohopekaliga was their chief, whom they revered so highly that they named their largest lake in his honor. The Kissimmee River flows southward through this lake, and then traverses a succession of lakes and swamps to Lake Okeechobee, about two hundred miles southward by the water-line. Kissimmee City is on Lake Tohopekaliga, and extensive drainage operations have been conducted here and to the southward, reclaiming a large extent of valuable lands, and lowering the water-level in all these lakes and attendant swamps. From Lake Tohopekaliga through the tortuous water route to Lake Okeechobee, and thence by the Caloosahatchie westward to the Gulf of Mexico, is a winding channel of four hundred and sixty miles, though in a direct line the distance is but one hundred and fifty miles. Okeechobee, the word meaning the large water, covers about twelve hundred and fifty square miles, and almost all about it are the everglades or grass water, the shores being generally a swampy jungle. This district for many miles is a mass of waving sedge grass eight to ten feet high above the water, and inaccessible excepting through narrow, winding and generally hidden channels. In one locality a few tall lone pines stand like sentinels upon Arpeika Island, formerly the home of the bravest and most dreaded of the Seminoles, and still occupied by some of their descendants. The name of the Seminole means the separatist or runaway Indians, they having centuries ago separated from the Creeks in Georgia and gone southward into Florida. From the days of De Soto to the time of their deportation in the nineteenth century the Spanish, British, French and Americans made war with these Seminole Indians. Gradually they were pressed southward through Florida. Their final
  • 67. refuge was the green islands and hummocks of the everglades, and they then clung to their last homes with the tenacity of despair. The greater part of this region is an unexplored mystery; the deep silence that can be actually felt, everywhere pervades; and once lost within the labyrinth, the adventurer is doomed unless rescued. Only the Indians knew its concealed and devious paths. On Arpeika Island the Cacique of the Caribs is said to have ruled centuries ago, until forced south out of Florida by the Seminoles. It was at times a refuge for the buccaneer with his plunder and a shrine for the missionary martyr who planted the Cross and was murdered beside it. This island was the last retreat of the Seminoles in the desultory war from 1835 to 1843, when they defied the Government, which, during eight years, spent $50,000,000 upon expeditions sent against them. Then the attempt to remove all of them was abandoned, and the remnant have since rested in peace, living by hunting and a little trading with the coast settlements. The names of the noted chiefs of this great race—Osceola, Tallahassee, Tohopekaliga, Coa-coo-chee and others—are preserved in the lakes, streams and towns of Florida. Most of the deported tribe were sent to the Indian Territory. There may be three or four hundred of them still in the everglades, peaceful, it is true, yet haughty and suspicious, and sturdily rejecting all efforts to educate or civilize them. They celebrate their great feast, the Green Corn Dance, in late June; and they have unwavering faith in the belief that the time will yet come when all their prized everglade land will be theirs again, and the glory of the past redeemed, if not in this world, then in the next one, beyond the Big Sleep. WESTERN FLORIDA. Westward from Jacksonville, a railway runs through the pine forests until it reaches the rushing Suwanee River, draining the Okifenokee swamp out to the Gulf, just north of Cedar Key. This stream is best known from the minstrel song, long so popular, of the Old Folks at Home. Beyond it the land rises into the rolling country of Middle
  • 68. Florida, the undulating surface sometimes reaching four hundred feet elevation, and presenting fertile soil and pleasant scenery, with a less tropical vegetation than the Peninsula of Florida. Here is Tallahassee, the capital of the State, one hundred and sixty-five miles from Jacksonville, a beautiful town of four thousand population, almost embedded in flowering plants, shrubbery and evergreens, and familiarly known from these beauties as the Floral City, the gardens being especially attractive in the season of roses. The Capitol and Court-house and West Florida Seminary, set on a hill, are the chief public buildings. In the suburbs, at Monticello, lived Prince Achille Murat, a son of the King of Naples, who died in 1847, and his grave is in the Episcopal Cemetery. There are several lakes near the town, one of them the curious Lake Miccosukie, which contracts into a creek, finally disappearing underground. The noted Wakulla Spring, an immense limestone basin of great depth and volume of water, with wonderful transparency, is fifteen miles southward. Some distance to the westward the Flint and Chattahoochee Rivers join to form the Appalachicola River, flowing down to the Gulf at Appalachicola, a somewhat decadent port from loss of trade, its exports being principally lumber and cotton. The shallowness of most of these Gulf harbors, which readily silt up, destroys their usefulness as ports for deep-draft shipping. The route farther westward skirts the Gulf Coast, crosses Escambia Bay and reaches Pensacola, on its spacious harbor, ten miles within the Gulf. This is the chief Western Florida port, with fifteen thousand people, having a Navy Yard and much trade in lumber, cotton, coal and grain, a large elevator for the latter being erected in 1898. The Spaniards made this a frontier post in 1696, and the remains of their forts, San Miguel and San Bernardo, can be seen behind the town, while near the outer edge of the harbor is the old-time Spanish defensive battery, Fort San Carlos de Barrancos. The harbor entrance is now defended by Fort Pickens and Fort McRae. Pensacola Bay was the scene of one of the first spirited naval combats of the Civil War, when the Union forces early in 1862 recaptured the Navy Yard and defenses. The name of
  • 69. Pensacola was originally given by the Choctaws to the bearded Europeans who first settled there, and signifies the hair people. THE FLORIDA GULF COAST. The coast of Florida on the Gulf of Mexico has various attractive places, reached by a convenient railway system. Homosassa is a popular resort about fifty miles southwestward from Ocala. A short distance in the interior is the locality where the Seminoles surprised and massacred Major Dade and his men in December, 1835, only three soldiers escaping alive to tell the horrid tale. The operations against these Indians were then mainly conducted from the military post of Tampa, and thither were taken for deportation the portions of the tribe that were afterwards captured, or who surrendered under the treaty. When Ferdinand de Soto entered this magnificent harbor on his voyage of discovery and gold hunting, he called it Espiritu Sancto Bay. It is from six to fifteen miles wide, and stretches nearly forty miles into the land, being dotted with islands, its waters swarming with sea-fowl, turtles and fish, deer abounding in the interior and on some of the islands, and there being abundant anchorage for the largest vessels. This is the great Florida harbor and the chief winter resort on the western coast. It was the main port of rendezvous and embarkation for the American forces in the Spanish War of 1898. The head of the harbor divides into Old Tampa and Hillsborough Bays, and on the latter and at the mouth of Hillsborough River is the city, numbering about twenty-five thousand inhabitants. The great hotels are surrounded by groves with orange and lemon trees abounding, and everything is invoked that can add to the tourist attractions. The special industry of the resident population is cigar-making. Port Tampa is out upon the Peninsula between the two bays, several miles below the city, and a long railway trestle leads from the shore for a mile to deep water. Upon the outer end of this long wharf is Tampa Inn, built on a mass of piles, much like some of the constructions in Venice. The guests can almost catch fish out of the bedroom windows, and while eating
  • 70. breakfast can watch the pelican go fishing in the neighboring waters, for this queer-looking bird, with the duck and gull, is everywhere seen in these attractive regions. An outer line of keys defends Tampa harbor from the storms of the Gulf. There are many popular resorts on the islands and shores of Tampa Bay, and regular lines of steamers are run to the West India ports, Mobile and New Orleans. All the surroundings are attractive, and a pleased visitor writes of the place: Conditions hereabouts exhilarate the men; a perpetual sun and ocean breeze are balm to the invalid and an inspiration to a robust health. The landscape affords uncommon diversion, and the sea its royal sport with rod and gaff. Farther down the coast is Charlotte Harbor, also deeply indented and sheltered from the sea by various outlying islands. It is eight to ten miles long and extends twenty-five miles into the land, having valuable oyster-beds and fisheries, and its port is Punta Gorda. Below this is the projecting shore of Punta Rassa, where the outlet of Lake Okeechobee, the Caloosahatchie River, flows to the sea, having the military post of Fort Myers, another popular resort, a short distance inland, upon its bank. The Gulf Coast now trends to the southeast, with various bays, in one of which, with Cape Romano as the guarding headland, is the archipelago of the ten thousand islands, while below is Cape Sable, the southwestern extremity of Florida. To the southward, distant from the shore, are the long line of Florida Keys, the name coming from the Spanish word cayo, an island. This remarkable coral formation marks the northern limit of the Gulf Stream, where it flows swiftly out to round the extremity of the Peninsula and begin its northern course through the Atlantic Ocean. Although well lighted and charted, the Straits of Florida along these reefs are dangerous to navigate and need special pilots. Nowhere rising more than eight to twelve feet above the sea, the Keys thus low-lying are luxuriantly covered with tropical vegetation. From the Dry Tortugas at the west, around to Sand's Key at the entrance to Biscayne Bay, off the Atlantic Coast, about two hundred miles, is a continuous reef of coral, upon the whole extent of which the little builder is still industriously working. The reef is occasionally
  • 71. broken by channels of varying depth, and within the outer line are many habitable islands. The whole space inside this reef is slowly filling up, just as all the Keys are also slowly growing through accretions from floating substances becoming entangled in the myriad roots of the mangroves. The present Florida Reef is a good example of the way in which a large part of the Peninsula was formed. No less than seven old coral reefs have been found to exist south of Lake Okeechobee, and the present one at the very edge of the deep water of the Gulf Stream is probably the last that can be formed, as the little coral-builder cannot live at a greater depth than sixty feet. The Gulf Stream current is so swift and deep along the outer reef that there is no longer a foundation on which to build. The Gulf Stream is the best known of all the great ocean currents. The northeast and southeast trade-winds, constantly blowing, drive a great mass of water from the Atlantic Ocean into the Caribbean Sea, and westward through the passages between the Windward Islands, which is contracted by the converging shores of the Yucatan Peninsula and the Island of Cuba, so that it pours between them into the Gulf of Mexico, raising its surface considerably above the level of the Atlantic. These currents then move towards the Florida Peninsula, and pass around the Florida Reef and out into the Atlantic. It is estimated by the Coast Survey that the hourly flow of the Gulf Stream past the reef is nearly ninety thousand million tons of water, the speed at the surface of the axis of the stream being over three and one-half miles an hour. To conceive what the immensity of this flow means, it is stated that if a single hour's flow of water were evaporated, the salt thus produced would require to carry it one hundred times the number of ocean-going vessels now afloat. The Gulf Stream water is of high temperature, great clearness and a deep blue color; and when it meets the greener waters of the Atlantic to the northward, the line of distinction is often very well defined. At the exit to the Atlantic below Jupiter Inlet the stream is forty-eight miles wide to Little Bahama Bank, and its depth over four hundred fathoms.
  • 72. There are numerous harbors of refuge among the Florida Keys, and that at Key West is the best. This is a coral island seven miles long and one to two miles broad, but nowhere elevated more than eleven feet above the sea. Its name, by a free translation, comes from the original Spanish name of Cayo Hueso, or the Bone Island, given because the early mariners found human bones upon it. Here are twenty thousand people, mostly Cubans and settlers from the Bahamas, the chief industry being cigar-making, while catching fish and turtles and gathering sponges also give much employment. There are no springs on the island, and the inhabitants are dependent on rain or distillation for water. The air is pure and the climate healthy, the trees and shrubbery, with the residences embowered in perennial flowers, giving the city a picturesque appearance. Key West has a good harbor, and as it commands the gateway to and from the Gulf near the western extremity of the Florida coral reef, it is strongly defended, the prominent work being Fort Taylor, constructed on an artificial island within the main harbor entrance. The little Sand Key, seven miles to the southwest, is the southernmost point of the United States. Forty miles to the westward is the group of ten small, low and barren islands known as the Dry Tortugas, from the Spanish tortuga, a tortoise. Upon the farthest one, Loggerhead Key, stands the great guiding light for the Florida Reef, of which this is the western extremity, the tower rising one hundred and fifty feet. Fort Jefferson is on Garden Key, where there is a harbor, and in it were confined various political prisoners during the Civil War, among them some who were concerned in the conspiracy to assassinate President Lincoln. Here, with the encircling waters of the Gulf all around us, terminates this visit to the Sunny South. As we have progressed, the gradual blending of the temperate into the torrid zone, with the changing vegetation, has reminded of Bayard Taylor's words:
  • 73. There, in the wondering airs of the Tropics, Shivers the Aspen, still dreaming of cold: There stretches the Oak from the loftiest ledges, His arms to the far-away lands of his brothers, And the Pine tree looks down on his rival, the Palm. And as the journey down the Florida Peninsula has displayed some of the most magnificent winter resorts of the American Riviera, with their wealth of tropical foliage, fruits and flowers, and their seductive and balmy climate, this too has reminded of Cardinal Damiani's glimpse of the Joys of Heaven: Stormy winter, burning summer, rage within these regions never, But perpetual bloom of roses and unfading spring forever; Lilies gleam, the crocus glows, and dropping balms their scents deliver. Along this famous peninsula the sea rolls with ceaseless beat upon some of the most gorgeous beaches of the American coast. To the glories of tropical vegetation and the charms of the climate, Florida thus adds the magnificence of its unrivalled marine environment. Everywhere upon these pleasant coasts— The bridegroom, Sea, Is toying with his wedded bride,—the Shore. He decorates her shining brow with shells, And then retires to see how fine she looks, Then, proud, runs up to kiss her.
  • 74. TRAVERSING THE PRAIRIE LAND. VI. TRAVERSING THE PRAIRIE LAND. The Northwest Territory—Beaver River—Fort McIntosh—Mahoning Valley— Steubenville—Youngstown—Canton—Massillon—Columbus—Scioto River—Wayne Defeats the Miamis—Sandusky River—Findlay—Natural Gas Fields—Fort Wayne— Maumee River—The Little Turtle—Old Tippecanoe—Tecumseh—Battle of Tippecanoe—Harrison Defeats the Prophet—Tecumseh Slain in Canada— Indianapolis—Wabash River—Terre Haute—Illinois River—Springfield—Lincoln's Home and Tomb—Peoria—The Great West—Lake Erie—Tribe of the Cat— Conneaut—The Western Reserve—Ashtabula—Mentor—Cleveland—Cuyahoga River—Moses Cleaveland—Euclid Avenue—Oberlin—Elyria—The Fire Lands— Sandusky—Put-in-Bay Island—Perry's Victory—Maumee River—Toledo—South Bend—Chicago—The Pottawatomies—Fort Dearborn—Chicago Fire—Lake Michigan—Chicago River—Drainage Canal—Lockport—Water Supply—Fine Buildings, Streets and Parks—University of Chicago—Libraries—Federal Steel Company—Great Business Establishments—Union Stock Yards—The Hog—The Board of Trade—Speculative Activity—George M. Pullman—The Sleeping Car— The Pioneer—Town of Pullman—Agricultural Wealth of the Prairies—The Corn Crop—Whittier's Corn Song. THE NORTHWEST TERRITORY. Beyond the Allegheny ranges, which are gradually broken down into their lower foothills, and then to an almost monotonous level, the expansive prairie lands stretch towards the setting sun. From their prolific agriculture has come much of the wealth and prosperity of the United States. The rivers flowing out of the mountains seek the
  • 75. Mississippi Valley, thus reaching the sea through the Great Father of Waters. Among these rivers is the Ohio, and at its confluence with the Beaver, near the western border of Pennsylvania, was, in the early days, the Revolutionary outpost of Fort McIntosh, a defensive work against the Indians. All about is a region of coal and gas, extending across the boundary into the Mahoning district of Ohio, the Mahoning River being an affluent of the Beaver. Numerous railroads serve its many towns of furnaces and forges. To the southward is Steubenville on the Ohio, and to the northward Youngstown on the Mahoning, both busy manufacturing centres. Salem and Alliance are also prominent, and some distance northwest is Canton, a city of thirty thousand people, in a fertile grain district, the home of President William McKinley. Massillon, upon the pleasant Tuscarawas River, in one of the most productive Ohio coal-fields, preserves the memory of the noted French missionary priest, Jean Baptiste Massillon, for all this region was first traversed, and opened to civilization, by the French religious explorers from Canada who went out to convert the Indians. In the centre of the State of Ohio is the capital, Columbus, built on the banks of the Scioto River, a tributary of the Ohio flowing southward and two hundred miles long. This river receives the Olentangy or Whetstone River at Columbus, in a region of great fertility, which is in fact the characteristic of the whole Scioto Valley. The Ohio capital, which has a population of one hundred and twenty thousand, large commerce and many important manufacturing establishments, dates from 1812, and became the seat of the State Government in 1816. The large expenditures of public money upon numerous public institutions, all having fine buildings, the wide, tree- shaded streets, and the many attractive residences, have made it one of the finest cities in the United States. Broad Street, one hundred and twenty feet wide, beautifully shaded with maples and elms, extends for seven miles. The Capitol occupies a large park surrounded with elms, and is an impressive Doric building of gray limestone, three hundred and four feet long and one hundred and eighty-four feet wide, the rotunda being one hundred and fifty-seven
  • 76. feet high. There are fine parks on the north, south and east of the city, the latter containing the spacious grounds of the Agricultural Society. Almost all the Ohio State buildings, devoted to its benevolence, justice or business, have been concentrated in Columbus, adding to its attractions, and it is also the seat of the Ohio State University with one thousand students. Railroads radiate in all directions, adding to its commercial importance. In going westward, the region we are traversing beyond the Pennsylvania boundary gradually changes from coal and iron to a rich agricultural section. As we move away from the influence of the Allegheny ranges, the hills become gentler, and the rolling surface is more and more subdued, until it is smoothed out into an almost level prairie, heavily timbered where not yet cleared for cultivation. This was the Northwest Territory, first explored by the French, who were led by the Sieur de la Salle in his original discoveries in the seventeenth century. The French held it until the conquest of Canada, when that Dominion and the whole country west to the Mississippi River came under the British flag by the treaty of 1763. After the Revolution, the various older Atlantic seaboard States claiming the region, ceded sovereignty to the United States Government, and then its history was chequered by Indian wars until General Wayne conducted an expedition against the Miamis and defeated them in 1794, after which the Northwest Territory was organized, and the State of Ohio taken out of it and admitted to the Union in 1803, its first capital being Chillicothe. It was removed to Zanesville for a couple of years, but finally located at Columbus. Beyond the Scioto the watershed is crossed, by which the waters of the Ohio are left behind and the valley of Sandusky River is reached, a tributary of Lake Erie. Here is Bucyrus, in another prolific natural gas region, the centre of which is Findlay. At this town, in 1887, the inhabitants, who had then had just one year of natural gas development, spent three days in exuberant festivity, to show their appreciation of the wonderful discovery. They had thirty-one gas wells pouring out ninety millions of cubic feet in a day, all piped into
  • 77. town and feeding thirty thousand glaring natural gas torches of enormous power, which blew their roaring flames as an accompaniment to the oratory of John Sherman and Joseph B. Foraker, who were then respectively Senator and Governor of Ohio. The soldiers and firemen paraded, and a multitude of brass bands tried to drown the Niagara of gas which was heard roaring five miles away, while the country at night was illuminated for twenty miles around. But the wells have since diminished their flow, although the gas still exists; while another field with a prolific yield is in Fairfield County, a short distance southeast of Columbus. Over the State boundary in Indiana is yet another great gas-field covering five thousand square miles in a dozen counties, with probably two thousand wells and a yield which has reached three thousand millions of cubic feet in a day. This gas supplies many cities and towns, including Chicago, and it is one of the greatest gas-fields known. In the same region there are also large petroleum deposits. Not far beyond the State boundary is Fort Wayne, the leading city of Northern Indiana, having forty thousand population, an important railway centre, and prominent also in manufactures. It stands in a fertile agricultural district, and being located at the highest part of the gentle elevation, beyond the Sandusky Valley, diverting the waters east and west, it is appropriately called the Summit City. Here the Maumee River is formed by the confluence of the two streams St. Joseph and St. Mary, and flows through the prairie towards the northeast, to make the head of Lake Erie. The French, under La Salle, in the eighteenth century established a fur-trading post here, and erected Fort Miami, and in 1760 the British penetrated to this then remote region and also built a fort. During the Revolution this country was abandoned to the Indians, but when General Wayne defeated the Miamis in 1794 he thought the place would make a good frontier outpost to hold the savages in check, and he then constructed a strong work, to which he gave the name of Fort Wayne. Around this post the town afterwards grew, being greatly prospered by the Wabash and Erie Canal, and by the various railways subsequently constructed in all directions. All this prairie
  • 78. region was the hunting-ground of the Miamis, whose domain extended westward to Lake Michigan, and southward along the valley of the Miami River to the Ohio. They were a warlike and powerful tribe, and their adherence to the English during the Revolution provoked almost constant hostilities with the settlers who afterwards came across the mountains to colonize the Northwest Territory. Under the leadership of their renowned chief Mishekonequah, or the Little Turtle, they defeated repeated expeditions sent against them, until finally beaten by Wayne. Subsequently they dwindled in importance, and when removed farther west, about 1848, they numbered barely two hundred and fifty persons. OLD TIPPECANOE. Some distance westward is the Tippecanoe River, a stream flowing southwest into the Wabash, and thence into the Ohio. The word Tippecanoe is said to mean the great clearing, and on this river was fought the noted battle by Old Tippecanoe, General William Henry Harrison, against the combined forces of the Shawnees, Miamis and several other tribes, which resulted in their complete defeat. They were united under Elskwatawa, or the Prophet, the brother of the famous Tecumseh. These two chieftains were Shawnees, and they preached a crusade by which they gathered all the northwestern tribes in a concerted movement to resist the steady encroachments of the whites. The brother, who was a medicine man, in 1805 set up as an inspired prophet, denouncing the use of liquors, and of all food, manners and customs introduced by the hated palefaces, and confidently predicted they would ultimately be driven from the land. For years both chiefs travelled over the country stirring up the Indians. General Harrison, who was the Governor of the Northwest Territory, gathered his forces together and advanced up the Wabash against the Prophet's town of Tippecanoe, when the Indians, hoping to surprise him, suddenly attacked his camp, but he being prepared, they were signally
  • 79. defeated, thus giving Harrison his popular title of Old Tippecanoe, which had much to do with electing him President in 1840. Some time after this defeat the War of 1812 broke out, when Tecumseh espoused the English cause, went to Canada with his warriors, and was made a brigadier-general. He was killed there in the battle of the Thames, in Ontario Province, and it is said had a premonition of death, for, laying aside his general's uniform, he put on a hunting- dress and fought desperately until he was slain. Tecumseh was the most famous Indian chief of his time, and the honor of killing him was claimed by several who fought in the battle, so that the problem of Who killed Tecumseh? was long discussed throughout the country. The State of Indiana was admitted into the Union in 1816, and in its centre, built upon a broad plain, on the east branch of White River, is its capital and largest city, Indianapolis, having two hundred thousand population. This is a great railway centre, having lines radiating in all directions, and it also has extensive manufactures and a large trade in live stock. The city plan, with wide streets crossing at right angles, and four diagonal avenues radiating from a circular central square, makes it very attractive; and the residential quarter, displaying tasteful houses, ornate grounds and shady streets, is regarded as one of the most beautiful in the country. The State Capitol, in a spacious park, is a Doric building with colonnade, central tower and dome, and in an enclosure on its eastern front is erected one of the finest Soldiers' and Sailors' Monuments existing, rising two hundred and eighty-five feet, out-topping everything around, having been designed and largely constructed in Europe. There are also many prominent public buildings throughout the city. Indianapolis, first settled in 1819, had but a small population until the railways centred there, the Capitol being removed from Corydon in 1825. The Wabash River, to which reference has been made, receives White River, and is one of the largest affluents of the Ohio, about five hundred and fifty miles long, being navigable over half that length. It rises in the State of Ohio, flows across Indiana, and, turning southward, makes for a long distance the Illinois boundary.
  • 80. Its chief city is Terre Haute, the High Ground, about seventy miles west of Indianapolis, another prominent railroad centre, having forty- five thousand people, with extensive manufactures. It is surrounded by valuable coal-fields, is built upon an elevated plateau, and, like all these prairie cities, is noted for its many broad and well-shaded streets. It was founded in 1816. THE GREAT WEST. Progressing westward, the timbered prairie gradually changes to the grass-covered prairie, spreading everywhere a great ocean of fertility. Across the Wabash is the Prairie State of Illinois, its name coming from its principal river, which the Indians named after themselves. The word is a French adaptation of the Indian name Illini, meaning the superior men, the earliest explorers and settlers having been French, the first comers on the Illinois River being Father Marquette and La Salle. At the beginning of the eighteenth century their little settlements were flourishing, and the most glowing accounts were sent home, describing the region, which they called New France, on account of its beauty, attractiveness and prodigious fertility, as a new Paradise. There were many years of Indian conflicts and hostility, but after peace was restored and a stable government established, population flowed in, and Illinois was admitted as a State to the Union in 1818. The capital was established at Springfield in 1837, an attractive city of about thirty thousand inhabitants, built on a prairie a few miles south of Sangamon River, a tributary of the Illinois, and from its floral development and the adornment of its gardens and shade trees, Springfield is popularly known as the Flower City. There is a magnificent State Capitol with high surmounting dome, patterned somewhat after the Federal Capitol at Washington. Springfield has coal-mines which add to its prosperity, but its great fame is connected with Abraham Lincoln. He lived in Springfield, and the house he occupied when elected President has been acquired by the State and is on public exhibition. After his assassination in 1865, his
  • 81. remains were brought from Washington to Springfield, and interred in the picturesque Oak Ridge Cemetery, in the northern suburbs, where a magnificent monument was erected to his memory and dedicated in 1874. About sixty miles north of Springfield, the Illinois River expands into Peoria Lake, and here came La Salle down the river in 1680, and at the foot of the lake established a trading-post and fort, one of the earliest in that region. When more than a century had elapsed, a little town grew there which is now the busy industrial city of Peoria, famous for its whiskey and glucose, and turning out products that annually approximate a hundred millions, furnishing vast traffic for numerous railroads. It is the chief city of the corn belt, and is served by all the prominent trunk railway lines. Like the pioneers of a hundred years ago, we have left the Atlantic seaboard, crossed the Allegheny Mountains and entered the expansive Northwest Territory, which in the first half of the nineteenth century was the Mecca of the colonist and frontiersman. This was then the region of the Great West, though that has since moved far beyond the Mississippi. Its agricultural wealth made the prosperity of the country for many decades, and its prodigious development was hardly realized until put to the test of the Civil War, when it poured out the men and officers, and had the staying qualities so largely contributing to the result of that great conflict. Gradually overspread by a network of railways, the numerous cross- roads have expanded everywhere into towns and cities, almost all patterned alike, and all of them centres of rich farming districts. Coal, oil and gas have come to minister to its manufacturing wants, and thus growing into mature Commonwealths, this prolific region in the later decades has been itself, in turn, contributing largely to the tide of migration flowing to the present Great Northwest, a thousand miles or more beyond. It presents a rich agricultural picture, but little scenic attractiveness. Everywhere an almost dead level, the numerous railways cross and recross the surface in all directions at grade, and are easily built, it being only necessary to dig a shallow ditch on either side, throw the earth in the centre, and
  • 82. lay the ties and rails. Nature has made the prairie as smooth as a lake, so that hardly any grading is necessary, and the region of expansive green viewed out of the car window has been aptly described as having a face but no features, when one looks afar over an ocean of waving verdure. LAKE ERIE. This vast prairie extends northward to and beyond the Great Lakes, and it is recorded that in the early history of the proposed legislation for the Northwest Territory, Congress gravely selected as the names of the States which were to be created out of it such ponderous conglomerates as Metropotamia, Assenispia, Pelisipia and Polypotamia, titles which happily were long ago permitted to pass into oblivion. Northward, in Ohio, the region stretches to Lake Erie, the most southern and the smallest of the group of Great Lakes above Niagara. It is regarded as the least attractive lake, having neither romances nor much scenery. Yet, from its favorable position, it carries an enormous commerce. It is elliptical in form, about two hundred and forty miles long and sixty miles broad, the surface being five hundred and sixty-five feet above the ocean level. It is a very shallow lake, the depth rarely exceeding one hundred and twenty feet, excepting at the lower end, while the other lakes are much deeper, and in describing this difference of level it is said that the surplus waters poured from the vast basins of Superior, Michigan and Huron, flow across the plate of Erie into the deep bowl of Ontario. This shallowness causes it to be easily disturbed, so that it is the most dangerous of these fresh-water seas, and it has few harbors, and those very poor, especially upon the southern shore. The bottom of the lake is a light, clayey sediment, rapidly accumulated from the wearing away of the shores, largely composed of clay strata. The loosely-aggregated products of these disintegrated strata are frequently seen along its coast, forming cliffs extending back into elevated plateaus, through which the rivers cut deep channels. Their mouths are clogged by sand-bars, and
  • 83. dredging and breakwaters have made the harbors on the southern shore, around which have grown the chief towns—Dunkirk, Erie, Ashtabula, Cleveland, Sandusky and Toledo. The name of Lake Erie comes from the Indian tribe of the Cat, whom the French called the Chats, because their early explorers, penetrating to the shores of the lake, found them abounding in wild cats, and thus they gave the same name to the cats and the savages. In their own parlance, these Indians were the Eries, and in the seventeenth century they numbered about two thousand warriors. In 1656 the Iroquois attacked and almost annihilated them. The Lake Erie ports in the Buckeye State of Ohio, so called from the buckeye tree, are chiefly harbors for shipping coal and receiving ores from the upper lakes, their railroads leading to the great industrial centres to the southward. Near the eastern boundary of Ohio is Conneaut, on the bank of a wide and deep ravine, formed by a small river, broadening into a bay at the shore of the lake, the name meaning many fish. Here landed in 1796 the first settlers from Connecticut, who entered the Western Reserve, as all this region was then called. On July 4th of that year, celebrating the national anniversary, they pledged each other in tin cups of lake water, accompanied by a salute of fowling-pieces, and the next day began building the first house on the Reserve, constructed of logs, and long known as Stow Castle. Conneaut is consequently known as the Plymouth of the Western Reserve, as here began the settlements made by the Puritan New England migration to Ohio. On deep ravines making their harbors are Ashtabula, an enormous entrepôt for ores, and a few miles farther westward, Painesville, on Grand River, named for Thomas Paine. Beyond is Mentor, the home of the martyred President Garfield, whose large white house stands near the railway. All along here, the southern shore of Lake Erie is a broad terrace at eighty to one hundred feet elevation above the water, while farther inland is another and considerably higher plateau. Each sharp declivity facing northward seems at one time to have been the actual shore of the lake when its surface before the waters receded was much higher than now. The outer plateau
  • 84. having once been the overflowed lake bed, is level, excepting where the crooked but attractive streams have deeply cut their winding ravines down through it to reach Lake Erie. THE CITY OF CLEVELAND. Thus we come to Cleveland, the second city in Ohio, having four hundred thousand people, and extensive manufacturing industries. It is the capital of the Western Reserve and the chief city of Northern Ohio, its commanding position upon a high bluff, falling off precipitously to the edge of the water, giving it the most attractive situation on the shore of Lake Erie. Shade trees embower it, including many elms planted by the early settlers, who learned to love them in New England, and hence it delights in the popular title of the Forest City. Were not the streets so wide, the profusion of foliage might make Cleveland seem like a town in the woods. The little Cuyahoga River, its name meaning the crooked stream, flows with wayward course down a deeply washed and winding ravine, making a valley in the centre of the city, known as the Flats, and this, with the tributary ravines of some smaller streams, is packed with factories and foundries, oil refineries and lumber mills, their chimneys keeping the business section constantly under a cloud of smoke. Railways run in all directions over these flats and through the ravines, while, high above, the city has built a stone viaduct nearly a half-mile long, crossing the valley. Here are the great works of the Standard Oil Company, controlling that trade, and several of the petroleum magnates have their palaces in the city. Old Moses Cleaveland, a shrewd but unsatisfied Puritan of the town of Windham, Connecticut, became the agent of the Connecticut Lead Company, who brought out the first colony in 1796 that landed at Conneaut. They explored the lake shore, and selecting as a good location the mouth of Cuyahoga River, Moses wrote back to his former home that they had found a spot on the bank of Lake Erie which was called by my name, and I believe the child is now born that may live to see that place as large as old Windham. In little
  • 85. over a century the town has grown far beyond his wildest dreams, although it did not begin to expand until the era of canals and railways, and it was not so long ago that the people in grateful memory erected a bronze statue of the founder. One of the local antiquaries, delving into the records, has found why various original settlers made their homes at Cleveland. He learned that one man, on his way farther West, was laid up with the ague and had to stop; another ran out of money and could get no farther; another had been to St. Louis and wanted to get back home, but saw a chance to make money in ferrying people across the river; another had $200 over, and started a bank; while yet another thought he could make a living by manufacturing ox-yokes, and he stayed. This earnest investigator continues: A man with an agricultural eye would look at the soil and kick his toe into it, and then would shake his head and declare that it would not grow white beans—but he knew not what this soil would bring forth; his hope and trust was in beans, he wanted to know them more, and wanted potatoes, corn, oats and cabbage, and he knew not the future of Euclid Avenue. On either side of the deep valley of the Flats stretch upon the plateau the long avenues of Cleveland, with miles of pleasant residences, surrounded by lawns and gardens, each house isolated in green, and the whole appearing like a vast rural village more than a city. This pleasant plan of construction had its origin in the New England ideas of the people. Yet the city also has a numerous population of Germans, and it is recorded that one of the early landowners wrote, in explaining his project of settlement: If I make the contract for thirty thousand acres, I expect with all speed to send you fifteen or twenty families of prancing Dutchmen. These Teutons came and multiplied, for the original Puritan stock can hardly be responsible for the vineyards of the neighborhood, the music and dancing, and the public gardens along the pleasant lake shore, where the crowds go, when work is over, to enjoy recreation and watch the gorgeous summer sunsets across the bosom of the lake which are the glory of Cleveland. Upon the plateau, the centre of the city, is the Monumental Park, where stand the statue of Moses
  • 86. Cleaveland, the founder, who died in 1806, and a fine Soldiers' Monument, with also a statue of Commodore Perry. This Park is an attractive enclosure of about ten acres, having fountains, gardens, monuments and a little lake, and it is intersected at right angles by two broad streets, and surrounded by important buildings. One of the streets is the chief business highway, Superior Street, and the other leads down to the edge of the bluff on the lake shore, where the steep slope is made into a pleasure-ground, with more flower- beds and fountains and a pleasant outlook over the water, although at its immediate base is a labyrinth of railroads and an ample supply of smoke from the numerous locomotives. A long breakwater protects the harbor entrance, and out under the lake is bored the water-works tunnel. There extends far to the eastward, from a corner of the Monumental Park, Cleveland's famous street—Euclid Avenue. The people regard it as the handsomest highway in America, in the combined magnificence of houses and grounds. It is a level avenue of about one hundred and fifty feet width, with a central roadway and stone footwalks on either hand, shaded by rows of grand overarching elms, and bordered on both sides by well-kept lawns. This is the public highway, every part being kept scrupulously neat, while a light railing marks the boundary between the street and the private grounds. For a long distance this noble avenue is bordered by stately residences, each surrounded by ample gardens, the stretch of grass, flowers and foliage extending back from one hundred to four hundred feet between the street and the buildings. Embowered in trees, and with all the delights of garden and lawn seen in every direction, this grand avenue makes a delightful driveway and promenade. Upon it live the multi-millionaires of Cleveland, the finest residences being upon the northern side, where they have invested part of the profits of their railways, mills, mines, oil wells and refineries in adorning their homes and ornamenting their city. This splendid boulevard, in one way, is a reproduction of the Parisian Avenue of the Champs Elysées and its gardens, but with more attractions in the surroundings of its bordering rows of palaces. Here live the men who vie with those of
  • 87. Chicago in controlling the commerce of the lakes and the affairs of the Northwest. Plenty of room and an abundance of income are necessary to provide each man, in the heart of the city, with two to ten acres of lawns and gardens around his house, but it is done here with eminent success. About four miles out is the beautiful Wade Park, opposite which are the handsome buildings of the Western Reserve University, having, with its adjunct institutions, a thousand students. Beyond this, the avenue ends at the attractive Lake View Cemetery, where, on the highest part of the elevated plateau, with a grand outlook over Lake Erie, is the grave of the assassinated President Garfield. His imposing memorial rises to a height of one hundred and sixty-five feet. CLEVELAND TO CHICAGO. Thirty-five miles southwest of Cleveland, and some distance inland from Lake Erie, is Oberlin, where, in a fertile and prosperous district, is the leading educational foundation of Northern Ohio—Oberlin College—named in memory of the noted French philanthropist, and established in 1833 by the descendants of the Puritan colonists, to carry out their idea of thorough equality in education. It admits students without distinction of sex or color, and has about thirteen hundred, almost equally divided between the sexes, occupying a cluster of commodious buildings. To the westward is the beautiful ravine of Black River, which gets out to the lake by falling over a rocky ledge in two streams, and on the peninsula formed by its forks is the town of Elyria. Maria Ely was the wife of the founder of the settlement, who named it after her in this peculiar reversible way. This romantic stream bounds the Fire Lands of the Western Reserve, a tract of nearly eight hundred square miles abutting on the lake shore, which Connecticut set apart for colonization by her people, who had been sufferers from destructive fires in the towns of New London, Fairfield and Norwalk on Long Island Sound. They secured this wilderness in the early part of the nineteenth century, and their chief town is Sandusky, with twenty-five thousand
  • 88. population. Here lived most of the Eries, the Indian tribe of the Cat, who fished in Sandusky Bay, its upper waters being an archipelago of little green islands abounding with water fowl. They were known to the adjoining tribes as the Neutral Nation, for they maintained two villages of refuge on Sandusky River, between the warlike Indians of the east and the west, and whoever entered their boundaries was safe from pursuit, the sanctuary being rigidly observed. The early French missionaries who found them in the seventeenth century speak of these anomalous villages among the savages as having then been long in existence. The name of Sandusky is a corruption of a Wyandot word meaning cold-water pools, the French having originally rendered it as Sandosquet. The shores are low, but there is a good harbor and much trade, and here is located the Ohio State Fish Hatchery. The railroads are laid among the savannahs and lagoons, and one of the suburban stations has been not inaptly named Venice. There are extensive vineyards on the flat and sunny shores of the bay, and this is one of the most prolific grape districts in the State. Sandusky Bay is a broad sheet of water, in places six miles wide, and about twenty miles long. Sandusky has a large timber trade, being noted for the manufacture of hard woods. Out beyond the bold peninsula, protruding into the lake at the entrance to the bay, is a group of islands spreading over the southwestern waters of Lake Erie, of which Kelly's Island is the chief, an archipelago formed largely from the detritus washed out of the Detroit, Maumee and various other rivers flowing into the head of the lake. Here the Erie Indians had a fortified stronghold, whose outlines can still be traced. The most noted of the group is Put-in-Bay Island, now a popular watering- place, which got its name from Commodore Perry, who put in there with the captured British fleet at the naval battle of Lake Erie, September 10, 1813. It was from this place, just after his victory, that he sent the historic despatch, giving him fame, We have met the enemy and they are ours. The killed of both fleets were buried side by side near the beach on the island, the place being marked by a mound. The lovely sheet of water of Put-in-Bay glistens in front,
  • 89. having the towns of villa-crowned Gibraltar Island upon its surface. Vineyards and roses abound, these islands, like the adjacent shores, being noted for their wines. The Maumee River, coming up from Fort Wayne, flows into the head of Lake Erie, the largest stream on its southern coast. It comes from the southwest through the region of the Black Swamp, a vast district, originally morass and forest, which has been drained to make a most fertile country. This miserable bog, as the original settlers denounced it, when they were jolted over the rude corduroy roads that sustained them upon the quaking morass, has since become the prolific garden and magnificent forest described by the modern tourist. The Maumee Valley was an almost continual battle-ground with the Indians when Mad Anthony Wayne commanded on that frontier, he being called by them the Wind, because he drives and tears everything before him. For a quarter of a century border warfare raged along this river, then known as the Miami of the Lakes, and its chief settlement, Toledo, passed its infancy in a baptism of blood and fire. It was at the battle of Fallen Timbers, fought in 1794, almost on the site of Toledo, that Wayne gave his laconic and noted field orders. General William Henry Harrison, then his aide, told Wayne just before the battle he was afraid he would get into the fight and forget to give the necessary field orders. Wayne replied: Perhaps I may, and if I do, recollect that the standing order for the day is, charge the rascals with the bayonets. Toledo is built on the flat surface on both sides of the Maumee River and Bay, which make it a good harbor, stretching six miles down to Lake Erie. There are a hundred thousand population here, and this energetic reproduction of the ancient Spanish city has named its chief newspaper the Toledo Blade. The city has extensive railway connections and a large trade in lumber and grain, coal and ores, and does much manufacturing, it being well served with natural gas. A dozen grain elevators line the river banks, and the factory smokes overhang the broad low-lying city like a pall. To the westward, crossing the rich lands of the reclaimed swamp, is the Indiana boundary, that State being here a broad and level prairie,
  • 90. Welcome to our website – the perfect destination for book lovers and knowledge seekers. We believe that every book holds a new world, offering opportunities for learning, discovery, and personal growth. That’s why we are dedicated to bringing you a diverse collection of books, ranging from classic literature and specialized publications to self-development guides and children's books. More than just a book-buying platform, we strive to be a bridge connecting you with timeless cultural and intellectual values. With an elegant, user-friendly interface and a smart search system, you can quickly find the books that best suit your interests. Additionally, our special promotions and home delivery services help you save time and fully enjoy the joy of reading. Join us on a journey of knowledge exploration, passion nurturing, and personal growth every day! ebookbell.com