Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn

Sample Size Calculations For Clustered And
Longitudinal Outcomes In Clinical Research Chul
Ahn download
https://ebookbell.com/product/sample-size-calculations-for-
clustered-and-longitudinal-outcomes-in-clinical-research-chul-
ahn-4946084
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Sample Size Calculations For Clustered And Longitudinal Outcomes In
Clinical Research Chul Ahn Moonseoung Heo Song Zhang
https://ebookbell.com/product/sample-size-calculations-for-clustered-
and-longitudinal-outcomes-in-clinical-research-chul-ahn-moonseoung-
heo-song-zhang-4960296
Sample Size Calculations In Clinical Research Chapman Hallcrc
Biostatistics Series 3rd Edition Chow
https://ebookbell.com/product/sample-size-calculations-in-clinical-
research-chapman-hallcrc-biostatistics-series-3rd-edition-
chow-55512002
Sample Size Calculations In Clinical Research Third Edition 3rd
Edition Sheinchung Chow
research-third-edition-3rd-edition-sheinchung-chow-6750322
Sample Size Calculations In Clinical Research Second Sheinchung Chow
research-second-sheinchung-chow-896782

Sample Size Calculations In Clinical Research 2 Rev Exp Sheinchung
Chow
research-2-rev-exp-sheinchung-chow-1357638
Methods And Applications Of Sample Size Calculation And Recalculation
In Clinical Trials 1st Ed Meinhard Kieser
https://ebookbell.com/product/methods-and-applications-of-sample-size-
calculation-and-recalculation-in-clinical-trials-1st-ed-meinhard-
kieser-22504494
Sample Size Tables For Clinical Studies 3rd Edition David Machin
https://ebookbell.com/product/sample-size-tables-for-clinical-
studies-3rd-edition-david-machin-2418960
Sample Size Tables For Clinical Studies David Machin Et Al
https://ebookbell.com/product/sample-size-tables-for-clinical-studies-
david-machin-et-al-4138216
Sample Size Determination And Power Thomas P Ryanauth
https://ebookbell.com/product/sample-size-determination-and-power-
thomas-p-ryanauth-4318598

Accurate sample size calculation ensures that clinical studies have
adequate power to detect clinically meaningful effects. This results in
the efficient use of resources and avoids exposing a disproportionate
number of patients to experimental treatments caused by an over-
powered study.
Sample Size Calculations for Clustered and Longitudinal Out-
comes in Clinical Research explains how to determine sample size
for studies with correlated outcomes, which are widely implemented
in medical, epidemiological, and behavioral studies.
The book focuses on issues specific to the two types of correlated
outcomes: longitudinal and clustered. For clustered studies, the au-
thors provide sample size formulas that accommodate variable clus-
ter sizes and within-cluster correlation. For longitudinal studies, they
present sample size formulas to account for within-subject correla-
tion among repeated measurements and various missing data pat-
terns. For multiple levels of clustering, the level at which to perform
randomization actually becomes a design parameter. The authors
show how this can greatly impact trial administration, analysis, and
sample size requirement.
Addressing the overarching theme of sample size determination for
correlated outcomes, this book provides a useful resource for bio-
statisticians, clinical investigators, epidemiologists, and social scien-
tists whose research involves trials with correlated outcomes. Each
chapter is self-contained so readers can explore topics relevant to
their research projects without having to refer to other chapters.
Statistics
K15411
w w w . c r c p r e s s . c o m
Chul Ahn
Moonseong Heo
Song Zhang
Ahn,
Heo,
and
Zhang
Sample Size
Calculations for
Clustered and
Longitudinal
Outcomes in
Clinical Research
Sample
Size
Calculations
for
Clustered
and
Longitudinal
Outcomes
in
Clinical
Research
K15411_cover.indd 1 11/4/14 10:32 AM

Sample Size Calculations
for Clustered and
Longitudinal Outcomes
in Clinical Research

Editor-in-Chief
Shein-Chung Chow, Ph.D., Professor, Department of Biostatistics and Bioinformatics,
Duke University School of Medicine, Durham, North Carolina
Series Editors
Byron Jones, Biometrical Fellow, Statistical Methodology, Integrated Information Sciences,
Novartis Pharma AG, Basel, Switzerland
Jen-pei Liu, Professor, Division of Biometry, Department of Agronomy,
National Taiwan University, Taipei, Taiwan
Karl E. Peace, Georgia Cancer Coalition, Distinguished Cancer Scholar, Senior Research Scientist
and Professor of Biostatistics, Jiann-Ping Hsu College of Public Health,
Georgia Southern University, Statesboro, Georgia
Bruce W. Turnbull, Professor, School of Operations Research and Industrial Engineering,
Cornell University, Ithaca, New York
Published Titles
Adaptive Design Methods in
Clinical Trials, Second Edition
Shein-Chung Chow and Mark Chang
Adaptive Design Theory and
Implementation Using SAS and R,
Second Edition
Mark Chang
Advanced Bayesian Methods for Medical
Test Accuracy
Lyle D. Broemeling
Advances in Clinical Trial Biostatistics
Nancy L. Geller
Applied Meta-Analysis with R
Ding-Geng (Din) Chen and Karl E. Peace
Basic Statistics and Pharmaceutical
Statistical Applications, Second Edition
James E. De Muth
Bayesian Adaptive Methods for
Clinical Trials
Scott M. Berry, Bradley P. Carlin,
J. Jack Lee, and Peter Muller
Bayesian Analysis Made Simple: An Excel
GUI for WinBUGS
Phil Woodward
Bayesian Methods for Measures of
Agreement
Lyle D. Broemeling
Bayesian Methods in Epidemiology
Lyle D. Broemeling
Bayesian Methods in Health Economics
Gianluca Baio
Bayesian Missing Data Problems: EM,
Data Augmentation and Noniterative
Computation
Ming T. Tan, Guo-Liang Tian,
and Kai Wang Ng
Bayesian Modeling in Bioinformatics
Dipak K. Dey, Samiran Ghosh,
and Bani K. Mallick
Benefit-Risk Assessment in
Pharmaceutical Research and
Development
Andreas Sashegyi, James Felli, and
Rebecca Noel
Biosimilars: Design and Analysis of
Follow-on Biologics
Shein-Chung Chow
Biostatistics: A Computing Approach
Stewart J. Anderson
Causal Analysis in Biomedicine and
Epidemiology: Based on Minimal
Sufficient Causation
Mikel Aickin
Clinical and Statistical Considerations
in Personalized Medicine
Claudio Carini, Sandeep Menon,
and Mark Chang
Clinical Trial Data Analysis using R
Ding-Geng (Din) Chen and Karl E. Peace

Clinical Trial Methodology
Karl E. Peace and Ding-Geng (Din) Chen
Computational Methods in Biomedical
Research
Ravindra Khattree and Dayanand N. Naik
Computational Pharmacokinetics
Anders Källén
Confidence Intervals for Proportions and
Related Measures of Effect Size
Robert G. Newcombe
Controversial Statistical Issues in
Clinical Trials
Shein-Chung Chow
Data and Safety Monitoring Committees
in Clinical Trials
Jay Herson
Design and Analysis of Animal Studies in
Pharmaceutical Development
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Bioavailability and
Bioequivalence Studies, Third Edition
Shein-Chung Chow and Jen-pei Liu
Design and Analysis of Bridging Studies
Jen-pei Liu, Shein-Chung Chow,
and Chin-Fu Hsiao
Design and Analysis of Clinical Trials with
Time-to-Event Endpoints
Karl E. Peace
Design and Analysis of Non-Inferiority
Trials
Mark D. Rothmann, Brian L. Wiens,
and Ivan S. F. Chan
Difference Equations with Public Health
Applications
Lemuel A. Moyé and Asha Seth Kapadia
DNA Methylation Microarrays:
Experimental Design and Statistical
Analysis
Sun-Chong Wang and Arturas Petronis
DNA Microarrays and Related Genomics
Techniques: Design, Analysis, and
Interpretation of Experiments
David B. Allison, Grier P. Page,
T. Mark Beasley, and Jode W. Edwards
Dose Finding by the Continual
Reassessment Method
Ying Kuen Cheung
Elementary Bayesian Biostatistics
Lemuel A. Moyé
Frailty Models in Survival Analysis
Andreas Wienke
Generalized Linear Models: A Bayesian
Perspective
Dipak K. Dey, Sujit K. Ghosh,
and Bani K. Mallick
Handbook of Regression and Modeling:
Applications for the Clinical and
Pharmaceutical Industries
Daryl S. Paulson
Inference Principles for Biostatisticians
Ian C. Marschner
Interval-Censored Time-to-Event Data:
Methods and Applications
Ding-Geng (Din) Chen, Jianguo Sun,
and Karl E. Peace
Joint Models for Longitudinal and Time-
to-Event Data: With Applications in R
Dimitris Rizopoulos
Measures of Interobserver Agreement
and Reliability, Second Edition
Mohamed M. Shoukri
Medical Biostatistics, Third Edition
A. Indrayan
Meta-Analysis in Medicine and Health
Policy
Dalene Stangl and Donald A. Berry
Mixed Effects Models for the Population
Approach: Models, Tasks, Methods and
Tools
Marc Lavielle
Monte Carlo Simulation for the
Pharmaceutical Industry: Concepts,
Algorithms, and Case Studies
Mark Chang
Multiple Testing Problems in
Pharmaceutical Statistics
Alex Dmitrienko, Ajit C. Tamhane,
and Frank Bretz

Noninferiority Testing in Clinical Trials:
Issues and Challenges
Tie-Hua Ng
Optimal Design for Nonlinear Response
Models
Valerii V. Fedorov and Sergei L. Leonov
Patient-Reported Outcomes:
Measurement, Implementation and
Interpretation
Joseph C. Cappelleri, Kelly H. Zou,
Andrew G. Bushmakin, Jose Ma. J. Alvir,
Demissie Alemayehu, and Tara Symonds
Quantitative Evaluation of Safety in Drug
Development: Design, Analysis and
Reporting
Qi Jiang and H. Amy Xia
Randomized Clinical Trials of
Nonpharmacological Treatments
Isabelle Boutron, Philippe Ravaud, and
David Moher
Randomized Phase II Cancer Clinical
Trials
Sin-Ho Jung
Sample Size Calculations for Clustered
and Longitudinal Outcomes in Clinical
Research
Chul Ahn, Moonseong Heo, and
Song Zhang
Sample Size Calculations in Clinical
Research, Second Edition
Shein-Chung Chow, Jun Shao
and Hansheng Wang
Statistical Analysis of Human Growth
and Development
Yin Bun Cheung
Statistical Design and Analysis of
Stability Studies
Shein-Chung Chow
Statistical Evaluation of Diagnostic
Performance: Topics in ROC Analysis
Kelly H. Zou, Aiyi Liu, Andriy Bandos,
Lucila Ohno-Machado, and Howard Rockette
Statistical Methods for Clinical Trials
Mark X. Norleans
Statistical Methods in Drug Combination
Studies
Wei Zhao and Harry Yang
Statistics in Drug Research:
Methodologies and Recent
Developments
Shein-Chung Chow and Jun Shao
Statistics in the Pharmaceutical Industry,
Third Edition
Ralph Buncher and Jia-Yeong Tsay
Survival Analysis in Medicine and
Genetics
Jialiang Li and Shuangge Ma
Theory of Drug Development
Eric B. Holmgren
Translational Medicine: Strategies and
Statistical Methods
Dennis Cosmatos and Shein-Chung Chow

Chul Ahn
University of Texas Southwestern Medical Center
Dallas, Texas, USA
Moonseong Heo
Albert Einstein College of Medicine
Bronx, New York, USA
Song Zhang
University of Texas Southwestern Medical Center
Dallas, Texas, USA
Sample Size Calculations
for Clustered and
Longitudinal Outcomes
in Clinical Research

CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2015 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Version Date: 20141029
International Standard Book Number-13: 978-1-4665-5627-0 (eBook - PDF)
This book contains information obtained from authentic and highly regarded sources. Reasonable
efforts have been made to publish reliable data and information, but the author and publisher cannot
assume responsibility for the validity of all materials or the consequences of their use. The authors and
publishers have attempted to trace the copyright holders of all material reproduced in this publication
and apologize to copyright holders if permission to publish in this form has not been obtained. If any
copyright material has not been acknowledged please write and let us know so we may rectify in any
future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information stor-
age or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copy-
right.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222
Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that pro-
vides licenses and registration for a variety of users. For organizations that have been granted a photo-
copy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are
used only for identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
and the CRC Press Web site at
http://www.crcpress.com

Contents
Preface ix
List of Figures xi
List of Tables xiii
1 Sample Size Determination for Independent Outcomes 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Precision Analysis . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Power Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Further Readings . . . . . . . . . . . . . . . . . . . . . . . . 18
2 Sample Size Determination for Clustered Outcomes 23
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.2 One–Sample Clustered Continuous Outcomes . . . . . . . . . 24
2.3 One–Sample Clustered Binary Outcomes . . . . . . . . . . . 28
2.4 Two–Sample Clustered Continuous Outcomes . . . . . . . . 34
2.5 Two–Sample Clustered Binary Outcomes . . . . . . . . . . . 38
2.6 Stratified Cluster Randomization for Binary Outcomes . . . 42
2.7 Nonparametric Approach for One–Sample Clustered Binary
Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3 Sample Size Determination for Repeated Measurement
Outcomes Using Summary Statistics 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Information Needed for Sample Size Estimation . . . . . . . 62
3.3 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . 64
4 Sample Size Determination for Correlated Outcome
Measurements Using GEE 83
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Review of GEE . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 Compare the Slope for a Continuous Outcome . . . . . . . . 90
4.4 Test the TAD for a Continuous Outcome . . . . . . . . . . . 110
4.5 Compare the Slope for a Binary Outcome . . . . . . . . . . . 119
vii

viii Contents
4.6 Test the TAD for a Binary Outcome . . . . . . . . . . . . . . 123
4.7 Compare the Slope for a Count Outcome . . . . . . . . . . . 126
4.8 Test the TAD for a Count Outcome . . . . . . . . . . . . . . 130
5 Sample Size Determination for Correlated Outcomes from
Two-Level Randomized Clinical Trials 149
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
5.2 Statistical Models for Continuous Outcomes . . . . . . . . . 150
5.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 151
5.4 Two-Level Longitudinal Designs: Testing Slope Differences . 158
5.5 Cross-Sectional Factorial Designs: Interactions between
Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 172
5.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 176
6 Sample Size Determination for Correlated Outcomes from
Three-Level Randomized Clinical Trials 187
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
6.2 Statistical Model for Continuous Outcomes . . . . . . . . . . 187
6.3 Testing Main Effects . . . . . . . . . . . . . . . . . . . . . . . 189
6.4 Testing Slope Differences . . . . . . . . . . . . . . . . . . . . 200
6.5 Cross-Sectional Factorial Designs: Interactions between
Treatments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
6.6 Longitudinal Factorial Designs: Treatment Effects on Slopes 218
6.7 Sample Sizes for Binary Outcomes . . . . . . . . . . . . . . . 223
Index 235

Preface
One of the most common questions statisticians encounter during interaction
with clinical investigators is “How many subjects do I need for this study?”
Clinicians are often surprised to find out that the required sample size depends
on a number of factors. Obtaining such information for sample size calcula-
tion is not trivial, and often involves preliminary studies, literature review,
and, more than occasionally, educated guess. The validity of clinical research
is judged not by the results but by how it is designed and conducted. Ac-
curate sample size calculation ensures that a study has adequate power to
detect clinically meaningful effects and avoids the waste in resources and the
risk of exposing excessive patients to experimental treatments caused by an
overpowered study.
In this book we focus on sample size determination for studies with cor-
related outcomes, which are widely implemented in medical, epidemiological,
and behavioral studies. Correlated outcomes are usually categorized into two
types: clustered and longitudinal. The former arises from trials where random-
ization is performed at the level of some aggregates (e.g., clinics) of research
subjects (e.g., patients). The latter arises when the outcome is measured at
multiple time points during follow-up from each subject. A key difference
between these two types is that for a clustered design, subjects within a clus-
ter are considered exchangeable, while for a longitudinal design, the multiple
measurements from the each subject are distinguished by their unique time
stamps.
Designing a randomized trial with correlated outcomes poses special chal-
lenges and opportunities for researchers. Appropriately accounting for the
correlation with different structures requires more sophisticated methodolo-
gies for analysis and sample size calculation. In practice it is also likely that
researchers might encounter correlated outcomes with a hierarchical structure.
For example, multiple levels of nested clustering (e.g., patients nested in clinics
and clinics nested in hospital systems) can occur, and such designs can be-
come more complicated if longitudinal measurements are obtained from each
subject. Missing data leads to the challenge of “partially” observed data for
clinical trials with correlated outcomes, and its impact on sample size require-
ment depends on many factors: the number of longitudinal measurements, the
structure and strength of correlation, and the distribution of missing data. On
the other hand, researchers enjoy some additional flexibility in designing ran-
domized trials with correlated outcomes. When multiple levels of clustering
are involved, the level at which to perform randomization actually becomes a
ix

x Preface
design parameter, which can greatly impact trial administration, analysis, and
sample size requirement. This issue is explored in Chapters 5 and 6. Another
example is that in longitudinal studies, to certain extent, researchers can com-
pensate the lack of unique subjects by increasing the number of measurements
from each subject, and vice versa. This feature has profound implication for
the design of clinical trials where the cost of recruiting an additional subject
is drastically different from the cost of obtaining an additional measurement
from an existing subject. It requires researchers to explore the trade-off be-
tween the number of subjects and the number of measurements per subject
in order to achieve the optimal power under a given financial constraint. We
explore this topic in Chapters 3 and 4.
The outline of this book is as follows. In Chapter 1 we review sample size
determination for independent outcomes. Advanced readers who are already
familiar with sample size problems can skip this chapter. In Chapter 2 we
explore sample size determination for variants of clustered trials, including
one- and two-sample trials, continuous and binary outcomes, stratified cluster
design, and nonparametric approaches. In Chapter 3 we review sample size
methods based on summary statistics (such as individually estimated means
or slopes) obtained from longitudinal outcomes. In Chapter 4 we present sam-
ple size determination based on GEE approaches for various types of corre-
lated outcomes, including continuous, binary, and count. The impact of miss-
ing data, correlation structures, and financial constraints is investigated. In
Chapter 5 we present sample size determination based on mixed-effects model
approaches for randomized clinical trials with two level data structure. Lon-
gitudinal and cross-sectional factorial designs are explored. In Chapter 6 we
further extend the mixed-effects model sample size approaches to scenarios
where three level data structures are involved in randomized trials.
We wish this book to serve as a useful resource for biostatisticians, clini-
cal investigators, epidemiologists, and social scientists whose research involves
randomized trials with correlated outcomes. While jointly addressing the over-
arching theme of sample size determination for correlated outcome under such
settings, individual chapters are written in a self-contained manner so that
readers can explore specific topics relevant to their research projects without
having to refer to other chapters.
We give special thanks to Dr. Mimi Y. Kim for her enthusiastic support
by providing critical reviews and suggestions, examples, edits, and corrections
throughout the chapters. Without her input, this book would have not been in
the present form. We also thank Acquisitions Editor David Grubbs for provid-
ing the opportunity to work on this book, and Production Manager Suzanne
Lassandro for her outstanding support in publishing this book. In addition,
we thank the support of the University of Texas Southwestern Medical Center
and the Albert Einstein College of Medicine.
Chul Ahn, PhD
Moonseong Heo, PhD
Song Zhang, PhD

List of Figures
1.1 Sample size estimation for a one–sided test in a one–sample
problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.1 Numerical study to explore the relationship between s2
t and
ρ, under the scenario of complete data and various values of
θ from the damped exponential family. θ = 1 corresponds to
AR(1) and θ = 0 corresponds to CS. The measurement times
are normalized such that tm − t1 = 1. Hence ρ1m = ρ under
all values of θ. . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.2 Numerical study to explore the relationship between s2
t and
ρ, under the scenario of incomplete data and various values of
θ from the damped exponential family. θ = 1 corresponds to
AR(1) and θ = 0 corresponds to CS. IM and MM represent
the independent and monotone missing pattern, respectively.
The measurement times are normalized such that tm−t1 = 1.
Hence ρ1m = ρ under all values of θ. . . . . . . . . . . . . . 97
4.3 A numerical study to explore n{m+1}
n{m} under missing data and
different correlation structures. The vertical axis is n{m+1}
n{m} .
“Complete” indicates the scenario of complete data. “IM”
and “’MM” indicate the independent and monotone missing
patterns, respectively, with marginal observant probabilities
computed by δj = 1 − 0.3 ∗ (j − 1)/(m − 1). . . . . . . . . . 101
4.4 Different trends in the marginal observant probabilities. δ1
approximately follows a linear trend. δ2 is relatively steady
initially but drops quickly afterward. δ3 drops quickly from
the beginning but plateaus. . . . . . . . . . . . . . . . . . . 109
5.1 Geometrical representations of fixed parameters in model
(5.12) for a parallel-arm longitudinal cluster randomized
trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
5.2 Geometrical representations of fixed parameters in model
(5.31) for a 2-by-2 factorial longitudinal cluster randomized
trial. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
xi

List of Tables
2.1 Proportion of infection (yi/mi) from n = 29 subjects
(clusters) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Distribution of the number of infected sites (mi) . . . . . . 33
2.3 Stepped wedge design, where C represents control and I
represents intervention . . . . . . . . . . . . . . . . . . . . . 53
4.1 Sample sizes under various scenarios . . . . . . . . . . . . . 110
5.1 Sample size and power for detecting a main effect δ(2) in
model (5.3) when randomizations occur at the second level
(two-sided significance level α = 0.05) . . . . . . . . . . . . 154
model (5.8) when randomizations occur at the first level
(two-sided significance level α = 0.05) . . . . . . . . . . . . 157
5.3 Sample size and power for detecting an effect δ(f) on slope
differences in a fixed-slope model (5.12) with rτ = 0 when
randomizations occur at the second level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 162
differences in a random-slope model (5.4.5) with rτ = 0.1
when randomizations occur at the second level (two-sided
significance level α = 0.05) . . . . . . . . . . . . . . . . . . 164
5.5 Sample size and power for detecting a main effect δ(e) at the
end of study in a fixed-slope model (5.22) when randomiza-
tions occur at the second level (two-sided significance level
α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
5.6 Sample size and power for detecting a two-way interaction
XZ effect δXZ(2) in model (5.25) for a 2-by-2 factorial design
XZ effect δXZ(1) in model (5.28) for a 2-by-2 factorial de-
sign when randomizations occur at the first level (two-sided
xiii

xiv List of Tables
5.8 Sample size and statistical power for detecting a three-way
interaction XZT effect δXZT in model (5.31) for a 2-by-2
factorial design when randomizations occur at the second
level (two-sided significance level α = 0.05) . . . . . . . . . 176
5.9 Sample size and statistical power for detecting a main effect
|p1 − p0| on binary outcome in model with m = 2 (5.34)
|p1 −p0| on binary outcome in model with m = 1 (5.34) when
randomizations occur at the first level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 181
model (6.4) when randomizations occur at the third level
with ρ2 = 0.05 (two-sided significance level α = 0.05) . . . 192
model (6.9) when randomizations occur at the second level
model (6.13) when randomizations occur at the first level
differences in a three-level fixed-slope model (6.17) with rτ =
0 when randomizations occur at the third level (two-sided
6.5 Sample size and power for detecting an effect δ(r) on slope
differences in a three-level random-slope model (6.22) with
rτ = 0.1 when randomizations occur at the third level (two-
sided significance level α = 0.05) . . . . . . . . . . . . . . . 207
6.6 Sample size and power for detecting a main effect δ(e) at the
end of study in a three-level fixed-slope model (6.28) when
randomizations occur at the third level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 211
XZ effect δXZ(3) in model with m = 3 (6.31) for a 2-by-2
factorial design when randomizations occur at the third level
XZ effect δXZ(2) in model with m = 2 (6.31) for a 2-by-
2 factorial design when randomizations occur at the second
level with ρ2 = 0.05 (two-sided significance level α = 0.05) . 216

List of Tables xv
XZ effect δXZ(1) in model with m = 1 (6.31) for a 2-by-2
factorial design when randomizations occur at the first level
6.10 Sample size and power for detecting a three-way interaction
XZT effect δXZT in model (6.38) for a 2-by-2 factorial de-
sign when randomizations occur at the third level (two-sided
randomizations occur at third level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 226
randomizations occur at second level (two-sided significance
level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . . . . 228
|p1 − p0| on binary outcome in model with m = 1 (6.41)
when randomizations occur at first level (two-sided signifi-
cance level α = 0.05) . . . . . . . . . . . . . . . . . . . . . . 231

1
Sample Size Determination for Independent
Outcomes
1.1 Introduction
One of the most common questions any statistician gets asked from clinical
investigators is “How many subjects do I need?” Researchers are often sur-
prised to find out that the required sample size depends on a number of factors
and they have to provide information to a statistician before they can get an
answer. Clinical research is judged to be valid not by the results but by how it
is designed and conducted. The cliche “do it right or do it over” is particularly
apt in clinical research.
One of the most important aspects in clinical research design is the sample
size estimation. In planning a clinical trial, it is necessary to determine the
number of subjects to be recruited for the clinical trial in order to achieve
sufficient power to detect the hypothesized effect. The ICH E9 guidance [1]
states: “The number of subjects in a clinical trial should always be large
enough to provide a reliable answer to the questions addressed. This number
is usually determined by the primary objective of the trial. If the sample size is
determined on some other basis, then this should be made clear and justified.
For example, a trial sized on the basis of safety questions or requirements
or important secondary objectives may need larger or smaller numbers of
subjects than a trial sized on the basis of the primary efficacy question.”
Sample size in clinical trials must be carefully estimated if the results are to
be credible. If the number of subjects is too small, even a well–conducted
trial will have little chance of detecting the hypothesized effect. Ideally, the
sample size should be large enough to have a high probability of detecting
a clinically important difference between treatment groups and to show it to
be statistically significant if such a difference really exists. If the number of
subjects is too large, the clinical trial will lead to statistical significance for an
effect of little clinical importance. Conversely, the clinical trial may not lead
to statistical significance despite a large difference that is clinically important
if the number of subjects is too small.
When an investigator designs a study, an investigator should consider con-
straints such as time, cost, and the number of available subjects. However,
these constraints should not dictate the sample size. There is no reason to
1

2 Sample Size Calculations for Clustered and Longitudinal Outcomes
carry out a study that is too small, only to come up with results that are
inconclusive, since an investigator will then need to carry out another study
to confirm or refute the initial results. Selecting an appropriate sample size is
a crucial step in the design of a study. A study with an insufficient sample size
may not have sufficient statistical power to detect meaningful effects and may
not produce reliable answers to important research questions. Krzywinski and
Altman [2] say that the ability to detect experimental effect is weakened in
studies that do not have sufficient power. Choosing the appropriate sample
size increases the chance of detecting a clinically meaningful effect and ensures
that the study is both ethical and cost-effective.
Sample size is usually estimated by precision analysis or power analysis.
In precision analysis, sample size is determined by the standard error or the
margin of error at a fixed significance level. The approach of precision anal-
ysis is simple and easy to estimate the sample size [3]. In power analysis,
sample size is estimated to achieve a desired power for detecting a clinically
or scientifically meaningful difference at a fixed type I error rate. Power anal-
ysis is the most commonly used method for sample size estimation in clinical
research. The sample size calculation requires assumptions that typically can-
not be tested until the data have been collected from the trial. Sample size
calculations are thus inherently hypothetical.
1.2 Precision Analysis
Sample size estimation is needed for the study in which the goal is to estimate
the unknown parameter with a certain degree of precision. Thus, some key
decisions in planning a study are “How precise will the parameter estimate
be if I select a particular sample size?” and “How large a sample size do I
need to attain a desirable level of precision?” What we are essentially saying
is that we want the confidence interval to be of a certain width, in which the
100(1−α)% confidence level reflects the probability of including the true (but
unknown) value of the parameter. Since the precision is determined by the
width of the confidence interval, the goal of precision analysis is to determine
the sample size that allows the confidence interval to be within a pre-specified
width. The narrower the confidence interval is, the more precise the parameter
inference is. Confidence interval estimation provides a convenient alternative
to significance testing in most situations. The confidence interval approach
is equivalent to the method of hypothesis testing. That is, if the confidence
interval does not include the parameter value under the null hypothesis, the
null hypothesis is rejected at a two–sided significance level of α. For example,
consider the hypothesis of no difference between means (µ1 and µ2). The
method of hypothesis testing rejects the hypothesis H0 : µ1 − µ2 = 0 at
the two–sided significance level of α if and only if the 100(1 − α)% confidence

Sample Size Determination for Independent Outcomes 3
interval for the mean difference (µ1−µ2) does not include the value zero. Thus,
the significance test can be performed with the confidence interval approach.
1.2.1 Continuous Outcomes
Suppose that Y1, . . . , Yn are independent and identically distributed normal
random variables with mean µ and variance σ2
. The parameter µ can be
estimated by the sample mean ȳ =
Pn
i=1 Yi. When σ2
is known, the 100(1 −
α)% confidence interval is
ȳ ± z1−α/2
σ
√
n
,
where z1−α/2 is the 100(1−α/2)th percentile of the standard normal distribu-
tion. Note that the sample size estimate based on precision analysis depends
on the type I error rate, not on the type II error rate. The maximum half
width of the confidence interval is called the maximum error of an estimate
of the unknown parameter. Suppose that the maximum error of µ is δ. Then,
the required minimum sample size is the smallest integer that is greater than
or equal to n solved from the following equation:
z1−α/2
σ
√
n
= δ.
Thus, the required sample size is the smallest integer that is greater than or
equal to n:
n =
z2
1−α/2σ2
δ2
. (1.1)
From Equation (1.1), we can obtain the required sample size once the
maximum error or the width of the 100(1 − α)% confidence interval of µ is
specified.
1.2.1.1 Example
Suppose that a clinical investigator is interested in estimating how much re-
duction will be made on the fasting serum–cholesterol level with administra-
tion of a new cholesterol–lowering drug for 6 months among recent Hispanic
immigrants with a given degree of precision. Suppose that the standard de-
viation (σ) for reduction in cholesterol level equals 40 mg/dl. We would like
to estimate the minimum sample size needed to estimate the reduction in
fasting serum–cholesterol level if we require that the 95% confidence interval
for reduction in cholesterol level is no wider than 20 mg/dl. The 100(1 − α)%
confidence interval for true reduction in fasting serum–cholesterol level is
ȳ ± z1−α/2
σ
√
n
,
where ȳ is the mean change in fasting serum–cholesterol level after adminis-
tration of a drug, and z1−α/2 is the 100(1 − α/2)th percentile of the standard

normal distribution. The width of a 95% confidence interval is
2 · z1−α/2
σ
√
n
= 2 · 1.96 ·
40
√
n
.
We want the width of the 95% confidence interval to be no wider than 20
mg/dl. The required sample size is the smallest integer satisfying n ≥ 4 ·
(1.96)2
(40)2
/(20)2
= 61.5. In order for a 95% confidence interval of reduction
in cholesterol level to be no wider than 20 mg/dl, we need at least 62 subjects
when the standard deviation for reduction in cholesterol level equals to 40
mg/dl.
1.2.2 Binary Outcomes
The study goal may be based on finding a suitably narrow confidence interval
for the statistics of interest at a given significance level (α), where the signif-
icance level is usually considered as the maximum probability of type I error
that can be tolerated. We may want to know how many subjects are required
for the 100(1 − α)% confidence interval to be a certain width.
Suppose that Y1, . . . , Yn are independent and identically distributed
Bernoulli random variables with mean p = E(Yi), (i = 1, . . . , n). The param-
eter p can be estimated by the sample mean p̂ =
Pn
i=1 Yi/n. For large n, p̂ is
asymptotically normal with mean p and variance p(1−p)/n. The 100(1−α)%
confidence interval for p is
p̂ ± z1−α/2
r
p̂(1 − p̂)
n
.
Suppose that the maximum error of p is δ. Then, the sample size can be
estimated by
z1−α/2
r
p̂(1 − p̂)
n
= δ.
Thus, the required sample size is
n =
z2
1−α/2p̂(1 − p̂)
δ2
. (1.2)
We can estimate the sample size from Equation (1.2) once the maximum error
or the width of the 100(1 − α)% confidence interval for p is specified. There
are a number of alternative ways to estimate the confidence interval for a
binomial proportion [4].
1.2.2.1 Example
Suppose that a clinical investigator is interested in conducting a clinical trial
with a new cancer drug to estimate the response rate with a maximum er-
ror of 20%. In oncology, the response rate (RR) is generally defined as the

proportion of patients whose tumor completely disappears (termed a complete
response, CR) or shrinks more than 50% after treatment (termed a partial re-
sponse, PR). In simpler terms, RR = PR + CR. An investigator expects the
response rate of a new cancer drug to be 30%. How many patients are needed
to achieve a maximum error of 20%? Let p̂ be the estimate of the response
rate. The maximum error of the response rate is z1−α/2
p
p̂(1 − p̂)/n. With
the guessed value of p̂ = 0.3, a maximum error of p is z1−α/2
p
0.3 · 0.7/n.
Thus, we need z1−α/2
p
0.3 · 0.7/n ≤ 0.2, or n ≥ 21. That is, we need at least
21 subjects to obtain a maximum error ≤ 20%. When we do not know the
value of p, a conservative approach is to use p̂ that yields the maximum error.
The maximum error of p occurs when p̂ = 0.5. So, a conservative maximum
error of p is z1−α/2
p
0.5 · 0.5/n = z1−α/20.5/
√
n. Thus, 1.96 · 0.5/
√
n ≤ 0.2
at a 5% significance level. Therefore, the required sample size is n = 25. An
investigator should recruit at least 25 subjects to achieve a maximum error of
20% in the response rate estimation.
The larger the sample size, the more precise the estimate of the parameter
will be if all the other factors are equal. An investigator should specify what
degree of precision is aimed for the study. A trial will take more cost and time
as the size of a trial increases. In order to estimate the sample size using preci-
sion analysis, we need to decide how large the maximum error of the unknown
parameter is or how wide the confidence interval for the unknown parameter
is, and we need to know the formula for the relevant maximum error.
1.3 Power Analysis
Power analysis uses two types of errors (type I and II errors) for sample size
estimation while precision analysis uses only one type of error (type I error)
for sample size estimation. Power analysis tests the null hypothesis at a pre-
determined level of significance with a desired power.
1.3.1 Information Needed for Power Analysis
A clinical trial that is conducted without attention to sample size or power
information takes the risks of either failing to detect clinically meaningful
differences (i.e., type II error) or using an unnecessarily excessive number of
subjects for a study. Either case fails to adhere to the Ethical Guidelines of
the American Statistical Association which says, “Avoid the use of excessive
or inadequate number of research subjects by making informed recommen-
dations for study size” [5]. The sample size estimate is important for eco-
nomic and ethical reasons [6]. An oversized clinical trial exposes more than
necessary number of subjects to a potentially harmful trial, and uses more re-
sources than necessary. An undersized clinical trial exposes the subjects to a

potentially harmful trial and leads to a waste of resources without producing
useful results. The sample size estimate will allow the estimation of total cost
of the proposed study. While the exact final number that will be used for anal-
ysis will be unknown due to missing information such as lack of demographic
information and clinical information, it is still desirable to determine a target
sample size based on the proposed study design. In this section, we describe
the general information needed to estimate the sample size for the trial.
1. Choose the primary endpoint
The primary endpoint should be chosen so that the primary objective of the
trial can be assessed, and the primary endpoint is generally used for sample
size estimation. Primary endpoint measures the outcome that will answer the
primary question being asked by a trial. Suppose that the primary hypothesis
is to test whether the new cancer drug yields longer overall survival than the
standard cancer drug. In this case, the primary endpoint is overall survival.
The sample size for a trial is determined by the power needed to detect a
clinically meaningful difference in overall survival at a given significance level.
The secondary hypothesis is to investigate other relevant questions from the
same trial. For example, the secondary hypothesis is to test whether the new
cancer drug produces better quality of life than the standard cancer drug, or
whether the new cancer drug yields longer progression–free survival than the
standard cancer drug.
The sample size calculation depends on the type of primary endpoint. The
variable type of the primary outcome must be defined before sample size and
power calculations can be conducted. The variable type may be continuous,
categorical, ordinal, or survival. Categorical variables may have only two cat-
egories or more than two categories.
• A quantitative (or continuous) outcome representing a specific measure (e.g.,
total cholesterol, quality of life, or blood pressure). Mean and median can
be used to compare the primary endpoint between treatment groups.
• A binary outcome indicating occurrence of an event (e.g., the occurrence of
myocardial infarction, or the occurrence of recurrent disease). Odds ratio,
risk difference, and risk ratio can be used to compare the primary endpoint
between treatment groups.
• Survival outcome for the time to occurrence of an event of interest (e.g., the
time from study entry to death, or time to progression). A Kaplan–Meier
survival curve is often used to graphically display the time to the event, and
log–rank test or Cox regression analysis is frequently used to test if there is
a significant difference in the treatment effect between treatment groups.
2. Determine the hypothesis of interest
The primary purpose of a clinical trial is to address a scientific hypothesis,
which is usually related to the evaluation of the efficacy and safety of a drug

product. To address a hypothesis, different statistical methods are used de-
pending on the type of question to be answered. Most often the hypothesis is
related to the effect of one treatment as compared to another. For example,
one trial could compare the effectiveness of a new drug to that of a standard
drug. Yet the specific comparison to be performed will depend on the hypoth-
esis to be addressed. Let µ1 and µ2 be the mean responses of a new drug and
a standard drug, respectively.
• A superiority test is designed to detect a meaningful difference in mean
response between a standard drug and a new drug [7]. The primary objective
is to show that the mean response of a new drug is different from that of a
standard drug.
H0 : µ1 = µ2 versus H1 : µ1 6= µ2
The null hypothesis (H0) says that the two drugs are not different with
respect to the mean response (µ1 = µ2). The alternative hypothesis (H1)
says that the two drugs are different with respect to the mean response
(µ1 6= µ2). The statistical test is a two–sided test since there are two chances
of rejecting the null hypothesis (µ1 > µ2 or µ1 < µ2) with each side allocated
an equal amount of the type I error of α/2.
If the alternative hypothesis is µ1 > µ2 or µ1 < µ2 instead of µ1 6= µ2, then
the statistical test is referred to as a one–sided test since there is only one
chance of rejecting the null hypothesis with one side allocated the type I
error of α.
• An equivalence test is designed to confirm the absence of a meaningful dif-
ference between a standard drug and a new drug. The primary objective is
to show that the mean responses to two drugs differ by an amount that is
clinically unimportant. This is usually demonstrated by showing that the
absolute difference in mean responses between drugs is likely to lie within
an equivalence margin (∆) of clinically acceptable differences.
H0 : |µ1 − µ2| ≥ ∆ versus H1 : |µ1 − µ2| < ∆
The null hypothesis (H0) says that the two drugs are different with respect
to the mean response (|µ1 − µ2| ≥ ∆). The alternative hypothesis (H1)
says that the two drugs are not different with respect to the mean response
(|µ1 − µ2| < ∆). In an equivalence test, an investigator wants to test if
the difference between a new drug and a standard drug is of no clinical
importance. This is to test for equivalence of two drugs.
The null hypothesis is expressed as a union (µ1 − µ2 ≥ ∆ or µ1 − µ2 ≤ −∆)
and the alternative hypothesis (H1) as an intersection (−∆ < µ1 − µ2 <
∆). Each component of the null hypothesis needs to rejected to conclude
equivalence.

• A non–inferiority test is designed to show that a new drug is not less effective
than a standard drug by more than ∆, the margin of non–inferiority. The
null and alternative hypotheses can be specified as:
H0 : µ1 − µ2 ≤ −∆ versus H1 : µ1 − µ2 > −∆
The null hypothesis (H0) says that a new drug is inferior to a standard drug
with respect to the mean response. The alternative hypothesis (H1) says
that a new drug is non–inferior to a standard drug with respect to the mean
response. That is, the alternative hypothesis of non–inferiority trial states
that a standard drug may indeed be more effective than a new drug, but
no more than ∆. In phase III clinical trials that compare a new drug with
a standard drug, non–inferiority trials are more common than equivalence
trials since it is only the non–inferiority limit that is usually of interest. This
is to test for non–inferiority of the new drug.
Choice of hypothesis depends on which scientific question an investigator is
trying to answer. All the above hypothesis tests are useful in the development
of drugs. In comparison studies with a standard drug, a non–inferiority trial is
used to demonstrate that a new drug provides at least the same benefit to the
subject as a standard drug. Non–inferiority trials are commonly used when a
new drug is easier to administer, less expensive, and less toxic than a standard
drug. Equivalence trials are used to show that a new drug is identical (within
an acceptable range) to a standard drug. This is used in the registration and
approval of biosimilar drugs that are shown to be equivalent to their branded
reference drugs [8]. Most equivalence trials are bioequivalence trials that aim
to compare a generic drug with the original branded reference drug.
3. Determine ∆
Sample size calculation depends on the hypothesis of interest. For a superiority
test, the necessary sample size depends on the clinically meaningful difference
(∆). In superiority trials, fewer subjects will be needed for a larger value of
∆ while more subjects will be needed for a smaller value of ∆. For instance,
we can detect a 40% difference in efficacy with a modest number of subjects.
However, a larger number of subjects will be needed to reliably detect a 10%
difference in efficacy. Because sample size is inversely related to the square of
∆, even the slightly misspecified difference can lead to a large change in the
sample size. Clinically meaningful differences are commonly specified using
one of two approaches. One is to select the drug effect deemed important to
detect, and the other is to calculate the sample size according to the best
guess concerning the true effect of drug [9].
For an equivalence test, the required sample size depends on the margin of
clinical equivalence. In an equivalence test, the equivalence margin of clinically
acceptable difference (∆) depends on the disease being studied. For example,

an absolute difference of 1% is often used as the clinically meaningful differ-
ence in thrombolytic trials while a 20% difference is considered as clinically
meaningful in most other situations including migraine headache [10]. Bioe-
quivalence trials aim to show the equivalent pharmacokinetic profile through
the most commonly used pharmacokinetic variables such as area under the
curve (AUC) and maximum concentration(Cmax). Average bioequivalence is
widely used for comparison of a generic drug with the original branded drug.
The 80/125 rule is currently used as regulation for the assessment of average
bioequivalence [11]. For average bioequivalence, the FDA [11] recommends
that the geometric means ratio between the test drug and the reference drug
is within 80% and 125% for the bioavailability measures (AUC and Cmax).
For a non–inferiority test, the necessary sample size depends on the up-
per bound for non–inferiority. Setting the non–inferiority margin is a major
issue in designing a non–inferiority trial. The Food and Drug Administration
[12] and the European Medicines Agency [13] issued guidances on the choice
of non–inferiority margin. The choice of the non–inferiority margin needs to
take account of both statistical reasoning and clinical judgement. An appro-
priate selection of non–inferiority margin should provide assurance that a new
drug has a clinically relevant superiority over placebo, and a new drug is not
substantially inferior to a standard drug, which results in a tighter margin.
The clinically or scientifically meaningful margin (∆) needs to be specified
to estimate the number of subjects for the trial since the purpose of the sample
size estimation is to provide sufficient power to reject the null hypothesis when
the alternative hypothesis is true.
In this book, we restrict the sample size estimation to a superiority test,
which is most commonly used in clinical trials. Julious [7, 14, 15] and Chow
et al. [3] provided general sample size formulas for equivalence trials and non–
inferiority trials.
4. Determine the variance of the primary endpoint
The variance of the primary endpoint is usually unknown in advance. In cross-
sectional studies, the variance or the standard deviation is generally obtained
from either previous studies or pilot studies. However, for correlated outcomes
such as clustered outcomes or repeated measurement outcomes, the variance of
the primary endpoint generally needs to be estimated utilizing various sources
of information such as missing proportion, correlation among measurements,
and the number of measurements, etc. Detailed description of the estimation
of the variance for correlated outcomes will be given in later chapters. A large
variance will lead to a large sample size for a study. That is, as the variance
increases, the sample size increases.
5. Choose type I error and power
Type I error (α) is the probability of rejecting the null hypothesis when the
null hypothesis is actually true. Type II error (β) is the probability of not
rejecting the null hypothesis when it is actually false. The aim of the sample

size calculation is to estimate the minimal sample size required to meet the
objectives of the study for a fixed probability of type I error to achieve a desired
power, which is defined as 1 − β. The power is the probability of rejecting the
null hypothesis when it is actually false. A two–sided type I error of 5% is
commonly used to reflect a 95% confidence interval for an unknown parameter,
and this is familiar to most investigators as the conventional benchmark of
5%. As α decreases, the sample size increases. For example, a study with α
level of 0.01 requires more sample size than a study with α level of 0.05.
Typically, the sample size is computed to provide a fixed level of power
under a specified alternative hypothesis. The alternative hypothesis usually
represents a minimal clinically or scientifically meaningful difference in efficacy
between treatment groups. Power (1 − β) is an important consideration in
sample size determination. Low power can cause a true difference in a clinical
outcome between study groups to go undetected. However, too much power
may make results statistically significant when results do not show a clinically
meaningful difference.
When there is a large difference such as a 100% real difference in thera-
peutic efficacy between a standard drug and a new drug, it is unlikely to be
missed by most studies. That is, type II error (β) is small when there is a large
difference in therapeutic efficacy. However, type II error is a common problem
in studies that aim to distinguish between a standard drug and a new drug
that may differ in therapeutic efficacy by only a small amount such as 1% or
5%. The number of subjects must be drastically increased to reduce type II
error when the aim is to discriminate a small difference between a standard
drug and a new drug. Otherwise, there is a high chance of incorrectly over-
looking small differences in therapeutic efficacy with an insufficient number of
subjects. Type II error (β) of 10% or 20% is commonly used for sample size
estimation. That is, the power (1 − β) of 80% or 90% is widely used for the
design of the study. The higher the power, the less likely the risk of type II
error. The power increases as the sample size increases. A sufficient sample
size ensures that the study is able to reliably detect a true difference, and not
underpowered.
6. Select a statistical method for data analysis
A statistical method for sample size estimation should adequately align with
the statistical method for data analysis [16]. For example, an investigator
would like to test whether there is a significant difference in total cholesterol
levels between those who take a new drug and who take a standard drug.
The investigator plans to analyze the data using a two–sample t–test. In this
case, a sample size calculation based on a two-group chi–square test with
dichotomization of total cholesterol levels would be inappropriate since the
statistical method used for power analysis is different from that to be used
for data analysis. Discrepancy between the statistical method for sample size
estimation and the statistical method for data analysis can lead to a sample

size that is too large or too small. The statistical method used for sample size
calculation should be the same as that used for data analysis.
1.3.2 One–Sample Test for Means
We illustrate the sample size calculation using a one–sided test through an
example. Suppose that the total cholesterol levels for male college students are
normally distributed with a mean (µ) of 180 mg/dl and a standard deviation
(σ) of 80 mg/dl. Suppose that an investigator would like to examine whether
the mean total cholesterol level of the physically inactive male college students
is higher than 180 mg/dl using a one–sided 5% significance level (α). That is,
an investigator would like to test the hypotheses: H0 : µ = µ0 = 180 mg/dl
(or µ ≤ 180 mg/dl) versus H1 : µ > 180 mg/dl assuming that the standard
deviation of the total cholesterol level is the same as that of male college
students. An investigator wants to risk a 10% chance (90% power) of failing
to reject the null hypothesis when the true mean (µ1) of the total cholesterol
level is as large as 210 mg/dl. How many subjects are needed to detect 30
mg/dl difference in total cholesterol level from the population mean of 180
mg/dl at a one–sided 5% significance level and a power of 90%?
For α = 0.05, we would reject the null hypothesis (H0) if the average total
cholesterol level is greater than the critical value (C) in Figure 1.1, where
C = µ0 + z1−α · σ/
√
n = 180 + 1.645 · 80/
√
n. If the true mean is 210 mg/dl
with a power of 90% (β = 0.1), we would not reject the null hypothesis when
the sample average is less than C = µ1 + zβ · σ/
√
n = 210 − 1.282 · 80/
√
n.
The sample size (n) can be estimated by setting two equations equal to each
other:
180 + 1.645 · 80/
√
n = 210 − 1.282 · 80/
√
n.
Therefore, the required number of subjects is
n =
(1.645 + 1.282)2
· 802
(180 − 210)2
= 61.
In general, the estimated sample size for a one–sided test for testing H0 :
µ = µ0 versus H1 : µ > µ1 with a significance level of α and a power of 1 − β
is the smallest integer that is larger than or equal to n satisfying the following
equation
n =
(z1−α + z1−β)2
σ2
(µ0 − µ1)2
. (1.3)
We will show how the sample size can be estimated for a two–sided one–
sample test. Let n be the number of subjects. Let Yi denote the response for
subject i, (i = 1, . . . , n), and ȳ be the sample mean. We assume that Y 0
i s are
independent and normally distributed random variables with mean µ0 and
variance σ2
. Suppose that we want to test the hypotheses H0 : µ = µ0 versus
H1 : µ = µ1 6= µ0.

FIGURE 1.1
Sample size estimation for a one–sided test in a one–sample problem
When σ2
is known, we reject the null hypothesis at the significance level
α if
ȳ − µ0
σ/
√
n
> z1−α/2,
where z1−α/2 is the 100(1 − α/2)th percentile of the standard normal distri-
bution. Under the alternative hypothesis (H1 : µ = µ1), the power is given
by
Φ
√
n(µ1 − µ0)
σ
− z1−α/2

+ Φ

−
√
n(µ1 − µ0)
σ
− z1−α/2

,
where Φ is the cumulative standard normal distribution function. By ignor-
ing the small value of the second term in the above equation, the power is
approximated by the first term. Thus, the sample size required to achieve the
power of 1 − β can be obtained by solving the following equation
√
n(µ1 − µ0)
σ
− z1−α/2 = z1−β.
The required sample size is the smallest integer that is larger than or equal
to n satisfying the following equation
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
. (1.4)

If the population variance σ2
is unknown, σ2
can be estimated by the
sample variance s2
=
Pn
i=1(yi − ȳ)2
/(n − 1), which is an unbiased estimator
of σ2
. For large n, we reject the null hypothesis H0 : µ = µ0 at the significance
level α if
ȳ − µ0
s/
√
n
z1−α/2.
Therefore, the sample size estimates for a one–sided test and a two–sided test
can be obtained by replacing σ2
by s2
in Equations (1.3) and (1.4).
1.3.2.1 Example
Consider the design of a single-arm psychiatric study that evaluates the effect
of a test drug on cognitive functioning of children with mental retardation
before and after administration of a test drug. A pilot study shows that the
mean difference in cognitive functioning before and after taking a test drug
was 6 with a standard deviation equal to 9. We would like to estimate the
sample size needed to detect the mean difference of 6 in cognitive functioning
to achieve 80% power at a two–sided 5% significance level assuming a stan-
dard deviation of 9. Let µ denote the mean difference in cognitive functioning
between pre- and post-drug administration. The null hypothesis H0 : µ = 0
is to be tested against the alternative hypothesis H1 : µ = 6. From Equa-
tion (1.4), n = (1.960 + 0.842)2
· 92
/62
= 17.7. Therefore, a sample size of
18 subjects is needed to detect a change in mean difference of 6 in cognitive
functioning, assuming a standard deviation of 9 using a normal approximation
with a two–sided significance level of 5% and a power of 80%.
1.3.2.2 Example
Concerning the effect of a test drug on systolic blood pressure before and
after the treatment, a pilot study shows that the mean systolic blood pressure
changes after a 4–month administration of a test drug was 15 mm Hg with a
standard deviation of 40 mm Hg. We would like to estimate the sample size
needed to detect 15 mm Hg in systolic blood pressure to achieve 80% power at
a two–sided 5% significance level assuming the standard deviation of 40 mm
Hg. From Equation (1.4), n = (1.960 + 0.842)2
· 402
/152
= 55.8. Therefore, a
sample size of 56 subjects will have 80% power to detect a change in mean
of 15 mm Hg in systolic blood pressure, assuming a standard deviation of 40
mm Hg at a two–sided 5% significance level.
1.3.3 One–Sample Test for Proportions
Let Yi denote a binary response variable of the ith subject with p = E(Yi),
(i = 1, . . . , n), where n is the number of subjects in the trial. For example, Yi
can denote the response or non–response in cancer clinical trials, where Yi = 0
denotes non–response, and Yi = 1 denotes response, which includes either
complete response or partial response. The response rate can be estimated by

the observed proportion p̂ =
Pn
i=1 Yi/n, where n is the number of subjects.
We illustrate the sample size calculation using the one–sided test. Suppose we
wish to test the null hypothesis H0 : p = p0 versus the alternative hypothesis
H1 : p = p1 p0 at the one–sided significance level of α. Under the null
hypothesis, the test statistic
Z =
p̂ − p0
p
p̂(1 − p̂)/n
approximately has a standard normal distribution for large n. We reject the
null hypothesis at a significance level α if the test statistic Z is greater than
z1−α.
For α = 0.05, we would reject the null hypothesis (H0) if the aver-
age response rate is greater than the critical value (C), where C = p0 +
z1−α
p
p0(1 − p0)/n. If the alternative hypothesis is true, that is, if the true
response rate is p1, we would not reject the null hypothesis if the response
rate is less than C = p1 + zβ
p
p1(1 − p1)/n.
By setting the two equations equal, we get
p0 + z1−α
p
p0(1 − p0)/n = p1 + zβ
p
p1(1 − p1)/n.
The required sample size to test H0 : p = p0 versus H1 : p = p1 p0 at a
one–sided significance level of α and a power of 1 − β is
n =
(z1−α
p
p0(1 − p0) + z1−β
p
p1(1 − p1))2
(p1 − p0)2
.
The sample size for a two–sided test H0 : p = p0 versus H1 : p = p1 for p1 6= p0
can be obtained by replacing z1−α by z1−α/2 as shown in a one–sample test
for means:
n =
(z1−α/2
p
p0(1 − p0) + z1−β
p
p1(1 − p1))2
(p1 − p0)2
. (1.5)
1.3.3.1 Example
Consider the design of a single-arm oncology clinical trial that evaluates if a
new molecular therapy has at least a 40% response rate. Let p be the response
rate of a new molecular therapy. We would like to estimate the sample size
needed to test the null hypothesis H0 : p = p0 = 0.20 against the alternative
hypothesis H1 : p = p1 6= p0. The trial is designed based on a two–sided test
that achieves 80% power at p = p1 = 0.40 with a two–sided 5% significance
level. From Equation (1.5),
n =
(1.96
p
0.2(1 − 0.2) + 0.842
p
0.4(1 − 0.4))2
(0.4 − 0.2)2
= 35.8.
The required number of subjects is 36 to detect the difference between the
null hypothesis proportion of 0.2 and the alternative proportion of 0.4 at a
two–sided significance level of 5% and a power of 80%.

1.3.4 Two–Sample Test for Means
Suppose that Y1i, (i = 1, ..., n1) and Y2i, (i = 1, ..., n2) represent observations
from groups 1 and 2, and Y1i and Y2i are independent and normally distributed
with means µ1 and µ2 and variances σ2
1 and σ2
2, respectively. Let’s consider
a one–sided test. Suppose that we want to test the hypotheses H0 : µ1 = µ2
versus H1 : µ1 µ2.
Let ȳ1 and ȳ2 be the sample means of Y1i and Y2i. Assume that the vari-
ances σ2
1 and σ2
2 are known, and n1 = n2 = n. Then, the Z–test statistic can
be written as
Z =
ȳ1 − ȳ2
p
σ2
1/n + σ2
2/n
.
Under the null hypothesis (H0), the test statistic Z is normally distributed
with mean 0 and variance 1. Thus, we reject the null hypothesis if Z z1−α.
Under the alternative hypothesis (H1), let µ1 −µ2 = ∆, which is the clinically
meaningful difference to be detected. Then, under the alternative hypothesis
(H1), the expected value of (ȳ1−ȳ2) is ∆, and Z follows the normal distribution
with mean µ∗
and variance 1, where µ∗
= ∆/
p
σ2
1/n + σ2
2/n.
Under the null hypothesis (H0),
P{Z z1−α|H0} α.
Similarly, under the alternative hypothesis (H1),
P{Z z1−α|H1} 1 − β.
That is,
P{
ȳ1 − ȳ2
p
σ2
1/n + σ2
2/n
z1−α|H1} 1 − β.
Under the alternative hypothesis, the expected value of (ȳ1 − ȳ2) is ∆. Thus,
P{
(ȳ1 − ȳ2) − ∆
p
σ2
1/n + σ2
2/n
z1−α −
∆
p
σ2
1/n + σ2
2/n
|H1} 1 − β.
The above equation can be written as follows due to the symmetry of the
normal distribution:
z1−α −
∆
p
σ2
1/n + σ2
2/n
= zβ = −z1−β.
The simple manipulation yields the required sample size per group assuming
equal allocation of subjects in each group,
n =
(σ2
1 + σ2
2)(z1−α + z1−β)2
∆2
.

If σ2
1 = σ2
2 = σ2
, then the required sample size per group is
n =
2σ2
(z1−α + z1−β)2
∆2
. (1.6)
In some randomized clinical trials, more subjects are assigned to the treat-
ment group than to the control group to encourage participation of subjects
in a trial due to their higher chance of being randomized to the treatment
group than the control group. Let n1 = n be the number of subjects in the
control group and n2 = kn be the number of subjects in the treatment group.
Then, the sample size for the study will be
n1 = n = (1 + 1/k)σ2 (z1−α + z1−β)2
∆2
. (1.7)
The total sample size for the trial is n1 +n2. The relative sample size required
to maintain the power and type I error rate of a trial against the trial with
an equal number of subjects in each group is (2 + k + 1/k)/4. For example, in
a trial that randomizes subjects in a 2:1 ratio requires a 12.5% larger sample
size in order to maintain the same power as a trial with a 1:1 randomization.
The sample size needed to detect the difference in means between two
groups with a two–sided test can be obtained by replacing z1−α by z1−α/2 as
shown in a one–sample test for means:
n1 = n = (1 + 1/k)σ2 (z1−α/2 + z1−β)2
∆2
. (1.8)
If the population variance σ2
is unknown, σ2
can be estimated by the
sample pooled variance s2
= {
Pn1
i=1(y1i −ȳ1)2
+
Pn2
i=1(y2i −ȳ2)2
}/(n1 +n2 −2),
which is an unbiased estimator of σ2
. For large n1 and n2, we reject the null
hypothesis H0 : µ1 = µ2 against the alternative hypothesis H1 : µ1 6= µ2 at
the significance level α if the absolute value of the test statistic Z is greater
than z1−α/2.
Z =
ȳ1 − ȳ2
s
q
1
n1
+ 1
n2
.
If n1 = n and n2 = kn, the Z test statistic becomes
Z =
ȳ1 − ȳ2
s
q
k+1
kn
.
Therefore, the sample size estimates for a one–sided test and a two–sided
test can be obtained by replacing σ2
by s2
in Equations (1.7) and (1.8).
1.3.4.1 Example
In a prior randomized clinical trial [17] investigating the effect of propranolol
versus no propranolol in geriatric patients with New York Heart Association

functional class II or III congestive heart failure (CHF), the changes in mean
left ventricular ejection fraction (LVEF) from baseline to 1 year after treat-
ment were 6% and 2% for propranolol and no propranolol groups, respectively.
We will conduct a two–arm randomized clinical trial with a placebo and a new
beta blocker drug to investigate if patients taking propranolol significantly im-
prove LVEF after 1 year compared with patients taking placebo. We assume
the similar increase in LVEF as in the prior study and a common standard
deviation of 8% in changes in LVEF from baseline to 1 year after treatment.
How many subjects are needed to test the superiority of a new drug in im-
proving LVEF over placebo with a two–sided 5% significance level and 80%
power? The required sample size is
n =
2σ2
(z1−α/2 + z1−β)2
∆2
= 2 · 82
· (1.960 + 0.842)2
/42
= 62.8.
The required sample size is 63 subjects per group.
1.3.5 Two–Sample Test for Proportions
In a randomized clinical trial subjects are randomly assigned to one of two
treatment groups. Let Yij be the binary random variable (Yij = 1 for response,
0 for no response) of the jth subject in the ith treatment, j = 1, . . . , ni, and
i = 1, 2. We assume that Y 0
ijs are independent and identically distributed
with E(Yij) = pi for a fixed i. The response rate pi is usually estimated by
the observed proportion in the ith treatment group:
p̂i =
ni
X
j=1
Yij/ni.
Let p1 and p2 be the response rates of control and treatment arms, respec-
tively. The sample sizes are n1 and n2 in each treatment group, respectively.
Suppose that an investigator wants to test whether there is a difference in
the response rates between control and treatment arms. The null (H0) and
alternative (H1) hypotheses are:
H0: The response rates are equal (p1 = p2).
H1: The response rates are different (p1 6= p2).
We reject the null hypothesis H0 : p1 = p2 at the significance level of α if
p̂1 − p̂2
p
p̂1(1 − p̂1)/n1 + p̂2(1 − p̂2)/n2
z1−α/2.
Under the alternative hypothesis, the power of the test is approximated
by
Φ
|p1 − p2|
p
p1(1 − p1)/n1 + p2(1 − p2)/n2
− z1−α/2
!
.

The sample size estimate needed to achieve a power of 1 − β can be obtained
by solving the following equation:
|p1 − p2|
p
p1(1 − p1)/n1 + p2(1 − p2)/n2
− z1−α/2 = z1−β.
When n2 = k · n1, n1 can be written as
n1 =
(z1−α/2 + z1−β)2
(p1 − p2)2
[p1(1 − p1) + p2(1 − p2)/k] .
Under equal allocation, n1 = n2 = n, the required sample size per group is
n1 = n2 = n =
(z1−α/2 + z1−β)2
(p1 − p2)2
[p1(1 − p1) + p2(1 − p2)] .
1.4 Further Readings
Sample size calculation is an important issue in the experimental design of
biomedical research. The sample size formulas presented in this chapter are
based on asymptotic approximation and superiority trials. Closed–form sam-
ple size estimates for independent outcomes can be obtained using normal
approximation for equivalence trials, cross–over trials, non–inferiority trials,
and bioequivalence trials [14]. In some clinical trials such as phase II cancer
clinical trials [18], sample sizes are usually small. Therefore, the sample size
calculation based on asymptotic approximation would not be appropriate for
clinical trials with a small number of subjects. The small sample sizes for
typical phase II clinical trials imply the need for the use of exact statistical
methods in sample size estimation [19]. Chow et al. [3] provided procedures
for sample size estimation for proportions based on exact tests for small sam-
ples. Even though the closed–form formulas cannot be obtained for sample
size estimates based on exact tests, the sample size estimates can be obtained
numerically.
The tests for proportions using normal approximation to the binomial
outcome are equivalent to the usual chi–square tests since Z2
= χ2
. The
p–values for the two tests are equal. For example, the critical value of the
chi–square with 1 degree of freedom is χ2
0.05 = 3.841 at the α = 0.05 level,
which is equal to the square of two–sided Zα/2 = Z0.025 = 1.96. If one wishes
to use a two–sided chi–square test, one should use a two–sided sample size
or power determination by using Zα/2 instead of Zα [20]. Others [21, 22, 23]
have used arcsine transformation of proportions, A(p) = 2 arcsin (
√
p), to
stabilize variance in the sense that the variance formula of A(p) is free of the
proportion p. Given a proportion p̂ with E(p̂) = p, A(p̂) is asymptotically
normal with mean A(p) and variance 1/n, where n is the sample size. Since

the variance of A(p) does not depend on the expectation, the sample size and
power calculation becomes simplified.
Pre– and post–intervention studies have been widely used in medical and
social behavioral studies [24, 25, 26, 27, 28]. In pre–post studies, each sub-
ject contributes a pair of dependent observations: one observation at pre–
intervention and the other observation at post–intervention. Paired t–test has
been used to detect the intervention effect on a continuous outcome while
McNemar’s test [29] has been the most widely used approach to detect the
intervention effect on a binary outcome in pre–post studies. Paired t–test can
be conducted by applying the one–sample t–test on the difference between
pre–test and post–test observations. Sample size needed to detect a difference
between a pair of continuous outcomes from pre–post tests can be estimated
by using the sample size formula for a one–sample test for means in Equation
(1.4). However, unlike paired continuous outcomes from pre–post tests, sam-
ple size formulas for independent outcomes presented in this chapter cannot
be used to estimate the sample size needed to detect a difference between a
pair of binary observations from pre–post studies. Sample size determination
for studies involving a pair of binary observations from pre–post studies will
be discussed in Chapter 4.
Clustered data often arise in medical and behavioral studies such as den-
tal, ophthalmologic, radiologic, and community intervention studies in which
data are obtained from multiple units of each cluster. In radiologic studies, as
many as 60 lesions may be observed through positron emission tomography
(PET) in one patient since PET offers the possibility of imaging the whole
body [30]. Sample size estimation for clustered outcomes should be done in-
corporating the dependence of within–cluster observations. Here, the unit of
data collection is a cluster (subject), and the unit of data analysis is a lesion
within a cluster. Two major problems arise in a sample size calculation for
clustered data. One is that the number of units in each cluster, called cluster
size, tends to vary cluster by cluster with a certain distribution. The other
is that observations within each cluster are correlated. The sample size esti-
mate needs to incorporate the variable cluster size and the correlation among
observations within a cluster.
Controlled clinical trials often employ a parallel–groups repeated measures
design in which subjects are randomly assigned between treatment groups,
evaluated at baseline, and then evaluated at intervals across a treatment pe-
riod of fixed total duration. The repeated measurements are usually equally
spaced, although not necessarily so. The hypothesis of primary interest in
short–term efficacy trials concerns the difference in the rates of changes or
the time–averaged responses between treatment groups [31]. Major problems
in the sample size estimation of repeated measurement data are missing data
and the correlation among repeated observations within a subject. As in the
sample size estimate of clustered outcomes, sample size should be estimated
incorporating the correlation among repeated measurements within each

subject and the missing data mechanisms for studies with repeated measure-
ments. Here, a sample size means the number of subjects.
In the subsequent chapters, sample size estimates will be provided using
large sample approximation for correlated outcomes such as clustered out-
comes and repeated measurement outcomes. There are many complexities in
estimating sample size. For example, different sample size formulas are appro-
priate for different types of study designs, with computations more complex
for studies that recruit study subjects at multiple centers. Sample size de-
terminations also have to take into account that some subjects will be lost
to follow-up or otherwise drop out of a study. Certain manipulations, such
as increased precision of measurements or repeating measurements at various
time points, can be used to maximize power for a given sample size.
Bibliography
[1] ICH. Statistical Principles for Clinical Trials. Tripartite International
Conference on Harmonized Guidelines, E9, 1998.
[2] M. Krzywinski and N. Altman. Points of significance: Power and sample
size. Nature Methods, 10:1139–1140, 2013.
[3] S. C. Chow, J. Shao, and H. Wang. Sample Size Calculations in Clinical
Research. Chapman Hall/CRC, 2008.
[4] R. G. Newcombe. Two sided confidence intervals for the single propor-
tion: Comparison of seven methods. Statistics in Medicine, 17:857–872,
1998.
[5] ASA. Ethical guidelines for statistical practice: Executive summary. Am-
stat News, April:12–15, 1999.
[6] R. V. Lenth. Some practical guidelines for effective sample size determi-
nation. American Statistician, 55(3):187–193, 2001.
[7] S. A. Julious. Tutorial in biostatistics: Sample size for clinical trials.
Statistics in Medicine, 23:1921–1986, 2004.
[8] S. C. Chow. Biosimilars: Design and Analysis of Follow-on Biologics.
Chapman Hall/CRC, 2013.
[9] J. Wittes. Sample size calculations for randomized clinical trials. Epi-
demiologic Reviews, 24(1):39–53, 1984.
[10] J. S. Lee. Understanding equivalence trials (and why we should care).
Canadian Association of Emergency Physicians, 2(3):194–196, 2000.

[11] FDA. Guidance for Industry Bioavailability and Bioequivalence Studies
for Orally Administered Drug Products General Considerations. Center
for Drug Evaluation and Research, the U.S. Food and Drug Administra-
tion, Rockville, MD., 2003.
[12] FDA. Guideline for Industry on Non-Inferiority Clinical Trials. Center
for Drug Evaluation and Research and Center for Biologics Evaluation
and Research, Food and Drug Administration, Rockville, MD, 2010.
[13] EMEA. Guidelines on the Choice of the Non-Inferiority Margin. Euro-
pean Medicines Agency CHMP/EWP/2158/99, London, UK, 2005.
[14] S. A. Julious. Sample Sizes for Clinical Trials. Chapman Hall/CRC,
2009.
[15] S. A. Julious and M. J. Campbell. Tutorial in biostatistics: Sample size
for parallel group clinical trials with binary data. Statistics in Medicine,
31:2904–2936, 2010.
[16] K. E. Muller, L. M. Lavange, S. L. Ramey, and C. T. Ramey. Power calcu-
lations for general linear multivariate models including repeated measures
applications. Journal of American Statistical Association, 87(420):1209–
1226, 1992.
[17] W. S. Aronow and C. Ahn. Postprandial hypotension in 499 elderly
persons in a long-term health care facility. Journal of the American
Geriatrics Society, 42(9):930–932, 1994.
[18] S. Piantadosi. Clinical Trials: A Methodologic Perspective, (2nd ed.).
John Wiley Sons, Inc, 2005.
[19] R. P. Hern. Sample size tables for exact single–stage phase II designs.
Statistics in Medicine, 20:859–866, 2001.
[20] J. M. Lachin. Introduction to sample size determination and power anal-
ysis for clinical trials. Controlled Clinical Trials, 2:93–113, 1981.
[21] R. D. Sokal and F. J. Rohlf. Biometry: The Principles and Practice of
Statistics in Biometric Research. San Francisco: Freeman, 1969.
[22] S. H. Jung and C. Ahn. Estimation of response probability in correlated
binary data: A new approach. Drug Information Journal, 34:599–604,
2000.
[23] S. H. Jung, S. H. Kang, and C. Ahn. Sample size calculations for clustered
binary data. Statistics in Medicine, 20:1971–1982, 2001.
[24] M. C. Rossi, C. Perozzi, C. Consorti, T. Almonti, P. Foglini, N. Giostra,
P. Nanni, S. Talevi, D. Bartolomei, and G. Vespasiani. An interactive
diary for diet management (DAI): A new telemedicine system able to

promote body weight reduction, nutritional education, and consumption
of fresh local produce. Diabetes Technology and Therapeutics, 12(8):641–
647, 2010.
[25] A. Wajnberg, K. H. Wang, M. Aniff, and H. V. Kunins. Hospitalizations
and skilled nursing facility admissions before and after the implementa-
tion of a home-based primary care program. Journal of the American
Geriatric Society, 58(6):1144–1147, 2010.
[26] E. J. Knudtson, L. B. Lorenz, V. J. Skaggs, J. D. Peck, J. R. Good-
man, and A. A. Elimian. The effect of digital cervical examination on
group b streptococcal culture. Journal of the American Geriatric Society,
202(1):58.e1–4, 2010.
[27] T. Zieschang, I. Dutzi, E. Müller, U. Hestermann, K. Grunendahl, A. K.
Braun, D. Huger, D. Kopf, N. Specht-Leible, and P. Oster. Improving
care for patients with dementia hospitalized for acute somatic illness in a
specialized care unit: a feasibility study. International Psychogeriatrics,
22(1):139–146, 2010.
[28] A. M. Spleen, B. C. Kluhsman, A. D. Clark, M. B. Dignan, E. J.
Lengerich, and The ACTION Health Cancer Task Force. An increase in
HPV–related knowledge and vaccination intent among parental and non–
parental caregivers of adolescent girls, age 9–17 years, in Appalachian
Pennsylvania. Journal of Cancer Education, 27(2):312–319, 2012.
[29] Q. McNemar. Note on the sampling error of the difference between cor-
related proportions or percentages. Psychometrika, 12(2):153–157, 1947.
[30] M. Gonen, K. S. Panageas, and S. M. Larson. Statistical issues in analysis
of diagnostic imaging experiments with multiple observations per patient.
Radiology, 221:763–767, 2001.
[31] P. J. Diggle, P. Heagerty, K. Y. Liang, and S. L. Zeger. Analysis of
longitudinal data (2nd ed.). Oxford University Press, 2002.

2
Sample Size Determination for Clustered
Outcomes
2.1 Introduction
Clustered data frequently arise in many fields of applications. We frequently
make observations from multiple sites of each subject (called a cluster). For
example, observations from the same subject are correlated although those
from different subjects are independent. In periodontal studies that observe
each tooth, each patient usually contributes data from more than one tooth
to the studies. In this case, a patient corresponds to a cluster, and a tooth
corresponds to a site.
The degree of similarity or correlation is typically measured by intraclus-
ter correlation coefficient (ρ). If one simply ignores the clustering effect and
analyzes clustered data using standard statistical methods developed for the
analysis of independent observations, one may underestimate the true p-value
and inflate the type I error rate of such tests since the correlation among
observations within a cluster tends to be positive [1, 2]. Therefore, clustered
data should be analyzed using statistical methods that take into account of
the dependence of within–cluster observations. If one fails to take into ac-
count the clustered nature of the study design during the planning stage of
the study, one will obtain smaller sample size estimate and statistical power
than planned. However, one will obtain larger sample size estimate and statis-
tical power than planned in some studies such as split–mouth trials [3, 4, 5] in
which each of two treatments is randomly assigned to two segments of a sub-
ject‘s mouth. In split–mouth trials, both intervention and control treatments
are applied in each subject.
Intracluster correlation coefficient (ρ) is defined by ρ = σ2
B/(σ2
B + σ2
W ),
where σ2
B is the between–cluster variance, and σ2
W is the within–cluster vari-
ance. As the within–cluster variance (σ2
W ) approaches to 0, ρ approaches to 1.
Let n be the number of clusters and m be the number of observations in each
cluster. When ρ = 1, all responses within a cluster are identical. The effective
sample size (ESS) is reduced to the number of clusters (n) when ρ = 1 since
all responses within a cluster are identical. A very small value of ρ implies that
the within–cluster variance (σ2
W ) is much larger than the between–cluster vari-
ance (σ2
B). When ρ = 0, there is no correlation among observations within a
23

cluster. The effective sample size is the total number of observations across all
clusters (nm) when ρ = 0. To get the effective sample size, the total number
of observations (the number of observations per cluster (m) times the number
of clusters (n)) is divided by a correction factor [1 + (m − 1)ρ] that includes ρ
and the number of observations per cluster (m). That is, the effective sample
size is nm/[1 + (m − 1)ρ]. The correction factor, [1 + (m − 1)ρ], is called the
design effect or the variance inflation factor [6].
In the TOSS (trial of cilostazol in symptomatic intracranial arterial steno-
sis) clinical trial [7], investigators examined the effect of cilostazol on the
progression of intracranial arterial stenosis, which narrows an artery inside
the brain that can lead to stroke. Cilostazol is a medication for the treat-
ment of intermittent claudication, a condition caused by narrowing of the
arteries that supply blood to the legs. One hundred thirty–six subjects were
randomly allocated to receive either cilostazol or placebo with an equal prob-
ability. Three arteries (two middle cerebral arteries and one basilar artery)
were evaluated for the progression of intracranial stenosis in both cilostazol
and placebo groups.
The number of arteries evaluated in each treatment group is 204 (=3 ar-
teries/subject x 68 subjects). If observations in three arteries are independent
(ρ = 0), then the effective number of observations is 204. If the observations in
three arteries are completely dependent (ρ = 1), then the effective number of
observations is 68. If ρ takes the value between 0 and 1, the effective number
of observations is 204/[1 + (m − 1)ρ], where m = 3. The effective number of
observations in each treatment group is nm/[1 + (m − 1)ρ] when 0 ≤ ρ ≤ 1.
As a special case, the effective number of observations is nm when ρ = 0, and
n when ρ = 1.
2.2 One–Sample Clustered Continuous Outcomes
Clustered continuous outcomes occur frequently in biomedical studies. Exam-
ples include size of tumors in cancer patients, and pocket probing depth and
clinical attachment level in teeth of subjects undergoing root planning under
local anesthetic.
2.2.1 Equal Cluster Size
We assume that the number of observations in each cluster (m) is small com-
pared to the number of clusters (n) so that asymptotic theories can be ap-
plied to n for sample size estimation. Let Yij be a random variable of the
jth (j = 1, . . . , m) observation in the ith (i = 1, . . . , n) cluster, where Yij
is assumed to be normally distributed with mean E(Yij) = µ and common

Sample Size Determination for Clustered Outcomes 25
variance V (Yij) = σ2
. We assume a pairwise common intracluster correlation
coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0
.
Let yi =
Pm
j=1 Yij denote the sum of responses in the ith cluster, and ȳi be
the mean response computed over m observations in the ith cluster. The total
number of observations is nm. The mean of Yij computed over all observations
is written as
ȳ =
Pn
i=1
Pm
j=1 Yij
nm
,
where ȳ estimates the population mean µ.
The degree of dependence within clusters is measured by the intracluster
correlation coefficient (ρ), which can be estimated by analysis of variance
(ANOVA) estimate [8] as
ρ̂ =
MSC − MSW
MSC + (m − 1)MSW
,
where
MSC = m
n
X
i=1
(ȳi − ȳ)2
n − 1
,
MSW =
n
X
i=1
m
X
j=1
(yij − ȳi)2
n(m − 1)
.
The overall mean ȳ has a normal distribution with mean µ and variance
V , where
V =
Pn
i=1 m{1 + (m − 1)ρ̂}σ2
(nm)2
=
{1 + (m − 1)ρ̂}σ2
nm
.
We test the null hypothesis H0 : µ = µ0 versus the alternative hypothesis
H1 : µ = µ1 for µ0 6= µ1. The test statistic Z = (ȳ−µ0)/
√
V is asymptotically
normal with mean 0 and variance 1. We reject H0 : µ = µ0 if the absolute
value of Z is larger than z1−α/2, the 100(1−α/2)th percentile of the standard
normal distribution.
We are interested in estimating the sample size n with a power of 1−β for
the projected alternative hypothesis H1 : µ = µ1. The sample size (n) needed
to achieve a power of 1 −β can be obtained by solving the following equation:
|µ1 − µ0|
√
V
= z1−α/2 + z1−β.
The required number of clusters is
n =
(z1−α/2 + z1−β)2
(µ1 − µ0)2
{1 + (m − 1)ρ̂}
m
σ2
. (2.1)

The total number of observations is
n · m =
(z1−α/2 + z1−β)2
{1 + (m − 1)ρ̂}σ2
(µ1 − µ0)2
.
When the cluster size is 1 (m = 1), the required number of observations is
n1 =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
.
When cluster size is m(m 1), the variance is inflated by a factor of {1 +
(m−1)ρ̂} compared with the variance under m = 1. The factor {1+(m−1)ρ̂}
is called variance inflation factor or design effect. That is, the total number
of observations can be computed by multiplying n1 by the design effect {1 +
(m − 1)ρ̂}.
2.2.2 Unequal Cluster Size
Cluster sizes are often unequal in cluster randomized studies. When the cluster
sizes are not constant, one approach is to replace the cluster size (m) by an
advance estimate of the average cluster sizes, which was referred to as the
average cluster size method [9, 10]. The average cluster size method is likely
to underestimate the actual required sample size [11]. Another approach is to
replace the cluster size (m) by the largest expected cluster size in the sample,
which was called as the maximum cluster size method [10]. Here, we provide
the sample size estimate under variable cluster size.
Let n be the number of clusters in a clinical trial, and mi be the cluster size
in the ith cluster (i = 1, . . . , n). The number of observations in the ith cluster,
mi, may vary at random with a certain distribution. Here, we estimate the
sample size using the information on varying cluster sizes. We assume that the
cluster sizes (mi, i = 1, . . . n) are independent and identically distributed, and
the cluster sizes (mi’s) are small compared to n so that asymptotic theories
can be applied to n for sample size estimation. Let Yij be a random variable of
the jth observation (j = 1, . . . , mi) in the ith cluster, where Yij is assumed to
be normally distributed with mean µ and variance σ2
. We assume a pairwise
common intracluster correlation coefficient, ρ = corr(Yij, Yij0 ) for j 6= j0
. The
correlation is assumed not to vary with the number of observations per cluster.
Let yi =
Pmi
j=1 Yij denote the sum of responses in the ith cluster, and
ȳi =
Pmi
j=1 Yij/mi be the mean response computed over mi responses in the
ith cluster. Then, the mean of yij computed over all clusters is written as
ȳ =
Pn
i=1 miȳi
Pn
i=1 mi
,
where ȳ estimates the population mean µ. The mean cluster size is m̄ =
Pn
i=1 mi/n.

The degree of dependence within clusters is measured by the intracluster
correlation coefficient (ρ), which can be estimated by analysis of variance
(ANOVA) estimate [8].
It can be shown that conditional on the empirical distribution of mi’s, the
overall mean (ȳ) has a normal distribution with mean µ and variance V , where
V =
Pn
i=1 mi{1 + (mi − 1)ρ̂}σ2
(
Pn
i=1 mi)2
.
Based on the asymptotic result, we can reject H0 : µ = µ0 if the absolute value
of the test statistic Z = (ȳ−µ0)/
√
V is larger than z1−α/2, the 100(1−α/2)th
percentile of the standard normal distribution.
We are interested in estimating the sample size n with a power of 1−β for
the projected alternative hypothesis H1 : µ = µ1. Since mi’s are independent
and identically distributed random variables, by the law of large numbers, as
n → ∞,
nV →
E[m{1 + (m − 1)ρ̂}]σ2
E(m)2
,
where m is the random variable associated with the cluster size and E(·) is
the expectation with respect to the distribution of the cluster size.
The sample size needed to achieve a power of 1 − β can be obtained by
solving the following equation:
|µ1 − µ0|
√
V
= z1−α/2 + z1−β.
This leads to
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
E[m{1 + (m − 1)ρ̂}]
E(m)2
.
Let E(m) = θ, V (m) = τ2
, and γ = τ/θ, where γ is the coefficient of variation
of the cluster size. Then, we can write
n =
(z1−α/2 + z1−β)2
σ2
(µ1 − µ0)2
{(1 − ρ̂)
1
θ
+ ρ̂ + ρ̂γ2
}. (2.2)
The sample size formula (2.2) provides the sample size estimate by accounting
for variability in cluster size. When cluster sizes are equal across all clusters,
then the sample size formula (2.2) is the same as the sample size formula (2.1)
with γ = 0.
Let (w1, . . . , wn) be a set of weights assigned to clusters with wi ≥ 0 and
Pn
i=1 wi = 1. The overall mean can be expressed as ȳ =
Pn
i=1 wiȳi. The overall
mean (ȳ) is an unbiased estimate of µ. The above sample size estimate is based
on equal weights to observations by letting wi = mi/
Pn
i=1 mi. Sample size
can be also estimated by an estimator that assigns equal weights (wi = 1/n)
to each cluster or an estimator that minimizes the variance of an overall mean
(ȳ). These weighting schemes will be described in detail for clustered binary
outcomes.

2.2.2.1 Example
Reports have established the effectiveness of minimally invasive periodontal
surgery (MIPS) in treating osseous defects [12, 13]. Since these papers were
published, new devices (including a videoscope and ultrasonic tips) have been
incorporated to enhance the effectiveness of the procedure. Haffajee et al. [14]
computed the intracluster correlation coefficients of periodontal measurements
for five groups of treated periodontal disease subjects and one group of un-
treated subjects with periodontal disease. The median intracluster correlation
coefficient (ρ) is 0.067 for clinical attachment level change. Harrel et al. [12]
showed clinical attachment loss (CAL) gains of 4.05 mm following application
of minimally invasive periodontal surgery (MIPS) in 16 subjects presenting
multiple sites with deep pockets associated with different morphologies, in-
cluding furcation involvements.
An investigator is proposing a prospective cohort study to evaluate the
effectiveness of the MIPS using these new devices. He expects CAL gains of
3.0 mm with a standard deviation of 3.5 mm over the 1–year study period.
An investigator will evaluate three sites in each subject and would like to
estimate the sample size to detect the mean difference of 1.05 mm in clinical
attachment loss (CAL) gains over the 1–year study period to achieve 80%
power at a two–sided 5% significance level. We estimate the sample size (n)
to test the null hypothesis of H0 : µ = 4.05 versus the alternative hypothesis
H1 : µ = 3.0 with a two–sided 5% significance level and 80% power assuming
three sites per subjects (m = 3) and ρ = 0.067. From Equation (2.1) with the
fixed number of sites per subject (m = 3), the required sample size for testing
H0 : µ = 4.05 versus H1 : µ = 3.0 is
n =
(1.96 + 0.842)2
(4.05 − 3.0)2
{1 + (3 − 1)0.067}
3
3.52
= 33.
Suppose that the number of sites examined per subject varies among sub-
jects with a mean of 3 and a standard deviation of 2. Then, from Equation
(2.2) with a variable number of sites per subject (θ = 3 and γ = 2/3), the
required sample size is
n =
(1.96 + 0.842)2
3.52
(4.05 − 3.0)2
{(1 − 0.067)/3 + 0.067 + 0.067(2/3)2
} = 36.
2.3 One–Sample Clustered Binary Outcomes
Clustered binary outcomes occur frequently in medical and behavioral studies.
Examples include the presence of cavities in one or more teeth, the presence
of arthritic pain in one or more joints, the presence of infection in one or two
eyes, and the occurrence of lymph node metastases in cancer patients.

2.3.1 Equal Cluster Size
We assume that cluster sizes are equal across clusters. Let n be the total
number of clusters in an experiment and m be the number of observations in
each cluster. Let Yij be the binary random variable of the jth (j = 1, . . . , m)
observation in the ith (i = 1, . . . , n) cluster, which is coded as 1 for response
and 0 for non–response.
We assume that observations within a cluster are exchangeable in the sense
that, given m, Yi1, . . . , Yim have a common marginal response probability
P(Yij = 1) = p(0 p 1) and a common pairwise intracluster correlation
coefficient ρ = corr(Yij, Yij0 ) for j 6= j0
.
Let yi =
Pm
j=1 Yij denote the total number of responses in the ith cluster.
Under the exchangeability assumption, we have E(yi) = mp and var(yi) =
mp(1 − p){1 + (m − 1)ρ}. The proportion of responses in the ith cluster is
estimated by p̂i = yi/m with E(p̂i) = p. An unbiased estimate of p is p̂ =
Pn
i=1 p̂i/n.
For large n,
√
n(p̂ − p) is approximately normal with mean 0 and variance
σ̂2
= p̂(1 − p̂)
{1 + (m − 1)ρ̂}
m
,
where ρ̂ can be obtained by ANOVA method. The ANOVA method suitable
for continuous variables can be used to estimate the intracluster correlation
coefficient for binary outcomes. Ridout et al. [15] conducted simulation studies
to investigate the performance of various estimators of intracluster correlation
coefficient for clustered binary data under the common intracluster correla-
tion, ρ = corr(Yij, Yij0 ) for j 6= j0
. Their simulation studies showed that the
ANOVA estimator performed well for clustered binary data. The ANOVA
estimator of intracluster correlation coefficient can be written as
ρ̂ =
MSC − MSW
MSC + (m − 1)MSW
,
where MSC =
P
m(p̂i − p̂)2
/(n − 1), and MSW =
P
yi(1 − p̂i)/{n(m − 1)}.
Suppose that we wish to test the null hypothesis H0 : p = p0 versus
H1 : p = p1 for p0 6= p1 at a two–sided significance level of α. Under the null
hypothesis, the test statistic
Z =
√
n(p̂ − p0)
σ̂
is asymptotically normal with mean 0 and variance 1. We reject H0 : p = p0
if the absolute value of the test statistic Z is larger than z1−α/2, the 100(1 −
α/2)th percentile of the standard normal distribution. We are interested in
calculating the sample size n against the alternative hypothesis H1 : p = p1
with a two–sided significance level of α and power of 1 − β. The required
sample size can be obtained by solving
√
n|p0 − p1|/σ̂ = z1−α/2 + z1−β. The

required number of clusters is
n =
σ̂2
(z1−α/2 + z1−β)2
(p0 − p1)2
=
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
·
{1 + (m − 1)ρ̂}
m
.
When the cluster size is 1 (m = 1), the required sample size becomes
n1 =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
.
When cluster size is m(m 1), the total number of observations (nm) is
{1 + (m − 1)ρ̂} times the required number of observations under m = 1. The
factor {1 + (m − 1)ρ̂} is called variance inflation factor or design effect.
2.3.2 Unequal Cluster Size
Let n be the total number of clusters in an experiment and mi be the number
of observations in the ith (i = 1, . . . , n) cluster. The number of observations
per cluster may vary at random with a certain distribution. Let Yij be the
binary random variable of the jth (j = 1, . . . , mi) observation in the ith
cluster, which is coded as 1 for response and 0 for non–response.
We assume that observations within a cluster are exchangeable with
P(Yij) = p (0 p 1) and Corr(Yij, Yij0 ) = ρ for j 6= j0
as in equal cluster
size. The intracluster correlation is assumed not to vary with the number of
observations per cluster.
Let yi =
Pmi
j=1 Yij denote the total number of responses in the ith cluster.
The proportion of responses in the ith cluster is estimated by p̂i = yi/mi
with E(p̂i) = p. Under the exchangeability assumption, we have E(yi) = mip
and var(yi) = mip(1 − p){1 + (mi − 1)ρ}. Let (w1, . . . , wn) be a set of weights
assigned to clusters with wi ≥ 0 and
Pn
i=1 wi = 1. An unbiased estimate of p is
p̂ =
Pn
i=1 wip̂i. Three weighting schemes have been proposed for parametric
and nonparametric sample size estimation for one–sample clustered binary
data [16, 17]. Three weighting schemes are equal weights to observations,
equal weights to clusters, and minimum variance weights that minimize the
variance of the weighted estimator.
Cochran [18] and Donner and Klar [11] used the estimator p̂u =
P
yi/
P
mi that assigns equal weights to observations with wi =
mi/
Pn
i0=1 mi0 . Lee [19] and Lee and Dubin [20] used the estimator p̂c =
P
pi/n that assigns equal weights to clusters with wi = 1/n. Ahn [21] showed
that the method of assigning equal weights to clusters is preferred to the
method of assigning equal weights to observations when the intracluster cor-
relation is 0.6 or greater in a simulation study. Jung et al. [16] also showed
that the sample size under equal weights to observations (nu) is usually smaller
than that under equal weights to clusters (nc) for small ρ while nc is gener-
ally smaller than nu for large ρ. If observations within a cluster are highly
dependent, then making another observation from the same cluster will not

add much information. In this case, the method assigning equal weights to
clusters is preferred to the method assigning equal weights to observations. If
all clusters have an equal number of observations, then these two weighting
methods are identical.
Jung and Ahn [22] proposed a minimum variance estimator, p̂m, that min-
imizes the variance of p̂ =
Pn
i=1 wip̂i. The variance of the estimator (p̂m) is
minimized with weights
wi =
mi{1 + (mi − 1)ρ̂}−1
Pn
i=1 mi{1 + (mi − 1)ρ̂}−1
,
where ρ̂ can be obtained by the ANOVA method. The ANOVA estimator of
intracluster correlation coefficient can be written as
ρ̂ =
MSC − MSW
MSC + (mA − 1)MSW
,
where MSC =
Pn
i=1 mi(p̂i − p̂)2
/(n − 1), MSW =
Pn
i=1 yi(1 − p̂i)/(M − n),
mA = (M −
Pn
i=1 m2
i /M)/(n − 1), and M =
Pn
i=1 mi. Note that pm = pu
if ρ = 0 and pm = pc if ρ = 1. If cluster sizes are equal across all clusters
(mi = m), then pm = pu = pc.
We would like to test the null hypothesis H0 : p = p0 versus the alternative
hypothesis H1 : p = p1 for p0 6= p1. The test statistic
Zw =
√
n(p̂w − p0)
σ̂w
is asymptotically normal with mean 0 and variance 1, where w = u, c, and m.
Hence, we reject H0 if the absolute value of Zw is larger than z1−α/2, which
is the 100(1 − α/2)th percentile of the standard normal distribution.
Jung et al. [16] provided the sample size formulas needed to test the null
hypothesis H0 : p = p0 versus the alternative hypothesis H1 : p = p1 with a
power of 1−β using three weighting schemes of equal weights to observations,
equal weights to clusters, and minimum variance weights.
2.3.2.1 Equal Weights to Observations
Under equal weights to observations with wi = mi/
Pn
i=1 mi, the variance of
√
n(p̂u − p0) is
σ̂2
u = V {
√
n(p̂u − p0)} = p̂u(1 − p̂u)
n
P
i mi{1 + (mi − 1)ρ̂}
(
P
i mi)2
.
The test statistic
Zu =
√
n(p̂u − p0)
σ̂u
has a standard normal distribution with mean 0 and variance 1 for large n.
Under the alternative hypothesis (H1 : p = p1), σ̂2
u converges to σ2
u, where
σ2
u = p1(1 − p1)
E(m) + [E(m2
) − E(m)]ρ̂
[E(m)]2
,

and E(m) and E(m2
) are computed using the probability distribution of clus-
ter sizes. The required sample size to test H0 : p = p0 versus H1 : p = p1 at a
two–sided significance level of α and a power of 1 − β is
nu =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
{E(m) + [E(m2
) − E(m)]ρ̂}
[E(m)]2
.
2.3.2.2 Equal Weights to Clusters
Under equal weights to clusters with wi = 1/n, the variance of
√
n(p̂c − p0) is
σ̂2
c = V {
√
n(p̂c − p0)} = p̂c(1 − p̂c)
1
n
X
i
1 + (mi − 1)ρ̂
mi
.
The test statistic
Zc =
√
n(p̂c − p0)
σ̂c
is asymptotically normal with mean 0 and variance 1. Under the alternative
hypothesis (H1 : p = p1), σ̂2
c converges to σ2
c , where
σ2
c = p1(1 − p1){E(1/m) + {1 − E(1/m)}ρ̂},
and E(1/m) is computed using the probability distribution of cluster sizes.
The required sample size with the power of 1−β for the alternative hypothesis
H1 : p = p1 is
nc =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
{E(1/m) + {1 − E(1/m)}ρ̂}.
2.3.2.3 Minimum Variance Weights
The variance of the estimator p̂ =
Pn
i=1 wip̂i is minimized when the weight,
wi, is inversely proportional to the variance of p̂i, V (p̂i) = V (yi)/m2
i [22]. The
weight that minimizes the variance of the estimator is
wi =
mi{1 + (mi − 1)ρ̂}−1
Pn
i=1 mi{1 + (mi − 1)ρ̂}−1
,
where ρ̂ can be obtained by the ANOVA method. The variance of p̂m is con-
sistently estimated by
σ̂2
m =
p̂m(1 − p̂m)
n−1
P
i mi{1 + (mi − 1)ρ̂}−1
.
The test statistic
Zm =
√
n(p̂m − p0)
σ̂m

has a standard normal distribution with mean 0 and variance 1 for large n.
Under the alternative hypothesis (H1 : p = p1), σ̂2
m converges to σ2
m, where
σ2
m = p1(1 − p1)
1
E[m + {1 + (m − 1)ρ̂}−1]
.
The required sample size against the alternative hypothesis H1 : p = p1
for a two–sided significance level of α and power of 1 − β is
nm =
p1(1 − p1)(z1−α/2 + z1−β)2
(p0 − p1)2
1
E[m + {1 + (m − 1)ρ̂}−1]
.
The sample size (nm) under minimum variance estimate is always smaller
than or equal to nu and nc.
2.3.2.4 Example
We use the data of Hujoel et al. [23] as a pilot data to illustrate sample
size calculation for clustered binary outcomes. An enzymatic diagnostic test
was used to determine whether a site was infected by two specific organisms,
treponema denticola and bacteroides gingivalis. Each subject had a different
number of infected sites, as determined by the gold standard (an antibody
assay against the two organisms).
In a sample of 29 subjects, the number of true positive test results (yi)
and the number of infected sites (mi) are given in Table 2.1.
In the example of an enzymatic diagnostic test in Table 2.1, the ANOVA
estimate (ρ̂) of the intracluster correlation coefficient is 0.20.
Suppose that we would like to estimate the sample size based on the hy-
pothesis H0 : p0 = .6 versus H1 : p1 = .7 using a two–sided significance level
of 5% and a power of 80%. Table 2.2 shows the distribution of the number of
infected sites (mi).
Using the observed relative frequency from Table 2.2, E(m) =
4.897, E(1/m) = 0.224, E(m2
) = 25.379, and E[m{1 + (m − 1)ρ̂}−1
] = 2.704.
Therefore, the required sample sizes are nu = 62, nc = 63, and nm = 61.
TABLE 2.1
Proportion of infection (yi/mi) from n = 29 subjects (clusters)
3/6, 2/6, 2/4, 5/6, 4/5, 5/5, 4/6, 3/4, 2/4, 3/4, 5/5, 4/4, 6/6, 3/3, 5/6,
1/2, 4/6, 0/4, 5/6, 4/5, 4/6, 0/6, 4/5, 3/5, 0/2, 2/6, 2/4, 5/5, 4/6.
TABLE 2.2
Distribution of the number of infected sites (mi)
m
2 3 4 5 6
Relative frequency, f(m) 2/29 1/29 7/29 7/29 12/29

Another Random Document on
Scribd Without Any Related Topics

drawbridge are carved the Spanish arms and an inscription recording
the completion of the fort in 1756, when Ferdinand VI. was King of
Spain and Don Hereda Governor of Florida. It mounted one hundred
of the small guns of those days, and the interior is a square parade
ground, surrounded by large casemates. Upon each side of the
casemate opposite the sally-port is a niche for holy water, and at the
farther end the Chapel. Dungeons and subterranean passages
abound, of which ghostly tales are told. This fort is the most
interesting relic of the ancient city, a picturesque place, with charms
even in its dilapidation.
There are other quaint structures in this curious old town. A gray
gateway about ten feet wide, flanked by tall square towers, marks
the northern entrance to the city, the ditch from the fort passing in
front of it. In one of the streets is the palace of the Spanish
Governors, since changed into a post-office. The official centre of the
city is a public square, the Plaza de la Constitucion, having a
monument commemorating the Spanish Liberal Constitution of 1812,
and also a Confederate Soldiers' Monument. This square fronts on
the sea-wall, and alongside it and stretching westward is the
Alameda, known as King Street, leading to the group of grand hotels
recently constructed in Spanish and Moorish style, which have made
modern St. Augustine so famous. These are the Ponce de Leon, the
Alcazar and the Cordova, with the Casino, adjoined by spacious and
beautiful gardens. These buildings reproduce all types of the
Hispano-Moorish architecture, with many suggestions from the
Alhambra. The Ponce de Leon, the largest, is three hundred and
eighty by five hundred and twenty feet, enclosing an open court,
and its towers rise above the red-tiled roofs to a height of one
hundred and sixty-five feet, the adornments in colors being very
effective. To the southward of the town, adjoining the barracks, is
the military cemetery, where a monument and three white pyramids
tell the horrid story of the Dade massacre during the Seminole War.
Major Dade, a gallant officer, and one hundred and seven men, were
ambushed and massacred by eight hundred Indians in December,
1835, and their remains afterwards brought here and interred under

the pyramids. Opposite the barracks is what is claimed to be the
oldest house in the United States, occupied by Franciscan monks
from 1565 to 1580, and afterwards a dwelling. It has been restored,
and contains a collection of historical relics.
St. Augustine has had a chequered history. In 1586, Queen
Elizabeth's naval hero, Sir Francis Drake, sailing all over the world to
fight Spaniards, attacked and plundered the town and burnt the
greater part of it. Then for nearly a century the Indians, pirates,
French, English and neighboring Georgians and Carolinians made
matters lively for the harried inhabitants. In 1763 the British came
into possession, but they ceded it back to Spain twenty years later,
the town then containing about three hundred householders and
nine hundred negroes. It became American in 1821, and was an
important military post during the subsequent Seminole War, which
continued several years. It was early captured by the Union forces
during the Civil War, and was a valuable stronghold for them. This
curious old town has many traditions that tell of war and massacre
and the horrible cruelties of the Spanish Inquisition, the remains of
cages in which prisoners were starved to death being shown in the
fort. Its best modern story, however, is told of the escape of Coa-
coo-chee, the Seminole chief, whose adventurous spirit and savage
nature gained him the name of the Wild Cat. The ending of the
Seminole War was the signing of a treaty by the older chiefs
agreeing to remove west of the Mississippi. Coa-coo-chee, with other
younger chiefs, opposed this and renewed the conflict. He was
ultimately captured and taken to Fort Marion. Feigning sickness, he
was removed into a casemate giving him air, there being an aperture
two feet high by nine inches wide in the wall about thirteen feet
above the floor, and under it a platform five feet high. Here, while
still feigning illness, he became attenuated by voluntary abstinence
from food, and finally one night squeezed himself through the
aperture and dropped to the bottom of the moat, which was dry.
Eluding all the guards, he escaped and rejoined his people. The
flight caused a great sensation, and there was hot pursuit. After
some time he was recaptured, and being taken before General

Worth, was used to compel the remnant of the tribe to remove to
the West. Worth told him if his people were not at Tampa in twenty
days he would be killed, and he was ordered to notify them by
Indian runners. He hesitated, but afterwards yielded, and the
runners were given twenty twigs, one to be broken each day, so
they might know when the last one was broken his life would pay
the penalty. In seventeen days the task was accomplished. The tribe
came to Tampa, and the captive was released, accompanying his
warriors to the far West. This ended most of the Indian troubles in
Florida, but some descendants of the Seminoles still exist in the
remote fastnesses of the everglades.
THE FLORIDA EAST COAST.
All along the Atlantic shore of Florida south of St. Augustine are
popular winter resorts, their broad and attractive beaches, fine
climate and prolific tropical vegetation being among the charms that
bring visitors. Ormond is between the ocean front and the pleasant
Halifax River, its picturesque tributary, the Tomoka, being a favorite
resort for picnic parties. A few miles south on the Halifax River is
Daytona, known as the Fountain City, and having its suburb, the
City Beautiful, on the opposite bank. New Smyrna, settled by
Minorcan indigo planters in the eighteenth century, is on the
northern arm of Indian River. Here are found some of the ancient
Indian shell mounds that are frequent in Florida, and also the orange
groves that make this region famous. Inland about thirty miles are a
group of pretty lakes, and in the pines at Lake Helen is located the
Southern Cassadaga, or Spiritualists' Assembly. For more than a
hundred and fifty miles the noted Indian River stretches down the
coast of Florida. It is a long and narrow lagoon, parallel with the
ocean, and is part of the series of lagoons found on the eastern
coast almost continuously for more than three hundred miles from
St. Augustine south to Biscayne Bay, and varying in width from
about fifty yards to six or more miles. They are shallow waters,
rarely over twelve feet deep, and are entered by very shallow inlets

from the sea. The Indian River shores, stretching down to Jupiter
Inlet, are lined with luxuriant vegetation, and the water is at times
highly phosphorescent. Upon the western shore are most of the
celebrated Indian River orange groves whose product is so highly
prized. At Titusville, the head of navigation, where there are about a
thousand people, the river is about, at its widest part, six miles.
Twenty miles below, at Rockledge, it narrows to about a mile in
width, washing against the perpendicular sides of a continuous
enclosing ledge of coquina rock, with pleasant overhanging trees.
Here comes in, around an island, its eastern arm, the Banana River,
and to the many orange groves are added plantations of the luscious
pineapple. Various limpid streams flow out from the everglade region
at the westward, and Fort Pierce is the trading station for that
district, to which the remnant of the Seminoles come to exchange
alligator hides, bird plumage and snake skins for various supplies,
not forgetting fire-water. Below this is the wide estuary of St. Lucie
River and the Jupiter River, with the lighthouse on the ocean's edge
at Jupiter Inlet, the mouth of Indian River.
Seventeen miles below this Inlet is Palm Beach, a noted resort,
situated upon the narrow strip of land between the long and narrow
lagoon of Lake Worth and the Atlantic Ocean. Here are the vast
Hotel Royal Poinciana and the Palm Beach Inn, with their cocoanut
groves, which also fringe for miles the pleasant shores of Lake
Worth. Prolific vegetation and every charm that can add to this
American Riviera bring a crowded winter population. The Poinciana
is a tree bearing gorgeous flowers, and the two magnificent hotels,
surrounded by an extensive tropical paradise, are connected by a
wide avenue of palms a half-mile long, one house facing the lake
and the other the ocean. There is not a horse in the settlement, and
only one mule, whose duty is to haul a light summer car between
the houses. The vehicles of Palm Beach are said to be confined to
bicycles, wheel-chairs and jinrickshas. Off to the westward the
distant horizon is bounded by the mysterious region of the
everglades. Far down the coast the railway terminates at Miami, the
southernmost railway station in the United States, a little town on

Miami River, where it enters the broad expanse of Biscayne Bay,
which is separated from the Atlantic by the first of the long chain of
Florida keys. Here are many fruit and vegetable plantations, and the
town, which is a railway terminal and steamship port for lines to
Nassau, Key West and Havana, is growing. Nassau is but one
hundred and seventy-five miles distant in the Bahamas, off the
Southern Florida coast, and has become a favorite American winter
tourist resort.
ASCENDING ST. JOHN'S RIVER.
The St. John's is the great river of Florida, rising in the region of
lakes, swamps and savannahs in the lower peninsula, and flowing
northward four hundred miles to Jacksonville, then turning eastward
to the ocean. It comes through a low and level region, with mostly a
sluggish current; is bordered by dense foliage, and in its northern
portion is a series of lagoons varying in width from one to six miles.
The river is navigable fully two hundred miles above Jacksonville.
The earlier portion of the journey is monotonous, the shores being
distant and the landings made at long piers jutting out over the
shallows from the villages and plantations. At Mandarin is the orange
grove which was formerly the winter home of Harriet Beecher
Stowe; Magnolia amid the pines is a resort for consumptives; and
nearby is Green Cove Springs, having a large sulphur spring of
medicinal virtue. In all directions stretch the pine forests; and the
river water, while clear and sparkling in the sunlight, is colored a
dark amber from the swamps whence it comes. The original Indian
name of this river was We-la-ka, or a chain of lakes, the literal
meaning, in the figurative idea of the savage, being the water has
its own way. It broadens into various bays, and at one of these,
about seventy-five miles south of Jacksonville, is the chief town of
the upper river, Palatka, having about thirty-five hundred inhabitants
and a much greater winter population. It is largely a Yankee town,
shipping oranges and early vegetables to the North; and across the
river, just above, is one of the leading orange plantations of Florida—

Colonel Hart's, a Vermonter who came here dying of consumption,
but lived to become, in his time, the leading fruit-grower of the
State. Above Palatka the river is narrower, excepting where it may
broaden into a lake; the foliage is greener, the shores more swampy,
the wild-fowl more frequent, and the cypress tree more general. The
young cypress knees can be seen starting up along the swampy
edge of the shore, looking like so many champagne bottles set to
cool in the water. The river also becomes quite crooked, and here is
an ancient Spanish and Indian settlement, well named Welaka,
opposite which flows in the weird Ocklawaha River, the haunt of the
alligator and renowned as the crookedest stream on the continent.

On the Ocklawaha
GOING DOWN THE OCKLAWAHA.
The Ocklawaha, the dark, crooked water, comes from the south, by
tortuous windings, through various lakes and swamps, and then
turns east and southeast to flow into St. John's River, after a course
of over three hundred miles. It rises in Lake Apopka, down the
Peninsula, elevated about a hundred feet above the sea, the second
largest of the Florida Lakes, and covering one hundred and fifty
square miles. This lake has wooded highlands to the westward,
dignified by the title of Apopka Mountains, which rise probably one

hundred and twenty feet above its surface. To the northward is a
group of lakes—Griffin, Yale, Eustis, Dora, Harris and others—having
clear amber waters and low shores, which are all united by the
Ocklawaha, the stream finally flowing northward out of Lake Griffin.
This is a region of extensive settlement, mainly by Northern people.
The mouth of the Ocklawaha is sixty-five miles from Lake Eustis in a
straight line, but the river goes two hundred and thirty miles to get
there. To the northward of this lake district is the thriving town of
Ocala, with five thousand people, in a region of good agriculture and
having large phosphate beds, the settlement having been originally
started as a military post during the Seminole War. About five miles
east of Ocala is the famous Silver Spring, which is believed to have
been the fountain of perpetual youth, for which Juan Ponce de
Leon vainly searched. It is the largest and most beautiful of the
many Florida springs, having wonderfully clear waters, and covers
about three acres. The waters can be plainly seen pouring upwards
through fissures in the rocky bottom, like an inverted Niagara, eighty
feet beneath the surface. It has an enormous outflow, and a swift
brook runs from it, a hundred feet wide, for some eight miles to the
Ocklawaha.
This strange stream is hardly a river in the ordinary sense, having
fixed banks and a well-defined channel, but is rather a tortuous but
navigable passage through a succession of lagoons and cypress
swamps. Above the Silver Spring outlet, only the smallest boats of
light draft can get through the crooked channel. This outlet is thirty
miles in a direct line from the mouth of the river at the St. John's,
but the Ocklawaha goes one hundred and nine miles thither. The
swampy border of the stream is rarely more than a mile broad, and
beyond it are the higher pine lands. Through this curious channel,
amid the thick cypress forests and dense jungle of undergrowth, the
wayward and crooked river meanders. The swampy bottom in which
it has its course is so low-lying as to be undrainable and cannot be
improved, so that it will probably always remain as now, a refuge for
the sub-tropical animals, birds, reptiles and insects of Florida, which
abound in its inmost recesses. Here flourishes the alligator, coming

out to sun himself at mid-day on the logs and warm grassy lagoons
at the edge of the stream, in just the kinds of places one would
expect to find him. Yet the alligator is said to be a coward, rarely
attacking, unless his retreat to water in which to hide himself is cut
off. He thus becomes more a curiosity than a foe. These reptiles are
hatched from eggs which the female deposits during the spring, in
large numbers, in muddy places, where she digs out a spacious
cavity, fills it with several hundred eggs, and covering them thickly
with mud, leaves nature to do the rest. After a long incubation the
little fellows come out and make a bee-line for the nearest water.
The big alligators of the neighborhood have many breakfasts on the
newly-born little ones, but some manage to grow up, after several
years, to maturity, and exhibit themselves along this remarkable
river.
It is almost impossible to conceive of the concentrated crookedness
of the Ocklawaha and the difficulties of passage. It is navigated by
stout and narrow flat-bottomed boats of light draft, constructed so
as to quickly turn sharp corners, bump the shores and run on logs
without injury. The river turns constantly at short intervals and
doubles upon itself in almost every mile, while the huge cypress
trees often compress the water way so that a wider boat could not
get through. There are many beautiful views in its course displaying
the noble ranks of cypress trees rising as the stream bends along its
bordering edge of swamps. Occasionally a comparatively straight
river reach opens like the aisle of a grand building with the moss-
hung cypress columns in long and sombre rows on either hand. At
rare intervals fast land comes down to the stream bank, where there
is some cultivation attempted for oranges and vegetables. Terrapin,
turtles and water-fowl abound. When the passenger boat, after
bumping and swinging around the corners, much like a ponderous
teetotum, halts for a moment at a landing in this swampy fastness,
half-clad negroes usually appear, offering for sale partly-grown baby
alligators, which are the prolific crop of the district. Various Turkey
bends, Hell's half-acres, Log Jams, Bone Yards and Double S
Bends are passed, and at one place is the Cypress Gate, where

three large trees are in the way, and by chopping off parts of their
roots, a passage about twenty feet wide had been secured to let the
boats through. There are said to be two thousand bends in one
hundred miles of this stream, and many of them are like corrugated
circles, by which the narrow water way, in a mile or two of its
course, manages to twist back to within a few feet of where it
started. At night, to aid the navigation, the lurid glare of huge pine-
knot torches, fitfully blazing, gives the scene a weird and unnatural
aspect. The monotonous sameness of cypress trunks, sombre moss
and twisting stream for many hours finally becomes very tiresome,
but it is nevertheless a most remarkable journey of the strangest
character possible in this country to sail down the Ocklawaha.
LOWER FLORIDA AND THE SEMINOLES.
South of the mouth of the Ocklawaha the St. John's River broadens
into Lake George, the largest of its many lakes, a pretty sheet of
water six to nine miles wide and twelve miles long. Volusia, the site
of an ancient Spanish mission, is at the head of this lake, and the
discharge from the swift but narrow stream above has made sand
bars, so that jetties are constructed to deepen the channel. For a
long distance the upper river is narrow and tortuous, with numerous
islands and swamps, the dark coffee-colored water disclosing its
origin; but the Blue Spring in one place is unique, sending out an
ample and rich blue current to mix with the amber. Then Lake
Monroe is reached, ten miles long and five miles wide, the head of
navigation, by the regular lines of steamers, one hundred and
seventy miles above Jacksonville. Here are two flourishing towns,
Enterprise on the northern shore and Sanford on the southern, both
popular winter resorts, and the latter having two thousand people.
The St. John's extends above Lake Monroe, a crooked, narrow,
shallow stream, two hundred and fourteen miles farther
southeastward to its source. The region through which it there
passes is mostly a prairie with herds of cattle and much game, and is
only sparsely settled. The upper river approaches the seacoast,

being in one place but three miles from the lagoons bordering the
Atlantic. To the southward of Lake Monroe are the winter resorts of
Winter Park and Orlando, the latter a town of three thousand
population. There are numerous lakes in this district, and then
leaving the St. John's valley and crossing the watershed southward
through the pine forests, the Okeechobee waters are reached, which
flow down to that lake. This region was the home of a part of the
Seminole Indians, and Tohopekaliga was their chief, whom they
revered so highly that they named their largest lake in his honor. The
Kissimmee River flows southward through this lake, and then
traverses a succession of lakes and swamps to Lake Okeechobee,
about two hundred miles southward by the water-line. Kissimmee
City is on Lake Tohopekaliga, and extensive drainage operations have
been conducted here and to the southward, reclaiming a large extent
of valuable lands, and lowering the water-level in all these lakes and
attendant swamps.
From Lake Tohopekaliga through the tortuous water route to Lake
Okeechobee, and thence by the Caloosahatchie westward to the Gulf
of Mexico, is a winding channel of four hundred and sixty miles,
though in a direct line the distance is but one hundred and fifty
miles. Okeechobee, the word meaning the large water, covers
about twelve hundred and fifty square miles, and almost all about it
are the everglades or grass water, the shores being generally a
swampy jungle. This district for many miles is a mass of waving
sedge grass eight to ten feet high above the water, and inaccessible
excepting through narrow, winding and generally hidden channels.
In one locality a few tall lone pines stand like sentinels upon Arpeika
Island, formerly the home of the bravest and most dreaded of the
Seminoles, and still occupied by some of their descendants. The
name of the Seminole means the separatist or runaway Indians,
they having centuries ago separated from the Creeks in Georgia and
gone southward into Florida. From the days of De Soto to the time of
their deportation in the nineteenth century the Spanish, British,
French and Americans made war with these Seminole Indians.
Gradually they were pressed southward through Florida. Their final

refuge was the green islands and hummocks of the everglades, and
they then clung to their last homes with the tenacity of despair. The
greater part of this region is an unexplored mystery; the deep
silence that can be actually felt, everywhere pervades; and once lost
within the labyrinth, the adventurer is doomed unless rescued. Only
the Indians knew its concealed and devious paths. On Arpeika Island
the Cacique of the Caribs is said to have ruled centuries ago, until
forced south out of Florida by the Seminoles. It was at times a
refuge for the buccaneer with his plunder and a shrine for the
missionary martyr who planted the Cross and was murdered beside
it. This island was the last retreat of the Seminoles in the desultory
war from 1835 to 1843, when they defied the Government, which,
during eight years, spent $50,000,000 upon expeditions sent against
them. Then the attempt to remove all of them was abandoned, and
the remnant have since rested in peace, living by hunting and a little
trading with the coast settlements. The names of the noted chiefs of
this great race—Osceola, Tallahassee, Tohopekaliga, Coa-coo-chee
and others—are preserved in the lakes, streams and towns of
Florida. Most of the deported tribe were sent to the Indian Territory.
There may be three or four hundred of them still in the everglades,
peaceful, it is true, yet haughty and suspicious, and sturdily rejecting
all efforts to educate or civilize them. They celebrate their great
feast, the Green Corn Dance, in late June; and they have
unwavering faith in the belief that the time will yet come when all
their prized everglade land will be theirs again, and the glory of the
past redeemed, if not in this world, then in the next one, beyond the
Big Sleep.
WESTERN FLORIDA.
Westward from Jacksonville, a railway runs through the pine forests
until it reaches the rushing Suwanee River, draining the Okifenokee
swamp out to the Gulf, just north of Cedar Key. This stream is best
known from the minstrel song, long so popular, of the Old Folks at
Home. Beyond it the land rises into the rolling country of Middle

Florida, the undulating surface sometimes reaching four hundred
feet elevation, and presenting fertile soil and pleasant scenery, with a
less tropical vegetation than the Peninsula of Florida. Here is
Tallahassee, the capital of the State, one hundred and sixty-five
miles from Jacksonville, a beautiful town of four thousand
population, almost embedded in flowering plants, shrubbery and
evergreens, and familiarly known from these beauties as the Floral
City, the gardens being especially attractive in the season of roses.
The Capitol and Court-house and West Florida Seminary, set on a
hill, are the chief public buildings. In the suburbs, at Monticello, lived
Prince Achille Murat, a son of the King of Naples, who died in 1847,
and his grave is in the Episcopal Cemetery. There are several lakes
near the town, one of them the curious Lake Miccosukie, which
contracts into a creek, finally disappearing underground. The noted
Wakulla Spring, an immense limestone basin of great depth and
volume of water, with wonderful transparency, is fifteen miles
southward.
Some distance to the westward the Flint and Chattahoochee Rivers
join to form the Appalachicola River, flowing down to the Gulf at
Appalachicola, a somewhat decadent port from loss of trade, its
exports being principally lumber and cotton. The shallowness of most
of these Gulf harbors, which readily silt up, destroys their usefulness
as ports for deep-draft shipping. The route farther westward skirts
the Gulf Coast, crosses Escambia Bay and reaches Pensacola, on its
spacious harbor, ten miles within the Gulf. This is the chief Western
Florida port, with fifteen thousand people, having a Navy Yard and
much trade in lumber, cotton, coal and grain, a large elevator for the
latter being erected in 1898. The Spaniards made this a frontier post
in 1696, and the remains of their forts, San Miguel and San
Bernardo, can be seen behind the town, while near the outer edge of
the harbor is the old-time Spanish defensive battery, Fort San Carlos
de Barrancos. The harbor entrance is now defended by Fort Pickens
and Fort McRae. Pensacola Bay was the scene of one of the first
spirited naval combats of the Civil War, when the Union forces early
in 1862 recaptured the Navy Yard and defenses. The name of

Pensacola was originally given by the Choctaws to the bearded
Europeans who first settled there, and signifies the hair people.
THE FLORIDA GULF COAST.
The coast of Florida on the Gulf of Mexico has various attractive
places, reached by a convenient railway system. Homosassa is a
popular resort about fifty miles southwestward from Ocala. A short
distance in the interior is the locality where the Seminoles surprised
and massacred Major Dade and his men in December, 1835, only
three soldiers escaping alive to tell the horrid tale. The operations
against these Indians were then mainly conducted from the military
post of Tampa, and thither were taken for deportation the portions of
the tribe that were afterwards captured, or who surrendered under
the treaty. When Ferdinand de Soto entered this magnificent harbor
on his voyage of discovery and gold hunting, he called it Espiritu
Sancto Bay. It is from six to fifteen miles wide, and stretches nearly
forty miles into the land, being dotted with islands, its waters
swarming with sea-fowl, turtles and fish, deer abounding in the
interior and on some of the islands, and there being abundant
anchorage for the largest vessels. This is the great Florida harbor
and the chief winter resort on the western coast. It was the main
port of rendezvous and embarkation for the American forces in the
Spanish War of 1898. The head of the harbor divides into Old Tampa
and Hillsborough Bays, and on the latter and at the mouth of
Hillsborough River is the city, numbering about twenty-five thousand
inhabitants. The great hotels are surrounded by groves with orange
and lemon trees abounding, and everything is invoked that can add
to the tourist attractions. The special industry of the resident
population is cigar-making. Port Tampa is out upon the Peninsula
between the two bays, several miles below the city, and a long
railway trestle leads from the shore for a mile to deep water. Upon
the outer end of this long wharf is Tampa Inn, built on a mass of
piles, much like some of the constructions in Venice. The guests can
almost catch fish out of the bedroom windows, and while eating

breakfast can watch the pelican go fishing in the neighboring waters,
for this queer-looking bird, with the duck and gull, is everywhere
seen in these attractive regions. An outer line of keys defends Tampa
harbor from the storms of the Gulf. There are many popular resorts
on the islands and shores of Tampa Bay, and regular lines of
steamers are run to the West India ports, Mobile and New Orleans.
All the surroundings are attractive, and a pleased visitor writes of the
place: Conditions hereabouts exhilarate the men; a perpetual sun
and ocean breeze are balm to the invalid and an inspiration to a
robust health. The landscape affords uncommon diversion, and the
sea its royal sport with rod and gaff.
Farther down the coast is Charlotte Harbor, also deeply indented and
sheltered from the sea by various outlying islands. It is eight to ten
miles long and extends twenty-five miles into the land, having
valuable oyster-beds and fisheries, and its port is Punta Gorda.
Below this is the projecting shore of Punta Rassa, where the outlet of
Lake Okeechobee, the Caloosahatchie River, flows to the sea, having
the military post of Fort Myers, another popular resort, a short
distance inland, upon its bank. The Gulf Coast now trends to the
southeast, with various bays, in one of which, with Cape Romano as
the guarding headland, is the archipelago of the ten thousand
islands, while below is Cape Sable, the southwestern extremity of
Florida. To the southward, distant from the shore, are the long line
of Florida Keys, the name coming from the Spanish word cayo, an
island. This remarkable coral formation marks the northern limit of
the Gulf Stream, where it flows swiftly out to round the extremity of
the Peninsula and begin its northern course through the Atlantic
Ocean. Although well lighted and charted, the Straits of Florida along
these reefs are dangerous to navigate and need special pilots.
Nowhere rising more than eight to twelve feet above the sea, the
Keys thus low-lying are luxuriantly covered with tropical vegetation.
From the Dry Tortugas at the west, around to Sand's Key at the
entrance to Biscayne Bay, off the Atlantic Coast, about two hundred
miles, is a continuous reef of coral, upon the whole extent of which
the little builder is still industriously working. The reef is occasionally

broken by channels of varying depth, and within the outer line are
many habitable islands. The whole space inside this reef is slowly
filling up, just as all the Keys are also slowly growing through
accretions from floating substances becoming entangled in the
myriad roots of the mangroves. The present Florida Reef is a good
example of the way in which a large part of the Peninsula was
formed. No less than seven old coral reefs have been found to exist
south of Lake Okeechobee, and the present one at the very edge of
the deep water of the Gulf Stream is probably the last that can be
formed, as the little coral-builder cannot live at a greater depth than
sixty feet. The Gulf Stream current is so swift and deep along the
outer reef that there is no longer a foundation on which to build.
The Gulf Stream is the best known of all the great ocean currents.
The northeast and southeast trade-winds, constantly blowing, drive a
great mass of water from the Atlantic Ocean into the Caribbean Sea,
and westward through the passages between the Windward Islands,
which is contracted by the converging shores of the Yucatan
Peninsula and the Island of Cuba, so that it pours between them into
the Gulf of Mexico, raising its surface considerably above the level of
the Atlantic. These currents then move towards the Florida
Peninsula, and pass around the Florida Reef and out into the
Atlantic. It is estimated by the Coast Survey that the hourly flow of
the Gulf Stream past the reef is nearly ninety thousand million tons
of water, the speed at the surface of the axis of the stream being
over three and one-half miles an hour. To conceive what the
immensity of this flow means, it is stated that if a single hour's flow
of water were evaporated, the salt thus produced would require to
carry it one hundred times the number of ocean-going vessels now
afloat. The Gulf Stream water is of high temperature, great clearness
and a deep blue color; and when it meets the greener waters of the
Atlantic to the northward, the line of distinction is often very well
defined. At the exit to the Atlantic below Jupiter Inlet the stream is
forty-eight miles wide to Little Bahama Bank, and its depth over four
hundred fathoms.

There are numerous harbors of refuge among the Florida Keys, and
that at Key West is the best. This is a coral island seven miles long
and one to two miles broad, but nowhere elevated more than eleven
feet above the sea. Its name, by a free translation, comes from the
original Spanish name of Cayo Hueso, or the Bone Island, given
because the early mariners found human bones upon it. Here are
twenty thousand people, mostly Cubans and settlers from the
Bahamas, the chief industry being cigar-making, while catching fish
and turtles and gathering sponges also give much employment.
There are no springs on the island, and the inhabitants are
dependent on rain or distillation for water. The air is pure and the
climate healthy, the trees and shrubbery, with the residences
embowered in perennial flowers, giving the city a picturesque
appearance. Key West has a good harbor, and as it commands the
gateway to and from the Gulf near the western extremity of the
Florida coral reef, it is strongly defended, the prominent work being
Fort Taylor, constructed on an artificial island within the main harbor
entrance. The little Sand Key, seven miles to the southwest, is the
southernmost point of the United States. Forty miles to the westward
is the group of ten small, low and barren islands known as the Dry
Tortugas, from the Spanish tortuga, a tortoise. Upon the farthest
one, Loggerhead Key, stands the great guiding light for the Florida
Reef, of which this is the western extremity, the tower rising one
hundred and fifty feet. Fort Jefferson is on Garden Key, where there
is a harbor, and in it were confined various political prisoners during
the Civil War, among them some who were concerned in the
conspiracy to assassinate President Lincoln.
Here, with the encircling waters of the Gulf all around us, terminates
this visit to the Sunny South. As we have progressed, the gradual
blending of the temperate into the torrid zone, with the changing
vegetation, has reminded of Bayard Taylor's words:

There, in the wondering airs of the Tropics,
Shivers the Aspen, still dreaming of cold:
There stretches the Oak from the loftiest ledges,
His arms to the far-away lands of his brothers,
And the Pine tree looks down on his rival, the Palm.
And as the journey down the Florida Peninsula has displayed some of
the most magnificent winter resorts of the American Riviera, with
their wealth of tropical foliage, fruits and flowers, and their seductive
and balmy climate, this too has reminded of Cardinal Damiani's
glimpse of the Joys of Heaven:
Stormy winter, burning summer, rage within these regions never,
But perpetual bloom of roses and unfading spring forever;
Lilies gleam, the crocus glows, and dropping balms their scents
deliver.
Along this famous peninsula the sea rolls with ceaseless beat upon
some of the most gorgeous beaches of the American coast. To the
glories of tropical vegetation and the charms of the climate, Florida
thus adds the magnificence of its unrivalled marine environment.
Everywhere upon these pleasant coasts—
The bridegroom, Sea,
Is toying with his wedded bride,—the Shore.
He decorates her shining brow with shells,
And then retires to see how fine she looks,
Then, proud, runs up to kiss her.

TRAVERSING THE PRAIRIE LAND.
VI.
TRAVERSING THE PRAIRIE LAND.
The Northwest Territory—Beaver River—Fort McIntosh—Mahoning Valley—
Steubenville—Youngstown—Canton—Massillon—Columbus—Scioto River—Wayne
Defeats the Miamis—Sandusky River—Findlay—Natural Gas Fields—Fort Wayne—
Maumee River—The Little Turtle—Old Tippecanoe—Tecumseh—Battle of
Tippecanoe—Harrison Defeats the Prophet—Tecumseh Slain in Canada—
Indianapolis—Wabash River—Terre Haute—Illinois River—Springfield—Lincoln's
Home and Tomb—Peoria—The Great West—Lake Erie—Tribe of the Cat—
Conneaut—The Western Reserve—Ashtabula—Mentor—Cleveland—Cuyahoga
River—Moses Cleaveland—Euclid Avenue—Oberlin—Elyria—The Fire Lands—
Sandusky—Put-in-Bay Island—Perry's Victory—Maumee River—Toledo—South
Bend—Chicago—The Pottawatomies—Fort Dearborn—Chicago Fire—Lake
Michigan—Chicago River—Drainage Canal—Lockport—Water Supply—Fine
Buildings, Streets and Parks—University of Chicago—Libraries—Federal Steel
Company—Great Business Establishments—Union Stock Yards—The Hog—The
Board of Trade—Speculative Activity—George M. Pullman—The Sleeping Car—
The Pioneer—Town of Pullman—Agricultural Wealth of the Prairies—The Corn
Crop—Whittier's Corn Song.
THE NORTHWEST TERRITORY.
Beyond the Allegheny ranges, which are gradually broken down into
their lower foothills, and then to an almost monotonous level, the
expansive prairie lands stretch towards the setting sun. From their
prolific agriculture has come much of the wealth and prosperity of
the United States. The rivers flowing out of the mountains seek the

Mississippi Valley, thus reaching the sea through the Great Father of
Waters. Among these rivers is the Ohio, and at its confluence with
the Beaver, near the western border of Pennsylvania, was, in the
early days, the Revolutionary outpost of Fort McIntosh, a defensive
work against the Indians. All about is a region of coal and gas,
extending across the boundary into the Mahoning district of Ohio,
the Mahoning River being an affluent of the Beaver. Numerous
railroads serve its many towns of furnaces and forges. To the
southward is Steubenville on the Ohio, and to the northward
Youngstown on the Mahoning, both busy manufacturing centres.
Salem and Alliance are also prominent, and some distance northwest
is Canton, a city of thirty thousand people, in a fertile grain district,
the home of President William McKinley. Massillon, upon the pleasant
Tuscarawas River, in one of the most productive Ohio coal-fields,
preserves the memory of the noted French missionary priest, Jean
Baptiste Massillon, for all this region was first traversed, and opened
to civilization, by the French religious explorers from Canada who
went out to convert the Indians.
In the centre of the State of Ohio is the capital, Columbus, built on
the banks of the Scioto River, a tributary of the Ohio flowing
southward and two hundred miles long. This river receives the
Olentangy or Whetstone River at Columbus, in a region of great
fertility, which is in fact the characteristic of the whole Scioto Valley.
The Ohio capital, which has a population of one hundred and twenty
thousand, large commerce and many important manufacturing
establishments, dates from 1812, and became the seat of the State
Government in 1816. The large expenditures of public money upon
numerous public institutions, all having fine buildings, the wide, tree-
shaded streets, and the many attractive residences, have made it
one of the finest cities in the United States. Broad Street, one
hundred and twenty feet wide, beautifully shaded with maples and
elms, extends for seven miles. The Capitol occupies a large park
surrounded with elms, and is an impressive Doric building of gray
limestone, three hundred and four feet long and one hundred and
eighty-four feet wide, the rotunda being one hundred and fifty-seven

feet high. There are fine parks on the north, south and east of the
city, the latter containing the spacious grounds of the Agricultural
Society. Almost all the Ohio State buildings, devoted to its
benevolence, justice or business, have been concentrated in
Columbus, adding to its attractions, and it is also the seat of the
Ohio State University with one thousand students. Railroads radiate
in all directions, adding to its commercial importance.
In going westward, the region we are traversing beyond the
Pennsylvania boundary gradually changes from coal and iron to a
rich agricultural section. As we move away from the influence of the
Allegheny ranges, the hills become gentler, and the rolling surface is
more and more subdued, until it is smoothed out into an almost level
prairie, heavily timbered where not yet cleared for cultivation. This
was the Northwest Territory, first explored by the French, who were
led by the Sieur de la Salle in his original discoveries in the
seventeenth century. The French held it until the conquest of
Canada, when that Dominion and the whole country west to the
Mississippi River came under the British flag by the treaty of 1763.
After the Revolution, the various older Atlantic seaboard States
claiming the region, ceded sovereignty to the United States
Government, and then its history was chequered by Indian wars until
General Wayne conducted an expedition against the Miamis and
defeated them in 1794, after which the Northwest Territory was
organized, and the State of Ohio taken out of it and admitted to the
Union in 1803, its first capital being Chillicothe. It was removed to
Zanesville for a couple of years, but finally located at Columbus.
Beyond the Scioto the watershed is crossed, by which the waters of
the Ohio are left behind and the valley of Sandusky River is reached,
a tributary of Lake Erie. Here is Bucyrus, in another prolific natural
gas region, the centre of which is Findlay. At this town, in 1887, the
inhabitants, who had then had just one year of natural gas
development, spent three days in exuberant festivity, to show their
appreciation of the wonderful discovery. They had thirty-one gas
wells pouring out ninety millions of cubic feet in a day, all piped into

town and feeding thirty thousand glaring natural gas torches of
enormous power, which blew their roaring flames as an
accompaniment to the oratory of John Sherman and Joseph B.
Foraker, who were then respectively Senator and Governor of Ohio.
The soldiers and firemen paraded, and a multitude of brass bands
tried to drown the Niagara of gas which was heard roaring five miles
away, while the country at night was illuminated for twenty miles
around. But the wells have since diminished their flow, although the
gas still exists; while another field with a prolific yield is in Fairfield
County, a short distance southeast of Columbus. Over the State
boundary in Indiana is yet another great gas-field covering five
thousand square miles in a dozen counties, with probably two
thousand wells and a yield which has reached three thousand
millions of cubic feet in a day. This gas supplies many cities and
towns, including Chicago, and it is one of the greatest gas-fields
known. In the same region there are also large petroleum deposits.
Not far beyond the State boundary is Fort Wayne, the leading city of
Northern Indiana, having forty thousand population, an important
railway centre, and prominent also in manufactures. It stands in a
fertile agricultural district, and being located at the highest part of
the gentle elevation, beyond the Sandusky Valley, diverting the
waters east and west, it is appropriately called the Summit City.
Here the Maumee River is formed by the confluence of the two
streams St. Joseph and St. Mary, and flows through the prairie
towards the northeast, to make the head of Lake Erie. The French,
under La Salle, in the eighteenth century established a fur-trading
post here, and erected Fort Miami, and in 1760 the British
penetrated to this then remote region and also built a fort. During
the Revolution this country was abandoned to the Indians, but when
General Wayne defeated the Miamis in 1794 he thought the place
would make a good frontier outpost to hold the savages in check,
and he then constructed a strong work, to which he gave the name
of Fort Wayne. Around this post the town afterwards grew, being
greatly prospered by the Wabash and Erie Canal, and by the various
railways subsequently constructed in all directions. All this prairie

region was the hunting-ground of the Miamis, whose domain
extended westward to Lake Michigan, and southward along the
valley of the Miami River to the Ohio. They were a warlike and
powerful tribe, and their adherence to the English during the
Revolution provoked almost constant hostilities with the settlers who
afterwards came across the mountains to colonize the Northwest
Territory. Under the leadership of their renowned chief
Mishekonequah, or the Little Turtle, they defeated repeated
expeditions sent against them, until finally beaten by Wayne.
Subsequently they dwindled in importance, and when removed
farther west, about 1848, they numbered barely two hundred and
fifty persons.
OLD TIPPECANOE.
Some distance westward is the Tippecanoe River, a stream flowing
southwest into the Wabash, and thence into the Ohio. The word
Tippecanoe is said to mean the great clearing, and on this river
was fought the noted battle by Old Tippecanoe, General William
Henry Harrison, against the combined forces of the Shawnees,
Miamis and several other tribes, which resulted in their complete
defeat. They were united under Elskwatawa, or the Prophet, the
brother of the famous Tecumseh. These two chieftains were
Shawnees, and they preached a crusade by which they gathered all
the northwestern tribes in a concerted movement to resist the
steady encroachments of the whites. The brother, who was a
medicine man, in 1805 set up as an inspired prophet, denouncing
the use of liquors, and of all food, manners and customs introduced
by the hated palefaces, and confidently predicted they would
ultimately be driven from the land. For years both chiefs travelled
over the country stirring up the Indians. General Harrison, who was
the Governor of the Northwest Territory, gathered his forces together
and advanced up the Wabash against the Prophet's town of
Tippecanoe, when the Indians, hoping to surprise him, suddenly
attacked his camp, but he being prepared, they were signally

defeated, thus giving Harrison his popular title of Old Tippecanoe,
which had much to do with electing him President in 1840. Some
time after this defeat the War of 1812 broke out, when Tecumseh
espoused the English cause, went to Canada with his warriors, and
was made a brigadier-general. He was killed there in the battle of
the Thames, in Ontario Province, and it is said had a premonition of
death, for, laying aside his general's uniform, he put on a hunting-
dress and fought desperately until he was slain. Tecumseh was the
most famous Indian chief of his time, and the honor of killing him
was claimed by several who fought in the battle, so that the problem
of Who killed Tecumseh? was long discussed throughout the
country.
The State of Indiana was admitted into the Union in 1816, and in its
centre, built upon a broad plain, on the east branch of White River, is
its capital and largest city, Indianapolis, having two hundred
thousand population. This is a great railway centre, having lines
radiating in all directions, and it also has extensive manufactures and
a large trade in live stock. The city plan, with wide streets crossing at
right angles, and four diagonal avenues radiating from a circular
central square, makes it very attractive; and the residential quarter,
displaying tasteful houses, ornate grounds and shady streets, is
regarded as one of the most beautiful in the country. The State
Capitol, in a spacious park, is a Doric building with colonnade,
central tower and dome, and in an enclosure on its eastern front is
erected one of the finest Soldiers' and Sailors' Monuments existing,
rising two hundred and eighty-five feet, out-topping everything
around, having been designed and largely constructed in Europe.
There are also many prominent public buildings throughout the city.
Indianapolis, first settled in 1819, had but a small population until
the railways centred there, the Capitol being removed from Corydon
in 1825. The Wabash River, to which reference has been made,
receives White River, and is one of the largest affluents of the Ohio,
about five hundred and fifty miles long, being navigable over half
that length. It rises in the State of Ohio, flows across Indiana, and,
turning southward, makes for a long distance the Illinois boundary.

Its chief city is Terre Haute, the High Ground, about seventy miles
west of Indianapolis, another prominent railroad centre, having forty-
five thousand people, with extensive manufactures. It is surrounded
by valuable coal-fields, is built upon an elevated plateau, and, like all
these prairie cities, is noted for its many broad and well-shaded
streets. It was founded in 1816.
THE GREAT WEST.
Progressing westward, the timbered prairie gradually changes to the
grass-covered prairie, spreading everywhere a great ocean of
fertility. Across the Wabash is the Prairie State of Illinois, its name
coming from its principal river, which the Indians named after
themselves. The word is a French adaptation of the Indian name
Illini, meaning the superior men, the earliest explorers and
settlers having been French, the first comers on the Illinois River
being Father Marquette and La Salle. At the beginning of the
eighteenth century their little settlements were flourishing, and the
most glowing accounts were sent home, describing the region, which
they called New France, on account of its beauty, attractiveness
and prodigious fertility, as a new Paradise. There were many years of
Indian conflicts and hostility, but after peace was restored and a
stable government established, population flowed in, and Illinois was
admitted as a State to the Union in 1818. The capital was
established at Springfield in 1837, an attractive city of about thirty
thousand inhabitants, built on a prairie a few miles south of
Sangamon River, a tributary of the Illinois, and from its floral
development and the adornment of its gardens and shade trees,
Springfield is popularly known as the Flower City. There is a
magnificent State Capitol with high surmounting dome, patterned
somewhat after the Federal Capitol at Washington. Springfield has
coal-mines which add to its prosperity, but its great fame is
connected with Abraham Lincoln. He lived in Springfield, and the
house he occupied when elected President has been acquired by the
State and is on public exhibition. After his assassination in 1865, his

remains were brought from Washington to Springfield, and interred
in the picturesque Oak Ridge Cemetery, in the northern suburbs,
where a magnificent monument was erected to his memory and
dedicated in 1874. About sixty miles north of Springfield, the Illinois
River expands into Peoria Lake, and here came La Salle down the
river in 1680, and at the foot of the lake established a trading-post
and fort, one of the earliest in that region. When more than a
century had elapsed, a little town grew there which is now the busy
industrial city of Peoria, famous for its whiskey and glucose, and
turning out products that annually approximate a hundred millions,
furnishing vast traffic for numerous railroads. It is the chief city of
the corn belt, and is served by all the prominent trunk railway
lines.
Like the pioneers of a hundred years ago, we have left the Atlantic
seaboard, crossed the Allegheny Mountains and entered the
expansive Northwest Territory, which in the first half of the
nineteenth century was the Mecca of the colonist and frontiersman.
This was then the region of the Great West, though that has since
moved far beyond the Mississippi. Its agricultural wealth made the
prosperity of the country for many decades, and its prodigious
development was hardly realized until put to the test of the Civil War,
when it poured out the men and officers, and had the staying
qualities so largely contributing to the result of that great conflict.
Gradually overspread by a network of railways, the numerous cross-
roads have expanded everywhere into towns and cities, almost all
patterned alike, and all of them centres of rich farming districts.
Coal, oil and gas have come to minister to its manufacturing wants,
and thus growing into mature Commonwealths, this prolific region in
the later decades has been itself, in turn, contributing largely to the
tide of migration flowing to the present Great Northwest, a
thousand miles or more beyond. It presents a rich agricultural
picture, but little scenic attractiveness. Everywhere an almost dead
level, the numerous railways cross and recross the surface in all
directions at grade, and are easily built, it being only necessary to
dig a shallow ditch on either side, throw the earth in the centre, and

lay the ties and rails. Nature has made the prairie as smooth as a
lake, so that hardly any grading is necessary, and the region of
expansive green viewed out of the car window has been aptly
described as having a face but no features, when one looks afar
over an ocean of waving verdure.
LAKE ERIE.
This vast prairie extends northward to and beyond the Great Lakes,
and it is recorded that in the early history of the proposed legislation
for the Northwest Territory, Congress gravely selected as the
names of the States which were to be created out of it such
ponderous conglomerates as Metropotamia, Assenispia,
Pelisipia and Polypotamia, titles which happily were long ago
permitted to pass into oblivion. Northward, in Ohio, the region
stretches to Lake Erie, the most southern and the smallest of the
group of Great Lakes above Niagara. It is regarded as the least
attractive lake, having neither romances nor much scenery. Yet, from
its favorable position, it carries an enormous commerce. It is elliptical
in form, about two hundred and forty miles long and sixty miles
broad, the surface being five hundred and sixty-five feet above the
ocean level. It is a very shallow lake, the depth rarely exceeding one
hundred and twenty feet, excepting at the lower end, while the other
lakes are much deeper, and in describing this difference of level it is
said that the surplus waters poured from the vast basins of Superior,
Michigan and Huron, flow across the plate of Erie into the deep bowl
of Ontario. This shallowness causes it to be easily disturbed, so that
it is the most dangerous of these fresh-water seas, and it has few
harbors, and those very poor, especially upon the southern shore.
The bottom of the lake is a light, clayey sediment, rapidly
accumulated from the wearing away of the shores, largely composed
of clay strata. The loosely-aggregated products of these
disintegrated strata are frequently seen along its coast, forming cliffs
extending back into elevated plateaus, through which the rivers cut
deep channels. Their mouths are clogged by sand-bars, and

dredging and breakwaters have made the harbors on the southern
shore, around which have grown the chief towns—Dunkirk, Erie,
Ashtabula, Cleveland, Sandusky and Toledo. The name of Lake Erie
comes from the Indian tribe of the Cat, whom the French called
the Chats, because their early explorers, penetrating to the shores
of the lake, found them abounding in wild cats, and thus they gave
the same name to the cats and the savages. In their own parlance,
these Indians were the Eries, and in the seventeenth century they
numbered about two thousand warriors. In 1656 the Iroquois
attacked and almost annihilated them.
The Lake Erie ports in the Buckeye State of Ohio, so called from
the buckeye tree, are chiefly harbors for shipping coal and receiving
ores from the upper lakes, their railroads leading to the great
industrial centres to the southward. Near the eastern boundary of
Ohio is Conneaut, on the bank of a wide and deep ravine, formed by
a small river, broadening into a bay at the shore of the lake, the
name meaning many fish. Here landed in 1796 the first settlers
from Connecticut, who entered the Western Reserve, as all this
region was then called. On July 4th of that year, celebrating the
national anniversary, they pledged each other in tin cups of lake
water, accompanied by a salute of fowling-pieces, and the next day
began building the first house on the Reserve, constructed of logs,
and long known as Stow Castle. Conneaut is consequently known
as the Plymouth of the Western Reserve, as here began the
settlements made by the Puritan New England migration to Ohio. On
deep ravines making their harbors are Ashtabula, an enormous
entrepôt for ores, and a few miles farther westward, Painesville, on
Grand River, named for Thomas Paine. Beyond is Mentor, the home
of the martyred President Garfield, whose large white house stands
near the railway. All along here, the southern shore of Lake Erie is a
broad terrace at eighty to one hundred feet elevation above the
water, while farther inland is another and considerably higher
plateau. Each sharp declivity facing northward seems at one time to
have been the actual shore of the lake when its surface before the
waters receded was much higher than now. The outer plateau

having once been the overflowed lake bed, is level, excepting where
the crooked but attractive streams have deeply cut their winding
ravines down through it to reach Lake Erie.
THE CITY OF CLEVELAND.
Thus we come to Cleveland, the second city in Ohio, having four
hundred thousand people, and extensive manufacturing industries. It
is the capital of the Western Reserve and the chief city of Northern
Ohio, its commanding position upon a high bluff, falling off
precipitously to the edge of the water, giving it the most attractive
situation on the shore of Lake Erie. Shade trees embower it,
including many elms planted by the early settlers, who learned to
love them in New England, and hence it delights in the popular title
of the Forest City. Were not the streets so wide, the profusion of
foliage might make Cleveland seem like a town in the woods. The
little Cuyahoga River, its name meaning the crooked stream, flows
with wayward course down a deeply washed and winding ravine,
making a valley in the centre of the city, known as the Flats, and
this, with the tributary ravines of some smaller streams, is packed
with factories and foundries, oil refineries and lumber mills, their
chimneys keeping the business section constantly under a cloud of
smoke. Railways run in all directions over these flats and through the
ravines, while, high above, the city has built a stone viaduct nearly a
half-mile long, crossing the valley. Here are the great works of the
Standard Oil Company, controlling that trade, and several of the
petroleum magnates have their palaces in the city.
Old Moses Cleaveland, a shrewd but unsatisfied Puritan of the town
of Windham, Connecticut, became the agent of the Connecticut Lead
Company, who brought out the first colony in 1796 that landed at
Conneaut. They explored the lake shore, and selecting as a good
location the mouth of Cuyahoga River, Moses wrote back to his
former home that they had found a spot on the bank of Lake Erie
which was called by my name, and I believe the child is now born
that may live to see that place as large as old Windham. In little

over a century the town has grown far beyond his wildest dreams,
although it did not begin to expand until the era of canals and
railways, and it was not so long ago that the people in grateful
memory erected a bronze statue of the founder. One of the local
antiquaries, delving into the records, has found why various original
settlers made their homes at Cleveland. He learned that one man,
on his way farther West, was laid up with the ague and had to stop;
another ran out of money and could get no farther; another had
been to St. Louis and wanted to get back home, but saw a chance to
make money in ferrying people across the river; another had $200
over, and started a bank; while yet another thought he could make a
living by manufacturing ox-yokes, and he stayed. This earnest
investigator continues: A man with an agricultural eye would look at
the soil and kick his toe into it, and then would shake his head and
declare that it would not grow white beans—but he knew not what
this soil would bring forth; his hope and trust was in beans, he
wanted to know them more, and wanted potatoes, corn, oats and
cabbage, and he knew not the future of Euclid Avenue.
On either side of the deep valley of the Flats stretch upon the
plateau the long avenues of Cleveland, with miles of pleasant
residences, surrounded by lawns and gardens, each house isolated in
green, and the whole appearing like a vast rural village more than a
city. This pleasant plan of construction had its origin in the New
England ideas of the people. Yet the city also has a numerous
population of Germans, and it is recorded that one of the early
landowners wrote, in explaining his project of settlement: If I make
the contract for thirty thousand acres, I expect with all speed to send
you fifteen or twenty families of prancing Dutchmen. These Teutons
came and multiplied, for the original Puritan stock can hardly be
responsible for the vineyards of the neighborhood, the music and
dancing, and the public gardens along the pleasant lake shore,
where the crowds go, when work is over, to enjoy recreation and
watch the gorgeous summer sunsets across the bosom of the lake
which are the glory of Cleveland. Upon the plateau, the centre of the
city, is the Monumental Park, where stand the statue of Moses

Cleaveland, the founder, who died in 1806, and a fine Soldiers'
Monument, with also a statue of Commodore Perry. This Park is an
attractive enclosure of about ten acres, having fountains, gardens,
monuments and a little lake, and it is intersected at right angles by
two broad streets, and surrounded by important buildings. One of
the streets is the chief business highway, Superior Street, and the
other leads down to the edge of the bluff on the lake shore, where
the steep slope is made into a pleasure-ground, with more flower-
beds and fountains and a pleasant outlook over the water, although
at its immediate base is a labyrinth of railroads and an ample supply
of smoke from the numerous locomotives. A long breakwater
protects the harbor entrance, and out under the lake is bored the
water-works tunnel.
There extends far to the eastward, from a corner of the Monumental
Park, Cleveland's famous street—Euclid Avenue. The people regard it
as the handsomest highway in America, in the combined
magnificence of houses and grounds. It is a level avenue of about
one hundred and fifty feet width, with a central roadway and stone
footwalks on either hand, shaded by rows of grand overarching elms,
and bordered on both sides by well-kept lawns. This is the public
highway, every part being kept scrupulously neat, while a light railing
marks the boundary between the street and the private grounds. For
a long distance this noble avenue is bordered by stately residences,
each surrounded by ample gardens, the stretch of grass, flowers and
foliage extending back from one hundred to four hundred feet
between the street and the buildings. Embowered in trees, and with
all the delights of garden and lawn seen in every direction, this grand
avenue makes a delightful driveway and promenade. Upon it live the
multi-millionaires of Cleveland, the finest residences being upon the
northern side, where they have invested part of the profits of their
railways, mills, mines, oil wells and refineries in adorning their homes
and ornamenting their city. This splendid boulevard, in one way, is a
reproduction of the Parisian Avenue of the Champs Elysées and its
gardens, but with more attractions in the surroundings of its
bordering rows of palaces. Here live the men who vie with those of

Chicago in controlling the commerce of the lakes and the affairs of
the Northwest. Plenty of room and an abundance of income are
necessary to provide each man, in the heart of the city, with two to
ten acres of lawns and gardens around his house, but it is done here
with eminent success. About four miles out is the beautiful Wade
Park, opposite which are the handsome buildings of the Western
Reserve University, having, with its adjunct institutions, a thousand
students. Beyond this, the avenue ends at the attractive Lake View
Cemetery, where, on the highest part of the elevated plateau, with a
grand outlook over Lake Erie, is the grave of the assassinated
President Garfield. His imposing memorial rises to a height of one
hundred and sixty-five feet.
CLEVELAND TO CHICAGO.
Thirty-five miles southwest of Cleveland, and some distance inland
from Lake Erie, is Oberlin, where, in a fertile and prosperous district,
is the leading educational foundation of Northern Ohio—Oberlin
College—named in memory of the noted French philanthropist, and
established in 1833 by the descendants of the Puritan colonists, to
carry out their idea of thorough equality in education. It admits
students without distinction of sex or color, and has about thirteen
hundred, almost equally divided between the sexes, occupying a
cluster of commodious buildings. To the westward is the beautiful
ravine of Black River, which gets out to the lake by falling over a
rocky ledge in two streams, and on the peninsula formed by its forks
is the town of Elyria. Maria Ely was the wife of the founder of the
settlement, who named it after her in this peculiar reversible way.
This romantic stream bounds the Fire Lands of the Western
Reserve, a tract of nearly eight hundred square miles abutting on the
lake shore, which Connecticut set apart for colonization by her
people, who had been sufferers from destructive fires in the towns of
New London, Fairfield and Norwalk on Long Island Sound. They
secured this wilderness in the early part of the nineteenth century,
and their chief town is Sandusky, with twenty-five thousand

population. Here lived most of the Eries, the Indian tribe of the
Cat, who fished in Sandusky Bay, its upper waters being an
archipelago of little green islands abounding with water fowl. They
were known to the adjoining tribes as the Neutral Nation, for they
maintained two villages of refuge on Sandusky River, between the
warlike Indians of the east and the west, and whoever entered their
boundaries was safe from pursuit, the sanctuary being rigidly
observed. The early French missionaries who found them in the
seventeenth century speak of these anomalous villages among the
savages as having then been long in existence.
The name of Sandusky is a corruption of a Wyandot word meaning
cold-water pools, the French having originally rendered it as
Sandosquet. The shores are low, but there is a good harbor and
much trade, and here is located the Ohio State Fish Hatchery. The
railroads are laid among the savannahs and lagoons, and one of the
suburban stations has been not inaptly named Venice. There are
extensive vineyards on the flat and sunny shores of the bay, and this
is one of the most prolific grape districts in the State. Sandusky Bay
is a broad sheet of water, in places six miles wide, and about twenty
miles long. Sandusky has a large timber trade, being noted for the
manufacture of hard woods. Out beyond the bold peninsula,
protruding into the lake at the entrance to the bay, is a group of
islands spreading over the southwestern waters of Lake Erie, of
which Kelly's Island is the chief, an archipelago formed largely from
the detritus washed out of the Detroit, Maumee and various other
rivers flowing into the head of the lake. Here the Erie Indians had a
fortified stronghold, whose outlines can still be traced. The most
noted of the group is Put-in-Bay Island, now a popular watering-
place, which got its name from Commodore Perry, who put in there
with the captured British fleet at the naval battle of Lake Erie,
September 10, 1813. It was from this place, just after his victory,
that he sent the historic despatch, giving him fame, We have met
the enemy and they are ours. The killed of both fleets were buried
side by side near the beach on the island, the place being marked by
a mound. The lovely sheet of water of Put-in-Bay glistens in front,

having the towns of villa-crowned Gibraltar Island upon its surface.
Vineyards and roses abound, these islands, like the adjacent shores,
being noted for their wines.
The Maumee River, coming up from Fort Wayne, flows into the head
of Lake Erie, the largest stream on its southern coast. It comes from
the southwest through the region of the Black Swamp, a vast
district, originally morass and forest, which has been drained to
make a most fertile country. This miserable bog, as the original
settlers denounced it, when they were jolted over the rude corduroy
roads that sustained them upon the quaking morass, has since
become the prolific garden and magnificent forest described by
the modern tourist. The Maumee Valley was an almost continual
battle-ground with the Indians when Mad Anthony Wayne
commanded on that frontier, he being called by them the Wind,
because he drives and tears everything before him. For a quarter
of a century border warfare raged along this river, then known as the
Miami of the Lakes, and its chief settlement, Toledo, passed its
infancy in a baptism of blood and fire. It was at the battle of Fallen
Timbers, fought in 1794, almost on the site of Toledo, that Wayne
gave his laconic and noted field orders. General William Henry
Harrison, then his aide, told Wayne just before the battle he was
afraid he would get into the fight and forget to give the necessary
field orders. Wayne replied: Perhaps I may, and if I do, recollect
that the standing order for the day is, charge the rascals with the
bayonets. Toledo is built on the flat surface on both sides of the
Maumee River and Bay, which make it a good harbor, stretching six
miles down to Lake Erie. There are a hundred thousand population
here, and this energetic reproduction of the ancient Spanish city has
named its chief newspaper the Toledo Blade. The city has extensive
railway connections and a large trade in lumber and grain, coal and
ores, and does much manufacturing, it being well served with
natural gas. A dozen grain elevators line the river banks, and the
factory smokes overhang the broad low-lying city like a pall. To the
westward, crossing the rich lands of the reclaimed swamp, is the
Indiana boundary, that State being here a broad and level prairie,

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com

Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn

More Related Content

Similar to Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn (20)

More from bhjodkn142 (6)

Recently uploaded (20)

Sample Size Calculations For Clustered And Longitudinal Outcomes In Clinical Research Chul Ahn