SlideShare a Scribd company logo
INTRODUCTION TO
BIOSTATISTICS FOR CLINICAL
RESEARCH
Jordan J. Elm, PhD
Department of Public Health Sciences
Medical University of South Carolina
NIH StrokeNet Professional Development Seminar – August 2020
CONFLICT OF INTEREST / DISCLAIMER
I am contact PI of the StrokeNet National Data
Management Center (NDMC) in Charleston, SC.
Other grants from NIH
OBJECTIVES
Provide an introduction to basics of biostatistics as applied
to clinical research
 Estimation and Hypothesis Testing
 Basic Overview of Common Analyses
 Sample Size Considerations
 Important topics (in brief)
ESTIMATION AND HYPOTHESIS TESTING
POPULATION:
A population is the entire group that we wish to study.
Notes:
Populations are generally very large. Frequently viewed
as infinite.
Can also be called study population, reference population
or target population.
5
A POPULATION HAS PARAMETERS:
The population has characteristics that we want (need)
to know:
a) Proportion (p) who experience DLTs
b) Proportion who will respond favorably to an
intervention
c) Mean () hematoma expansion volume on DWI
These characteristics are called parameters.
99.99% of the time population parameters are unknown!
6
A SAMPLE HAS STATISTICS:
A sample is a representative group drawn from the
population.
We use statistics to make estimates about population
parameters by using analogous values computed from
a sample.
 Proportion of sample who experience DLTs.
 Proportion of sample who respond.
 Sample mean volume.
These sample summary values (descriptive values) are
called statistics. 7
PARAMETERS VS STATISTICS:
The distinction between statistics and parameters is
essential to the understanding of statistical inference.
 We use different symbols to represent each
 Parameters are constants, while sample statistics are
random variables.
 The values of parameters do not change from sample
to sample, whereas, statistics change whenever the
population is resampled.
8
STATISTICAL INFERENCE:
Statistical inference is inference about a population from a
random sample drawn from it.
It includes:
 Point estimation
 Interval estimation
 Hypothesis testing
9
ESTIMATION
Point estimates provide a single estimate of the
parameter (e.g. mean, proportion, odds ratio, RR).
Interval estimates (Confidence Intervals) provide a range
of values that seeks to capture the parameter.
"We can be 95% confident that the proportion of ischemic
stroke patients who have a 90 day mRS < 2 is between
5.1% and 15.3%."
10
HYPOTHESIS TESTING:
Hypothesis testing provides a framework for drawing
conclusions on an objective basis rather than on a
subjective basis by simply looking at the data.
“There is enough statistical evidence to conclude that the
mean normal body temperature of adults is lower than 98.6
degrees F."
11
H0
HA
COURT ROOM ANALOGY
In the US court system, we assume that the accused is
innocent until proven guilty.
Two competing hypotheses
Null H0: Defendant is not guilty (innocent)
Alternative HA: Defendant is guilty
The jury examines the evidence.**
If there is enough evidence, we reject
the null.
**In statistics, the data are the evidence. 12
COURT ROOM EXAMPLE:
The jury then makes a decision based on the available evidence
(data):
If the jury finds sufficient evidence — beyond a reasonable
doubt — the jury rejects the null hypothesis and deems the
defendant guilty. We behave as if the defendant is guilty.
If there is insufficient evidence, then the jury does not reject the
null hypothesis. We behave as if the defendant is innocent.
In statistics, we always make one of two decisions. We either
"reject the null hypothesis" or we "fail to reject the null
hypothesis."
13
https://online.stat.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing
ERRORS IN HYPOTHESIS TESTING:
When testing a hypothesis, 1 of 2 decisions can be made:
 Reject H0
 Fail to reject H0
14
Truth
H0 true H0 false
Decision Fail to Reject
(Accept) H0 OK
ERROR
Type II error “”
Reject
H0
ERROR
Type I error “” OK
TYPE I ERROR:
The probability of a type I error is the probability of
rejecting the null hypothesis when it is true.
We generally use  to denote probability of a type one
error:
=P(reject H0 | H0 true)
This is called the significance level of a test.
15
STATISTICAL SIGNIFICANCE
Hypothesis testing provides a framework for making
decisions on an objective basis rather than on a subjective
basis by simply looking at the data.
p-value probability of observing data at least as
extreme as that which you have actually observed,
assuming that the null hypothesis is true.
NORMAL PROBABILITY CURVE:
17
TYPE II ERROR AND POWER:
Why should we be concerned about power?
The power of a test tells us how likely we are to find
a significant difference given that the alternative
hypothesis is true, i.e. given that the true mean  is
different from 0.
If the power is too low, then we have little chance of
finding a significant difference even if the true mean
is not equal to 0.
18
CHOOSING  CAREFULLY:
Because  is chosen by the investigator, it is under his
control and is known.
Thus when you reject H0, you know the probability of
a Type I error.
 is chosen a priori (usually set at two-sided 0.05 or
0.01, but could be 0.10 if well justified)
So why not make  very, very small?
This may be the solution in some cases, however,
reduction in the  level without increasing your
sample size will always increases the probability of
a Type II error.
19
 AND  AND STATISTICAL
CONCLUSIONS:
If we reject H0 we may have made a Type I error, and if
we fail to reject we may have made a Type II error.
Because we have these two types of error and one is
potentially possible in any decision, we NEVER say that
we have proved that H0 is true or that H0 is false.
Proof implies that there is no possibility for error.
Instead we say that the data support or fail to support the
null hypothesis (i.e. reject or fail to reject H0, respectively.)
20
STATISTICAL VS CLINICAL SIGNIFICANCE:
The investigator must distinguish between results that
are statistically significant and results that are clinically
significant.
Very small differences can become statistically
significant. However, very small differences may not
have clinical meaning.
Statistical significance does not imply clinical significance.
21
BRIEF OVERVIEW OF COMMON ANALYSES
Analysis depends on type of measurement:
 Continuous measurement (0F temperature) or a Rating
Scale (e.g. NIHSS 0, 1, 2, ….24)
 Nominal (low, medium, high) or Ordinal (mRS 0, 1, 2, 3,
4, 5, 6)
 Binary (yes/no)
 Time to event (yes/no over varying follow-up)
CLINICAL TRIAL
Estimate treatment effect
 Continuous/Interval Measure (Blood Pressure, Rating Scale)
 Differences between means (averages)
 Binary Proportion (Adverse Event, mRS<2)
 Odds ratio (OR) [{p1 / (1 – p1)} / {p0 / (1 – p0)}]
 Absolute risk reduction [p1 – p0]
 Relative risk (RR) [p1 / p0]
 Relative risk reduction (RRR) [1 – (p1 / p0)]
 Time to Event (death, recurrent stroke)
 Hazard ratio (HR) (similar to relative risk)
WHAT IS AN ODDS RATIO?
….LETS START WITH THE “ODDS”
The probability that an event will occur is the fraction of
times you expect to see that event in many
trials. Probabilities always range between 0 and 1.
The odds are defined as the probability that the event will
occur divided by the probability that the event will not
occur.
If the horse runs 100 races and wins 80, the probability of
winning is 80/100 = 0.80 or 80%, and the odds of
winning are 80/20 = 4 to 1.
ANALYTIC APPROACH
Exposure
Odds
Exposure
Odds
Odds Ratio
Diseased
(Cases)
Non-diseased
(Controls)
Exposed
Non-exposed
MEASURE RISK
a b
c d
Cases Controls
Exposed
Unexposed
a + b
c + d
a + c b + d
Odds Ratio: a/c ÷ b/d ≈ Relative Risk
EXAMPLE
14 7
338 267
Movement
Disorder
Cases
Spousal
Controls
Fragile X Gene
Carriers (Exposed)
Non carriers
Unexposed
23
605
355 273
Odds Ratio: a/c ÷ b/d ≈ Relative Risk
OR: 14/338 ÷ 7/267 = 1.6
FIXED COHORT ANALYSIS
Risk=a/(a+b)
Disease
Risk=c/(c+d)
Relative Risk = a/(a+b)=0.2/0.05=4
c/(c+d)
Exposure
+
-
+ -
40
40
40
160
760
DYNAMIC COHORT ANALYSIS
Risk=a/100 Person-Years
Disease
Risk=c/100 Person-Years
Relative Risk = a/(100 P-Y)=2.2/1.1=2
c/(100 P-Y)
Exposure
+
-
+ -
40 40
40
160
760
Time at risk
1800 Person-Years
3600 Person-Years
TIME TO EVENT (OR SURVIVAL) ANALYSIS
We can also compare the time to event between treatment groups (or exposed
and unexposed) groups.
This is known as a survival analysis, even though the event or outcome might not
always be “death”. This is the standard name for an analysis that takes into
account time to event.
Proportion surviving at a specific time point (2 years)
Median survival: half of the patients in the treatment group have survived for
2246 days (median survival rate) compared to 906 days in the control group.
Cox proportional hazard model)  HR
This method is good when disease onset may take some time. Recurring cancer or
prevention trials in Stroke…. Recurrent stroke events …realistically we need to
stop the study after a certain amount of follow-up, but we know that many people
would have eventually gotten cancer had we followed them up for longer. These
people are said to be “censored” at the end of the study (we know they didn’t
have cancer as of the end of the study, but we don’t know their true time to
cancer).
KAPLAN-MEIER PLOT
OF TIME TO DEATH FOR CLINICAL SUBTYPE
Lo R. Neurology 2009
SAMPLE SIZE
WHY WORRY ABOUT POWER/SAMPLE SIZE?
Provides assurance that the trial has a reasonable
probability of being conclusive
Allows one to determine the sample size necessary, so that
resources are efficiently allocated
Ethical Issues
 Study too large implies some subjects needlessly
exposed, resources needlessly spent
 Study too small implies potential for misleading
conclusions, unnecessary experimentation
SAMPLE SIZE CALCULATIONS
  (Type I error)
  (Type II error)
  (variance of outcome)
 Δ (clinically relevant difference)
34
2
1 1- /2
2
( Z ) (variance)
sample size
(effect size)
Z  
 

VARIABILITY
Is the outcome continuous or categorical?
Continuous
 Need estimate of standard deviation/variance
 based on relevant clinical literature or a range of plausible values
Dichotomous
 Need estimate of control proportion
MINIMUM SCIENTIFICALLY IMPORTANT DIFFERENCE
the smallest difference
which would change in
clinical practice
“Larger the difference,
smaller the sample size”
VARIABILITY
“Larger the difference, smaller
the sample size” ignores
contribution of variability
Common standard deviation
0 5 10 15
n
per
group
0
20
40
60
80
100
120
140
Two group t-test of equal means (equal n's)
80% power, MCID 5 units
Common standard deviation
0 5 10 15
Power
(
%
)
10
20
30
40
50
60
70
80
90
Two group t-test of equal means (equal n's)
80% power, MCID 5 units
N PER GROUP BY CONTROL GROUP % GOOD OUTCOME FOR
VARIOUS 
0
200
400
600
800
1000
1200
1400
1600
1800
5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95%
Control %
N
5% 10% 15% 20%
Assume 80% power with 2-sided alpha=0.05
Quadrupling of N for 
of 5% vs 10%
For binary case, N is
maximized when one
group has response of
around 50%
ADDITIONAL FACTORS TO CONSIDER FOR
TIME-TO-EVENT ANALYSIS
 Number of events of interest
 Study duration and follow-up period
 Subject accrual and lost-to-follow-up rates
 Proportion of censoring
Good reference: Lachin, Controlled Clinical Trials 2:93-113, 1981
SAMPLE SIZE ISSUES: MULTIPLICITY
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 41
N
N
N
N
N
N
CAUSES OF MULTIPLICITY
 Multiple treatments (e.g., 2 doses + control)
 Multiple outcomes (e.g., efficacy + safety)
 Repeated measures (e.g., Day 1, 7, 30, 90)
 Subgroup analyses (e.g., mild, mod, severe cases)
 Multiple looks (i.e., interim analyses)
SAMPLE SIZE ISSUES:
ADJUSTMENTS FOR POTENTIAL MISSING
OUTCOME DATA AND NONCOMPLIANCE
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 43
INTENT-TO-TREAT (ITT) PRINCIPLE
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 44
 Comparison of treatment policies
 Subjects’ data are analyzed in the group to which they were
randomized regardless of their compliance with the protocol
 Preservation of the benefits of randomization
 Most Phase II/III studies analyzed according to the ITT
principle
WERE ALL PARTICIPANTS ANALYZED IN THE GROUPS TO
WHICH THEY WERE RANDOMIZED?
“Excluding randomized participants or observed outcomes
from analysis and subgrouping on the basis of outcome or
response variables can lead to biased results of unknown
magnitude or direction”
Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials, 3rd Edition. New
York: Springer-Verlag, 1998, p. 284.
MISSING OUTCOME DATA
 Subject became lost-to-follow-up
 Subject withdrew consent
 Subject died
 No other reason should exists for missing
outcome data!
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 46
NONCOMPLIANCE (PROTOCOL VIOLATIONS)
 Subject became lost-to-follow-up
 Subject withdrew consent
 Subject had not met eligibility criteria
 Subject/investigator did not comply with
treatment regimen
 Crossover in treatment allocation
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 47
ANALYSIS EXCLUDING MISSING OUTCOME/
NONCOMPLIANCE CASES
If d x 100% of subjects is anticipated not to
complete the protocol, and their outcome is
unknown or not imputed, then divide the
calculated N by (1-d) to get the adjusted
(inflated) N
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 48
EXAMPLES
 If 10% of recruited subjects are anticipated
to drop out or become ineligible during a
run-in period, then required N = (estimated
N) / 0.90.
 If plan to do per-protocol analysis and
expect that 5% of subjects during follow-up
will drop out, then required N = (estimated
N) / 0.95
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 49
ADJUSTMENT FOR ITT ANALYSIS
 If r1 x 100% of the patients is expected to “switch”
from intervention to control and r2 x 100% of the
patients is expected to “switch” from control to
intervention, then multiply the calculated N by the
inflation factor: IF = 1/(1-r1-r2)2
 The IF is to compensate for the dilution of the
difference in the treatment effect, i.e., the actual
difference may be smaller than what was estimated
prior to the study initiation.
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 50
ITT EXAMPLE
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 51
Tx Grp N Est μ Drop out σ
A 63 30 lbs 15% 20
B 63 20 lbs 25% 20
Suppose for a study using weight change outcome:
So, Δ = μA – μB = 10 with planned total N=126 and power of 80%
ITT EXAMPLE (CONT’D)
May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 52
With the drop in/out, the observed Δ = Δ’:
Δ’ = [(30x0.85)+(20x0.15)] -
[(30x0.25)+(20x0.75)] = 6
< original planned Δ of 10
IF = 1/[(1-r1-r2
)2] = 1/[(1-0.15-0.25)2] = 2.78
New N under ITT: N’ = 126 x 2.78 = 350
DISCUSSION 1
8/17/2020 53
If you claim to conduct an intention-to-treat analysis and a
randomized subject stops taking the assigned treatment
due to an adverse event, do you follow that person
according to the protocol or do you do their final
assessments at that point and remove them from the study?
STATISTICAL CONSIDERATIONS
Were the Groups Comparable
at the Start of the Study?
Were All Participants Accounted
for at the end of Follow-up?
How complete was the follow-
up?
 Impute Missing data
HANDLING MISSING DATA
Impute missing data
 Single point imputation (LOCF, Worse case, best case,
mean imputation
 Multiple imputation (Using a modelling approach
repeatedly impute the missing cases (e.g. 20 times,
perform the test, and summarize the findings across
imputed datasets)
PRE-SPECIFIED STATISTICAL ANALYSIS PLAN
Avoid of Statistician Bias
Sample Size/Power/Study Design should be in agreement.
State error rates, approach to deal with multiplicity.
Randomization plan
Baseline comparisons
Missing data
Analysis Samples, ITT/Per Protocol
Plans for Interim Analyses
Pre-specify model building approach and baseline
covariates/confounders to be adjusted
Prioritization of outcomes
 Primary vs. secondary vs. exploratory outcomes (Standard
definitions)

More Related Content

DOCX
Exemplos de superlativos absolutos sintéticos
PPT
O comboio dos números
PPT
Flexão nominal adjetivo
DOCX
Ficha de avaliação sumativa 1 matemática b 10º ano
PPTX
Application of statistical tests in Biomedical Research .pptx
PPT
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
PPT
Test signal for the patient and the rest of the week after Christmas
PDF
Research method ch07 statistical methods 1
Exemplos de superlativos absolutos sintéticos
O comboio dos números
Flexão nominal adjetivo
Ficha de avaliação sumativa 1 matemática b 10º ano
Application of statistical tests in Biomedical Research .pptx
COM 301 INFERENTIAL STATISTICS SLIDES.ppt
Test signal for the patient and the rest of the week after Christmas
Research method ch07 statistical methods 1

Similar to introduction to biostatistics in clinical trials (20)

PPTX
Test of significance application in biostatistics
PPTX
Biostatistics.pptx
PDF
Biostatistics and epidemiology 01stats20
PPTX
Overview of different statistical tests used in epidemiological
PPT
Introductory Statistics
PDF
Ezz eazy biostatistics for crash course
PPT
25_Anderson_Biostatistics_and_Epidemiology.ppt
PPT
Inferential statistics_AAF 500L 2021.ppt
PPTX
Module7_RamdomError.pptx
PPTX
INTERPRETATION OF STATISTICAL TESTS.pptx
PPT
Lecture2 hypothesis testing
PPTX
Understanding clinical trial's statistics
PDF
Statistical significance
PPTX
RMH Concise Revision Guide - the Basics of EBM
PPTX
Tests of significance Periodontology
PDF
Statistics for Lab Scientists
PPTX
Bio-Statistics in Bio-Medical research
PPT
hyptest (1).ppthyptest (1).ppthyptest (1).ppt
PPT
HypothesisTestForMachineLearningInCSE.ppt
Test of significance application in biostatistics
Biostatistics.pptx
Biostatistics and epidemiology 01stats20
Overview of different statistical tests used in epidemiological
Introductory Statistics
Ezz eazy biostatistics for crash course
25_Anderson_Biostatistics_and_Epidemiology.ppt
Inferential statistics_AAF 500L 2021.ppt
Module7_RamdomError.pptx
INTERPRETATION OF STATISTICAL TESTS.pptx
Lecture2 hypothesis testing
Understanding clinical trial's statistics
Statistical significance
RMH Concise Revision Guide - the Basics of EBM
Tests of significance Periodontology
Statistics for Lab Scientists
Bio-Statistics in Bio-Medical research
hyptest (1).ppthyptest (1).ppthyptest (1).ppt
HypothesisTestForMachineLearningInCSE.ppt
Ad

Recently uploaded (20)

PDF
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
PPTX
Nursing Care Aspects for High Risk newborn.pptx
PPTX
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
PPTX
AI_in_Pharmaceutical_Technology_Presentation.pptx
PPTX
COMMUNICATION SKILSS IN NURSING PRACTICE
PPTX
different types of Gait in orthopaedic injuries
DOCX
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
PPTX
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
PDF
Dermatology diseases Index August 2025.pdf
PDF
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
PPTX
Pulmonary Circulation PPT final for easy
PPTX
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
PPT
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
PDF
Dr Masood Ahmed Expertise And Sucess Story
PPTX
community services team project 2(4).pptx
PPTX
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
PPT
Adrenergic drugs (sympathomimetics ).ppt
PPTX
Trichuris trichiura infection
PDF
Pharmacology slides archer and nclex quest
PDF
Structure Composition and Mechanical Properties of Australian O.pdf
Myers’ Psychology for AP, 1st Edition David G. Myers Test Bank.pdf
Nursing Care Aspects for High Risk newborn.pptx
PE and Health 7 Quarter 3 Lesson 1 Day 3,4 and 5.pptx
AI_in_Pharmaceutical_Technology_Presentation.pptx
COMMUNICATION SKILSS IN NURSING PRACTICE
different types of Gait in orthopaedic injuries
Copies if quanti.docxsegdfhfkhjhlkjlj,klkj
General Pharmacology by Nandini Ratne, Nagpur College of Pharmacy, Hingna Roa...
Dermatology diseases Index August 2025.pdf
CHAPTER 9 MEETING SAFETY NEEDS FOR OLDER ADULTS.pdf
Pulmonary Circulation PPT final for easy
HEMODYNAMICS - I DERANGEMENTS OF BODY FLUIDS.pptx
Parental-Carer-mental-illness-and-Potential-impact-on-Dependant-Children.ppt
Dr Masood Ahmed Expertise And Sucess Story
community services team project 2(4).pptx
PEDIATRIC OSCE, MBBS, by Dr. Sangit Chhantyal(IOM)..pptx
Adrenergic drugs (sympathomimetics ).ppt
Trichuris trichiura infection
Pharmacology slides archer and nclex quest
Structure Composition and Mechanical Properties of Australian O.pdf
Ad

introduction to biostatistics in clinical trials

  • 1. INTRODUCTION TO BIOSTATISTICS FOR CLINICAL RESEARCH Jordan J. Elm, PhD Department of Public Health Sciences Medical University of South Carolina NIH StrokeNet Professional Development Seminar – August 2020
  • 2. CONFLICT OF INTEREST / DISCLAIMER I am contact PI of the StrokeNet National Data Management Center (NDMC) in Charleston, SC. Other grants from NIH
  • 3. OBJECTIVES Provide an introduction to basics of biostatistics as applied to clinical research  Estimation and Hypothesis Testing  Basic Overview of Common Analyses  Sample Size Considerations  Important topics (in brief)
  • 5. POPULATION: A population is the entire group that we wish to study. Notes: Populations are generally very large. Frequently viewed as infinite. Can also be called study population, reference population or target population. 5
  • 6. A POPULATION HAS PARAMETERS: The population has characteristics that we want (need) to know: a) Proportion (p) who experience DLTs b) Proportion who will respond favorably to an intervention c) Mean () hematoma expansion volume on DWI These characteristics are called parameters. 99.99% of the time population parameters are unknown! 6
  • 7. A SAMPLE HAS STATISTICS: A sample is a representative group drawn from the population. We use statistics to make estimates about population parameters by using analogous values computed from a sample.  Proportion of sample who experience DLTs.  Proportion of sample who respond.  Sample mean volume. These sample summary values (descriptive values) are called statistics. 7
  • 8. PARAMETERS VS STATISTICS: The distinction between statistics and parameters is essential to the understanding of statistical inference.  We use different symbols to represent each  Parameters are constants, while sample statistics are random variables.  The values of parameters do not change from sample to sample, whereas, statistics change whenever the population is resampled. 8
  • 9. STATISTICAL INFERENCE: Statistical inference is inference about a population from a random sample drawn from it. It includes:  Point estimation  Interval estimation  Hypothesis testing 9
  • 10. ESTIMATION Point estimates provide a single estimate of the parameter (e.g. mean, proportion, odds ratio, RR). Interval estimates (Confidence Intervals) provide a range of values that seeks to capture the parameter. "We can be 95% confident that the proportion of ischemic stroke patients who have a 90 day mRS < 2 is between 5.1% and 15.3%." 10
  • 11. HYPOTHESIS TESTING: Hypothesis testing provides a framework for drawing conclusions on an objective basis rather than on a subjective basis by simply looking at the data. “There is enough statistical evidence to conclude that the mean normal body temperature of adults is lower than 98.6 degrees F." 11 H0 HA
  • 12. COURT ROOM ANALOGY In the US court system, we assume that the accused is innocent until proven guilty. Two competing hypotheses Null H0: Defendant is not guilty (innocent) Alternative HA: Defendant is guilty The jury examines the evidence.** If there is enough evidence, we reject the null. **In statistics, the data are the evidence. 12
  • 13. COURT ROOM EXAMPLE: The jury then makes a decision based on the available evidence (data): If the jury finds sufficient evidence — beyond a reasonable doubt — the jury rejects the null hypothesis and deems the defendant guilty. We behave as if the defendant is guilty. If there is insufficient evidence, then the jury does not reject the null hypothesis. We behave as if the defendant is innocent. In statistics, we always make one of two decisions. We either "reject the null hypothesis" or we "fail to reject the null hypothesis." 13 https://online.stat.psu.edu/statprogram/reviews/statistical-concepts/hypothesis-testing
  • 14. ERRORS IN HYPOTHESIS TESTING: When testing a hypothesis, 1 of 2 decisions can be made:  Reject H0  Fail to reject H0 14 Truth H0 true H0 false Decision Fail to Reject (Accept) H0 OK ERROR Type II error “” Reject H0 ERROR Type I error “” OK
  • 15. TYPE I ERROR: The probability of a type I error is the probability of rejecting the null hypothesis when it is true. We generally use  to denote probability of a type one error: =P(reject H0 | H0 true) This is called the significance level of a test. 15
  • 16. STATISTICAL SIGNIFICANCE Hypothesis testing provides a framework for making decisions on an objective basis rather than on a subjective basis by simply looking at the data. p-value probability of observing data at least as extreme as that which you have actually observed, assuming that the null hypothesis is true.
  • 18. TYPE II ERROR AND POWER: Why should we be concerned about power? The power of a test tells us how likely we are to find a significant difference given that the alternative hypothesis is true, i.e. given that the true mean  is different from 0. If the power is too low, then we have little chance of finding a significant difference even if the true mean is not equal to 0. 18
  • 19. CHOOSING  CAREFULLY: Because  is chosen by the investigator, it is under his control and is known. Thus when you reject H0, you know the probability of a Type I error.  is chosen a priori (usually set at two-sided 0.05 or 0.01, but could be 0.10 if well justified) So why not make  very, very small? This may be the solution in some cases, however, reduction in the  level without increasing your sample size will always increases the probability of a Type II error. 19
  • 20.  AND  AND STATISTICAL CONCLUSIONS: If we reject H0 we may have made a Type I error, and if we fail to reject we may have made a Type II error. Because we have these two types of error and one is potentially possible in any decision, we NEVER say that we have proved that H0 is true or that H0 is false. Proof implies that there is no possibility for error. Instead we say that the data support or fail to support the null hypothesis (i.e. reject or fail to reject H0, respectively.) 20
  • 21. STATISTICAL VS CLINICAL SIGNIFICANCE: The investigator must distinguish between results that are statistically significant and results that are clinically significant. Very small differences can become statistically significant. However, very small differences may not have clinical meaning. Statistical significance does not imply clinical significance. 21
  • 22. BRIEF OVERVIEW OF COMMON ANALYSES Analysis depends on type of measurement:  Continuous measurement (0F temperature) or a Rating Scale (e.g. NIHSS 0, 1, 2, ….24)  Nominal (low, medium, high) or Ordinal (mRS 0, 1, 2, 3, 4, 5, 6)  Binary (yes/no)  Time to event (yes/no over varying follow-up)
  • 23. CLINICAL TRIAL Estimate treatment effect  Continuous/Interval Measure (Blood Pressure, Rating Scale)  Differences between means (averages)  Binary Proportion (Adverse Event, mRS<2)  Odds ratio (OR) [{p1 / (1 – p1)} / {p0 / (1 – p0)}]  Absolute risk reduction [p1 – p0]  Relative risk (RR) [p1 / p0]  Relative risk reduction (RRR) [1 – (p1 / p0)]  Time to Event (death, recurrent stroke)  Hazard ratio (HR) (similar to relative risk)
  • 24. WHAT IS AN ODDS RATIO? ….LETS START WITH THE “ODDS” The probability that an event will occur is the fraction of times you expect to see that event in many trials. Probabilities always range between 0 and 1. The odds are defined as the probability that the event will occur divided by the probability that the event will not occur. If the horse runs 100 races and wins 80, the probability of winning is 80/100 = 0.80 or 80%, and the odds of winning are 80/20 = 4 to 1.
  • 26. MEASURE RISK a b c d Cases Controls Exposed Unexposed a + b c + d a + c b + d Odds Ratio: a/c ÷ b/d ≈ Relative Risk
  • 27. EXAMPLE 14 7 338 267 Movement Disorder Cases Spousal Controls Fragile X Gene Carriers (Exposed) Non carriers Unexposed 23 605 355 273 Odds Ratio: a/c ÷ b/d ≈ Relative Risk OR: 14/338 ÷ 7/267 = 1.6
  • 28. FIXED COHORT ANALYSIS Risk=a/(a+b) Disease Risk=c/(c+d) Relative Risk = a/(a+b)=0.2/0.05=4 c/(c+d) Exposure + - + - 40 40 40 160 760
  • 29. DYNAMIC COHORT ANALYSIS Risk=a/100 Person-Years Disease Risk=c/100 Person-Years Relative Risk = a/(100 P-Y)=2.2/1.1=2 c/(100 P-Y) Exposure + - + - 40 40 40 160 760 Time at risk 1800 Person-Years 3600 Person-Years
  • 30. TIME TO EVENT (OR SURVIVAL) ANALYSIS We can also compare the time to event between treatment groups (or exposed and unexposed) groups. This is known as a survival analysis, even though the event or outcome might not always be “death”. This is the standard name for an analysis that takes into account time to event. Proportion surviving at a specific time point (2 years) Median survival: half of the patients in the treatment group have survived for 2246 days (median survival rate) compared to 906 days in the control group. Cox proportional hazard model)  HR This method is good when disease onset may take some time. Recurring cancer or prevention trials in Stroke…. Recurrent stroke events …realistically we need to stop the study after a certain amount of follow-up, but we know that many people would have eventually gotten cancer had we followed them up for longer. These people are said to be “censored” at the end of the study (we know they didn’t have cancer as of the end of the study, but we don’t know their true time to cancer).
  • 31. KAPLAN-MEIER PLOT OF TIME TO DEATH FOR CLINICAL SUBTYPE Lo R. Neurology 2009
  • 33. WHY WORRY ABOUT POWER/SAMPLE SIZE? Provides assurance that the trial has a reasonable probability of being conclusive Allows one to determine the sample size necessary, so that resources are efficiently allocated Ethical Issues  Study too large implies some subjects needlessly exposed, resources needlessly spent  Study too small implies potential for misleading conclusions, unnecessary experimentation
  • 34. SAMPLE SIZE CALCULATIONS   (Type I error)   (Type II error)   (variance of outcome)  Δ (clinically relevant difference) 34 2 1 1- /2 2 ( Z ) (variance) sample size (effect size) Z     
  • 35. VARIABILITY Is the outcome continuous or categorical? Continuous  Need estimate of standard deviation/variance  based on relevant clinical literature or a range of plausible values Dichotomous  Need estimate of control proportion
  • 36. MINIMUM SCIENTIFICALLY IMPORTANT DIFFERENCE the smallest difference which would change in clinical practice “Larger the difference, smaller the sample size”
  • 37. VARIABILITY “Larger the difference, smaller the sample size” ignores contribution of variability
  • 38. Common standard deviation 0 5 10 15 n per group 0 20 40 60 80 100 120 140 Two group t-test of equal means (equal n's) 80% power, MCID 5 units Common standard deviation 0 5 10 15 Power ( % ) 10 20 30 40 50 60 70 80 90 Two group t-test of equal means (equal n's) 80% power, MCID 5 units
  • 39. N PER GROUP BY CONTROL GROUP % GOOD OUTCOME FOR VARIOUS  0 200 400 600 800 1000 1200 1400 1600 1800 5% 10% 20% 30% 40% 50% 60% 70% 80% 90% 95% Control % N 5% 10% 15% 20% Assume 80% power with 2-sided alpha=0.05 Quadrupling of N for  of 5% vs 10% For binary case, N is maximized when one group has response of around 50%
  • 40. ADDITIONAL FACTORS TO CONSIDER FOR TIME-TO-EVENT ANALYSIS  Number of events of interest  Study duration and follow-up period  Subject accrual and lost-to-follow-up rates  Proportion of censoring Good reference: Lachin, Controlled Clinical Trials 2:93-113, 1981
  • 41. SAMPLE SIZE ISSUES: MULTIPLICITY May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 41 N N N N N N
  • 42. CAUSES OF MULTIPLICITY  Multiple treatments (e.g., 2 doses + control)  Multiple outcomes (e.g., efficacy + safety)  Repeated measures (e.g., Day 1, 7, 30, 90)  Subgroup analyses (e.g., mild, mod, severe cases)  Multiple looks (i.e., interim analyses)
  • 43. SAMPLE SIZE ISSUES: ADJUSTMENTS FOR POTENTIAL MISSING OUTCOME DATA AND NONCOMPLIANCE May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 43
  • 44. INTENT-TO-TREAT (ITT) PRINCIPLE May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 44  Comparison of treatment policies  Subjects’ data are analyzed in the group to which they were randomized regardless of their compliance with the protocol  Preservation of the benefits of randomization  Most Phase II/III studies analyzed according to the ITT principle
  • 45. WERE ALL PARTICIPANTS ANALYZED IN THE GROUPS TO WHICH THEY WERE RANDOMIZED? “Excluding randomized participants or observed outcomes from analysis and subgrouping on the basis of outcome or response variables can lead to biased results of unknown magnitude or direction” Friedman LM, Furberg CD, DeMets DL. Fundamentals of Clinical Trials, 3rd Edition. New York: Springer-Verlag, 1998, p. 284.
  • 46. MISSING OUTCOME DATA  Subject became lost-to-follow-up  Subject withdrew consent  Subject died  No other reason should exists for missing outcome data! May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 46
  • 47. NONCOMPLIANCE (PROTOCOL VIOLATIONS)  Subject became lost-to-follow-up  Subject withdrew consent  Subject had not met eligibility criteria  Subject/investigator did not comply with treatment regimen  Crossover in treatment allocation May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 47
  • 48. ANALYSIS EXCLUDING MISSING OUTCOME/ NONCOMPLIANCE CASES If d x 100% of subjects is anticipated not to complete the protocol, and their outcome is unknown or not imputed, then divide the calculated N by (1-d) to get the adjusted (inflated) N May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 48
  • 49. EXAMPLES  If 10% of recruited subjects are anticipated to drop out or become ineligible during a run-in period, then required N = (estimated N) / 0.90.  If plan to do per-protocol analysis and expect that 5% of subjects during follow-up will drop out, then required N = (estimated N) / 0.95 May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 49
  • 50. ADJUSTMENT FOR ITT ANALYSIS  If r1 x 100% of the patients is expected to “switch” from intervention to control and r2 x 100% of the patients is expected to “switch” from control to intervention, then multiply the calculated N by the inflation factor: IF = 1/(1-r1-r2)2  The IF is to compensate for the dilution of the difference in the treatment effect, i.e., the actual difference may be smaller than what was estimated prior to the study initiation. May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 50
  • 51. ITT EXAMPLE May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 51 Tx Grp N Est μ Drop out σ A 63 30 lbs 15% 20 B 63 20 lbs 25% 20 Suppose for a study using weight change outcome: So, Δ = μA – μB = 10 with planned total N=126 and power of 80%
  • 52. ITT EXAMPLE (CONT’D) May 6-7, 2010 DESIGN OF EARLY PHASE CLINICAL TRIALS 52 With the drop in/out, the observed Δ = Δ’: Δ’ = [(30x0.85)+(20x0.15)] - [(30x0.25)+(20x0.75)] = 6 < original planned Δ of 10 IF = 1/[(1-r1-r2 )2] = 1/[(1-0.15-0.25)2] = 2.78 New N under ITT: N’ = 126 x 2.78 = 350
  • 53. DISCUSSION 1 8/17/2020 53 If you claim to conduct an intention-to-treat analysis and a randomized subject stops taking the assigned treatment due to an adverse event, do you follow that person according to the protocol or do you do their final assessments at that point and remove them from the study?
  • 54. STATISTICAL CONSIDERATIONS Were the Groups Comparable at the Start of the Study? Were All Participants Accounted for at the end of Follow-up? How complete was the follow- up?  Impute Missing data
  • 55. HANDLING MISSING DATA Impute missing data  Single point imputation (LOCF, Worse case, best case, mean imputation  Multiple imputation (Using a modelling approach repeatedly impute the missing cases (e.g. 20 times, perform the test, and summarize the findings across imputed datasets)
  • 56. PRE-SPECIFIED STATISTICAL ANALYSIS PLAN Avoid of Statistician Bias Sample Size/Power/Study Design should be in agreement. State error rates, approach to deal with multiplicity. Randomization plan Baseline comparisons Missing data Analysis Samples, ITT/Per Protocol Plans for Interim Analyses Pre-specify model building approach and baseline covariates/confounders to be adjusted Prioritization of outcomes  Primary vs. secondary vs. exploratory outcomes (Standard definitions)