SlideShare a Scribd company logo
Concepts in Biostatistics
Anne Eaton
Department of Epidemiology and Biostatistics
Memorial Sloan-Kettering Cancer Center
February 14, 2012
Outline of Talk
– Basic statistics concepts
– Types of variables
– Descriptive statistics
• Measures of location and dispersion
– Two variables
• Correlation between two variables
• Bivariate analysis (two-sample vs. paired)
– Multivariate analysis (multivariate normal regression, logistic regression)
– Survival analysis
– Clinical trial design
– Sample size
– Intent-to-treat analysis
– Missing data
Referenced Datasets
• I will be anchoring most of the concepts to two datasets
throughout the lectures
Dataset 1: Multiple myeloma patients (Krall, J. M. et al,
Biometrics, 31, 49–57; 1975.)
 65 patients treated with alkylating agents
 variables: BUN, HGB, platelets, age, WBC, fractures,
plasma cells in bone marrow, proteinuria, serum calcium, death
status
Dataset 2: Metastatic renal cancer patients
 789 first-line mRCC clinical trial patients at MSKCC
 selected variables: treatment, corrected calcium, HGB,
year of trt, LDH, KPS, death status
Basic Concepts
Descriptive Analysis
What is Statistics?
Descriptive Statistics: summarizing and presenting data using
numerical or graphical methods.
- What are the clinical characteristics of the 65 multiple
myeloma patients?
Inferential Statistics: making estimates, predictions or other
generalizations about the population.
- What can we say about the clinical characteristics of the
general population of multiple myeloma patients treated with
alkylating agents?
Statistical Inference and
Hypothesis Testing
Population, N Sample, n
Multiple myeloma
patients trted with
alkylating agents
=avg. platelet count
=prop. died w/in 1 yr
x
y
We use the sample of n patients to
make inference about the
population by:
- estimating parameters
- testing hypotheses.
=avg. platelet count
=prop. died w/in 1 yr
µ
θ
65 patients
Variable Types (Distributions)
• Continuous (always numeric)
– Age, Tumor size
• Count
– # of lesions, # prior therapies, # of surgeries
• Categorical
– Nominal (special case is binary)
• responder vs. nonresponder, gender, treatment
• Special case: PFS, death
– Ordinal
• Age categories (20-30 yrs, 31-40 yrs, 41-50 yrs)
• Tumor size categories (small, medium, large)
• Comorbidity score (none, mild, moderate, severe)
Statistical method depends on distribution of the outcome as well as
the hypothesis of research interest and other methodological
issues.
Summarizing Data: Univariate Analysis
•Continuous variables
-Location parameters identify the location where most of
the datapoints lie
-Mean: average, affected by outliers
-Median: value at which 50% of data points are higher
and 50% are lower, not so affected by outliers
-Mode: value with the most datapoints
-Dispersion parameters measure the variability, spread,
dispersion, variation of the data.
-Variance: approximately, the squared average distance
from the mean of all the data points, measure of how close
the values cluster together. Standard deviation is the
squareroot of the variance.
- Range: distance from lower to highest value
Variable N Mean Median Mode Minimum Maximum Variance Std Dev
HGB
bun2
65
65
10.2015385
4.2432506
10.2000000
3.7516660
10.2000000
3.7516660
4.9000000
2.1775491
14.6000000
9.3511563
6.5410913
2.3514505
2.5575557
1.5334440
Example: Descriptive statistics for
HGB and BUN
Summarizing Data: Univariate Analysis
•Count variables
- Mean, Median, range
-E.g. Number of metastatic sites
1 site: 98 , 2 sites: 129, 3 sites: 104, >=4 sites: 442
Mean: 5.2 Median: 6 Range: 1 to 9
•Categorical
-Total count of patients by each category, Proportion
-E.g. Fractures at baseline (yes=1 or no=0)
Frac Frequency Percent
Cumulative
Frequency
Cumulative
Percent
0 16 24.62 16 24.62
1 49 75.38 65 100.00
Two Variables: Correlations and
Associations
Are two variables correlated/associated?
• Basic idea in many correlative studies is to examine whether
there is a relationship between two variables
-Are baseline values of HGB and LDH correlated in the mRCC
dataset?
-Is presence of a fracture associated with abnormal platelets at
diagnosis?
• Caution with terminology: predictor implies causation
while correlate implies association. Whether one can
determine causation depends on the experimental design.
• Identifying correlations may be of primary purpose, but also
will help in multivariate modeling---don’t want to have two
highly correlated variables in a multivariate analysis.
Experimental Design
• Randomized study: patients are randomly assigned to a
treatment arm. Can infer causation.
Non-randomized study: a patient’s treatment may be
influenced by any number of factors including their health status
at baseline, their personal preference, their doctor’s preference,
what treatment was available at the time and location they were
being treated. Cannot infer causation.
Randomization balances the groups, as long as the number of
patients is large enough.
• In a non-randomized study, we can use multivariate models to
adjust for outside factors and measure the effect of treatment
“corrected for” the other differences between the groups
• However, the groups may still be unbalanced in ways we
can’t measure
• Randomization is crucial for inference!
From “Randomized
Trial of Estrogen
Plus Progestin for
Secondary Prevention
of Coronary Heart
Disease in
Postmenopausal
Women”
JAMA 1998: 280(7)
HGB by LDH before transformation HGB by LDH after transformation
Higher LDH values associated with
lower HGB values.
Two Variables: Correlations
and Associations
Example 1: HGB by LDH
Table of Frac by Platelet
Frac Platelet
Total
Frequency
Percent
Row Pct
Col Pct 0 1
0 1
1.54
6.25
11.11
15
23.08
93.75
26.79
16
24.62
1 8
12.31
16.33
88.89
41
63.08
83.67
73.21
49
75.38
Total 9
13.85
56
86.15
65
100.00
Presence of fractures (1=yes, 0=no) by
Platelet count (1=normal, 0=abnormal)
Among those who had a fracture, 8/49
had abnormal platelets, while among
those who did not have a fracture, 1/16
had abnormal platelets. (row percents)
Two Variables: Correlations and
Associations
Example 2: Fracture by Platelets
N
AbN
None
Yes
Quantifying Association or Correlation
Bivariate Analysis
• Purpose is to examine the relationship between two variables,
(two covariates, an outcome with a covariate)
- Are two variables associated or independent?
• Concepts important in quantifying this relationship:
- Distributional assumptions
- Null hypothesis, Alternative hypothesis
- Test statistic
- P-value / confidence interval
- Type I error
- One sided vs two-sided tests
- Is age associated with whether a patient presented with fractures
at diagnosis in the myeloma dataset?
- A two-sided hypothesis test is given by:
H0: µf = µnf (null hypothesis)
H1: µf ≠ µnf (alternative hypothesis)
-Calculate the test statistic:
-Reject null hypothesis if t < –tα/2 or t > tα/2
- tα is the critical value and α (Type I error) is usually set at .05
-The p-value is p(T>=t)
T-test: Comparing Two Means
/
f nf
y y
t
s n
−
=
Type I error
• Alpha: detecting a difference when a difference does
not actually exist.
– Also called Type I error
– Usually set at 5 or 10%
– ‘Detecting a difference under the null hypothesis’
Statistics
Variable Frac N
Lower CL
Mean Mean
Upper CL
Mean
Lower CL
Std Dev Std Dev
Upper CL
Std Dev Std Err
Age 0 16 56.31 62.313 68.315 8.3214 11.265 17.434 2.8162
Age 1 49 56.567 59.449 62.331 8.3671 10.033 12.535 1.4333
Age Diff (1-2) -3.086 2.8635 8.8131 8.8075 10.34 12.523 2.9772
T-Tests
Variable Method Variances DF t Value Pr > |t|
Age Pooled Equal 63 0.96 0.3398
Age Satterthwaite Unequal 23.3 0.91 0.3741
Equality of Variances
Variable Method Num DF Den DF F Value Pr > F
Age Folded F 15 48 1.26 0.5270
Variances are
equal, use pooled
t-test
Conclusion: p-value = .34. No difference, cannot reject null
T-test: Output from SAS
Do not
reject
Do not
reject
T= .96, df=63
Critical values are at -2 and 2 approximately.
Since .96 is in the ‘do not reject region’, we cannot
conclude there is a difference in age by presence of
fractures at diagnosis.
Interpreting p-values
P-value: the probability that an observed result is due to chance alone
if the null hypothesis is true.
• If p-value is less than the α-level (typically 0.05) chosen prior to
the study, then the null hypothesis is rejected.
• Commonly misinterpreted as the probability that the null
hypothesis is true.
Table of Frac by Platelet
Frac Platelet
Total
Frequency
Percent
Row Pct
Col Pct 0 1
0 1
1.54
6.25
11.11
15
23.08
93.75
26.79
16
24.62
1 8
12.31
16.33
88.89
41
63.08
83.67
73.21
49
75.38
Total 9
13.85
56
86.15
65
100.00
N
AbN
None
Yes
Statistic DF Value Prob
Chi-Square 1 1.0266 0.3109
Likelihood Ratio Chi-Square 1 1.1852 0.2763
Continuity Adj. Chi-Square 1 0.3557 0.5509
Mantel-Haenszel Chi-Square 1 1.0109 0.3147
Phi Coefficient -0.1257
Contingency Coefficient 0.1247
Cramer's V -0.1257
WARNING: 25% of the cells have expected counts less
than 5. Chi-Square may not be a valid test.
Fisher's Exact Test
Cell (1,1) Frequency (F) 1
Left-sided Pr <= F 0.2900
Right-sided Pr >= F 0.9357
Table Probability (P) 0.2257
Two-sided Pr <= P 0.4326
P=.43, cannot reject the
null. Conclude there is no
difference in the presence
of fractions by platelet
count status.
Chi-Square test: Output from SAS
H0: Fractures and platelets are independent.
Ha: Fractures and platelets are associated.
Note on p-values: Multiple testing
There is often a search for a ‘significant finding’, a p-value less
than .05. This search comes at a cost.
Since each test you do has a 5% chance of a “significant”
(p<0.05) finding by chance alone, the more tests you do, the
more likely you are to find a spurious association.
So instead of comparing each p-value to 0.05, we use a more
strict cutoff. This ensures that the family-wise error rate (the
probability of any significant finding given there are no true
associations) is less than alpha=0.05.
The Bonferroni adjustment is the most common method. You
compare each p-value to (alpha)/K where K is the number of
tests you are doing.
Difference, Two
independent samples
(e.g. two arms of a trial)
Difference, Paired data
(e.g. before and after on
same patient)
Difference between three
or more independent
samples (e.g. three arm
trial)
Binary or nominal
variables
Pearson's Chi-Square,
Fisher's Exact test
McNemar's test Pearson's Chi-Square
Quantitative,
normality assumed
Two sample T-test Paired T-test ANOVA (Analysis of Variance)
Non-normal data, non-
parametric tests
Mann-Whitney Wilcoxon signed rank Kruskal-Wallis
Important Notes:
• This is not an exhaustive list, many variations and areas beyond scope of talk
- Depends on your research question and data
• There will be times where your research question will require analysis that is
not listed above (e.g. Survival analysis, repeated measures, longitudinal data,
cluster analysis, inter-rater agreement, factor analysis, ROCs)
Summary of Commonly Used Tests
Multivariate Analysis
Multivariate Analysis
• Interested in more than one covariate
– Simultaneous effect of 2 or 3 covariates on the
outcome
– Effect of one covariate, adjusted for others (e.g.
confounding variables)
– Want to include interaction
• Continuous outcome: multivariate normal
regression
• Binary outcome: logistic regression
Multivariate Normal Analysis
• Outcome is normal (continuous)
• Covariates can be normal or categorical
• Simple linear regression models a linear relationship
(association) between the outcome and a single covariate.
• Multivariate normal regression models the relationship
between the outcome and several covariates.
• A sample interpretation might be, after adjusting for saturated
fat in diet, a one-year increase in age was associated with a
0.1-mg/dL increase in cholesterol
Logistic Regression
• Outcome is binary (0 vs 1)
• Covariates can be normal or categorical
• Parameter coefficients have a useful interpretation:
log odds ratios
• A sample interpretation might be, after adjusting for
age, patients with a stage 2 tumor had twice the odds
of being treated with chemotherapy compared to
patients with a stage 1 tumor
Interpretation of a multivariate model
If a covariate is significant in a multivariate model we
can say, “After adjusting for X, Y and Z, A has a
significant effect on B” or, “A is independently
associated with B.”
The number of variables you can correct for is limited
by your sample size. For linear regression, you need
10-15 patients per variable.
Survival Analysis
Survival Analysis
- Survival analysis is a group of statistical methods
designed to analyze time to an event.
- Examples of events could be:
- Recurrence or progression
- Death or death due to disease
- Disease onset (AIDS in HIV patients)
Two Common Goals of Survival Analysis
1) Evaluate time to event (descriptive)
-What is the median survival time from diagnosis
among patients in the multiple myeloma dataset?
2) Examine effect of certain factors (e.g.
clinicopathologic variables, biomarkers) on the time
to event.
- What are important prognostic factors for
survival in the multiple myeloma patient dataset?
Why we need survival analysis methods
• Able to account for censoring
– Subject does not experience event of interest
– Incomplete follow-up
• Lost to followup
• Withdrawal
• Death
Example of Right Censoring
Data
• When the clock starts
– E.g. Diagnosis date, end of therapy
• Did the patient experience the event? (binary)
– E.g. Death, death due to disease, progression, infection
• Last date of follow-up, Date of event
• Covariates
– Assessed at or before the clock starts
– Assessed after the clock starts (adds complexity to
analysis)
Kaplan Meier Estimates with 95% CI
Overall Survival, Kidney Cancer example
• Number at risk decreases over
time
• Tick marks represent when a
patient was censored
• Drops in the curve represent
when a patient experience the
event.
• CI gets wider at the end of the
curve (number at risk is small)
Log-rank p-value=.07
Platelets: Red = Normal, Blue=Abnormal
Months
Months
Fractures: Red = yes, Blue= none
Log-rank p-value=.33
Log-rank test to compare survival curves for
2 or more groups
Clinical trial design
• As opposed to observational studies, clinical
trials involve an intervention that’s assigned
by the investigator
• Clinical trials are highly regulated to make
sure approved drugs are safe and effective
• Endpoint and alpha must be specified
beforehand
Phase I trial
• First-in-humans trial
• Goal is to determine the MTD
• You have to define beforehand what is
considered a dose-limiting toxicity
• Standard design is called 3+3; patients are
enrolled in cohorts of size 3
Concepts in Biostatistics Presentation.pdf
Phase I trial
True risk of
toxicity
.10 .20 .30 .40 .50
Probability of
escalation
.91 .71 .49 .31 .17
This design has the property that the more toxic a drug is,
the less likely the dose will be escalated for the next
cohort of patients.
Phase II trial
• This trial looks at a drug’s efficacy
• Endpoint is often response rate
• Other possible endpoints are survival or
progression-free survival
• Trial may be randomized if no good historical
data is available for comparison
• Simon’s two stage design is common
– Endpoint is response rate
– Allows for early stopping if drug isn’t promising
Phase II trial
Sample size justification
The sample size needed for a T-test is:
σ
µ
µ
β
α
1
0
1
2
/
1
−
−
−
z
z is a function of α that gets bigger as α gets smaller
is a function of β that gets bigger as β gets smaller
is the difference between the group means
is the variance of the variable you are measuring (dispersion)
Power and Alpha
• Power: the ability to detect a difference, given
that the difference actually exists (80-90%).
– Type II error = 1-Power
• Alpha: detecting a difference when a
difference does not actually exist. (5-10%)
– Also called Type I error
Power, alpha and sample size are
all related
• There aren’t simple formulas for other tests but the
general patterns are the same
• Lower error rate -> larger sample size
• Smaller detectable difference -> larger sample size
• More variability -> larger sample size
• A significant finding may not be scientifically
significant or clinically significant. Large sample
sizes have power to detect even small differences,
differences that may not be useful clinically.
Concepts in Biostatistics Presentation.pdf
Power tables
For a proportion, the variability gets bigger as
the true proportion increases from 0.1 to 0.5.
From MSK protocol 10-115, Association of smoking, lung
inflammation and lung metastases from breast cancer
Phase III trial
• This is the definitive trial that shows a drug is
superior to an older drug or whatever’s the
standard of care
• Large sample size (thousands) and
randomized
• Often blinded
Intent-to-treat analysis
• Randomization only works if you analyze the data “as
randomized” (also known as intent-to-treat)
• If analysis is not done in this way p-value can’t be
trusted
• The patients who deviate from the protocol may be
different from those who remain on protocol
• It’s good to randomize as late as possible so you
minimize the number of patients who are randomized
but don’t complete therapy or assessments
Evaluable patients
• For non-randomized studies, the protocol
should specify at what point a patient will be
considered “evaluable”
• If we can’t ascertain the outcome on an evaluable patient, we
have to assume the worst in order to be conservative and
control type I error
Missing data
• Complete case analysis looks at just the
patients whose data is complete.
• Are the patients missing at random?
• The less the better, but if more than 10% of
data is missing for a certain covariate,
reviewers may be skeptical.
Thank you

More Related Content

PPT
Session1b.ppt
PPT
25_Anderson_Biostatistics_and_Epidemiology.ppt
PPT
Statistics basics for oncologist kiran
PDF
INFERENTIAL STATISTICS.pdf
PDF
inferentialstatistics-210411214248.pdf
PPTX
Inferential statistics
PPTX
Company Induction process and Onboarding
PPTX
Bio-Statistics in Bio-Medical research
Session1b.ppt
25_Anderson_Biostatistics_and_Epidemiology.ppt
Statistics basics for oncologist kiran
INFERENTIAL STATISTICS.pdf
inferentialstatistics-210411214248.pdf
Inferential statistics
Company Induction process and Onboarding
Bio-Statistics in Bio-Medical research

Similar to Concepts in Biostatistics Presentation.pdf (20)

PPTX
Statistics.pptx
PDF
PPTX
univariate and bivariate analysis in spss
PPTX
Statistical methods for research scholars (cd)
PPTX
Test of significance
PPT
Categorical data analysis which part of the generalized linear model
PPT
23-Statistical_tests_(chi-square,Fishers___&Macnemars))(UG1435-36).ppt
PDF
Biostatistics clinical research & trials
PDF
Choosing appropriate statistical test RSS6 2104
PPT
PARAMETRIC TEST in Public health dentistry.ppt
PPT
Biostatistics
PPTX
Introduction to biostatistics
PPTX
Inferential statistics nominal data
PPT
Biostatics introduction
PPTX
Basic of Biostatistics The Second Part.pptx
PPTX
Biostatistics.pptx
PPTX
#1 Seminar - Biostatistics in public health dentistry.pptx
PPTX
FORTUNE EFFIONG_COMPARATIVE ANALYSIS.pptx
PPT
Categorical-data-afghvvghgfhg.analysis.ppt
PPTX
TEST OF SIGNIFICANCE.pptx
Statistics.pptx
univariate and bivariate analysis in spss
Statistical methods for research scholars (cd)
Test of significance
Categorical data analysis which part of the generalized linear model
23-Statistical_tests_(chi-square,Fishers___&Macnemars))(UG1435-36).ppt
Biostatistics clinical research & trials
Choosing appropriate statistical test RSS6 2104
PARAMETRIC TEST in Public health dentistry.ppt
Biostatistics
Introduction to biostatistics
Inferential statistics nominal data
Biostatics introduction
Basic of Biostatistics The Second Part.pptx
Biostatistics.pptx
#1 Seminar - Biostatistics in public health dentistry.pptx
FORTUNE EFFIONG_COMPARATIVE ANALYSIS.pptx
Categorical-data-afghvvghgfhg.analysis.ppt
TEST OF SIGNIFICANCE.pptx
Ad

Recently uploaded (20)

PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PDF
Classroom Observation Tools for Teachers
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Institutional Correction lecture only . . .
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
Cell Structure & Organelles in detailed.
PDF
Basic Mud Logging Guide for educational purpose
PDF
Complications of Minimal Access Surgery at WLH
PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
PDF
01-Introduction-to-Information-Management.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Classroom Observation Tools for Teachers
Supply Chain Operations Speaking Notes -ICLT Program
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
2.FourierTransform-ShortQuestionswithAnswers.pdf
Institutional Correction lecture only . . .
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
Cell Types and Its function , kingdom of life
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Cell Structure & Organelles in detailed.
Basic Mud Logging Guide for educational purpose
Complications of Minimal Access Surgery at WLH
O5-L3 Freight Transport Ops (International) V1.pdf
TR - Agricultural Crops Production NC III.pdf
Introduction to Child Health Nursing – Unit I | Child Health Nursing I | B.Sc...
01-Introduction-to-Information-Management.pdf
Microbial disease of the cardiovascular and lymphatic systems
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Ad

Concepts in Biostatistics Presentation.pdf

  • 1. Concepts in Biostatistics Anne Eaton Department of Epidemiology and Biostatistics Memorial Sloan-Kettering Cancer Center February 14, 2012
  • 2. Outline of Talk – Basic statistics concepts – Types of variables – Descriptive statistics • Measures of location and dispersion – Two variables • Correlation between two variables • Bivariate analysis (two-sample vs. paired) – Multivariate analysis (multivariate normal regression, logistic regression) – Survival analysis – Clinical trial design – Sample size – Intent-to-treat analysis – Missing data
  • 3. Referenced Datasets • I will be anchoring most of the concepts to two datasets throughout the lectures Dataset 1: Multiple myeloma patients (Krall, J. M. et al, Biometrics, 31, 49–57; 1975.)  65 patients treated with alkylating agents  variables: BUN, HGB, platelets, age, WBC, fractures, plasma cells in bone marrow, proteinuria, serum calcium, death status Dataset 2: Metastatic renal cancer patients  789 first-line mRCC clinical trial patients at MSKCC  selected variables: treatment, corrected calcium, HGB, year of trt, LDH, KPS, death status
  • 5. What is Statistics? Descriptive Statistics: summarizing and presenting data using numerical or graphical methods. - What are the clinical characteristics of the 65 multiple myeloma patients? Inferential Statistics: making estimates, predictions or other generalizations about the population. - What can we say about the clinical characteristics of the general population of multiple myeloma patients treated with alkylating agents?
  • 6. Statistical Inference and Hypothesis Testing Population, N Sample, n Multiple myeloma patients trted with alkylating agents =avg. platelet count =prop. died w/in 1 yr x y We use the sample of n patients to make inference about the population by: - estimating parameters - testing hypotheses. =avg. platelet count =prop. died w/in 1 yr µ θ 65 patients
  • 7. Variable Types (Distributions) • Continuous (always numeric) – Age, Tumor size • Count – # of lesions, # prior therapies, # of surgeries • Categorical – Nominal (special case is binary) • responder vs. nonresponder, gender, treatment • Special case: PFS, death – Ordinal • Age categories (20-30 yrs, 31-40 yrs, 41-50 yrs) • Tumor size categories (small, medium, large) • Comorbidity score (none, mild, moderate, severe) Statistical method depends on distribution of the outcome as well as the hypothesis of research interest and other methodological issues.
  • 8. Summarizing Data: Univariate Analysis •Continuous variables -Location parameters identify the location where most of the datapoints lie -Mean: average, affected by outliers -Median: value at which 50% of data points are higher and 50% are lower, not so affected by outliers -Mode: value with the most datapoints -Dispersion parameters measure the variability, spread, dispersion, variation of the data. -Variance: approximately, the squared average distance from the mean of all the data points, measure of how close the values cluster together. Standard deviation is the squareroot of the variance. - Range: distance from lower to highest value
  • 9. Variable N Mean Median Mode Minimum Maximum Variance Std Dev HGB bun2 65 65 10.2015385 4.2432506 10.2000000 3.7516660 10.2000000 3.7516660 4.9000000 2.1775491 14.6000000 9.3511563 6.5410913 2.3514505 2.5575557 1.5334440 Example: Descriptive statistics for HGB and BUN
  • 10. Summarizing Data: Univariate Analysis •Count variables - Mean, Median, range -E.g. Number of metastatic sites 1 site: 98 , 2 sites: 129, 3 sites: 104, >=4 sites: 442 Mean: 5.2 Median: 6 Range: 1 to 9 •Categorical -Total count of patients by each category, Proportion -E.g. Fractures at baseline (yes=1 or no=0) Frac Frequency Percent Cumulative Frequency Cumulative Percent 0 16 24.62 16 24.62 1 49 75.38 65 100.00
  • 11. Two Variables: Correlations and Associations
  • 12. Are two variables correlated/associated? • Basic idea in many correlative studies is to examine whether there is a relationship between two variables -Are baseline values of HGB and LDH correlated in the mRCC dataset? -Is presence of a fracture associated with abnormal platelets at diagnosis? • Caution with terminology: predictor implies causation while correlate implies association. Whether one can determine causation depends on the experimental design. • Identifying correlations may be of primary purpose, but also will help in multivariate modeling---don’t want to have two highly correlated variables in a multivariate analysis.
  • 13. Experimental Design • Randomized study: patients are randomly assigned to a treatment arm. Can infer causation. Non-randomized study: a patient’s treatment may be influenced by any number of factors including their health status at baseline, their personal preference, their doctor’s preference, what treatment was available at the time and location they were being treated. Cannot infer causation. Randomization balances the groups, as long as the number of patients is large enough. • In a non-randomized study, we can use multivariate models to adjust for outside factors and measure the effect of treatment “corrected for” the other differences between the groups • However, the groups may still be unbalanced in ways we can’t measure • Randomization is crucial for inference!
  • 14. From “Randomized Trial of Estrogen Plus Progestin for Secondary Prevention of Coronary Heart Disease in Postmenopausal Women” JAMA 1998: 280(7)
  • 15. HGB by LDH before transformation HGB by LDH after transformation Higher LDH values associated with lower HGB values. Two Variables: Correlations and Associations Example 1: HGB by LDH
  • 16. Table of Frac by Platelet Frac Platelet Total Frequency Percent Row Pct Col Pct 0 1 0 1 1.54 6.25 11.11 15 23.08 93.75 26.79 16 24.62 1 8 12.31 16.33 88.89 41 63.08 83.67 73.21 49 75.38 Total 9 13.85 56 86.15 65 100.00 Presence of fractures (1=yes, 0=no) by Platelet count (1=normal, 0=abnormal) Among those who had a fracture, 8/49 had abnormal platelets, while among those who did not have a fracture, 1/16 had abnormal platelets. (row percents) Two Variables: Correlations and Associations Example 2: Fracture by Platelets N AbN None Yes
  • 17. Quantifying Association or Correlation Bivariate Analysis • Purpose is to examine the relationship between two variables, (two covariates, an outcome with a covariate) - Are two variables associated or independent? • Concepts important in quantifying this relationship: - Distributional assumptions - Null hypothesis, Alternative hypothesis - Test statistic - P-value / confidence interval - Type I error - One sided vs two-sided tests
  • 18. - Is age associated with whether a patient presented with fractures at diagnosis in the myeloma dataset? - A two-sided hypothesis test is given by: H0: µf = µnf (null hypothesis) H1: µf ≠ µnf (alternative hypothesis) -Calculate the test statistic: -Reject null hypothesis if t < –tα/2 or t > tα/2 - tα is the critical value and α (Type I error) is usually set at .05 -The p-value is p(T>=t) T-test: Comparing Two Means / f nf y y t s n − =
  • 19. Type I error • Alpha: detecting a difference when a difference does not actually exist. – Also called Type I error – Usually set at 5 or 10% – ‘Detecting a difference under the null hypothesis’
  • 20. Statistics Variable Frac N Lower CL Mean Mean Upper CL Mean Lower CL Std Dev Std Dev Upper CL Std Dev Std Err Age 0 16 56.31 62.313 68.315 8.3214 11.265 17.434 2.8162 Age 1 49 56.567 59.449 62.331 8.3671 10.033 12.535 1.4333 Age Diff (1-2) -3.086 2.8635 8.8131 8.8075 10.34 12.523 2.9772 T-Tests Variable Method Variances DF t Value Pr > |t| Age Pooled Equal 63 0.96 0.3398 Age Satterthwaite Unequal 23.3 0.91 0.3741 Equality of Variances Variable Method Num DF Den DF F Value Pr > F Age Folded F 15 48 1.26 0.5270 Variances are equal, use pooled t-test Conclusion: p-value = .34. No difference, cannot reject null T-test: Output from SAS
  • 21. Do not reject Do not reject T= .96, df=63 Critical values are at -2 and 2 approximately. Since .96 is in the ‘do not reject region’, we cannot conclude there is a difference in age by presence of fractures at diagnosis.
  • 22. Interpreting p-values P-value: the probability that an observed result is due to chance alone if the null hypothesis is true. • If p-value is less than the α-level (typically 0.05) chosen prior to the study, then the null hypothesis is rejected. • Commonly misinterpreted as the probability that the null hypothesis is true.
  • 23. Table of Frac by Platelet Frac Platelet Total Frequency Percent Row Pct Col Pct 0 1 0 1 1.54 6.25 11.11 15 23.08 93.75 26.79 16 24.62 1 8 12.31 16.33 88.89 41 63.08 83.67 73.21 49 75.38 Total 9 13.85 56 86.15 65 100.00 N AbN None Yes Statistic DF Value Prob Chi-Square 1 1.0266 0.3109 Likelihood Ratio Chi-Square 1 1.1852 0.2763 Continuity Adj. Chi-Square 1 0.3557 0.5509 Mantel-Haenszel Chi-Square 1 1.0109 0.3147 Phi Coefficient -0.1257 Contingency Coefficient 0.1247 Cramer's V -0.1257 WARNING: 25% of the cells have expected counts less than 5. Chi-Square may not be a valid test. Fisher's Exact Test Cell (1,1) Frequency (F) 1 Left-sided Pr <= F 0.2900 Right-sided Pr >= F 0.9357 Table Probability (P) 0.2257 Two-sided Pr <= P 0.4326 P=.43, cannot reject the null. Conclude there is no difference in the presence of fractions by platelet count status. Chi-Square test: Output from SAS H0: Fractures and platelets are independent. Ha: Fractures and platelets are associated.
  • 24. Note on p-values: Multiple testing There is often a search for a ‘significant finding’, a p-value less than .05. This search comes at a cost. Since each test you do has a 5% chance of a “significant” (p<0.05) finding by chance alone, the more tests you do, the more likely you are to find a spurious association. So instead of comparing each p-value to 0.05, we use a more strict cutoff. This ensures that the family-wise error rate (the probability of any significant finding given there are no true associations) is less than alpha=0.05. The Bonferroni adjustment is the most common method. You compare each p-value to (alpha)/K where K is the number of tests you are doing.
  • 25. Difference, Two independent samples (e.g. two arms of a trial) Difference, Paired data (e.g. before and after on same patient) Difference between three or more independent samples (e.g. three arm trial) Binary or nominal variables Pearson's Chi-Square, Fisher's Exact test McNemar's test Pearson's Chi-Square Quantitative, normality assumed Two sample T-test Paired T-test ANOVA (Analysis of Variance) Non-normal data, non- parametric tests Mann-Whitney Wilcoxon signed rank Kruskal-Wallis Important Notes: • This is not an exhaustive list, many variations and areas beyond scope of talk - Depends on your research question and data • There will be times where your research question will require analysis that is not listed above (e.g. Survival analysis, repeated measures, longitudinal data, cluster analysis, inter-rater agreement, factor analysis, ROCs) Summary of Commonly Used Tests
  • 27. Multivariate Analysis • Interested in more than one covariate – Simultaneous effect of 2 or 3 covariates on the outcome – Effect of one covariate, adjusted for others (e.g. confounding variables) – Want to include interaction • Continuous outcome: multivariate normal regression • Binary outcome: logistic regression
  • 28. Multivariate Normal Analysis • Outcome is normal (continuous) • Covariates can be normal or categorical • Simple linear regression models a linear relationship (association) between the outcome and a single covariate. • Multivariate normal regression models the relationship between the outcome and several covariates. • A sample interpretation might be, after adjusting for saturated fat in diet, a one-year increase in age was associated with a 0.1-mg/dL increase in cholesterol
  • 29. Logistic Regression • Outcome is binary (0 vs 1) • Covariates can be normal or categorical • Parameter coefficients have a useful interpretation: log odds ratios • A sample interpretation might be, after adjusting for age, patients with a stage 2 tumor had twice the odds of being treated with chemotherapy compared to patients with a stage 1 tumor
  • 30. Interpretation of a multivariate model If a covariate is significant in a multivariate model we can say, “After adjusting for X, Y and Z, A has a significant effect on B” or, “A is independently associated with B.” The number of variables you can correct for is limited by your sample size. For linear regression, you need 10-15 patients per variable.
  • 32. Survival Analysis - Survival analysis is a group of statistical methods designed to analyze time to an event. - Examples of events could be: - Recurrence or progression - Death or death due to disease - Disease onset (AIDS in HIV patients)
  • 33. Two Common Goals of Survival Analysis 1) Evaluate time to event (descriptive) -What is the median survival time from diagnosis among patients in the multiple myeloma dataset? 2) Examine effect of certain factors (e.g. clinicopathologic variables, biomarkers) on the time to event. - What are important prognostic factors for survival in the multiple myeloma patient dataset?
  • 34. Why we need survival analysis methods • Able to account for censoring – Subject does not experience event of interest – Incomplete follow-up • Lost to followup • Withdrawal • Death Example of Right Censoring
  • 35. Data • When the clock starts – E.g. Diagnosis date, end of therapy • Did the patient experience the event? (binary) – E.g. Death, death due to disease, progression, infection • Last date of follow-up, Date of event • Covariates – Assessed at or before the clock starts – Assessed after the clock starts (adds complexity to analysis)
  • 36. Kaplan Meier Estimates with 95% CI Overall Survival, Kidney Cancer example • Number at risk decreases over time • Tick marks represent when a patient was censored • Drops in the curve represent when a patient experience the event. • CI gets wider at the end of the curve (number at risk is small)
  • 37. Log-rank p-value=.07 Platelets: Red = Normal, Blue=Abnormal Months Months Fractures: Red = yes, Blue= none Log-rank p-value=.33 Log-rank test to compare survival curves for 2 or more groups
  • 38. Clinical trial design • As opposed to observational studies, clinical trials involve an intervention that’s assigned by the investigator • Clinical trials are highly regulated to make sure approved drugs are safe and effective • Endpoint and alpha must be specified beforehand
  • 39. Phase I trial • First-in-humans trial • Goal is to determine the MTD • You have to define beforehand what is considered a dose-limiting toxicity • Standard design is called 3+3; patients are enrolled in cohorts of size 3
  • 41. Phase I trial True risk of toxicity .10 .20 .30 .40 .50 Probability of escalation .91 .71 .49 .31 .17 This design has the property that the more toxic a drug is, the less likely the dose will be escalated for the next cohort of patients.
  • 42. Phase II trial • This trial looks at a drug’s efficacy • Endpoint is often response rate • Other possible endpoints are survival or progression-free survival • Trial may be randomized if no good historical data is available for comparison • Simon’s two stage design is common – Endpoint is response rate – Allows for early stopping if drug isn’t promising
  • 44. Sample size justification The sample size needed for a T-test is: σ µ µ β α 1 0 1 2 / 1 − − − z z is a function of α that gets bigger as α gets smaller is a function of β that gets bigger as β gets smaller is the difference between the group means is the variance of the variable you are measuring (dispersion)
  • 45. Power and Alpha • Power: the ability to detect a difference, given that the difference actually exists (80-90%). – Type II error = 1-Power • Alpha: detecting a difference when a difference does not actually exist. (5-10%) – Also called Type I error
  • 46. Power, alpha and sample size are all related • There aren’t simple formulas for other tests but the general patterns are the same • Lower error rate -> larger sample size • Smaller detectable difference -> larger sample size • More variability -> larger sample size • A significant finding may not be scientifically significant or clinically significant. Large sample sizes have power to detect even small differences, differences that may not be useful clinically.
  • 48. Power tables For a proportion, the variability gets bigger as the true proportion increases from 0.1 to 0.5. From MSK protocol 10-115, Association of smoking, lung inflammation and lung metastases from breast cancer
  • 49. Phase III trial • This is the definitive trial that shows a drug is superior to an older drug or whatever’s the standard of care • Large sample size (thousands) and randomized • Often blinded
  • 50. Intent-to-treat analysis • Randomization only works if you analyze the data “as randomized” (also known as intent-to-treat) • If analysis is not done in this way p-value can’t be trusted • The patients who deviate from the protocol may be different from those who remain on protocol • It’s good to randomize as late as possible so you minimize the number of patients who are randomized but don’t complete therapy or assessments
  • 51. Evaluable patients • For non-randomized studies, the protocol should specify at what point a patient will be considered “evaluable” • If we can’t ascertain the outcome on an evaluable patient, we have to assume the worst in order to be conservative and control type I error
  • 52. Missing data • Complete case analysis looks at just the patients whose data is complete. • Are the patients missing at random? • The less the better, but if more than 10% of data is missing for a certain covariate, reviewers may be skeptical.