Clinicaldataanalysis in r

Clinical data Analysis using R
A case study

Dataset
• Diastolic blood pressure (DBP) was measured (mm HG) in the
supine position at baseline (i.e., DBP1) before randomization
and monthly thereafter up to 4 months as indicated by
DBP2,DBP3,DBP4 and DBP5.
• Patients age and sex were recorded at baseline and represent
potential covariates.
• primary objective is to test whether treatment A (new drug)
may be effective in lowering DBP as compared to B (placebo)
and to describe changes in DBP across the times at which it
was measured.

Statistical Models for Treatment
Comparisons
A) Student's t-tests :test the null hypothesis that the means of the two
treatment groups are the same
H0 : μ1= μ2
The test statistic is constructed as:
• yi are the treatment group means of the observed data, and s is the pooled
standard error . Under the null hypothesis, this t -statistic has a Student's t –
distribution with n1 + n2 - 2 degrees of freedom.
confidence interval (CI)

Parameter Violations
• Unequal variances: Welch test in R (t.test)
v degrees of freedom calculated as
• Non-normal data:
Mann Whitney Wilcoxon (MWW) U-test (also called Wilcoxon rank-sum test, or
Wilcoxon{Mann{Whitney test). In R (wilcox.test) .
• Bootstrap resampling:
Iteratively resampling the data with replacement, calculating the value of the statistic
for each sample obtained, and generating the resampling distribution. In R Use
library(bootstrap)

One-Way Analysis of Variance
(ANOVA)
• For comparisons involving more than two treatment groups,
F -tests deriving ANOVA is used.
Note : If the null hypothesis fails to be rejected, the analysis ends and it is concluded that there is
insufficient evidence to conclude that the treatment group means differ. However, if the null
hypothesis is rejected, the next logical step is to investigate which levels differ by using so-called
multiple comparisons. We use Tukey's honest significant difference (HSD).
• The ANOVA procedure is implemented in the R system as aov() and
Tukey’s HSD procedure as TukeyHSD() .

Data Analysis of Diastolic Pressure data in R
>dat = read.csv("dbpdata.csv",header=TRUE)
# create the difference
>dat$diff = dat$DBP5-dat$DBP1
>boxplot(diff~TRT, dat, xlab="Treatment", ylab="DBP Changes")

Perform t.test
> t.test(diff~TRT, dat, var.equal=T)
Two Sample t-test
data: diff by TRT
t = -12.1504, df = 38, p-value = 1.169e-14
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-12.132758 -8.667242
sample estimates:
mean in group A mean in group B
-15.2 -4.8
> t.test(diff~TRT, dat, var.equal=F)
Welch Two Sample t-test
data: diff by TRT
t = -12.1504, df = 36.522, p-value = 2.149e-14
alternative hypothesis: true difference in means is not equal to 0
-12.135063 -8.664937
sample estimates:
mean in group A mean in group B
-15.2 -4.8

More tests
> var.test(diff~TRT, dat)
F test to compare two variances
data: diff by TRT
F = 1.5036, num df = 19, denom df = 19, p-value = 0.3819
alternative hypothesis: true ratio of variances is not equal to 1
0.595142 3.798764
sample estimates:
ratio of variances
1.503597
> wilcox.test(diff~TRT, dat)
Wilcoxon rank sum test with continuity correction
data: diff by TRT
W = 0, p-value = 6.286e-08
alternative hypothesis: true location shift is not equal to 0

One-sided t-test
> diff.A = dat[dat$TRT=="A",]$diff
# data from treatment B
> diff.B = dat[dat$TRT=="B",]$diff
# call t.test for one-sided test
> t.test(diff.A, diff.B,alternative="less")
Welch Two Sample t-test
data: diff.A and diff.B
t = -12.1504, df = 36.522, p-value = 1.074e-14
alternative hypothesis: true difference in means is less than 0
-Inf -8.955466
sample estimates:
mean of x mean of y
-15.2 -4.8
A and B are statistically significantly different; i.e., there is evidence that A is more effective.

Bootstrapping
> library(bootstrap)
> mean.diff = function(bn,dat)
+ diff(tapply(dat[bn,]$diff, dat[bn,]$TRT,mean))
> nboot = 1000
> boot.mean = bootstrap(1:dim(dat)[1], nboot, mean.diff,dat)
> x = boot.mean$thetastar
> x.quantile = quantile(x, c(0.025,0.5, 0.975))
> print(x.quantile)
2.5% 50% 97.5%
8.79144 10.38121 12.06272
> hist(boot.mean$thetastar,
xlab="Mean Differences", main="")
> abline(v=x.quantile,lwd=2, lty=c(4,1,4))

One-Way ANOVA for Time Changes
• The treatment period in the DBP trial was
four months with DBP measured at months 1,
2, 3, and 4 post baseline.
> aggregate(dat[,3:7], list(TRT=dat$TRT), mean)
TRT DBP1 DBP2 DBP3 DBP4 DBP5
1 A 116.55 113.5 110.70 106.25 101.35
2 B 116.75 115.2 114.05 112.45 111.95

DBP Changes are Different One – Way
Anova to see change over time.
H0 : μ1= μ2 = μ3 = μ4 = μ5
Ha : Not all means are equal
> Dat = reshape(dat, direction="long",
+ varying=c("DBP1","DBP2","DBP3","DBP4","DBP5"),
+ idvar = c("Subject","TRT","Age","Sex","diff"),sep="")
> colnames(Dat) =
c("Subject","TRT","Age","Sex","diff","Time","DBP")
> Dat$Time = as.factor(Dat$Time)
> head(Dat)
Subject TRT Age Sex diff Time DBP
1.A.43.F.-9.1 1 A 43 F -9 1 114
2.A.51.M.-15.1 2 A 51 M -15 1 116
3.A.48.F.-21.1 3 A 48 F -21 1 119
4.A.42.F.-14.1 4 A 42 F -14 1 115
5.A.49.M.-11.1 5 A 49 M -11 1 116
6.A.47.M.-15.1 6 A 47 M -15 1 117

One Way ANOVA
> # one-way ANOVA to test the null hypotheses that the means of DBP at all five
times of measurement are equal
> # test treatment "A"
> datA = Dat[Dat$TRT=="A",]
> test.A = aov(DBP~Time, datA)
> summary(test.A)
Df Sum Sq Mean Sq F value Pr(>F)
Time 4 2879.7 719.9 127 <2e-16 ***
Residuals 95 538.5 5.7
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> # test treatment "B"
> datB = Dat[Dat$TRT=="B",]
> test.B = aov(DBP~Time, datB)
> summary(test.B)
Time 4 311.6 77.89 17.63 7.5e-11 ***
Residuals 95 419.8 4.42
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

TukeyHBD test
> TukeyHSD(test.A)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = DBP ~ Time, data = datA)
$Time
diff lwr upr p adj
2-1 -3.05 -5.143586 -0.9564144 0.0009687
3-1 -5.85 -7.943586 -3.7564144 0.0000000
4-1 -10.30 -12.393586 -8.2064144 0.0000000
5-1 -15.20 -17.293586 -13.1064144 0.0000000
3-2 -2.80 -4.893586 -0.7064144 0.0030529
4-2 -7.25 -9.343586 -5.1564144 0.0000000
5-2 -12.15 -14.243586 -10.0564144 0.0000000
4-3 -4.45 -6.543586 -2.3564144 0.0000005
5-3 -9.35 -11.443586 -7.2564144 0.0000000
5-4 -4.90 -6.993586 -2.8064144 0.0000000
> TukeyHSD(test.B)
Tukey multiple comparisons of means
95% family-wise confidence level
Fit: aov(formula = DBP ~ Time, data = datB)
$Time
diff lwr upr p adj
2-1 -1.55 -3.398584 0.2985843 0.1440046
3-1 -2.70 -4.548584 -0.8514157 0.0009333
4-1 -4.30 -6.148584 -2.4514157 0.0000000
5-1 -4.80 -6.648584 -2.9514157 0.0000000
3-2 -1.15 -2.998584 0.6985843 0.4207789
4-2 -2.75 -4.598584 -0.9014157 0.0007122
5-2 -3.25 -5.098584 -1.4014157 0.0000400
4-3 -1.60 -3.448584 0.2485843 0.1223788
5-3 -2.10 -3.948584 -0.2514157 0.0176793
5-4 -0.50 -2.348584 1.3485843 0.9433857

Two-Way ANOVA for Interaction
mod2 = aov(DBP~ TRT*Time, Dat)
summary(mod2)
TRT 1 972.4 972.4 192.81 <2e-16 ***
Time 4 2514.1 628.5 124.62 <2e-16 ***
TRT:Time 4 677.1 169.3 33.56 <2e-16 ***
Residuals 190 958.2 5.0
par(mfrow=c(2,1),mar=c(5,3,1,1))
with(Dat,interaction.plot(Time,TRT,DBP,las=1,legend=T))
with(Dat,interaction.plot(TRT,Time,DBP,las=1,legend=T))
At the end of trial, mean DBP for new drug
treatment A decreased from 116.55 to 101.35 mm
HG whereas mean DBP decreased from 116.75 to
111.95 mm for placebo.

Multiple comparisons
>TukeyHSD(aov(DBP ~ TRT*Time,Dat))
• Treatment A at Time 1 (i.e., A1), the Placebo at
Time points 1 and 2 (i.e., B1, B2)
• For Treatment A at Time 3 (i.e., A3), the Placebo
at Time points 4 and 5 (i.e., B4 and B5)
• For Placebo B at Time 2 (i.e., B2), the Placebo at
Time point 3 (i.e.,B3)
find out how many are not significant ....

References
• Multivariate Data Analysis (7th Edition)
by Joseph F. Hair Jr, William C. Black , Barry J. Babin, Rolph E. Anderson
• An Introduction to Applied Multivariate Analysis with R (Use R!)
by Brian Everitt, Torsten Hothorn
• Clinical Trial Data Analysis Using R (Chapman & Hall/CRC Biostatistics Series)
by Din Chen, Karl E. Peace

Clinicaldataanalysis in r

More Related Content

Viewers also liked (20)

Similar to Clinicaldataanalysis in r (20)

More from Abhik Seal (18)

Recently uploaded (20)

Clinicaldataanalysis in r