Analysis of two samples

BIOL209: Two Samples
Paul Gardner
April 3, 2017
Paul Gardner BIOL209: Two Samples

Two samples
Some of the most frequently used (and simplest) models
include:
Covered in BIOL209:
comparing 2 variances (Fisher’s F test: var.test)
comparing 2 sample means with normal distribution (Student’s
t test: t.test)
comparing 2 means with non-normal distribution (Wilcoxon’s
test: wilcox.test)
comparing 2 proportions (the binomial test: prop.test) NB.
I may not cover this...
comparing 2 variables (“Pearson’s” or “Spearman’s rank”
correlation: cor.test)
Covered in BIOL309:
testing for independence in contingency tables using χ2
(chisq.test)
testing small samples for correlation with Fisher’s exact
test(chisq.test)

R. A. Fisher
Who is this Fisher?
R. A. Fisher (1890-1962) was a statistician and geneticist who
contributed to mathematics and genetic measures of
evolutionary selection
Developed analysis of variance (ANOVA) – See Distinguished
Professor David Schiel’s lectures
Fisher’s F test
Fisher’s exact test
Fisher’s method for combining P-values
and lots more...
https://en.wikipedia.org/wiki/Ronald Fisher

Fisher’s F test
Comparing 2 variances (s2
1 & s2
2 ,)
Sometimes we are interested in comparing the variances of two
samples
This can happen when, for example, we want to compare the
results of a treatment group and a control
If s2
1 ≥ s2
2 , then:
F =
s2
1
s2
2
Broad
x
Freq.
−15
−10
−5
0
5
10
15
0
20
40
60
80
100
120
Narrow
x
Freq.
−15
−10
−5
0
5
10
15
0
100
200
300
400
500

Fisher’s F test
F =
s2
1
s2
2
, s2
1 > s2
2
F ≥ 1
What is the null (H0) for this test?
How will we know if there is a signiﬁcant diﬀerence between
the variances? I.e. what is the critical value of F?
What are the assumptions of this test?
normal & independent
David Schiel will talk a LOT more about this test for ANOVAs

An example
Ozone concentrations were collected in 2 market gardens
gardenB gardenC
1 5 3
2 5 3
3 6 2
4 7 1
5 4 10
6 4 4
7 3 3
8 5 11
9 6 3
10 5 10
Garden B
[ozone] (pphm)
Frequency
0 2 4 6 8 10 12
01234
Garden C
[ozone] (pphm)
Frequency
0 2 4 6 8 10 12
01234
mean
median
mode
oz<-read.csv("f.test.data.csv", header=T)
par(mfrow=c(1,2), cex=2.5)
hist(oz$gardenB, col="cornflowerblue", main="Garden B", xlab="[ozone] (pphm)")
hist(oz$gardenC, col="salmon", main="Garden C", xlab="[ozone] (pphm)")

An example
Calculate F
In this example, we have no idea which variance is likely to be
larger (often not the case for control vs expt situations)
Therefore, we use a two-tailed test (p = 1 − α
2 )
var(oz$gardenB)
[1] 1.333333
var(oz$gardenC)
[1] 14.22222
(F.ratio <- var(oz$gardenC)/var(oz$gardenB))
[1] 10.66667
#we double the probablity for the two-tailed test:
2*(1-pf(F.ratio, 9, 9))
[1] 0.001624199
The probability of obtaining a F ratio as large as this, or
larger, if the variances were the same (i.e. by chance), is less
than 0.002.
NB. This is not the probability that the null is true!
The null is assumed to be true in carrying out the test
(same with the other statistical tests).

RECALL: p values and significance
p is the probability that a test statistic could have occured by
chance when the null hypothesis is true
say this 50 times, write on the bathroom wall, remember it!
a result is said to be significant (or unlikely to be due to
chance) if p is low
≤ 5% (or α ≤ 0.05)
the significance threshold may change depending upon the field
e.g. p ≤ 3x10−7
is often used in high-energy physics,
p ≤ 10−10
is often used in genomics

An example
There is a faster way of running F tests:
var.test(oz$gardenC, oz$gardenB, alternative = "two.sided")
F test to compare two variances
data: oz$gardenC and oz$gardenB
F = 10.667, num df = 9, denom df = 9, p-value = 0.001624
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
2.649449 42.943938
sample estimates:
ratio of variances
10.66667
NB. shouldn’t use Student’s t test on this data as the
variance are signiﬁcantly diﬀerent and the means are the same
(5 pphm)

Comparing two means
We now know how to test if our variances are significantly
different,
Suppose we wish to see if our means of our samples are
significantly different
As usual, we want to:
1. Compute a test statistic
2. How likely is our test statistic if our null hypothesis is true?
3. Compare the statistic to a critical value.
R is particularly handy, since many statistical tables and
formula for many probability distributions have been built into
the package
What are some measures we could use to compare two means?

Examples of comparing two means
Genetic variation and associations with disease (or the
severity)
Treatment vs control groups (e.g. with/without fertiliser or
pesticide or greenhouse or ...)
Comparing regions, outcomes, social systems, time-periods, ...
So fundamental that it’s diﬃcult to think of a ﬁeld where this
wouldn’t be useful!
−3 −2 −1 0 1 2 3
0.00.10.20.30.4
Probability Densities for 2 Normal Distributions
x
Prob.
mu=0.0, sigma=1.0
mu=1.0, sigma=1.0

We’ll consider two tests (maybe three)
Student’s t test for when our samples are independent,
variances are constant, and the data is normally distributed
(parametric)
Wilcoxon rank-sum test for when our samples are
independent, but the data is not necessarily normally
distributed (nonparametric)
Kolmogorov-Smirnov test for when our samples are
independent, but the data is not necessarily normally
distributed (nonparametric)

Student’s t Test
“Student” is a pseudonym for W. S. Gosset who ﬁrst
published the approach in 1908
Prevented from using his own name by his employer, the
Guinness Brewing Company
The test statistic is the number of standard errors by which
the sample means are separated
does this remind you of z scores?

Student’s t Test
We write:
t =
¯xA − ¯xB
SEdiﬀ
I hope you recall that: SE¯x = s2
n ,
A trick called the variance sum law, can be used to show that:
SEdiﬀ =
s2
A
nA
+
s2
B
nB

Estimating the variance of a diﬀerence between 2
independent samples
If nA = nB:
[(xA − ¯xA) − (xB − ¯xB)]2
With a little algebra we can show that:
σ2
¯xA−¯xB
= σ2
A + σ2
B
NB. Only true if the samples are not correlated

Let’s try an example
The ozone concentrations in market gardens, garden B vs C
was inappropriate for Student’s t, what about A vs B?
(oz<-read.csv("t.test.data.csv", header=T))
gardenA gardenB
1 3 5
2 4 5
3 4 6
4 3 7
5 2 4
6 3 4
7 1 3
8 3 5
9 5 6
10 2 5
#check means:
apply(oz, 2, mean)
gardenA gardenB
3 5
#check variances:
apply(oz, 2, var)
gardenA gardenB
1.333333 1.333333
Garden A
[ozone] (pphm)
Frequency
0 2 4 6 8 10 12
01234
Garden B
[ozone] (pphm)
Frequency
0 2 4 6 8 10 12
01234
mean
median
mode

Degrees of freedom, t statistic
d.f . = nA + nB − 2 ????
d.f . = 10 + 10 − 2 = 18
Since, we’re not testing (or don’t know) in advance which
garden has the higher mean ozone concentration, this is a
two-tailed test (if we did, then we’d use a one-tailed test).
Therefore we can work out the critical value of Student’s t,
with α = 0.05:
qt(0.975,18)
[1] 2.100922

More data exploration
Boxplots are a great way to examine data
The notch option is is handy:
?boxplot
notch: if notch is TRUE, a notch is drawn in each side of the
boxes. If the notches of two plots do not overlap this is
strong evidence that the two medians differ (Chambers _et
al_, 1983, p. 62). See boxplot.stats for the calculations
used.
attach(oz)
ozone<-c(gardenA,gardenB)
label <- factor( c(rep("A",10), rep("B", 10)) )
boxplot(ozone~label, notch=T, xlab="[ozone] (pphm)", ylab="Garden", col="cornflowerblue", horizontal = T)
AB
1 2 3 4 5 6 7
[ozone] (pphm)
Garden

More data exploration
AB
1 2 3 4 5 6 7
[ozone] (pphm)
Garden
The notches of the two plots do not overlap
Therefore the medians are signiﬁcantly diﬀerent at the 5%
level
s2A <- var(gardenA)
s2B <- var(gardenB)
s2A/s2B
[1] 1
( mean(gardenA) - mean(gardenB) )/sqrt( s2A/10 + s2B/10 )
[1] -3.872983
The absolute value of the test statistic is greater than the critical
value (2.100922). Therefore we can reject the null hypothesis.

The easy way...
t.test(gardenA, gardenB)
Welch Two Sample t-test
data: gardenA and gardenB
t = -3.873, df = 18, p-value = 0.001115
alternative hypothesis: true difference in means is not equal to 0
-3.0849115 -0.9150885
sample estimates:
mean of x mean of y
3 5
You might describe this result like so:
Ozone concentration was signiﬁcantly higher in garden B
(mean = 5.0 pphm) than in garden A (mean = 3.0 pphm;
t = 3.873, p = 0.0011 (2 tailed), d.f . = 18)

Wilcoxon Rank-Sum Test
A non-parametric alternative to Student’s t test
The test statistic is computed by:
1. Place both samples into a single array, with their sample
names attached (e.g. “A” and “B”)
2. Then sort the list, keeping the names attached
3. Assign a rank to each value (ties receive an averaged rank)
4. Sum the rank for each sample
5. Signiﬁcance is determined by the size of the smaller sum of
ranks
Right skew
x
Freq.
0 5 10 15 20
050150
mean
median
mode
Left skew
x
Freq.
0 5 10 15 20
040100

Wilcoxon Rank-Sum Test
What is the null for this test?
What is the minimum value for this statistic?
Hint: i i = 1
2 n(n + 1)

An example: back to the ozone contaminated gardens
ozone
[1] 3 4 4 3 2 3 1 3 5 2 5 5 6 7 4 4 3 5 6 5
label
[1] A A A A A A A A A A B B B B B B B B B B
Levels: A B
(combined.ranks <- rank(ozone))
[1] 6.0 10.5 10.5 6.0 2.5 6.0 1.0 6.0 15.0 2.5 15.0 15.0 18.5 20.0 10.5
[16] 10.5 6.0 15.0 18.5 15.0
#tapply: Apply a function to each cell of an array, grouped by factors!
tapply(combined.ranks, label, sum)
A B
66 144
We can look up the smaller of the two values (66) in tables of
Wilcoxon rank sums and reject the null if 66 is smaller than
the tabled value at the appropriate significance
In this case, the critical value is 78
Therefore we can again reject the null hypothesis.
The sample means are significantly different.

Critical values table
http://www.real-statistics.com/statistics-tables/wilcoxon-rank-sum-table-independent-samples/

The easy way...
wilcox.test(gardenA, gardenB)
Wilcoxon rank sum test with continuity correction
data: gardenA and gardenB
W = 11, p-value = 0.002988
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(gardenA, gardenB) :
cannot compute exact p-value with ties
The function wilcox.test approximates a z value, which it
then uses to compute a p value
In this case, p = 0.002988, which is much less than 0.05, so
we can reject the null
The warning means that p values cannot be compute exactly
(doesn’t matter in most cases)
Why is W = 11 & not 66? n
2 (n + 1) is subtracted in R (noted
in the documentation for wilcox.test)

t vs W
t.test(gardenA, gardenB)
t = -3.873, df = 18, p-value = 0.001115
wilcox.test(gardenA, gardenB)
W = 11, p-value = 0.002988
The non-parametric test is much more appropriate than the t
test when the distribution is not normal
However, the non-parametric test is about 95% as powerful
when the distribution is normal (i.e. increased chance of
falsely accepting the null)
Wilcoxon is more powerful than t in the presence of
outliers/skew
Typically, as here, the t test will give the lower p value
Wilcoxon tests are generally more conservative!

Tests on Paired Samples
Sometimes 2-sample data comes from paired observations
E.g. an individuals behaviour in the morning vs afternoon,
stream health before vs after contamination, well-being before
or after a drug treatment, ...
Recall the variance for of a difference:
σ2
¯xA−¯xB
= [(xA − ¯xA) − (xB − ¯xB)]2
= (xA − ¯xA)2
+ (xB − ¯xB)2
− 2(xA − ¯xA)(xB − ¯xB)
The covariance of A & B is given by the third term
If the covariance is positive then the variance of the difference
is reduced!
This can make it easier to detect significant differences
between the means!

An example
Kick samples of aquatice invertebrates from 16 rivers
1 sample upstream from a sewage outfall, 1 downstream
(streams<-read.csv("streams.csv", header=T))
down up
1 20 23
2 15 16
3 10 10
4 5 4
5 20 22
6 15 15
7 10 12
8 5 7
9 20 21
10 15 16
11 10 11
12 5 5
13 20 22
14 15 14
15 10 10
16 5 6
attach(streams)
Down
# invertebrates
Frequency
5 10 15 20
01234
Up
# invertebrates
Frequency
5 10 15 20
0.01.02.03.0
mean
median
mode

An example: t.test
If we ignore the fact that the samples are paired (p is rubbish):
t.test(down, up)
Welch Two Sample t-test
data: down and up
t = -0.40876, df = 29.755, p-value = 0.6856
-5.248256 3.498256
sample estimates:
mean of x mean of y
12.500 13.375

An example: paired t.test
The picture changes completely if we account for the fact that
samples are paired!
The moral of the story:
if you can do a paired t test, then you should always do a
paired test!
In general, if you have information on blocking or spatial
correlation then you should incorporate this information into
your analysis
t.test(down, up, paired=T)
Paired t-test
data: down and up
t = -3.0502, df = 15, p-value = 0.0081
-1.4864388 -0.2635612
sample estimates:
mean of the differences
-0.875

An example: t on the diﬀerences
Same result:
Halved degrees of freedom, but this is compensated for by
reducing the error variance
Blocking always helps!
t.test(up-down)
One Sample t-test
data: up - down
t = 3.0502, df = 15, p-value = 0.0081
alternative hypothesis: true mean is not equal to 0
0.2635612 1.4864388
sample estimates:
mean of x
0.875

Pop test: Q1
Given:
s2
=
(x − ¯x)2
n − 1
Show that:
s2
=
1
n − 1
× x2
−
( x)2
n

Pop test: Q2
A colleague has collected some very important independant
data on the affects of a drug on tumour growth.
She has computed the following statistics for you:
Control:
nC = 10
x2
C = 100, 334
xC = 998
Drugged:
nD = 10
x2
D = 68, 811
xD = 823
What is the t statistics for comparing these two samples?
Recall:
t =
¯xA − ¯xB
SEdiff
SEdiff =
s2
A
nA
+
s2
B
nB
The critical value of t is 1.73 for α = 0.05 and d.f . = 18. Is
the result significant?

Pop test: Q3
You’re given the data for 3 independent samples. The
corresponding histograms are shown below.
1. How would you compare sample 1 & 2?
2. How would you compare sample 2 & 3?
Justify your answers.
Sample 1
x
Freq.
70 90 110 130
0100200
Sample 2
x
Freq.
80 100 120 140
0100200 Sample 3
x
Freq.
95 100 105 110
060120
mean
median
mode

Further reading
Chapter 6 of Crawley (2015) Statistics: An introduction using
R.
Excluding binomial, χ2, contingency tables & Fisher’s exacts
tests

The End

Analysis of two samples

More Related Content

Similar to Analysis of two samples (20)

More from Paul Gardner (20)

Recently uploaded (20)

Analysis of two samples