Non parametric methods

Non-Parametric
Methods
Peter T. Donnan
Professor of Epidemiology and Biostatistics
Statistics for HealthStatistics for Health
ResearchResearch

Objectives of PresentationObjectives of Presentation
• IntroductionIntroduction
• Ranks & MedianRanks & Median
• Paired Wilcoxon Signed RankPaired Wilcoxon Signed Rank
• Mann-Whitney test (or WilcoxonMann-Whitney test (or Wilcoxon
Rank Sum test)Rank Sum test)
• Spearman’s Rank CorrelationSpearman’s Rank Correlation
CoefficientCoefficient
• Others….Others….

What are non-parametric tests?What are non-parametric tests?
• ‘‘Parametric’ tests involve estimatingParametric’ tests involve estimating
parameters such as the mean, andparameters such as the mean, and
assume that distribution of sampleassume that distribution of sample
means are ‘normally’ distributedmeans are ‘normally’ distributed
• Often data does not follow a NormalOften data does not follow a Normal
distribution eg number of cigarettesdistribution eg number of cigarettes
smoked, cost to NHS etc.smoked, cost to NHS etc.
• Positively skewed distributionsPositively skewed distributions

A positively skewed distributionA positively skewed distribution
0 10 20 30 40 50
Units of alcohol per week
0
5
10
15
20
Frequency
Mean = 8.03
Std. Dev. = 12.952
N = 30

What are non-parametric tests?What are non-parametric tests?
• ‘‘Non-parametric’ tests were developed forNon-parametric’ tests were developed for
these situations where fewer assumptionsthese situations where fewer assumptions
have to be madehave to be made
• Sometimes called Distribution-free testsSometimes called Distribution-free tests
• NP tests STILL have assumptions but areNP tests STILL have assumptions but are
less stringentless stringent
• NP tests can be applied to Normal data butNP tests can be applied to Normal data but
parametric tests have greater powerparametric tests have greater power IFIF
assumptions metassumptions met

RanksRanks
• Practical differences betweenPractical differences between
parametric and NP are that NPparametric and NP are that NP
methods use themethods use the ranksranks of valuesof values
rather than the actual valuesrather than the actual values
• E.g.E.g.
1,2,3,4,5,7,13,22,38,45 - actual1,2,3,4,5,7,13,22,38,45 - actual
1,2,3,4,5,6, 7, 8, 9,10 - rank1,2,3,4,5,6, 7, 8, 9,10 - rank

MedianMedian
• The median is the value above andThe median is the value above and
below which 50% of the data lie.below which 50% of the data lie.
• If the data is ranked in order, it isIf the data is ranked in order, it is
the middle valuethe middle value
• In symmetric distributions the meanIn symmetric distributions the mean
and median are the sameand median are the same
• In skewed distributions, median moreIn skewed distributions, median more
appropriateappropriate

MedianMedian
• BPs:BPs:
135, 138, 140, 140, 141, 142, 143135, 138, 140, 140, 141, 142, 143
Median=Median=

MedianMedian
• BPs:BPs:
135, 138, 140, 140, 141, 142, 143135, 138, 140, 140, 141, 142, 143
Median=140Median=140
• No. of cigarettes smoked:No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 100, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=Median=

MedianMedian
• BPs:BPs:
135, 138, 140, 140, 141, 142, 143135, 138, 140, 140, 141, 142, 143
Median=140Median=140
• No. of cigarettes smoked:No. of cigarettes smoked:
0, 1, 2, 2, 2, 3, 5, 5, 8, 100, 1, 2, 2, 2, 3, 5, 5, 8, 10
Median=2.5Median=2.5

T-testT-test
• T-test used to test whether theT-test used to test whether the
mean of a sample is sig differentmean of a sample is sig different
from a hypothesised sample meanfrom a hypothesised sample mean
• T-test relies on the sample beingT-test relies on the sample being
drawn from a normally distributeddrawn from a normally distributed
populationpopulation
• If sampleIf sample notnot Normal then use theNormal then use the
Wilcoxon Signed Rank Test as anWilcoxon Signed Rank Test as an
alternativealternative

Wilcoxon testsWilcoxon tests
• Frank Wilcoxon was ChemistFrank Wilcoxon was Chemist
In USA who wanted to developIn USA who wanted to develop
test similar to t-test but withouttest similar to t-test but without
requirement of Normal distributionrequirement of Normal distribution
• Presented paper in 1945Presented paper in 1945
• Wilcoxon Signed RankWilcoxon Signed Rank ΞΞ paired t-testpaired t-test
• Wilcoxon Rank SumWilcoxon Rank Sum ΞΞ independent t-independent t-
testtest

Wilcoxon Signed Rank TestWilcoxon Signed Rank Test
• NP test relating to the median asNP test relating to the median as
measure of central tendencymeasure of central tendency
• The ranks of the absoluteThe ranks of the absolute
differences between the data anddifferences between the data and
the hypothesised median calculatedthe hypothesised median calculated
• The ranks for the negative and theThe ranks for the negative and the
positive differences are then summedpositive differences are then summed
separately (Wseparately (W-- and Wand W++ resp.)resp.)
• The minimum of these is the testThe minimum of these is the test
statistic, Wstatistic, W

Normal ApproximationNormal Approximation
• As the number of ranks (n) becomesAs the number of ranks (n) becomes
larger, the distribution of W becomeslarger, the distribution of W becomes
approximately Normalapproximately Normal
• Generally, if n>20Generally, if n>20
• Mean W=n(n+1)/4Mean W=n(n+1)/4
• Variance W=n(n+1)(2n+1)/24Variance W=n(n+1)(2n+1)/24
• Z=(W-mean W)/SD(W)Z=(W-mean W)/SD(W)

AssumptionsAssumptions
• Population should be approximatelyPopulation should be approximately
symmetricalsymmetrical butbut need not be Normalneed not be Normal
• Results must be classified as eitherResults must be classified as either
being greater than or less than thebeing greater than or less than the
median ie exclude results=medianmedian ie exclude results=median
• Can be used for small or largeCan be used for small or large
samplessamples

Paired samples t-testPaired samples t-test
• DisadvantageDisadvantage: Assumes data are a: Assumes data are a
random sample from a populationrandom sample from a population
which is Normally distributedwhich is Normally distributed
• AdvantageAdvantage: Uses all detail of the: Uses all detail of the
available data, and if the data areavailable data, and if the data are
normally distributed it is the mostnormally distributed it is the most
powerful testpowerful test

The Wilcoxon Signed Rank TestThe Wilcoxon Signed Rank Test
for Paired Comparisonsfor Paired Comparisons
• DisadvantageDisadvantage: Only the sign (+ or -): Only the sign (+ or -)
of any change is analysedof any change is analysed
• AdvantageAdvantage: Easy to carry out and: Easy to carry out and
data can be analysed from anydata can be analysed from any
distribution or populationdistribution or population

Paired And Not PairedPaired And Not Paired
ComparisonsComparisons
• If you have the same sampleIf you have the same sample
measured on two separate occasionsmeasured on two separate occasions
then this is a paired comparisonthen this is a paired comparison
• Two independent samples is not aTwo independent samples is not a
paired comparisonpaired comparison
• Different samples which areDifferent samples which are
‘matched’ by age and gender are‘matched’ by age and gender are
pairedpaired

The Wilcoxon Signed Rank TestThe Wilcoxon Signed Rank Test
for Paired Comparisonsfor Paired Comparisons
• Similar calculation to the WilcoxonSimilar calculation to the Wilcoxon
Signed Rank test, only theSigned Rank test, only the
differences in the paired results aredifferences in the paired results are
rankedranked
• Example using SPSS:Example using SPSS:
A group of 10 patients with chronicA group of 10 patients with chronic
anxiety receive sessions of cognitiveanxiety receive sessions of cognitive
therapy. Quality of Life scores aretherapy. Quality of Life scores are
measured before and after therapy.measured before and after therapy.

QoL ScoreQoL Score
BeforeBefore AfterAfter DiffDiff RankRank -/+-/+
66 99 33 5.55.5 ++
55 1212 77 1010 ++
33 99 66 99 ++
44 99 55 88 ++
22 33 11 44 ++
11 11 00 33 tiedtied
33 22 -1-1 22 --
88 1212 44 77 ++
66 99 33 5.55.5 ++
1212 1010 -2-2 11 --
exampleexample
WW-- = 2= 2
WW++ = 7= 7
1 tied1 tied

exampleexample

p < 0.05
SPSS OutputSPSS Output

Mann-Whitney testMann-Whitney test ΞΞ WilcoxonWilcoxon
Rank SumRank Sum
• Used when we want to compare twoUsed when we want to compare two
unrelated or INDEPENDENT groupsunrelated or INDEPENDENT groups
• For parametric data you would useFor parametric data you would use
the unpaired (independent) samplesthe unpaired (independent) samples
t-testt-test
• The assumptions of the t-testThe assumptions of the t-test
were:were:
1.1. The distribution of the measure in eachThe distribution of the measure in each
group is approx Normally distributedgroup is approx Normally distributed
2.2. The variances are similarThe variances are similar
HB Mann

Example (1)Example (1)
The following data shows the numberThe following data shows the number
of alcohol units per week collected in aof alcohol units per week collected in a
survey:survey:
Men (n=13): 0,0,1,5,10,30,45,5,5,1,0,0,0Men (n=13): 0,0,1,5,10,30,45,5,5,1,0,0,0
Women (n=14): 0,0,0,0,1,5,4,1,0,0,3,20,0,0Women (n=14): 0,0,0,0,1,5,4,1,0,0,3,20,0,0
Is the amount greater in men comparedIs the amount greater in men compared
to women?to women?

How would you test whether theHow would you test whether the
distributions in both groups aredistributions in both groups are
approximately Normally distributed?approximately Normally distributed?
 Plot histogramsPlot histograms
 Stem and leaf plotStem and leaf plot
 Box-plotBox-plot
 Q-Q or P-P plotQ-Q or P-P plot

Male Female
Gender
0
10
20
30
40
50
Unitsofalcoholperweek
25
6
7
Boxplots of alcohol units per week by genderBoxplots of alcohol units per week by gender

Are those distributions symmetrical?Are those distributions symmetrical?
Definitely not!Definitely not!
They are both highly skewed so notThey are both highly skewed so not
Normal. If transformation is still not NormalNormal. If transformation is still not Normal
then use non-parametric test – Mann Whitneythen use non-parametric test – Mann Whitney
Suggests perhaps that males tend toSuggests perhaps that males tend to
have a higher intake than women.have a higher intake than women.

Mann-Whitney on SPSSMann-Whitney on SPSS

Normal approx (NS)
Mann-Whitney (NS)

Spearman Rank CorrelationSpearman Rank Correlation
• Method for investigating theMethod for investigating the
relationship between 2 measuredrelationship between 2 measured
variablesvariables
• Non-parametric equivalent toNon-parametric equivalent to
Pearson correlationPearson correlation
• Variables are either non-Normal orVariables are either non-Normal or
measured on ordinal scalemeasured on ordinal scale

ExampleExample
A researcher wishes to assess whetherA researcher wishes to assess whether
the distance to general practicethe distance to general practice
influences the time of diagnosis ofinfluences the time of diagnosis of
colorectal cancer.colorectal cancer.
The null hypothesis would be thatThe null hypothesis would be that
distance is not associated with time todistance is not associated with time to
diagnosis. Data collected for 7 patientsdiagnosis. Data collected for 7 patients

Distance (km)Distance (km)
Time to diagnosisTime to diagnosis
(weeks)(weeks)
55 66
22 44
44 33
88 44
2020 55
4545 55
1010 44
Distance from GP and time to diagnosisDistance from GP and time to diagnosis

Distance from GP and time to diagnosisDistance from GP and time to diagnosis
DistanceDistance
(km)(km)
TimeTime
(weeks)(weeks)
Rank forRank for
distancedistance
Rank forRank for
timetime
DifferenceDifference
in Ranksin Ranks
DD22
22 44 11 33 -2-2 44
44 33 22 11 11 11
55 66 33 77 -4-4 1616
88 44 44 33 11 11
1010 44 55 33 22 44
2020 55 66 5.55.5 0.50.5 0.250.25
4545 55 77 5.55.5 1.51.5 2.252.25
Total = 0Total = 0 ∑∑dd22
=28.5=28.5

ExampleExample
The formula for Spearman’s rankThe formula for Spearman’s rank
correlation is:correlation is:
where n is the number of pairswhere n is the number of pairs
( )1
6
1 2
2
−
−=
∑
nn
d
rs

Spearman’s in SPSSSpearman’s in SPSS

ExampleExample
In our example, rIn our example, rss=0.468=0.468
In SPSS we can see that this value isIn SPSS we can see that this value is
not significant, ie.p=0.29not significant, ie.p=0.29
Therefore there is no significantTherefore there is no significant
relationship between the distance to arelationship between the distance to a
GP and the time to diagnosis but noteGP and the time to diagnosis but note
that correlation is quite high!that correlation is quite high!

• Correlations lie between –1 to +1Correlations lie between –1 to +1
• A correlation coefficient close toA correlation coefficient close to
zero indicates weak or nozero indicates weak or no
correlationcorrelation
• A significant rA significant rss value depends onvalue depends on
sample size and tells you that itssample size and tells you that its
unlikely these results have arisen byunlikely these results have arisen by
chancechance
• Correlation does NOT measureCorrelation does NOT measure
causality only associationcausality only association

Chi-squared testChi-squared test
• Used when comparing 2 or moreUsed when comparing 2 or more
groups of categorical or nominalgroups of categorical or nominal
data (as opposed to measured data)data (as opposed to measured data)
• Already covered!Already covered!
• In SPSS Chi-squared test is test ofIn SPSS Chi-squared test is test of
observed vs. expected in singleobserved vs. expected in single
categorical variablecategorical variable

More than 2 groupsMore than 2 groups
• So far we have been comparing 2So far we have been comparing 2
groupsgroups
• If we have 3 or more independentIf we have 3 or more independent
groups and data is not Normal wegroups and data is not Normal we
need NP equivalent to ANOVAneed NP equivalent to ANOVA
• If independent samples useIf independent samples use Kruskal-Kruskal-
WallisWallis
• If related samples useIf related samples use FriedmanFriedman
• Same assumptions as beforeSame assumptions as before

More than 2 groupsMore than 2 groups

Parametric related to Non-Parametric related to Non-
parametric testparametric test
Parametric TestsParametric Tests Non-parametric TestsNon-parametric Tests
Single sample t-testSingle sample t-test
Paired sample t-testPaired sample t-test
2 independent samples t-2 independent samples t-
testtest
One-way Analysis ofOne-way Analysis of
VarianceVariance
Pearson’s correlationPearson’s correlation

Parametric / Non-parametricParametric / Non-parametric
Parametric Tests Non-parametric Tests
Single sample t-test Wilcoxon-signed rank test
Paired sample t-test
2 independent samples t-
test
One-way Analysis of
Variance
Pearson’s correlation

Paired sample t-test Paired Wilcoxon-signed rank
test
One-way Analysis of
Variance

test
Mann-Whitney test (Note:
sometimes called Wilcoxon
Rank Sum test!)
One-way Analysis of
Variance

test
Mann-Whitney test (Note:
Rank Sum test!)
One-way Analysis of
Variance
Kruskal-Wallis

test
Mann-Whitney test(Note:
Rank Sums test!)
One-way Analysis of
Variance
Kruskal-Wallis
Pearson’s correlation Spearman Rank
Repeated Measures Friedman

SummarySummary
Non-parametricNon-parametric
• Non-parametric methods have fewerNon-parametric methods have fewer
assumptions than parametric testsassumptions than parametric tests
• So useful when these assumptions not metSo useful when these assumptions not met
• Often used when sample size is small andOften used when sample size is small and
difficult to tell if Normally distributeddifficult to tell if Normally distributed
• Non-parametric methods are a ragbag ofNon-parametric methods are a ragbag of
tests developed over time with notests developed over time with no
consistent frameworkconsistent framework
• Read in datasets LDL, etc and carry outRead in datasets LDL, etc and carry out
appropriate Non-Parametric testsappropriate Non-Parametric tests

ReferencesReferences
Corder GW, Foreman DI. Non-parametric Statistics for
Non-Statisticians. Wiley, 2009.
Nonparametric statistics for the behavioural Sciences.
Siegel S, Castellan NJ, Jr. McGraw-Hill, 1988 (first edition
was 1956)

Non parametric methods

More Related Content

What's hot (20)

Viewers also liked (19)

Similar to Non parametric methods (20)

Recently uploaded (20)

Non parametric methods