3. parametric assumptions

Steve Saffhill
Research Methods in Sport & Exercise
Basic Statistics

Refresher
• Data handling assessment – VLE
• Basic Stats – 2 types:
• Descriptives (figures & Tables)
• Inferentials (accept/reject hypotheses)
• We test hypotheses with stats
i.e., HO: There will be no significant correlation between attendance % and exam %
• We set an alpha (0.05) to compare to the p (probability) value SPSS gives us to make
inferences about what we have found (95% Confidence)

Parametric Assumptions for Deciding on
Inferential Statistics
• The aim of research is to make factual descriptive statements about a
group of people.
• E.g., the ingestion of creatine monohydrate over a 6 week period
before competition will enhance power output by 10%.
• However, it would be nonsense to measure every single person who might
be related to this statement.
• We therefore look at a selected sample of participants and make an
educated guess about the whole of the population.
• But first you need to test if the selected sample is representative of that
population

• Karl Gauss found that if you take a survey of most groups of
people they will behave in certain set ways.
• It is irrelevant how you measure them, most people are
average.
• There are usually some extremes at either end, but most
people behave in an average, predictable way.
• This is what he called normal distribution.

• If any group of students take an exam or write a
piece of coursework, most of them will score around
the average (40% - 60%).
• There are always a few who score highly...
• But, to compensate, there are always a few who do
badly...

Normal Distribution Curve
0
2
4
6
8
10
12
14
16
18
<20 21-30 31-40 41-50 51-60 61-70 71-80 >80
Score (%)
Frequency
Bell-shaped curve showing marks along x-axis, number
of students on y-axis.

• Karl Gauss showed how most groups of people will behave in
this predictable way.
• It is therefore not necessary to measure a whole population
to make a true statement about it. A sample will be sufficient.
• That is.....Providing certain criteria are met, the results can
then be extrapolated to the population.
• These criteria are what we call parametric properties….

= most
common
inferential
tests for you!

What are these criteria?
• These criteria must be explored before
running ANY inferential statistics!
• The descriptive statistics (last week) allow us to
determine if the criteria have been met!
• If we run the wrong inferential test there is a
risk of errors (type I or type II error)!

Errors in statistics
• Type I = (false-positive result) occurs if the Null
hypothesis is rejected when it is actually true
(e.g., the effects of the training are interpreted as being significantly
different when they are not).
• Type II = (false-negative result) occurs if the Null
hypothesis is accepted when it is actually false
(e.g., the effects of the training are interpreted as being equal when
they are actually significantly different).

So What Exactly Are these Criteria that help Us Choose the
Correct Inferential Test?
• Called “the parametric assumptions”
1. Random sampling – must be randomly sampled
2. Level of data being used – must be interval or ratio (high
level data).
3. Normal Distribution – Must be normally distributed (2
checks: 3a and 3b)
4. Equal variance in scores – The variance scores of one
variable should be twice as big as the variance of the other
variable.

1. Random Sampling
• Suppose I wish to make a statement about plyometric training enhancing sprint
times for all athletes.
• If I just measured a top athletics team, their sprint times might very well
improve due to the best nutrition, coaching and training facilities. Plyometric
training might have nothing to do with their success.
• I need to look at an unbiased sample, which I have selected by chance (i.e.,
randomly).
• If we have used a random sample we have accepted this 1st criteria to use
parametric inferential statistics and then check the other 3 criteria
• If not: we can’t run a parametric test so run a non-parametric test (we still
check the other 3 criteria for our report)

2. High Level Data
• Low level data is called this because it is quite crude and
therefore hard to make good judgements about a population
from a sample.
• Putting athletes in a specific order is not constructive either in
terms of learning information about them.
• Supposing I told you I had been in a friendly tennis
tournament and that I had come third.
• You might not be impressed……..
Until I told you that Roger Federer had come first and Kim Clijsters
second

• The person who came fourth was actually the
cleaner who I had persuaded to join us to make up
four so the tournament could take place!!!
• You might again revise your opinion then!!!
• This is why ordinal or ranking data is low level.

Nominal/categorical - categories
• E.g., male/female; rugby players/football/cricket
players; netball players/hockey
players/tennis/badminton players.
Ordinal - rank order
• E.g., 1st, 2nd, 3rd,
• High level data is much better to use. If I gave you a
set of high level data, you could tell me a great deal
about the group.

• Interval - rating scale, Fahrenheit - zero and minus are
meaningful
• E.g., 28ºF, 0ºF.
• Many psychological questionnaires are interval data (eg IQ,
GEQ, CSAI-2).
• Ratio - equal distances between numbers
(zero means nothing, minus has no meaning) 0kg = no
kilograms, -0 kg is meaningless.
• A good deal of physiological data is ratio
(eg height, weight, time in minutes and seconds).

So...(after checking random sampling)
• If you have interval/ratio data we can accept the 2nd
rule/criteria of parametric assumptions and we then check
the other assumptions (no. 3 & 4).
• If not (i.e., we have ordinal/nominal data) we cannot run a
parametric test and we must run a non-parametric test (but
we still check the other 2 assumptions for our report)

3. Normal distribution
• Think back to the beginning of the lecture to Karl Gauss
and how he found that if you take a survey of most
groups of people, they will behave in certain set ways.
• It is irrelevant how you measure them, most people are
average…some extremes but mainly average
• This is what he called normal distribution.

0
2
4
6
8
10
12
14
16
18
20
<20 21-30 31-40 41-50 51-60 61-70 71-80 >80
Frequency
Score (%)
Normal Distribution Curve
Bell-shaped curve showing marks along x-axis, number
of students on y-axis.

• We must check whether the data does actually fit the pattern
of normal distribution as the 3rd parametric rule/criteria
• It may be that if the data were plotted on a graph, it would
not fit the normal pattern.
• It would then be dangerous to assume that the population
behaves in the same way and run a particular test that
assumes they are and lead to a type I or II error!
Normal distribution

• As the graphs we are working with are 2-D, the two
ways in which the data can shift away from normality
is on the horizontal axis and the vertical axis.
• It may either shift to the left extreme or right
extreme. This is known as positive or negative
skewness.
Resting Heart Rates of Athletes 10 Minutes After Exercise
0
2
4
6
8
10
12
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Frequency
Resting Heart Rates of Sedentary Non-Athletes 10 Minutes After Exercise
0
2
4
6
8
10
12
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Frequency

Or, the shift may be a very peaked or very flat curve
0
2
4
6
8
10
12
14
16
18
20
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
2
3
4
5
6
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Peaked = Leptokurtic
Flat = Platykurtic
Mesokurtic = normal

Normal distribution Curve
• Can be visualised in SPSS when you run the descriptives:
AGE
29.028.027.026.025.024.023.022.021.0
6
5
4
3
2
1
0
Std. Dev = 2.46
Mean = 25.4
N = 22.00

• While a graph is very helpful in deciding if a sample is normally distributed, it
does not actually tell the researcher how skewed or kurtotic the data is.
• A number therefore needs to be found to determine accurately skewness and
kurtosis values.
• These TWO numbers are found in your descriptive statistics and by dividing the
skewness figure by the skewness std error.
• The answer is the skewness statistic (Z skew).
• And then repeat dividing the kurtosis figure by the kurtosis error (Z kurt).
• If these figures are between -1.96 and 1.96 it is said not to be skewed or kurtotic
= Normally Distributed!!.
• These numbers are called Z scores.

Z Scores
• The information you need to work out these Z scores are provided by your
descriptive statistics.
Descriptives
1.50 .073
1.35
1.65
1.50
1.50
.255
.505
1
2
1
1
.000 .343
-2.089 .674
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Anxiety
Statistic Std. Error
Divide .000 by .343 to
give you your Z skewness
score = 0
Divide -2.089 by .674 to
give you your Z kurtosis
score = -3.09
Both the Z skewness and Z kurtosis scores need to
fall between -1.96 and 1.96 for that variable to be
deemed normally distributed
You would need to test this
for
ALL your variables!!!

So.....
• If all of our Z scores (skewness and kurtosis)
are between -1.96 and 1.96 we meet the 3rd
assumption and run a parametric test (we still
check the 4th assumption but it is least important)
• If not then we must run a non-parametric test
(we still check the final 4th assumption).

4. Equal (Homogeneity) Variance
• This criteria looks at two characteristics of a sample to see if
their variances are reasonably similar or very different.
• You take the smallest variance, double it and see whether it is
smaller or larger than the larger variance.
• If the smallest variance doubled is now larger than the biggest
variance, the two data sets are known as homogenous and
this criteria is accepted.
• If the smallest variance doubled is still smaller than the larger
variance, the two data sets are known as heterogenous and
this criteria is not accepted.

• For example, supposing our data sets have the variances 15 and
28.
• You take the smaller variance and double it. 15 x 2 = 30.
• Is 30 bigger or smaller than the biggest variance?
• Yes, 30 is bigger than 28 so the two variables are showing
homogeneity of variance – so we say yes we’ve accepted this
criteria and run a parametric test (pending rules 1-3)!
• For the same reason, 15 and 32 are heterogenous – therefore we
wouldn’t be able to accept this criteria and run (what we do
depends on rules 1-3).

The final parametric
assumption….
De scriptives
11.8010 .30010
11.1221
12.4799
11.7350
11.7600
.901
.94900
10.76
14.03
3.27
1.2200
1.448 .687
2.890 1.334
15.5680 1.38841
12.4272
18.7088
15.0956
13.9050
19.277
4.39053
12.56
27.08
14.52
3.6800
2.388 .687
6.132 1.334
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
GROUP
elite
amateur
TIME
Statistic Std. Error
Some inferential tests (e.g., independent t-test) double check the variance in the
actual SPSS output( e.g., Levene’s test of equality of variance) as it is vital that the
two groups are of equal variance for the test to run!!!

Levene’s Test
Independent Samples Test
8.311 .004 1.704 193 .090 .2542 .14919 -.04009 .54843
1.720 185.961 .087 .2542 .14778 -.03737 .54571
Equal variances
assumed
Equal variances
not assumed
CS1
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Low er Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means
• If less than .05 then there is no equal variance!
• If more than .05 then there is equal variance!

• These 4 assumptions are of progressive
importance.
• If you do not meet #1 then use Non-
parametric inferential tests
• Some can be violated but you must justify
doing so with supporting evidence!

Some Assumptions Can be Re-Run after Re-
Checking the Data
• For example if your data is not normally
distributed & you have lots of cases then you
could check & remove the outliers/extremes!
• Remember to justify it!! (Small v Large sample)
• You must then re-check all assumptions again

1010N =
GROUP
amateurelite
TIME 30
20
10
0
18
6
Extreme Score
Outlier Score

You MUST go through this
process every time you analyse
data!
Or…risk running the wrong tests &
getting a type I or II error!

3. parametric assumptions

More Related Content

What's hot (20)

Viewers also liked (11)

Similar to 3. parametric assumptions (20)

More from Steve Saffhill (20)

Recently uploaded (20)

3. parametric assumptions

Editor's Notes