SlideShare a Scribd company logo
Steve Saffhill
Research Methods in Sport & Exercise
Basic Statistics
Refresher
• Data handling assessment – VLE
• Basic Stats – 2 types:
• Descriptives (figures & Tables)
• Inferentials (accept/reject hypotheses)
• We test hypotheses with stats
i.e., HO: There will be no significant correlation between attendance % and exam %
• We set an alpha (0.05) to compare to the p (probability) value SPSS gives us to make
inferences about what we have found (95% Confidence)
Parametric Assumptions for Deciding on
Inferential Statistics
• The aim of research is to make factual descriptive statements about a
group of people.
• E.g., the ingestion of creatine monohydrate over a 6 week period
before competition will enhance power output by 10%.
• However, it would be nonsense to measure every single person who might
be related to this statement.
• We therefore look at a selected sample of participants and make an
educated guess about the whole of the population.
• But first you need to test if the selected sample is representative of that
population
• Karl Gauss found that if you take a survey of most groups of
people they will behave in certain set ways.
• It is irrelevant how you measure them, most people are
average.
• There are usually some extremes at either end, but most
people behave in an average, predictable way.
• This is what he called normal distribution.
• If any group of students take an exam or write a
piece of coursework, most of them will score around
the average (40% - 60%).
• There are always a few who score highly...
• But, to compensate, there are always a few who do
badly...
Normal Distribution Curve
0
2
4
6
8
10
12
14
16
18
<20 21-30 31-40 41-50 51-60 61-70 71-80 >80
Score (%)
Frequency
Bell-shaped curve showing marks along x-axis, number
of students on y-axis.
• Karl Gauss showed how most groups of people will behave in
this predictable way.
• It is therefore not necessary to measure a whole population
to make a true statement about it. A sample will be sufficient.
• That is.....Providing certain criteria are met, the results can
then be extrapolated to the population.
• These criteria are what we call parametric properties….
= most
common
inferential
tests for you!
What are these criteria?
• These criteria must be explored before
running ANY inferential statistics!
• The descriptive statistics (last week) allow us to
determine if the criteria have been met!
• If we run the wrong inferential test there is a
risk of errors (type I or type II error)!
Errors in statistics
• Type I = (false-positive result) occurs if the Null
hypothesis is rejected when it is actually true
(e.g., the effects of the training are interpreted as being significantly
different when they are not).
• Type II = (false-negative result) occurs if the Null
hypothesis is accepted when it is actually false
(e.g., the effects of the training are interpreted as being equal when
they are actually significantly different).
So What Exactly Are these Criteria that help Us Choose the
Correct Inferential Test?
• Called “the parametric assumptions”
1. Random sampling – must be randomly sampled
2. Level of data being used – must be interval or ratio (high
level data).
3. Normal Distribution – Must be normally distributed (2
checks: 3a and 3b)
4. Equal variance in scores – The variance scores of one
variable should be twice as big as the variance of the other
variable.
1. Random Sampling
• Suppose I wish to make a statement about plyometric training enhancing sprint
times for all athletes.
• If I just measured a top athletics team, their sprint times might very well
improve due to the best nutrition, coaching and training facilities. Plyometric
training might have nothing to do with their success.
• I need to look at an unbiased sample, which I have selected by chance (i.e.,
randomly).
• If we have used a random sample we have accepted this 1st criteria to use
parametric inferential statistics and then check the other 3 criteria
• If not: we can’t run a parametric test so run a non-parametric test (we still
check the other 3 criteria for our report)
2. High Level Data
• Low level data is called this because it is quite crude and
therefore hard to make good judgements about a population
from a sample.
• Putting athletes in a specific order is not constructive either in
terms of learning information about them.
• Supposing I told you I had been in a friendly tennis
tournament and that I had come third.
• You might not be impressed……..
Until I told you that Roger Federer had come first and Kim Clijsters
second
• The person who came fourth was actually the
cleaner who I had persuaded to join us to make up
four so the tournament could take place!!!
• You might again revise your opinion then!!!
• This is why ordinal or ranking data is low level.
Nominal/categorical - categories
• E.g., male/female; rugby players/football/cricket
players; netball players/hockey
players/tennis/badminton players.
Ordinal - rank order
• E.g., 1st, 2nd, 3rd,
• High level data is much better to use. If I gave you a
set of high level data, you could tell me a great deal
about the group.
• Interval - rating scale, Fahrenheit - zero and minus are
meaningful
• E.g., 28ºF, 0ºF.
• Many psychological questionnaires are interval data (eg IQ,
GEQ, CSAI-2).
• Ratio - equal distances between numbers
(zero means nothing, minus has no meaning) 0kg = no
kilograms, -0 kg is meaningless.
• A good deal of physiological data is ratio
(eg height, weight, time in minutes and seconds).
So...(after checking random sampling)
• If you have interval/ratio data we can accept the 2nd
rule/criteria of parametric assumptions and we then check
the other assumptions (no. 3 & 4).
• If not (i.e., we have ordinal/nominal data) we cannot run a
parametric test and we must run a non-parametric test (but
we still check the other 2 assumptions for our report)
3. Normal distribution
• Think back to the beginning of the lecture to Karl Gauss
and how he found that if you take a survey of most
groups of people, they will behave in certain set ways.
• It is irrelevant how you measure them, most people are
average…some extremes but mainly average
• This is what he called normal distribution.
0
2
4
6
8
10
12
14
16
18
20
<20 21-30 31-40 41-50 51-60 61-70 71-80 >80
Frequency
Score (%)
Normal Distribution Curve
Bell-shaped curve showing marks along x-axis, number
of students on y-axis.
• We must check whether the data does actually fit the pattern
of normal distribution as the 3rd parametric rule/criteria
• It may be that if the data were plotted on a graph, it would
not fit the normal pattern.
• It would then be dangerous to assume that the population
behaves in the same way and run a particular test that
assumes they are and lead to a type I or II error!
Normal distribution
• As the graphs we are working with are 2-D, the two
ways in which the data can shift away from normality
is on the horizontal axis and the vertical axis.
• It may either shift to the left extreme or right
extreme. This is known as positive or negative
skewness.
Resting Heart Rates of Athletes 10 Minutes After Exercise
0
2
4
6
8
10
12
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Frequency
Resting Heart Rates of Sedentary Non-Athletes 10 Minutes After Exercise
0
2
4
6
8
10
12
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Frequency
Or, the shift may be a very peaked or very flat curve
0
2
4
6
8
10
12
14
16
18
20
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
2
3
4
5
6
<50 51-60 61-70 71-80 81-90 91-100 >100
Heart Rate (BPM)
Peaked = Leptokurtic
Flat = Platykurtic
Mesokurtic = normal
Normal distribution Curve
• Can be visualised in SPSS when you run the descriptives:
AGE
29.028.027.026.025.024.023.022.021.0
6
5
4
3
2
1
0
Std. Dev = 2.46
Mean = 25.4
N = 22.00
• While a graph is very helpful in deciding if a sample is normally distributed, it
does not actually tell the researcher how skewed or kurtotic the data is.
• A number therefore needs to be found to determine accurately skewness and
kurtosis values.
• These TWO numbers are found in your descriptive statistics and by dividing the
skewness figure by the skewness std error.
• The answer is the skewness statistic (Z skew).
• And then repeat dividing the kurtosis figure by the kurtosis error (Z kurt).
• If these figures are between -1.96 and 1.96 it is said not to be skewed or kurtotic
= Normally Distributed!!.
• These numbers are called Z scores.
Z Scores
• The information you need to work out these Z scores are provided by your
descriptive statistics.
Descriptives
1.50 .073
1.35
1.65
1.50
1.50
.255
.505
1
2
1
1
.000 .343
-2.089 .674
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Anxiety
Statistic Std. Error
Divide .000 by .343 to
give you your Z skewness
score = 0
Divide -2.089 by .674 to
give you your Z kurtosis
score = -3.09
Both the Z skewness and Z kurtosis scores need to
fall between -1.96 and 1.96 for that variable to be
deemed normally distributed
You would need to test this
for
ALL your variables!!!
So.....
• If all of our Z scores (skewness and kurtosis)
are between -1.96 and 1.96 we meet the 3rd
assumption and run a parametric test (we still
check the 4th assumption but it is least important)
• If not then we must run a non-parametric test
(we still check the final 4th assumption).
4. Equal (Homogeneity) Variance
• This criteria looks at two characteristics of a sample to see if
their variances are reasonably similar or very different.
• You take the smallest variance, double it and see whether it is
smaller or larger than the larger variance.
• If the smallest variance doubled is now larger than the biggest
variance, the two data sets are known as homogenous and
this criteria is accepted.
• If the smallest variance doubled is still smaller than the larger
variance, the two data sets are known as heterogenous and
this criteria is not accepted.
• For example, supposing our data sets have the variances 15 and
28.
• You take the smaller variance and double it. 15 x 2 = 30.
• Is 30 bigger or smaller than the biggest variance?
• Yes, 30 is bigger than 28 so the two variables are showing
homogeneity of variance – so we say yes we’ve accepted this
criteria and run a parametric test (pending rules 1-3)!
• For the same reason, 15 and 32 are heterogenous – therefore we
wouldn’t be able to accept this criteria and run (what we do
depends on rules 1-3).
The final parametric
assumption….
De scriptives
11.8010 .30010
11.1221
12.4799
11.7350
11.7600
.901
.94900
10.76
14.03
3.27
1.2200
1.448 .687
2.890 1.334
15.5680 1.38841
12.4272
18.7088
15.0956
13.9050
19.277
4.39053
12.56
27.08
14.52
3.6800
2.388 .687
6.132 1.334
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
Mean
Low er Bound
Upper Bound
95% Confidence
Interval for Mean
5% Trimmed Mean
Median
Variance
Std. Deviation
Minimum
Maximum
Range
Interquartile Range
Skew ness
Kurtosis
GROUP
elite
amateur
TIME
Statistic Std. Error
Some inferential tests (e.g., independent t-test) double check the variance in the
actual SPSS output( e.g., Levene’s test of equality of variance) as it is vital that the
two groups are of equal variance for the test to run!!!
Levene’s Test
Independent Samples Test
8.311 .004 1.704 193 .090 .2542 .14919 -.04009 .54843
1.720 185.961 .087 .2542 .14778 -.03737 .54571
Equal variances
assumed
Equal variances
not assumed
CS1
F Sig.
Levene's Test for
Equality of Variances
t df Sig. (2-tailed)
Mean
Difference
Std. Error
Difference Low er Upper
95% Confidence
Interval of the
Difference
t-test for Equality of Means
• If less than .05 then there is no equal variance!
• If more than .05 then there is equal variance!
• These 4 assumptions are of progressive
importance.
• If you do not meet #1 then use Non-
parametric inferential tests
• Some can be violated but you must justify
doing so with supporting evidence!
Some Assumptions Can be Re-Run after Re-
Checking the Data
• For example if your data is not normally
distributed & you have lots of cases then you
could check & remove the outliers/extremes!
• Remember to justify it!! (Small v Large sample)
• You must then re-check all assumptions again
1010N =
GROUP
amateurelite
TIME 30
20
10
0
18
6
Extreme Score
Outlier Score
You MUST go through this
process every time you analyse
data!
Or…risk running the wrong tests &
getting a type I or II error!

More Related Content

PPTX
Normal Distribution, Skewness and kurtosis
PPT
Topic 7 measurement in research
PDF
Phi Coefficient of Correlation - Thiyagu
PPTX
Introduction to Statistics
DOCX
descriptive and inferential statistics
PPTX
Measures of Central Tendency
PPTX
Descriptive statistics
PDF
Biserial Correlation - Thiyagu
Normal Distribution, Skewness and kurtosis
Topic 7 measurement in research
Phi Coefficient of Correlation - Thiyagu
Introduction to Statistics
descriptive and inferential statistics
Measures of Central Tendency
Descriptive statistics
Biserial Correlation - Thiyagu

What's hot (20)

PDF
Partial Correlation - Thiyagu
PPT
Correlational research
PPTX
Variables for research methodology and its types
PPTX
Analysis of data in research
PPTX
[Ccp] culture and cognition
PPTX
Variable and types of variable
PPTX
Research Method - Ex Post Facto Research
PPTX
Unit 1 - Statistics (Part 1).pptx
PPT
Survival Analysis Lecture.ppt
PPT
Basis of statistical inference
PPTX
Inferential statistics
PPTX
PPT
Bivariate analysis
PPSX
Types of Statistics
PDF
Tetrachoric Correlation - Thiyagu
PPTX
DATA Types
PPT
Methods of data collection
PPT
Descriptive statistics
PDF
Test standardization and norming
PPTX
What is statistics
Partial Correlation - Thiyagu
Correlational research
Variables for research methodology and its types
Analysis of data in research
[Ccp] culture and cognition
Variable and types of variable
Research Method - Ex Post Facto Research
Unit 1 - Statistics (Part 1).pptx
Survival Analysis Lecture.ppt
Basis of statistical inference
Inferential statistics
Bivariate analysis
Types of Statistics
Tetrachoric Correlation - Thiyagu
DATA Types
Methods of data collection
Descriptive statistics
Test standardization and norming
What is statistics
Ad

Viewers also liked (11)

PPTX
Parametric tests
PPTX
Probability distributions & expected values
PPTX
Parametric tests seminar
PPT
Parametric and non parametric test
PPT
Descriptive Statistics
PPTX
Parametric vs Nonparametric Tests: When to use which
PPTX
Non-Parametric Tests
PPT
Introduction To Statistics
PPTX
Sampling and Sample Types
PPT
DIstinguish between Parametric vs nonparametric test
PPT
Definition and types of research
Parametric tests
Probability distributions & expected values
Parametric tests seminar
Parametric and non parametric test
Descriptive Statistics
Parametric vs Nonparametric Tests: When to use which
Non-Parametric Tests
Introduction To Statistics
Sampling and Sample Types
DIstinguish between Parametric vs nonparametric test
Definition and types of research
Ad

Similar to 3. parametric assumptions (20)

PPTX
5. testing differences
PPTX
Statistical tests
PPTX
univariate and bivariate analysis in spss
PPT
Quantitative analysis
PPTX
Unit 2 - Statistics
PPTX
How to Analyse Data
PPT
PDF
03-Data-Analysis-Final.pdf
PDF
basic statisticsfor stastics basic knolege
PDF
Clinical research ( Medical stat. concepts)
PPTX
Basic stat analysis using excel
PDF
Spss basic Dr Marwa Zalat
PDF
R - what do the numbers mean? #RStats
PDF
Data-Screening qqwewqewqeqeqwewqewqeqweqweqwewq.pdf
PPTX
Data analytics course notes of Unit-1.pptx
PDF
Z-score and probability in statistics.pdf
PDF
Biostats and epidimiology slides for cmed.pdf
PPTX
Non parametric-tests
PPTX
INDEPENDENT SAMPLE T TEST.pptx
PPTX
Data Analysis - Confirmatory Data Analysis.pptx
5. testing differences
Statistical tests
univariate and bivariate analysis in spss
Quantitative analysis
Unit 2 - Statistics
How to Analyse Data
03-Data-Analysis-Final.pdf
basic statisticsfor stastics basic knolege
Clinical research ( Medical stat. concepts)
Basic stat analysis using excel
Spss basic Dr Marwa Zalat
R - what do the numbers mean? #RStats
Data-Screening qqwewqewqeqeqwewqewqeqweqweqwewq.pdf
Data analytics course notes of Unit-1.pptx
Z-score and probability in statistics.pdf
Biostats and epidimiology slides for cmed.pdf
Non parametric-tests
INDEPENDENT SAMPLE T TEST.pptx
Data Analysis - Confirmatory Data Analysis.pptx

More from Steve Saffhill (20)

PPTX
PPTX
PPTX
Samantha lawson assignment 1 energy systems
PPTX
Austin baker assignment 1 energy systems
PPTX
PPTX
PPTX
Motivation
PPTX
Personality
PPTX
Jordan brown energy systems and their use in sport and
PPTX
Jamie scott the three engery systems
PPTX
Aaron branch energy systems
PDF
Respiratory system diagram student work (coaching)
PPTX
Energy systems
PPTX
Energy systems 2
PPTX
Lesson 1 overview and definitions
PPT
12 p6 2 self preservation
PPT
11 p6 1 legislation and regs
PPTX
Psychology assignment 1
PPTX
Questionnaire design
PDF
Nutritional benefits of smoothies
Samantha lawson assignment 1 energy systems
Austin baker assignment 1 energy systems
Motivation
Personality
Jordan brown energy systems and their use in sport and
Jamie scott the three engery systems
Aaron branch energy systems
Respiratory system diagram student work (coaching)
Energy systems
Energy systems 2
Lesson 1 overview and definitions
12 p6 2 self preservation
11 p6 1 legislation and regs
Psychology assignment 1
Questionnaire design
Nutritional benefits of smoothies

Recently uploaded (20)

PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
Empowerment Technology for Senior High School Guide
PDF
RMMM.pdf make it easy to upload and study
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
IGGE1 Understanding the Self1234567891011
PPTX
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
PDF
advance database management system book.pdf
PDF
1_English_Language_Set_2.pdf probationary
PDF
Classroom Observation Tools for Teachers
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
PPTX
UNIT III MENTAL HEALTH NURSING ASSESSMENT
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
Cell Types and Its function , kingdom of life
PDF
LDMMIA Reiki Yoga Finals Review Spring Summer
PDF
Complications of Minimal Access Surgery at WLH
PDF
Chinmaya Tiranga quiz Grand Finale.pdf
PPTX
A powerpoint presentation on the Revised K-10 Science Shaping Paper
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
medical_surgical_nursing_10th_edition_ignatavicius_TEST_BANK_pdf.pdf
History, Philosophy and sociology of education (1).pptx
Empowerment Technology for Senior High School Guide
RMMM.pdf make it easy to upload and study
Supply Chain Operations Speaking Notes -ICLT Program
IGGE1 Understanding the Self1234567891011
Chinmaya Tiranga Azadi Quiz (Class 7-8 )
advance database management system book.pdf
1_English_Language_Set_2.pdf probationary
Classroom Observation Tools for Teachers
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Radiologic_Anatomy_of_the_Brachial_plexus [final].pptx
UNIT III MENTAL HEALTH NURSING ASSESSMENT
Final Presentation General Medicine 03-08-2024.pptx
Cell Types and Its function , kingdom of life
LDMMIA Reiki Yoga Finals Review Spring Summer
Complications of Minimal Access Surgery at WLH
Chinmaya Tiranga quiz Grand Finale.pdf
A powerpoint presentation on the Revised K-10 Science Shaping Paper

3. parametric assumptions

  • 1. Steve Saffhill Research Methods in Sport & Exercise Basic Statistics
  • 2. Refresher • Data handling assessment – VLE • Basic Stats – 2 types: • Descriptives (figures & Tables) • Inferentials (accept/reject hypotheses) • We test hypotheses with stats i.e., HO: There will be no significant correlation between attendance % and exam % • We set an alpha (0.05) to compare to the p (probability) value SPSS gives us to make inferences about what we have found (95% Confidence)
  • 3. Parametric Assumptions for Deciding on Inferential Statistics • The aim of research is to make factual descriptive statements about a group of people. • E.g., the ingestion of creatine monohydrate over a 6 week period before competition will enhance power output by 10%. • However, it would be nonsense to measure every single person who might be related to this statement. • We therefore look at a selected sample of participants and make an educated guess about the whole of the population. • But first you need to test if the selected sample is representative of that population
  • 4. • Karl Gauss found that if you take a survey of most groups of people they will behave in certain set ways. • It is irrelevant how you measure them, most people are average. • There are usually some extremes at either end, but most people behave in an average, predictable way. • This is what he called normal distribution.
  • 5. • If any group of students take an exam or write a piece of coursework, most of them will score around the average (40% - 60%). • There are always a few who score highly... • But, to compensate, there are always a few who do badly...
  • 6. Normal Distribution Curve 0 2 4 6 8 10 12 14 16 18 <20 21-30 31-40 41-50 51-60 61-70 71-80 >80 Score (%) Frequency Bell-shaped curve showing marks along x-axis, number of students on y-axis.
  • 7. • Karl Gauss showed how most groups of people will behave in this predictable way. • It is therefore not necessary to measure a whole population to make a true statement about it. A sample will be sufficient. • That is.....Providing certain criteria are met, the results can then be extrapolated to the population. • These criteria are what we call parametric properties….
  • 9. What are these criteria? • These criteria must be explored before running ANY inferential statistics! • The descriptive statistics (last week) allow us to determine if the criteria have been met! • If we run the wrong inferential test there is a risk of errors (type I or type II error)!
  • 10. Errors in statistics • Type I = (false-positive result) occurs if the Null hypothesis is rejected when it is actually true (e.g., the effects of the training are interpreted as being significantly different when they are not). • Type II = (false-negative result) occurs if the Null hypothesis is accepted when it is actually false (e.g., the effects of the training are interpreted as being equal when they are actually significantly different).
  • 11. So What Exactly Are these Criteria that help Us Choose the Correct Inferential Test? • Called “the parametric assumptions” 1. Random sampling – must be randomly sampled 2. Level of data being used – must be interval or ratio (high level data). 3. Normal Distribution – Must be normally distributed (2 checks: 3a and 3b) 4. Equal variance in scores – The variance scores of one variable should be twice as big as the variance of the other variable.
  • 12. 1. Random Sampling • Suppose I wish to make a statement about plyometric training enhancing sprint times for all athletes. • If I just measured a top athletics team, their sprint times might very well improve due to the best nutrition, coaching and training facilities. Plyometric training might have nothing to do with their success. • I need to look at an unbiased sample, which I have selected by chance (i.e., randomly). • If we have used a random sample we have accepted this 1st criteria to use parametric inferential statistics and then check the other 3 criteria • If not: we can’t run a parametric test so run a non-parametric test (we still check the other 3 criteria for our report)
  • 13. 2. High Level Data • Low level data is called this because it is quite crude and therefore hard to make good judgements about a population from a sample. • Putting athletes in a specific order is not constructive either in terms of learning information about them. • Supposing I told you I had been in a friendly tennis tournament and that I had come third. • You might not be impressed…….. Until I told you that Roger Federer had come first and Kim Clijsters second
  • 14. • The person who came fourth was actually the cleaner who I had persuaded to join us to make up four so the tournament could take place!!! • You might again revise your opinion then!!! • This is why ordinal or ranking data is low level.
  • 15. Nominal/categorical - categories • E.g., male/female; rugby players/football/cricket players; netball players/hockey players/tennis/badminton players. Ordinal - rank order • E.g., 1st, 2nd, 3rd, • High level data is much better to use. If I gave you a set of high level data, you could tell me a great deal about the group.
  • 16. • Interval - rating scale, Fahrenheit - zero and minus are meaningful • E.g., 28ºF, 0ºF. • Many psychological questionnaires are interval data (eg IQ, GEQ, CSAI-2). • Ratio - equal distances between numbers (zero means nothing, minus has no meaning) 0kg = no kilograms, -0 kg is meaningless. • A good deal of physiological data is ratio (eg height, weight, time in minutes and seconds).
  • 17. So...(after checking random sampling) • If you have interval/ratio data we can accept the 2nd rule/criteria of parametric assumptions and we then check the other assumptions (no. 3 & 4). • If not (i.e., we have ordinal/nominal data) we cannot run a parametric test and we must run a non-parametric test (but we still check the other 2 assumptions for our report)
  • 18. 3. Normal distribution • Think back to the beginning of the lecture to Karl Gauss and how he found that if you take a survey of most groups of people, they will behave in certain set ways. • It is irrelevant how you measure them, most people are average…some extremes but mainly average • This is what he called normal distribution.
  • 19. 0 2 4 6 8 10 12 14 16 18 20 <20 21-30 31-40 41-50 51-60 61-70 71-80 >80 Frequency Score (%) Normal Distribution Curve Bell-shaped curve showing marks along x-axis, number of students on y-axis.
  • 20. • We must check whether the data does actually fit the pattern of normal distribution as the 3rd parametric rule/criteria • It may be that if the data were plotted on a graph, it would not fit the normal pattern. • It would then be dangerous to assume that the population behaves in the same way and run a particular test that assumes they are and lead to a type I or II error! Normal distribution
  • 21. • As the graphs we are working with are 2-D, the two ways in which the data can shift away from normality is on the horizontal axis and the vertical axis. • It may either shift to the left extreme or right extreme. This is known as positive or negative skewness. Resting Heart Rates of Athletes 10 Minutes After Exercise 0 2 4 6 8 10 12 <50 51-60 61-70 71-80 81-90 91-100 >100 Heart Rate (BPM) Frequency Resting Heart Rates of Sedentary Non-Athletes 10 Minutes After Exercise 0 2 4 6 8 10 12 <50 51-60 61-70 71-80 81-90 91-100 >100 Heart Rate (BPM) Frequency
  • 22. Or, the shift may be a very peaked or very flat curve 0 2 4 6 8 10 12 14 16 18 20 <50 51-60 61-70 71-80 81-90 91-100 >100 Heart Rate (BPM) 2 3 4 5 6 <50 51-60 61-70 71-80 81-90 91-100 >100 Heart Rate (BPM) Peaked = Leptokurtic Flat = Platykurtic Mesokurtic = normal
  • 23. Normal distribution Curve • Can be visualised in SPSS when you run the descriptives: AGE 29.028.027.026.025.024.023.022.021.0 6 5 4 3 2 1 0 Std. Dev = 2.46 Mean = 25.4 N = 22.00
  • 24. • While a graph is very helpful in deciding if a sample is normally distributed, it does not actually tell the researcher how skewed or kurtotic the data is. • A number therefore needs to be found to determine accurately skewness and kurtosis values. • These TWO numbers are found in your descriptive statistics and by dividing the skewness figure by the skewness std error. • The answer is the skewness statistic (Z skew). • And then repeat dividing the kurtosis figure by the kurtosis error (Z kurt). • If these figures are between -1.96 and 1.96 it is said not to be skewed or kurtotic = Normally Distributed!!. • These numbers are called Z scores.
  • 25. Z Scores • The information you need to work out these Z scores are provided by your descriptive statistics. Descriptives 1.50 .073 1.35 1.65 1.50 1.50 .255 .505 1 2 1 1 .000 .343 -2.089 .674 Mean Low er Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skew ness Kurtosis Anxiety Statistic Std. Error Divide .000 by .343 to give you your Z skewness score = 0 Divide -2.089 by .674 to give you your Z kurtosis score = -3.09 Both the Z skewness and Z kurtosis scores need to fall between -1.96 and 1.96 for that variable to be deemed normally distributed You would need to test this for ALL your variables!!!
  • 26. So..... • If all of our Z scores (skewness and kurtosis) are between -1.96 and 1.96 we meet the 3rd assumption and run a parametric test (we still check the 4th assumption but it is least important) • If not then we must run a non-parametric test (we still check the final 4th assumption).
  • 27. 4. Equal (Homogeneity) Variance • This criteria looks at two characteristics of a sample to see if their variances are reasonably similar or very different. • You take the smallest variance, double it and see whether it is smaller or larger than the larger variance. • If the smallest variance doubled is now larger than the biggest variance, the two data sets are known as homogenous and this criteria is accepted. • If the smallest variance doubled is still smaller than the larger variance, the two data sets are known as heterogenous and this criteria is not accepted.
  • 28. • For example, supposing our data sets have the variances 15 and 28. • You take the smaller variance and double it. 15 x 2 = 30. • Is 30 bigger or smaller than the biggest variance? • Yes, 30 is bigger than 28 so the two variables are showing homogeneity of variance – so we say yes we’ve accepted this criteria and run a parametric test (pending rules 1-3)! • For the same reason, 15 and 32 are heterogenous – therefore we wouldn’t be able to accept this criteria and run (what we do depends on rules 1-3).
  • 29. The final parametric assumption…. De scriptives 11.8010 .30010 11.1221 12.4799 11.7350 11.7600 .901 .94900 10.76 14.03 3.27 1.2200 1.448 .687 2.890 1.334 15.5680 1.38841 12.4272 18.7088 15.0956 13.9050 19.277 4.39053 12.56 27.08 14.52 3.6800 2.388 .687 6.132 1.334 Mean Low er Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skew ness Kurtosis Mean Low er Bound Upper Bound 95% Confidence Interval for Mean 5% Trimmed Mean Median Variance Std. Deviation Minimum Maximum Range Interquartile Range Skew ness Kurtosis GROUP elite amateur TIME Statistic Std. Error Some inferential tests (e.g., independent t-test) double check the variance in the actual SPSS output( e.g., Levene’s test of equality of variance) as it is vital that the two groups are of equal variance for the test to run!!!
  • 30. Levene’s Test Independent Samples Test 8.311 .004 1.704 193 .090 .2542 .14919 -.04009 .54843 1.720 185.961 .087 .2542 .14778 -.03737 .54571 Equal variances assumed Equal variances not assumed CS1 F Sig. Levene's Test for Equality of Variances t df Sig. (2-tailed) Mean Difference Std. Error Difference Low er Upper 95% Confidence Interval of the Difference t-test for Equality of Means • If less than .05 then there is no equal variance! • If more than .05 then there is equal variance!
  • 31. • These 4 assumptions are of progressive importance. • If you do not meet #1 then use Non- parametric inferential tests • Some can be violated but you must justify doing so with supporting evidence!
  • 32. Some Assumptions Can be Re-Run after Re- Checking the Data • For example if your data is not normally distributed & you have lots of cases then you could check & remove the outliers/extremes! • Remember to justify it!! (Small v Large sample) • You must then re-check all assumptions again
  • 34. You MUST go through this process every time you analyse data! Or…risk running the wrong tests & getting a type I or II error!

Editor's Notes

  • #4: Just imagine measuring every single athlete in Britain! It would be a life’s work for one study!!
  • #6: (miss the coursework deadline, totally misread the question, don’t make it to the exam). They tend to score low marks.