SlideShare a Scribd company logo
INTRODUCTION TO STATISTICS
April 25th 2013
James.Hall@Education.ox.ac.uk
Coral.Milburn-Curtis@Education.ox.ac.uk
STRUCTURE OF THIS MORNING
 9.30-10.15: Introduction & basic concepts
15 minute break
 10.30:11.15: Introduction to SPSS
15 minute break
 11.30-12.30: Two worked examples
2
PURPOSE OF THIS MORNING
 To “set the scene” for people to go away and learn
statistics for themselves
 With the help of a textbook at the appropriate level:
1. Pallant. SPSS Survival Manual
2. Field. Discovering Statistics Using SPSS
3. Tabachnick and Fidell. Using Multivariate Statistics
 At the end of this morning, you should be able to:
 Understand that there are four areas of knowledge
required to successfully produce statistics in the social
sciences
 Understand basic statistical terminology
 Have an introductory understanding of SPSS
3
SESSION ONE: INTRODUCTION AND
BASIC CONCEPTS
9.30-10.15
4
CONTENTS
1. Background
2. Essential Ideas (no maths though)
 Including Descriptive Statistics
3. Inferential Statistics
4. Things that you can easily do in Microsoft Excel
5
1. BACKGROUND
6
WHERE YOU FIND STATISTICS WITHIN JOURNAL
ARTICLES & DISSERTATIONS: EVERYWHERE!
 Literature Review
 Determining the statistical weaknesses in past research in order to
identify the gap which will be addressed
 Method
 Participants
 Description of participants – including numbers and description of their
background (can include representativeness)
 Materials/Measures
 Presentation of all measures and full description
 Design
 Description of the design of the study and presentation of variables
 Procedure
 Description of what was undertaken – including manipulation of variables
 Results
 Descriptive statistics first, then Inferential. The aim is the same as
in a literature review/essay – to tell a coherent story
 Discussion
 The strengths and weaknesses of this research compared to
previous studies – suggestions for future research
7
FOUR KEY AREAS OF KNOWLEDGE TO ACQUIRE
SHOULD YOU NEED TO DO ANY OF THIS
Data
Management
Data Entry (into
a suitable
software)
Variable Creation
Data Cleaning &
Variable
Modification
Producing
Statistics
Descriptive
Documenting
Missing Data
Measures of
Central
Tendency
Range &
Dispersion of
Scores
Inferential
Tests
...of Difference ...of Association
Models
Presentation
Skills
Ability to write-up
/report statistics
Ability to
generate suitable
tables & graphs
8
1 2 3 4
(NOTE)
 Although all four areas of knowledge are needed
should you ever need statistics for your own
research
 Learning how to carry out statistics (numbers 2 and
3 in the previous slide) can be quicker and easier
than learning Data Management (1) and learning
how to present/write-up statistics (4)
 Further, textbooks and online courses commonly
skip areas 1 and 4.
9
SOURCES OF KNOWLEDGE
SHOULD YOU NEED TO DO ANY OF THIS
 Textbooks:
 BASICS->INTERMEDIATE:
 Field, A. (2009) Discovering statistics using SPSS (and sex, drugs
and rock'n'roll). 3rd ed. London: Sage Publications Inc.
 INTERMEDIATE->ADVANCED:
 Tabachnick, B.G. & Fidell, L.S. (2013) Multivariate Statistics. 6th
ed. Allyn and Bacon: Boston
 Websites:
 Andy Field’s website: http://www.statisticshell.com/
 (Warning: he has a very quirky sense of humour & there is bad
language)
 Includes videos of his statistics lectures including how to write up
statistics which can otherwise be found here:
 http://www.youtube.com/watch?v=vekCPvF016A
10
COMPUTER PACKAGES
SHOULD YOU NEED TO DO ANY OF THIS
 Microsoft Excel will only take you so far...
 Perhaps the most common statistical software
package is SPSS
 The University of Oxford has a site-license for this
meaning that you can get it installed on your
machines
 It’s also on the machines in the Department’s
computer room
 Excel is a Spreadsheet programme – SPSS is a
database programme – don’t be confused
 Other statistical software packages (that do the
basics well) include: STATA and SAS
11
2. ESSENTIAL IDEAS
12
LEVELS OF MEASUREMENT
 Measurement is the representation of information with
numbers
 There are different levels of complexity in how we use
numbers in measurement – from simple to complex
 From most-simple to most-complex, there are three
commonly used levels of complexity:
1. “Discrete”/“Nominal”/”Categorical” Level: e.g. east =1,
west=2
2. “Ordinal”: e.g. small=1, medium=2, large=3
3. “Continuous”
a. “Interval”: e.g. age in years
b. “Ratio”: a special type of interval data. One where zero
represents nothing. e.g. income as opposed to temperature in
degrees Celsius
13
LEVELS OF MEASUREMENT
 One “level of measurement” is special however – it is
both categorical and ordinal at the same time:
 “Dichotomous”/“Binary”
 discrete data with only 2 conditions. (0 and 1 is the usual way of
coding) e.g. Employed/not-employed
 Being able to identify a level of measurement is the
most important first thing to learn:
 It informs which “measure of central tendency” and
method of documenting “dispersion” that you should
report
 And it informs which “Inferential” “Test” or “Model” you
should carry out and how you should go about this
14
CLASS EXERCISE
 Talk to your neighbour: Which level of measurement best
describes the following measures?:
 Telephone numbers
 Gender
 Participants’ scores on an self-report anxiety question:
 Strongly Agree (5), Agree(4), Neither Agree nor Disagree (3)
Disagree(2), Strongly Disagree(1)
 Height
 University Rankings
15
DESCRIPTIVE STATISTICS
 Once we have identified each variable’s level of
measurement we can then describe this variable with
descriptive statistics
 Measures of Central Tendency:
 For Discrete Data: Mode
 For Ordinal Data: Median
 (though you can report the mode as well)
 For Continuous Data: Mean
 (though you can report the median and mode as well)
 Measures of Dispersion:
 For Ordinal Data: Inter-quartile Range
 For Continuous Data: Standard Deviation 16
INTER-QUARTILE RANGE
 The median is the middle-value of a range of
scores. It is the “second quartile”(“Q2”)
 If we divided the range into four equal parts, the second
quartile would occur in the middle – as the median does.
 The “Inter-Quartile Range” (IQR) is the middle-
range that surrounds the median
 First quartile (“Q1”) to the third (“Q3”)
 We get this range by the simple subtraction of Q1 from
Q3:
 IQR=Q3-Q1
17
Median
Q2
Q1 Q3 Q4
GRAPHING ORDINAL DATA
18
This is a box plot
Range
encompassing
middle 90% of
values
The middle-
value. AKA the
Median
Range
encompassing
middle 50% of
values. AKA: The
“inter-quartile
range”
“Outliers”
STANDARD DEVIATION
 When we calculate a mean, we understand that not everyone
actually has this score
 Some scores are closer, some are further away from the mean
 But how close are people’s scores to the mean - on average?
 This is the Standard Deviation:
19
Mean
+1 standard deviation
-1 standard deviation
THE NORMAL DISTRIBUTION
 This is a special “distribution” of continuous data
 Many real-life continuous variables are normally distributed
 95% of the cases in a normally distributed continuous variable occur in
the blue area (mean ±1.96 standard deviations[SDs])
20
-3SDs -1.96SDs -1SDs Mean +1SDs +1.96SDs +3SDs
GRAPHING CONTINUOUS DATA
21
If you request a
histogram of
continuous data,
SPSS creates
arbitrary pots of
scores(!)
Don’t rely on this
fitted “normal curve”
to establish the
“normality” of a
continuous measure
This is a
histogram
A LITTLE MORE ON DESCRIBING CONTINUOUS
DATA
 Two more descriptive statistics are “Skewness” and “Kurtosis”:
 A rule-of-thumb for assessing whether a continuous measure is
“normally distributed”:
 Divide each above statistic by it’s “standard error”
 (a good statistics software will calculate all these values for you)
 Scores outside the range -2 to +2 suggest you have non-normality
 Quotable source: http://web.ipac.caltech.edu/staff/fmasci/home/statistics_refs/SkewStatSignif.pdf
22
zero
-ve
+ve
3. INFERENTIAL STATISTICS
23
TYPES
 “Bivariate” and “Multivariate”
 In other words: “Two variables” and “Multiple variables”
 “Tests”
 ...Of the difference between groups or time-points
 ...Of the association between two or more measures
 “Models”
 Miniature representations of reality
 supposedly(!)
 lots of ways in which this can be determined
24
HYPOTHESIS TESTING
 We state a hypothesis (H1) about how we believe two or more
measures should be related to one another and then try to
disprove it’s opposite – it’s “null hypothesis” (H0)
 Because we are trying to disprove H0, the full name for H1 is the
“alternative hypothesis”
 We gather a “sample” of data to do this, but then
infer/generalise conclusions back to the real-world “population”
from which we believe our sample came from
 We estimate the accuracy of these generalisations back to the
real-world with probabilities
 We want the null-hypothesis to be very unlikely to be true in the
real-world and so we look for low probabilities
 We usually want our inferential statistics to reject our null-hypotheses with
95% confidence (so we look for probabilities <5%)
 Moving from percentage to proportion: we look for p<0.05
25
EFFECT SIZE
 Though we look for probabilities <5% (“p<0.05”)
 The likelihood that we find one is also affected by how
many people we consider:
 More people considered = more chance of p<0.05
 This means that p<0.05 is not a reliable enough
measure of a statistical (probabilistic) effect
 We also need a measure that is not affected by the number of
people we consider
 Such quantities are termed “Effect Sizes”
 As they estimate the “size of a statistical effect” 26
INFERENTIAL STATISTICS - TESTS
27
COMMON EFFECT SIZES FOR STATISTICAL TESTS
28
Test Statistic Effect Size ‘Small’
effect
‘Mediu
m’
effect
‘Large’
effect
Chi-square 2 2 = 2 / N * (k-1)
(k = smaller of number
rows or number of
columns)
.01 .09 .25
Pearson ‘s
correlation
r ±.01 ±.03 ±.05
t-test related t .02 .05 .08
t-test
unrelated
One-way
ANOVA
F 2 = SSeffect / SStotal .01 .06 .14
 Good news: There are lots of online calculators that
will calculate these for you! Just try searching for,
“effect size calculator”
freedom
of
degree
2t
d 
PARAMETRIC DATA
 A property of Continuous Data
 It strongly dictates which “Inferential Test” to carry out
 Three assumptions:
1. The continuous measure is “Normally Distributed”
 We can test this (see slide 19!)
2. When comparing groups of scores to one another, all groups
should have the same “standard deviation”
 We can test this (good software packages will do it automatically)
3. Each score was gathered independently from the others
 We can’t test this, this concerns how we gathered our data 29
INFERENTIAL STATISTICS - MODELS
Reality – The Underlying
Population
Drawn Sample – The data
available to us
Statistical Model – Our version
of Reality created with our
Sample data 30
(With unavoidable “error”)
(With unavoidable
“residual” aspects left un-
accounted for)
STATISTICAL REGRESSION
 The most common “Model” in “Inferential Statistics”
 At it’s simplest, it “models” how much one measure (“y”)
is driven by another (“x”):
 We say that we “regress” “y on x”
 y=mx+C
 y=b1x+bo [+e]
31
m
x
y
C
4. THINGS THAT YOU CAN EASILY DO
IN MICROSOFT EXCEL
32
THINGS YOU CAN EASILY DO IN EXCEL
1. Simple aspects of Data Management
 E.g. Looking at your data & simple manipulation of data
2. Generate Descriptive Statistics
 Get Measures of Central Tendency
 ...& Dispersion
3. Carry out simple Inferential Statistics
 E.g. Correlations (at a push)
4. Create Tables and simple Figures
 Perhaps Excel’s most useful purpose. You can even copy-
pasting SPSS output into Excel and so simplify it to a level
suitable for reporting in a report/dissertation/paper
 You really can’t rely on Excel for anything other than the
basics however...
 ...It will struggle to give you the necessary statistics for any
solely quantitative dissertation.
33

More Related Content

PPTX
050325Online SPSS.pptx spss social science
PDF
Basic Statistical Concepts.pdf
PPT
Stats-Review-Maie-St-John-5-20-2009.ppt
PPTX
Statistics(Basic)
PDF
Essentials of Social Statistics for a Diverse Society (Third Edition) Anna Le...
PPT
Chapter6
PPT
Chapter34
PDF
Research method ch07 statistical methods 1
050325Online SPSS.pptx spss social science
Basic Statistical Concepts.pdf
Stats-Review-Maie-St-John-5-20-2009.ppt
Statistics(Basic)
Essentials of Social Statistics for a Diverse Society (Third Edition) Anna Le...
Chapter6
Chapter34
Research method ch07 statistical methods 1

Similar to Introduction_to_Statistics_as_used_in_th.ppt (20)

PPTX
Chapter_1_Lecture.pptx
PDF
Statistics A Gentle Introduction 4th Edition Frederick L Coolidge
PPTX
Biostatistics Basics Descriptive and Estimation Methods
PDF
1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf
PPTX
MD Paediatrics (Part 1) - Overview of Basic Statistics
PDF
7- Quantitative Research- Part 3.pdf
PPT
UNIVERSITY FALL LECTURES ON STATISTICAL VARIANCE --1
PPTX
Basic concept of statistics
PDF
Essentials Of Social Statistics For A Diverse Society Third Edition 3rd Anna ...
PPTX
Stats - Intro to Quantitative
PPTX
Complete Biostatistics (Descriptive and Inferential analysis)
PPTX
Introduction to Biostatistics in medical research
PPTX
statisticsforsupportslides.pptxnnnnnnnnnnnnnnnnnn
PPT
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
PPTX
Basics in Biostats,applications,types,about in detile
PDF
UG_B.Sc._Psycology_11933 –PSYCHOLOGICAL STATISTICS.pdf
PPTX
3. parametric assumptions
PPTX
Introduction to Statistics Presentation.pptx
PDF
Essentials of Social Statistics for a Diverse Society (Third Edition) Anna Le...
PDF
1Basic biostatistics.pdf
Chapter_1_Lecture.pptx
Statistics A Gentle Introduction 4th Edition Frederick L Coolidge
Biostatistics Basics Descriptive and Estimation Methods
1.Introduction to Biostatistics MBChB 6 - DPH 6024.pdf
MD Paediatrics (Part 1) - Overview of Basic Statistics
7- Quantitative Research- Part 3.pdf
UNIVERSITY FALL LECTURES ON STATISTICAL VARIANCE --1
Basic concept of statistics
Essentials Of Social Statistics For A Diverse Society Third Edition 3rd Anna ...
Stats - Intro to Quantitative
Complete Biostatistics (Descriptive and Inferential analysis)
Introduction to Biostatistics in medical research
statisticsforsupportslides.pptxnnnnnnnnnnnnnnnnnn
Statistics and Public Health. Curso de Inglés Técnico para profesionales de S...
Basics in Biostats,applications,types,about in detile
UG_B.Sc._Psycology_11933 –PSYCHOLOGICAL STATISTICS.pdf
3. parametric assumptions
Introduction to Statistics Presentation.pptx
Essentials of Social Statistics for a Diverse Society (Third Edition) Anna Le...
1Basic biostatistics.pdf
Ad

Recently uploaded (20)

PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
PPTX
Session 11-13. Working Capital Management and Cash Budget.pptx
PPTX
kyc aml guideline a detailed pt onthat.pptx
PPTX
Session 3. Time Value of Money.pptx_finance
PPTX
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
PDF
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
PDF
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
PPTX
The discussion on the Economic in transportation .pptx
PPTX
Introduction to Essence of Indian traditional knowledge.pptx
PPTX
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
DOCX
marketing plan Elkhabiry............docx
PDF
how_to_earn_50k_monthly_investment_guide.pdf
PDF
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
PDF
discourse-2025-02-building-a-trillion-dollar-dream.pdf
PPTX
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
PPTX
social-studies-subject-for-high-school-globalization.pptx
PDF
Copia de Minimal 3D Technology Consulting Presentation.pdf
PPTX
Introduction to Customs (June 2025) v1.pptx
PDF
final_dropping_the_baton_-_how_america_is_failing_to_use_russia_sanctions_and...
PDF
ECONOMICS AND ENTREPRENEURS LESSONSS AND
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Session 11-13. Working Capital Management and Cash Budget.pptx
kyc aml guideline a detailed pt onthat.pptx
Session 3. Time Value of Money.pptx_finance
Basic Concepts of Economics.pvhjkl;vbjkl;ptx
Dr Tran Quoc Bao the first Vietnamese speaker at GITEX DigiHealth Conference ...
Spending, Allocation Choices, and Aging THROUGH Retirement. Are all of these ...
The discussion on the Economic in transportation .pptx
Introduction to Essence of Indian traditional knowledge.pptx
FL INTRODUCTION TO AGRIBUSINESS CHAPTER 1
marketing plan Elkhabiry............docx
how_to_earn_50k_monthly_investment_guide.pdf
Predicting Customer Bankruptcy Using Machine Learning Algorithm research pape...
discourse-2025-02-building-a-trillion-dollar-dream.pdf
Antihypertensive_Drugs_Presentation_Poonam_Painkra.pptx
social-studies-subject-for-high-school-globalization.pptx
Copia de Minimal 3D Technology Consulting Presentation.pdf
Introduction to Customs (June 2025) v1.pptx
final_dropping_the_baton_-_how_america_is_failing_to_use_russia_sanctions_and...
ECONOMICS AND ENTREPRENEURS LESSONSS AND
Ad

Introduction_to_Statistics_as_used_in_th.ppt

  • 1. INTRODUCTION TO STATISTICS April 25th 2013 James.Hall@Education.ox.ac.uk Coral.Milburn-Curtis@Education.ox.ac.uk
  • 2. STRUCTURE OF THIS MORNING  9.30-10.15: Introduction & basic concepts 15 minute break  10.30:11.15: Introduction to SPSS 15 minute break  11.30-12.30: Two worked examples 2
  • 3. PURPOSE OF THIS MORNING  To “set the scene” for people to go away and learn statistics for themselves  With the help of a textbook at the appropriate level: 1. Pallant. SPSS Survival Manual 2. Field. Discovering Statistics Using SPSS 3. Tabachnick and Fidell. Using Multivariate Statistics  At the end of this morning, you should be able to:  Understand that there are four areas of knowledge required to successfully produce statistics in the social sciences  Understand basic statistical terminology  Have an introductory understanding of SPSS 3
  • 4. SESSION ONE: INTRODUCTION AND BASIC CONCEPTS 9.30-10.15 4
  • 5. CONTENTS 1. Background 2. Essential Ideas (no maths though)  Including Descriptive Statistics 3. Inferential Statistics 4. Things that you can easily do in Microsoft Excel 5
  • 7. WHERE YOU FIND STATISTICS WITHIN JOURNAL ARTICLES & DISSERTATIONS: EVERYWHERE!  Literature Review  Determining the statistical weaknesses in past research in order to identify the gap which will be addressed  Method  Participants  Description of participants – including numbers and description of their background (can include representativeness)  Materials/Measures  Presentation of all measures and full description  Design  Description of the design of the study and presentation of variables  Procedure  Description of what was undertaken – including manipulation of variables  Results  Descriptive statistics first, then Inferential. The aim is the same as in a literature review/essay – to tell a coherent story  Discussion  The strengths and weaknesses of this research compared to previous studies – suggestions for future research 7
  • 8. FOUR KEY AREAS OF KNOWLEDGE TO ACQUIRE SHOULD YOU NEED TO DO ANY OF THIS Data Management Data Entry (into a suitable software) Variable Creation Data Cleaning & Variable Modification Producing Statistics Descriptive Documenting Missing Data Measures of Central Tendency Range & Dispersion of Scores Inferential Tests ...of Difference ...of Association Models Presentation Skills Ability to write-up /report statistics Ability to generate suitable tables & graphs 8 1 2 3 4
  • 9. (NOTE)  Although all four areas of knowledge are needed should you ever need statistics for your own research  Learning how to carry out statistics (numbers 2 and 3 in the previous slide) can be quicker and easier than learning Data Management (1) and learning how to present/write-up statistics (4)  Further, textbooks and online courses commonly skip areas 1 and 4. 9
  • 10. SOURCES OF KNOWLEDGE SHOULD YOU NEED TO DO ANY OF THIS  Textbooks:  BASICS->INTERMEDIATE:  Field, A. (2009) Discovering statistics using SPSS (and sex, drugs and rock'n'roll). 3rd ed. London: Sage Publications Inc.  INTERMEDIATE->ADVANCED:  Tabachnick, B.G. & Fidell, L.S. (2013) Multivariate Statistics. 6th ed. Allyn and Bacon: Boston  Websites:  Andy Field’s website: http://www.statisticshell.com/  (Warning: he has a very quirky sense of humour & there is bad language)  Includes videos of his statistics lectures including how to write up statistics which can otherwise be found here:  http://www.youtube.com/watch?v=vekCPvF016A 10
  • 11. COMPUTER PACKAGES SHOULD YOU NEED TO DO ANY OF THIS  Microsoft Excel will only take you so far...  Perhaps the most common statistical software package is SPSS  The University of Oxford has a site-license for this meaning that you can get it installed on your machines  It’s also on the machines in the Department’s computer room  Excel is a Spreadsheet programme – SPSS is a database programme – don’t be confused  Other statistical software packages (that do the basics well) include: STATA and SAS 11
  • 13. LEVELS OF MEASUREMENT  Measurement is the representation of information with numbers  There are different levels of complexity in how we use numbers in measurement – from simple to complex  From most-simple to most-complex, there are three commonly used levels of complexity: 1. “Discrete”/“Nominal”/”Categorical” Level: e.g. east =1, west=2 2. “Ordinal”: e.g. small=1, medium=2, large=3 3. “Continuous” a. “Interval”: e.g. age in years b. “Ratio”: a special type of interval data. One where zero represents nothing. e.g. income as opposed to temperature in degrees Celsius 13
  • 14. LEVELS OF MEASUREMENT  One “level of measurement” is special however – it is both categorical and ordinal at the same time:  “Dichotomous”/“Binary”  discrete data with only 2 conditions. (0 and 1 is the usual way of coding) e.g. Employed/not-employed  Being able to identify a level of measurement is the most important first thing to learn:  It informs which “measure of central tendency” and method of documenting “dispersion” that you should report  And it informs which “Inferential” “Test” or “Model” you should carry out and how you should go about this 14
  • 15. CLASS EXERCISE  Talk to your neighbour: Which level of measurement best describes the following measures?:  Telephone numbers  Gender  Participants’ scores on an self-report anxiety question:  Strongly Agree (5), Agree(4), Neither Agree nor Disagree (3) Disagree(2), Strongly Disagree(1)  Height  University Rankings 15
  • 16. DESCRIPTIVE STATISTICS  Once we have identified each variable’s level of measurement we can then describe this variable with descriptive statistics  Measures of Central Tendency:  For Discrete Data: Mode  For Ordinal Data: Median  (though you can report the mode as well)  For Continuous Data: Mean  (though you can report the median and mode as well)  Measures of Dispersion:  For Ordinal Data: Inter-quartile Range  For Continuous Data: Standard Deviation 16
  • 17. INTER-QUARTILE RANGE  The median is the middle-value of a range of scores. It is the “second quartile”(“Q2”)  If we divided the range into four equal parts, the second quartile would occur in the middle – as the median does.  The “Inter-Quartile Range” (IQR) is the middle- range that surrounds the median  First quartile (“Q1”) to the third (“Q3”)  We get this range by the simple subtraction of Q1 from Q3:  IQR=Q3-Q1 17 Median Q2 Q1 Q3 Q4
  • 18. GRAPHING ORDINAL DATA 18 This is a box plot Range encompassing middle 90% of values The middle- value. AKA the Median Range encompassing middle 50% of values. AKA: The “inter-quartile range” “Outliers”
  • 19. STANDARD DEVIATION  When we calculate a mean, we understand that not everyone actually has this score  Some scores are closer, some are further away from the mean  But how close are people’s scores to the mean - on average?  This is the Standard Deviation: 19 Mean +1 standard deviation -1 standard deviation
  • 20. THE NORMAL DISTRIBUTION  This is a special “distribution” of continuous data  Many real-life continuous variables are normally distributed  95% of the cases in a normally distributed continuous variable occur in the blue area (mean ±1.96 standard deviations[SDs]) 20 -3SDs -1.96SDs -1SDs Mean +1SDs +1.96SDs +3SDs
  • 21. GRAPHING CONTINUOUS DATA 21 If you request a histogram of continuous data, SPSS creates arbitrary pots of scores(!) Don’t rely on this fitted “normal curve” to establish the “normality” of a continuous measure This is a histogram
  • 22. A LITTLE MORE ON DESCRIBING CONTINUOUS DATA  Two more descriptive statistics are “Skewness” and “Kurtosis”:  A rule-of-thumb for assessing whether a continuous measure is “normally distributed”:  Divide each above statistic by it’s “standard error”  (a good statistics software will calculate all these values for you)  Scores outside the range -2 to +2 suggest you have non-normality  Quotable source: http://web.ipac.caltech.edu/staff/fmasci/home/statistics_refs/SkewStatSignif.pdf 22 zero -ve +ve
  • 24. TYPES  “Bivariate” and “Multivariate”  In other words: “Two variables” and “Multiple variables”  “Tests”  ...Of the difference between groups or time-points  ...Of the association between two or more measures  “Models”  Miniature representations of reality  supposedly(!)  lots of ways in which this can be determined 24
  • 25. HYPOTHESIS TESTING  We state a hypothesis (H1) about how we believe two or more measures should be related to one another and then try to disprove it’s opposite – it’s “null hypothesis” (H0)  Because we are trying to disprove H0, the full name for H1 is the “alternative hypothesis”  We gather a “sample” of data to do this, but then infer/generalise conclusions back to the real-world “population” from which we believe our sample came from  We estimate the accuracy of these generalisations back to the real-world with probabilities  We want the null-hypothesis to be very unlikely to be true in the real-world and so we look for low probabilities  We usually want our inferential statistics to reject our null-hypotheses with 95% confidence (so we look for probabilities <5%)  Moving from percentage to proportion: we look for p<0.05 25
  • 26. EFFECT SIZE  Though we look for probabilities <5% (“p<0.05”)  The likelihood that we find one is also affected by how many people we consider:  More people considered = more chance of p<0.05  This means that p<0.05 is not a reliable enough measure of a statistical (probabilistic) effect  We also need a measure that is not affected by the number of people we consider  Such quantities are termed “Effect Sizes”  As they estimate the “size of a statistical effect” 26
  • 28. COMMON EFFECT SIZES FOR STATISTICAL TESTS 28 Test Statistic Effect Size ‘Small’ effect ‘Mediu m’ effect ‘Large’ effect Chi-square 2 2 = 2 / N * (k-1) (k = smaller of number rows or number of columns) .01 .09 .25 Pearson ‘s correlation r ±.01 ±.03 ±.05 t-test related t .02 .05 .08 t-test unrelated One-way ANOVA F 2 = SSeffect / SStotal .01 .06 .14  Good news: There are lots of online calculators that will calculate these for you! Just try searching for, “effect size calculator” freedom of degree 2t d 
  • 29. PARAMETRIC DATA  A property of Continuous Data  It strongly dictates which “Inferential Test” to carry out  Three assumptions: 1. The continuous measure is “Normally Distributed”  We can test this (see slide 19!) 2. When comparing groups of scores to one another, all groups should have the same “standard deviation”  We can test this (good software packages will do it automatically) 3. Each score was gathered independently from the others  We can’t test this, this concerns how we gathered our data 29
  • 30. INFERENTIAL STATISTICS - MODELS Reality – The Underlying Population Drawn Sample – The data available to us Statistical Model – Our version of Reality created with our Sample data 30 (With unavoidable “error”) (With unavoidable “residual” aspects left un- accounted for)
  • 31. STATISTICAL REGRESSION  The most common “Model” in “Inferential Statistics”  At it’s simplest, it “models” how much one measure (“y”) is driven by another (“x”):  We say that we “regress” “y on x”  y=mx+C  y=b1x+bo [+e] 31 m x y C
  • 32. 4. THINGS THAT YOU CAN EASILY DO IN MICROSOFT EXCEL 32
  • 33. THINGS YOU CAN EASILY DO IN EXCEL 1. Simple aspects of Data Management  E.g. Looking at your data & simple manipulation of data 2. Generate Descriptive Statistics  Get Measures of Central Tendency  ...& Dispersion 3. Carry out simple Inferential Statistics  E.g. Correlations (at a push) 4. Create Tables and simple Figures  Perhaps Excel’s most useful purpose. You can even copy- pasting SPSS output into Excel and so simplify it to a level suitable for reporting in a report/dissertation/paper  You really can’t rely on Excel for anything other than the basics however...  ...It will struggle to give you the necessary statistics for any solely quantitative dissertation. 33