SlideShare a Scribd company logo
Quantitative Data
Analysis
Saad Chahine, PhD
datum |ˈdātəm, ˈdatəm|
noun (pl. data |ˈdātə, ˈdatə| )
1 a piece of information. See also data.
• an assumption or premise from which
inferences may be drawn. See sense
datum.
2 a fixed starting point of a scale or
operation.
ORIGIN mid 18th cent.: from Latin,
literally ‘something given,’ neuter past
participle of dare ‘give.’
data |ˈdatə, ˈdātə|
noun [ treated as sing. or pl. ]
facts and statistics collected together for
reference or analysis. See also datum.
• Computing the quantities, characters, or
symbols on which operations are
performed by a computer, being stored
and transmitted in the form of electrical
signals and recorded on magnetic,
optical, or mechanical recording media.
• Philosophy things known or assumed as
facts, making the basis of reasoning or
calculation.
ORIGIN mid 17th cent. (as a term in
philosophy): from Latin, plural of datum.
Categorical Data
• Eye Colours
• Male/Female
• Cultural Groups
• Self-Reported Interests
• Nominal Scale
ref: p 8-10: Field, 2013
Ordinal Data
• Clerks/Residents
• PGY1/PGY2/PGY3/
PGY4
• 1st, 2nd, 3rd, 4th
• Ordinal scale
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
ref: p 8-10: Field, 2013
Interval Data
• Likert scale
• Time of day (12 hour
clock)
• Age categories
• Equal-interval scale
• Zero is not meaningful
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
ref: p 8-10: Field, 2013
Ratio Data
• Height, Weight
• Age
• 24 hour clock
• “twice as long”
• Zero is meaningful -
represents absence
of quality
0
0
0
0
1 2 3 4
1 2 3 4
1 2 3 5
1 2 3 4
a continuum exists
underneath the scale (Field,
2013, p.11)
Statistical Distribution
Data is graphed to
illustrate how the
individual points are
distributed amongst each
other.
*** Its important to know
what your data looks like
so you can figure out
what analysis to run***
• Uniform distribution
• Normal distribution
• Skewed distribution
• Peaked distribution
• Bi-modal distribution
Uniform Distribution"Uniform Distribution PDF SVG" by IkamusumeFan - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Uniform_Distribution_PDF_SVG.svg#/media/File:Uniform_Distribution_PDF_SVG.svg
Normal Distribution"Standard deviation diagram" by Mwtoews. Licensed under CC BY 2.5 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg#/media/File:Standard_deviation_diagram.svg
Skewness"Negative and positive skew diagrams (English)" by Rodolfo Hermans (Godot) at en.wikipedia. - Own work; transferred from en.wikipedia by Rodolfo Hermans (Godot)..
Licensed under CC BY-SA 3.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Negative_and_positive_skew_diagrams_(English).svg#/media/File:Negative_and_positive_skew_diagrams_(English).svg
Peaked Distribution (Kurtosis)
"Pearson type VII distribution PDF". Licensed under Public Domain via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Pearson_type_VII_distribution_PDF.png#/media/File:Pearson_type_
VII_distribution_PDF.png
Bi-Modal Distribution
"BimodalAnts" by Qwfp (talk) - I created this work entirely by myself.. Licensed under CC BY-SA 3.0
via Wikipedia - http://en.wikipedia.org/wiki/File:BimodalAnts.png#/media/File:BimodalAnts.png
Quick General Descriptions
If you want to describe a population or a group of
people using one or two numbers you could say:
• On average, students in Grade 8 in Ontario scored
570 on an international test of reading (mean)
• At Western University, the most frequent eye colour
amongst undergraduate students is brown (mode)
• In a residency rotation of 10 students, the weekly
time spent on case review was 5 hours (median)
Data Type
Measure of Central
Tendency
Categorical Data
• Mean and median are meaningless
• More meaningful to use mode
Ordinal Data
• Mean is meaningless
• Median is the most representative
• Mode is also useful
Interval & Ratio Data
• Mean is the most useful when data
are normally distributed
• Median and mode may be more
useful depending on sample size
and skewness
Descriptive and
Inferential Statistics
Descriptive statistics describe the sample or population usually by
providing values of range, maximum, minimum, central tendency,
variance (sum of individual differences from the mean)
Inferential statistics are often used when you do not have access to
the entire population and want to make an inference about this
population
Often we are looking to draw a pattern and describe our data results in sophisticated and
rigorous ways. In order to draw inference about a larger population - large samples are needed to
make a conjecture with some degree of confidence.
If this is the focus of your research, power calculations are needed (beyond this course). Two useful applications:
(1) G*Power http://www.gpower.hhu.de/en.html
(2) The ‘pwr’ package for R http://cran.r-project.org/web/packages/pwr/pwr.pdf
Lets Look At Some Data
After doing a great deal of reading, the Dean of a well
known Canadian medical school believed that in
general, students in medical programs have an
average IQ of 135
• This is a conjecture about an entire population of
undergraduate medical students
Testing a Conjecture
Null Hypothesis - Ho: µ=135
Alternative Hypothesis - HA: µ≠135
We test for the conjecture or hypothesis by making it
the null
‘µ’ stands for the
mean
Role of Software
• Computer programs such as SPSS, SAS, R,
STATA, etc…have built in algorithms to carry out
what you might do by hand
• It is important to initially do this by hand to
understand what it means to reject, or fail to reject
the null hypothesis
Significance Level
• Because we are not dealing with absolutes and we are making a
prediction about a population — it is not exact
• We need to select a criterion or significance level by which we
can either reject or accept the null hypothesis (Ho: µ=135)
• The default in most software programs is to set the criterion or
significance level at .05
• It is also referred to as p-value or alpha (α)
At what point is the difference between the sample mean and 135
not due to chance?
What is a z-score?
• It is a standard score - all
scores can be converted to a
z-score for comparison
purposes
• We can compare two
measures one on a scale of
500, another on a scale
of 10 by converting them to a
standard score (z-score).
• We can compare an
individual person to the
population by using z-score.
if a z-score =0 it
is equal to the
mean
if a z-score =+2 it is
2SD above the
mean
if a z-score =-2 it is
2SD below the
mean
What if you scored 118 on an
intelligence test?
We know this test is scaled to have
an average of 100 and an SD of 15
(its typical to scale tests from raw
scores)
z = (118-100)/15
z=18/15
z=1.2
1.2 z-score is about here — this score is
above the average
However is it high average or exceptional? —
We would need to establish a criteria
z-score of 1.96 for 5% two tailed (p-value)
In order to be statistically significant an individual would
have to have a z-score of 1.96 or higher — this equates
to scaled score of ~130 or higher
Lets Go Back to The
Data
At what point is the difference between the sample
mean and 135 not due to chance?
We need to randomly draw a sample of 10 students
115, 140, 133, 125, 120, 126, 136, 124, 132, 129
Mean = 128
ID IQ Score mean
1001 115 128
1002 140 128
1003 133 128
1004 125 128
1005 120 128
1006 126 128
1007 136 128
1008 124 128
1009 132 128
1010 129 128
What do we know so far?
1. We have a sample of 10
students
2. The average score is 128
3. The max score is 140
4. The min score is 115
5. The range is 25 (140-115)
In order to draw a comparison to
the Dean's conjecture of 135 we
have to be confident in our
decision that while some students
are higher than 135 some are not.
We need to conduct a statistical
analysis to come to a conclusion.
First we need a way of
summarizing the distribution of 10
students.
ID IQ Score mean Deviations
1001 115 128 13
1002 140 128 -12
1003 133 128 -5
1004 125 128 3
1005 120 128 8
1006 126 128 2
1007 136 128 -8
1008 124 128 4
1009 132 128 -4
1010 129 128 -1
A simple way to summarize
deviations is to add them up and
take an average
Try this: Add all the deviations
together.
ID IQ Score mean Deviations
Squared
Deviations
1001 115 128 13 169
1002 140 128 -12 144
1003 133 128 -5 25
1004 125 128 3 9
1005 120 128 8 64
1006 126 128 2 4
1007 136 128 -8 64
1008 124 128 4 16
1009 132 128 -4 16
1010 129 128 -1 1
look no negatives
Sum of Squared Deviations = 512
Sample Variance= (Sum of Square
Deviations) / (Sample -1)
Sample Variance = 56.89
Standard Deviation = Sqrt (Sample
Variance)
Standard Deviation = 7.54
standard deviation : “provides a
sort of average of the differences
of all scores from the mean” Brown, J. D.
(1988). Understanding research in second language learning: A teacher's guide to
statistics and research design. London: Cambridge University Press.
128 + 7.54
Mean = 128
128 - 7.54
128 + (2*7.54)128 - (2*7.54)
Distribution of Students
Sampled
What About Error?
• We want to be confident and accurate in our description of the
data
• The more data points we have the more accurate our estimation -
thus SD is sample size dependent
• Is there a way to examine the SD to understand more about how
much fluctuation exists in the data?
• Standard Error (SE) of the mean is a
way of understanding how much error
exists in the mean - it is also used to
calculate confidence intervals and other
statistical procedures
standard error |ˈstændərd ˈɛrər|
nounStatistics
a measure of the statistical accuracy
of an estimate, equal to the standard
deviation of the theoretical
distribution of a large population of
such estimates.
What we know about our
data so far…
1. We have a sample of 10 Students
2. The average score is 128
3. The max score is 140
4. The min score is 115
5. The range is 25 (140-115)
6. Sum of Squared Deviations = 512
7. Sample Variance = 56.89
8. Standard Deviation = 7.54
9. Standard Error = 2.39 {7.54/Sqrt(10)}
Confidence Intervals (95%)
Lower Bound = mean - (1.96 * SE)
Upper Bound = mean + (1.96 * SE)
Lower Bound = 123.32
Upper Bound = 132.68
When do we need to run
additional stats?
128
132.86
123.32
135
The dean’s hypothesis falls
outside the confidence
intervals
What if we wanted to compare the IQ’s of
male and female med students.
Are these two groups different from each
other or similar?
128
135
‘t-tests’ are Used to Make
Decisions
While z-scores are used to study individuals, t-tests are used to study
how a group compares to a hypothesized score, another group, or the
same group at a different time point.
t-statistic = (sample average – hypothesis)/standard error
t = (128 - 135)/2.39
t= -2.935
• If we are using SPSS this would be very quick and the output will tell
us if this t-value is statistically significant at p<0.05.
• Since we are not we have to look up this value in a table in the back
of just about any stats book, and yes it is statistically significant.
“The hypothesis that the mean IQ
of the population is 135 was
rejected, t= -2.935, df=9, p≤ .05.”
Why is this important
• Most statistical applications that you will come across
are parametric based — that is, they use the mean
and a normal distribution
• Its more important to understand the conceptual
meaning of what a software program is doing than
the actual formulas and calculations
• Its also important when reading research studies to
have a solid conceptual understanding of what the
different statistical procedures do behind the scenes
–John Tukey (1915-2000)
“The combination of some data and an aching
desire for an answer does not ensure that a
reasonable answer can be extracted from a
given body of data”

More Related Content

PDF
Descriptive statistics
PDF
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
PDF
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
PPTX
Classification
PPT
Mba ii rm unit-4.1 data analysis & presentation a
PDF
sigir2020
PDF
data science course
PDF
Exploratory data analysis data visualization
Descriptive statistics
孔令傑 / 給工程師的統計學及資料分析 123 (2016/9/4)
Collaboration with Statistician? 矩陣視覺化於探索式資料分析
Classification
Mba ii rm unit-4.1 data analysis & presentation a
sigir2020
data science course
Exploratory data analysis data visualization

What's hot (13)

PPTX
Lect9 Decision tree
ODP
Visualiation of quantitative information
PPT
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
PDF
Statistical Distributions
PDF
sigir2018tutorial
PDF
Research Method for Business chapter 12
PPTX
Results, Discussion, APA Editing, and Defense
PPT
Aed1222 lesson 4
PDF
Research Method EMBA chapter 11
DOCX
Assigment 1
PPT
www1.cs.columbia.edu
PDF
An Overview of Naïve Bayes Classifier
PPTX
Introduction To Data Science Using R
Lect9 Decision tree
Visualiation of quantitative information
Multi-criteria Decision Analysis for Customization of Estimation by Analogy M...
Statistical Distributions
sigir2018tutorial
Research Method for Business chapter 12
Results, Discussion, APA Editing, and Defense
Aed1222 lesson 4
Research Method EMBA chapter 11
Assigment 1
www1.cs.columbia.edu
An Overview of Naïve Bayes Classifier
Introduction To Data Science Using R
Ad

Similar to Quant Data Analysis (20)

PPTX
Standard deviation
 
PPT
Analysing & interpreting data.ppt
PPTX
m2_2_variation_z_scores.pptx
PPTX
Presentation research- chapter 10-11 istiqlal
PPTX
Data in science
PPTX
PyGotham 2016
PPT
statistics
PDF
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
DOCX
The Importance of Probability in Data Science.docx
PPTX
Mat 255 chapter 3 notes
PPT
research methodology and research methodology.ppt
PPTX
Exploratory Data Analysis Unit 1 ppt presentation.pptx
PPT
Lect 2 basic ppt
PDF
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
PDF
Statistics and permeability engineering reports
PDF
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
PPTX
Section 6 - Chapter 1 - Introduction to Statistics Part I
PPTX
Lec 3.pptx
PPT
Introduction to Statistics53004300.ppt
PDF
Datascience Introduction WebSci Summer School 2014
Standard deviation
 
Analysing & interpreting data.ppt
m2_2_variation_z_scores.pptx
Presentation research- chapter 10-11 istiqlal
Data in science
PyGotham 2016
statistics
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
The Importance of Probability in Data Science.docx
Mat 255 chapter 3 notes
research methodology and research methodology.ppt
Exploratory Data Analysis Unit 1 ppt presentation.pptx
Lect 2 basic ppt
76a15ed521b7679e372aab35412ab78ab583436a-1602816156135.pdf
Statistics and permeability engineering reports
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Section 6 - Chapter 1 - Introduction to Statistics Part I
Lec 3.pptx
Introduction to Statistics53004300.ppt
Datascience Introduction WebSci Summer School 2014
Ad

More from Saad Chahine (16)

PPTX
Csse 2014 hmm presentation_ta_ed
PPTX
R Intro Workshop
PPTX
GEDU 6170 Reviews
PPTX
Gedu 6170 Mixed Methods Research
PPTX
Gedu 6170 tiles abstracts and intro
PPTX
Quantitative & Qualitative GEDU 6170
PPTX
Research Literacy GEDU6170 MSVU
PPT
Relationship of Competency and Global Ratings in OSCEs
PPT
A Novel Approach for Developing Rating Scales for a Practice Ready OSCE
PPT
CERA saad chahine 2013 fuzzy clusters
PPT
Ncme april 30 2013 saad chahine
PPTX
Chahine Understanding Common Study Results
PPTX
Chahine Hypothesis Testing,
PPTX
Saad Chahine ICSEI 2012
PPTX
Saad Chahine Thesis
PPTX
Saad Chahine Thesis Presentation Final
Csse 2014 hmm presentation_ta_ed
R Intro Workshop
GEDU 6170 Reviews
Gedu 6170 Mixed Methods Research
Gedu 6170 tiles abstracts and intro
Quantitative & Qualitative GEDU 6170
Research Literacy GEDU6170 MSVU
Relationship of Competency and Global Ratings in OSCEs
A Novel Approach for Developing Rating Scales for a Practice Ready OSCE
CERA saad chahine 2013 fuzzy clusters
Ncme april 30 2013 saad chahine
Chahine Understanding Common Study Results
Chahine Hypothesis Testing,
Saad Chahine ICSEI 2012
Saad Chahine Thesis
Saad Chahine Thesis Presentation Final

Recently uploaded (20)

PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Quality review (1)_presentation of this 21
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Computer network topology notes for revision
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
Foundation of Data Science unit number two notes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Mega Projects Data Mega Projects Data
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Quality review (1)_presentation of this 21
STUDY DESIGN details- Lt Col Maksud (21).pptx
Launch Your Data Science Career in Kochi – 2025
.pdf is not working space design for the following data for the following dat...
Introduction-to-Cloud-ComputingFinal.pptx
Computer network topology notes for revision
Introduction to Knowledge Engineering Part 1
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
1_Introduction to advance data techniques.pptx
Supervised vs unsupervised machine learning algorithms
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Foundation of Data Science unit number two notes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Mega Projects Data Mega Projects Data
Miokarditis (Inflamasi pada Otot Jantung)
Business Ppt On Nestle.pptx huunnnhhgfvu
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx

Quant Data Analysis

  • 2. datum |ˈdātəm, ˈdatəm| noun (pl. data |ˈdātə, ˈdatə| ) 1 a piece of information. See also data. • an assumption or premise from which inferences may be drawn. See sense datum. 2 a fixed starting point of a scale or operation. ORIGIN mid 18th cent.: from Latin, literally ‘something given,’ neuter past participle of dare ‘give.’ data |ˈdatə, ˈdātə| noun [ treated as sing. or pl. ] facts and statistics collected together for reference or analysis. See also datum. • Computing the quantities, characters, or symbols on which operations are performed by a computer, being stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media. • Philosophy things known or assumed as facts, making the basis of reasoning or calculation. ORIGIN mid 17th cent. (as a term in philosophy): from Latin, plural of datum.
  • 3. Categorical Data • Eye Colours • Male/Female • Cultural Groups • Self-Reported Interests • Nominal Scale ref: p 8-10: Field, 2013
  • 4. Ordinal Data • Clerks/Residents • PGY1/PGY2/PGY3/ PGY4 • 1st, 2nd, 3rd, 4th • Ordinal scale 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ref: p 8-10: Field, 2013
  • 5. Interval Data • Likert scale • Time of day (12 hour clock) • Age categories • Equal-interval scale • Zero is not meaningful 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 ref: p 8-10: Field, 2013
  • 6. Ratio Data • Height, Weight • Age • 24 hour clock • “twice as long” • Zero is meaningful - represents absence of quality 0 0 0 0 1 2 3 4 1 2 3 4 1 2 3 5 1 2 3 4 a continuum exists underneath the scale (Field, 2013, p.11)
  • 7. Statistical Distribution Data is graphed to illustrate how the individual points are distributed amongst each other. *** Its important to know what your data looks like so you can figure out what analysis to run*** • Uniform distribution • Normal distribution • Skewed distribution • Peaked distribution • Bi-modal distribution
  • 8. Uniform Distribution"Uniform Distribution PDF SVG" by IkamusumeFan - Own work. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Uniform_Distribution_PDF_SVG.svg#/media/File:Uniform_Distribution_PDF_SVG.svg
  • 9. Normal Distribution"Standard deviation diagram" by Mwtoews. Licensed under CC BY 2.5 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Standard_deviation_diagram.svg#/media/File:Standard_deviation_diagram.svg
  • 10. Skewness"Negative and positive skew diagrams (English)" by Rodolfo Hermans (Godot) at en.wikipedia. - Own work; transferred from en.wikipedia by Rodolfo Hermans (Godot).. Licensed under CC BY-SA 3.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Negative_and_positive_skew_diagrams_(English).svg#/media/File:Negative_and_positive_skew_diagrams_(English).svg
  • 11. Peaked Distribution (Kurtosis) "Pearson type VII distribution PDF". Licensed under Public Domain via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Pearson_type_VII_distribution_PDF.png#/media/File:Pearson_type_ VII_distribution_PDF.png
  • 12. Bi-Modal Distribution "BimodalAnts" by Qwfp (talk) - I created this work entirely by myself.. Licensed under CC BY-SA 3.0 via Wikipedia - http://en.wikipedia.org/wiki/File:BimodalAnts.png#/media/File:BimodalAnts.png
  • 13. Quick General Descriptions If you want to describe a population or a group of people using one or two numbers you could say: • On average, students in Grade 8 in Ontario scored 570 on an international test of reading (mean) • At Western University, the most frequent eye colour amongst undergraduate students is brown (mode) • In a residency rotation of 10 students, the weekly time spent on case review was 5 hours (median)
  • 14. Data Type Measure of Central Tendency Categorical Data • Mean and median are meaningless • More meaningful to use mode Ordinal Data • Mean is meaningless • Median is the most representative • Mode is also useful Interval & Ratio Data • Mean is the most useful when data are normally distributed • Median and mode may be more useful depending on sample size and skewness
  • 15. Descriptive and Inferential Statistics Descriptive statistics describe the sample or population usually by providing values of range, maximum, minimum, central tendency, variance (sum of individual differences from the mean) Inferential statistics are often used when you do not have access to the entire population and want to make an inference about this population Often we are looking to draw a pattern and describe our data results in sophisticated and rigorous ways. In order to draw inference about a larger population - large samples are needed to make a conjecture with some degree of confidence. If this is the focus of your research, power calculations are needed (beyond this course). Two useful applications: (1) G*Power http://www.gpower.hhu.de/en.html (2) The ‘pwr’ package for R http://cran.r-project.org/web/packages/pwr/pwr.pdf
  • 16. Lets Look At Some Data After doing a great deal of reading, the Dean of a well known Canadian medical school believed that in general, students in medical programs have an average IQ of 135 • This is a conjecture about an entire population of undergraduate medical students
  • 17. Testing a Conjecture Null Hypothesis - Ho: µ=135 Alternative Hypothesis - HA: µ≠135 We test for the conjecture or hypothesis by making it the null ‘µ’ stands for the mean
  • 18. Role of Software • Computer programs such as SPSS, SAS, R, STATA, etc…have built in algorithms to carry out what you might do by hand • It is important to initially do this by hand to understand what it means to reject, or fail to reject the null hypothesis
  • 19. Significance Level • Because we are not dealing with absolutes and we are making a prediction about a population — it is not exact • We need to select a criterion or significance level by which we can either reject or accept the null hypothesis (Ho: µ=135) • The default in most software programs is to set the criterion or significance level at .05 • It is also referred to as p-value or alpha (α) At what point is the difference between the sample mean and 135 not due to chance?
  • 20. What is a z-score? • It is a standard score - all scores can be converted to a z-score for comparison purposes • We can compare two measures one on a scale of 500, another on a scale of 10 by converting them to a standard score (z-score). • We can compare an individual person to the population by using z-score. if a z-score =0 it is equal to the mean if a z-score =+2 it is 2SD above the mean if a z-score =-2 it is 2SD below the mean
  • 21. What if you scored 118 on an intelligence test? We know this test is scaled to have an average of 100 and an SD of 15 (its typical to scale tests from raw scores) z = (118-100)/15 z=18/15 z=1.2 1.2 z-score is about here — this score is above the average However is it high average or exceptional? — We would need to establish a criteria z-score of 1.96 for 5% two tailed (p-value) In order to be statistically significant an individual would have to have a z-score of 1.96 or higher — this equates to scaled score of ~130 or higher
  • 22. Lets Go Back to The Data At what point is the difference between the sample mean and 135 not due to chance? We need to randomly draw a sample of 10 students 115, 140, 133, 125, 120, 126, 136, 124, 132, 129 Mean = 128
  • 23. ID IQ Score mean 1001 115 128 1002 140 128 1003 133 128 1004 125 128 1005 120 128 1006 126 128 1007 136 128 1008 124 128 1009 132 128 1010 129 128 What do we know so far? 1. We have a sample of 10 students 2. The average score is 128 3. The max score is 140 4. The min score is 115 5. The range is 25 (140-115) In order to draw a comparison to the Dean's conjecture of 135 we have to be confident in our decision that while some students are higher than 135 some are not. We need to conduct a statistical analysis to come to a conclusion. First we need a way of summarizing the distribution of 10 students.
  • 24. ID IQ Score mean Deviations 1001 115 128 13 1002 140 128 -12 1003 133 128 -5 1004 125 128 3 1005 120 128 8 1006 126 128 2 1007 136 128 -8 1008 124 128 4 1009 132 128 -4 1010 129 128 -1 A simple way to summarize deviations is to add them up and take an average Try this: Add all the deviations together.
  • 25. ID IQ Score mean Deviations Squared Deviations 1001 115 128 13 169 1002 140 128 -12 144 1003 133 128 -5 25 1004 125 128 3 9 1005 120 128 8 64 1006 126 128 2 4 1007 136 128 -8 64 1008 124 128 4 16 1009 132 128 -4 16 1010 129 128 -1 1 look no negatives Sum of Squared Deviations = 512 Sample Variance= (Sum of Square Deviations) / (Sample -1) Sample Variance = 56.89 Standard Deviation = Sqrt (Sample Variance) Standard Deviation = 7.54 standard deviation : “provides a sort of average of the differences of all scores from the mean” Brown, J. D. (1988). Understanding research in second language learning: A teacher's guide to statistics and research design. London: Cambridge University Press.
  • 26. 128 + 7.54 Mean = 128 128 - 7.54 128 + (2*7.54)128 - (2*7.54) Distribution of Students Sampled
  • 27. What About Error? • We want to be confident and accurate in our description of the data • The more data points we have the more accurate our estimation - thus SD is sample size dependent • Is there a way to examine the SD to understand more about how much fluctuation exists in the data? • Standard Error (SE) of the mean is a way of understanding how much error exists in the mean - it is also used to calculate confidence intervals and other statistical procedures standard error |ˈstændərd ˈɛrər| nounStatistics a measure of the statistical accuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large population of such estimates.
  • 28. What we know about our data so far… 1. We have a sample of 10 Students 2. The average score is 128 3. The max score is 140 4. The min score is 115 5. The range is 25 (140-115) 6. Sum of Squared Deviations = 512 7. Sample Variance = 56.89 8. Standard Deviation = 7.54 9. Standard Error = 2.39 {7.54/Sqrt(10)} Confidence Intervals (95%) Lower Bound = mean - (1.96 * SE) Upper Bound = mean + (1.96 * SE) Lower Bound = 123.32 Upper Bound = 132.68
  • 29. When do we need to run additional stats? 128 132.86 123.32 135 The dean’s hypothesis falls outside the confidence intervals What if we wanted to compare the IQ’s of male and female med students. Are these two groups different from each other or similar? 128 135
  • 30. ‘t-tests’ are Used to Make Decisions While z-scores are used to study individuals, t-tests are used to study how a group compares to a hypothesized score, another group, or the same group at a different time point. t-statistic = (sample average – hypothesis)/standard error t = (128 - 135)/2.39 t= -2.935 • If we are using SPSS this would be very quick and the output will tell us if this t-value is statistically significant at p<0.05. • Since we are not we have to look up this value in a table in the back of just about any stats book, and yes it is statistically significant. “The hypothesis that the mean IQ of the population is 135 was rejected, t= -2.935, df=9, p≤ .05.”
  • 31. Why is this important • Most statistical applications that you will come across are parametric based — that is, they use the mean and a normal distribution • Its more important to understand the conceptual meaning of what a software program is doing than the actual formulas and calculations • Its also important when reading research studies to have a solid conceptual understanding of what the different statistical procedures do behind the scenes
  • 32. –John Tukey (1915-2000) “The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data”