SlideShare a Scribd company logo
Quantitative analysis
A brief introduction
Petri Lankoski, 2018 1
You should be familiar with following
• Mean (medelvärde), for a normal distribution
• Median (median)
• Mode (typvärde)
• Line chart (linjediagram)
• Bar chart (stapeldiagram)
Petri Lankoski, 2018 2
Is the Die Loaded?
11st throw
12st throw
43st throw
14st throw
25st throw
We cannot say for certain, but we can estimate how
likely or unlikely the perceived sequence is
In long run we expect to see equal amount of 1s, 2s,
3s, 4s, 5s and 6s
16st throw
Chance to get 1 is 1/6, but as first throw, this is as
likely as any other result. We do not have enough
information to say anything more about this
six throws is probably still too little to estimate the
die, so we would need to roll more…
Petri Lankoski, 2018 3
Is the Die Loaded?
1
1
4
1
2
1
3
6
1
1
1
5
Testing this sequence against expected sequence
indicate that the die is loaded
• But we have around 1% change to be wrong
We roll following sequence: 2 6 2 6 6 4 6 5 4 1 3 4
4 6 5 3 5 3 2 5
• Amounts of 6s and 1s does not match to
expected amounts
• We would have 70% likelihood of being wrong
if we claim that the die is load
Petri Lankoski, 2018 4
Boxplot
Median
IQR,
50% of data
1.5 * IQR
Petri Lankoski, 2018 5
density and violin plot
Violin plot is a form
of density plot
Petri Lankoski, 2018 6
Density plot and data points
Scatter plot
-2 -1 0 1 2
-3-2-10123
Variable 1
Variable2
Scatter plot shows values of two variables
• For example how a participant answered
to questions
Petri Lankoski, 2018 7
Random sampling Predicting election results
- It is not practically possible to ask all what they will vote
- Picking a sample of people randomly & asking them
However, we know that there is uncertainty here
If random sample again, we might get something else
We get:
A: 37.6%
B: 12.3%
C: 33.1%
D: 5.2%
…
We get:
A: 36.9%
B: 13.0%
C: 32.7%
D: 6.1%
…
We can estimate uncertainty, but we need to make some
assumptions
Petri Lankoski, 2018
8
We get:
A: 38.7%
B: 11.0%
C: 31.7%
D: 6.3%
…
Normal distribution
1𝜎 2𝜎-2𝜎 -1𝜎 0𝜎
68.3%
95.4% of data
9
𝜎 = standard deviation
• describes the width of distribution
Back to polling
1.96𝜎-1.96𝜎 0𝜎
95% of population is in the
area of ∓1.96𝜎; sample
distribution behaves similarly
However, within 95% certainty
what we observed falls in area
between -1.96𝜎 and 1.96𝜎.
We cannot know where in
population distribution what
we observed was (red vertical
lines).
10
We do not know true
population value (black
vertical line).
Support for A
36.1%
38.7%
37.6%
Random sampling
Instead of uncertainty, confidence is usually used.
Confidence interval (CI), usually 95%, is function of sample
size and probability of someone choosing a candidate.
0.376 ∓ 1.96 ∗ √
0.376(1 − 0.376)
𝑁
𝜎95%A
Petri Lankoski, 2018 11
We can backtrack from the sample distribution and estimate
the uncertainty in what we observed when polling
• When we poll next time within 95% certainty what we
observed falls in area between -1.96𝜎 and 1.96𝜎
Are two means different, t-test?
A B∆
We have two sample means A and B
Their difference is ∆=B-A
Mean is calculated based on sampled values
Mean(A) =
∑𝑎
𝑛
(for normally distruted variables)
To extrapolate if the there is difference between
groups A and B in population level (from witch A and
B were sampled) we need to account uncertainty.
Again population mean and sample mean can be
different.
Petri Lankoski, 2018 12
Are two means different, t-test?
A B∆
We have two sample means A and B
Their difference is ∆=B-A
t statistic describes difference so that it takes into
account variance (𝜎2) and sample size
p describes probability that perceived data deviates
from null hypothesis; in case null hypothesis of t-test, is
the means are not different.
p depends on t-value and sample size; high t-value
means lower p.
p = 0.05 means that there is 5% change that observed
data did not deviate from expected, there is no
difference. P<0.05 is a typical statistically significant
result criterion.
Petri Lankoski, 2018 13
Are tree means different, one-way ANOVA
• One-way ANOVA is similar to t-test
• F-statistic describes difference so that it takes into account variance
and sample size
• p describes probability that perceived data deviates from null
hypothesis; in case null hypothesis of ANOVA, is the means are not
different
• A significant result (p<0.05) tells that at least one mean differ from
others
• But not which
• Post hoc comparisons are needed to determine which variable differs from
which
Petri Lankoski, 2018 14
Correlation
Correlation (r) describes the strength of
association between two variables
p describes the likelihood that the observed
correlation deviates from what is expected
under null hypothesis (which is that there is no
relation between the two variables)
Correlation does not tell if v1 causes v2 or vice
versa
• There is a strong correlation between ice
cream sales and drowning
• Either is causing another
• Third variable, temperature, related to both
Petri Lankoski, 2018 15

More Related Content

PPTX
A brief introduction to quantitative analysis
DOCX
Distinction between outliers and influential data points w out hyp test
DOCX
Movie Ticket Post
PPTX
Representing and generating uncertainty effectively presentatıon
PDF
Module08 hypotheses testing proportions
PDF
Bayes rpp bristol
A brief introduction to quantitative analysis
Distinction between outliers and influential data points w out hyp test
Movie Ticket Post
Representing and generating uncertainty effectively presentatıon
Module08 hypotheses testing proportions
Bayes rpp bristol

What's hot (19)

PDF
beyond objectivity and subjectivity; a discussion paper
PPTX
P value wars
PDF
Discussion a 4th BFFF Harvard
PPTX
The Seven Habits of Highly Effective Statisticians
PDF
Statistics for UX Professionals - Jessica Cameron
PDF
Biostatistics Workshop: Missing Data
PPTX
Statistics for UX Professionals
PDF
P1 Stroop
PDF
Classification via Logistic Regression
PPT
Chp1 Methods and Stats
PPSX
Research Sample size by Dr Allah Yar Malik
PDF
Think Like a Strategist - Confab 2019
PPTX
The revenge of RA Fisher
PPTX
The revenge of RA Fisher
PPTX
Clinical trials: three statistical traps for the unwary
PDF
On p-values
PPTX
How to do the maths
PDF
Hypothesis
PPTX
Chi-Square Test of Independence
beyond objectivity and subjectivity; a discussion paper
P value wars
Discussion a 4th BFFF Harvard
The Seven Habits of Highly Effective Statisticians
Statistics for UX Professionals - Jessica Cameron
Biostatistics Workshop: Missing Data
Statistics for UX Professionals
P1 Stroop
Classification via Logistic Regression
Chp1 Methods and Stats
Research Sample size by Dr Allah Yar Malik
Think Like a Strategist - Confab 2019
The revenge of RA Fisher
The revenge of RA Fisher
Clinical trials: three statistical traps for the unwary
On p-values
How to do the maths
Hypothesis
Chi-Square Test of Independence
Ad

Similar to Quantitative analysis: A brief introduction (20)

PPTX
Complete Biostatistics (Descriptive and Inferential analysis)
PDF
Test Bank for Stats Data and Models 5th by De Veaux
PDF
statistics - Populations and Samples.pdf
PDF
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
PPTX
IntroStatsSlidesPost.pptx
PPTX
Presentation_advance_1n.pptx
PDF
Lecturenotesstatistics
PPT
Statistics
PPT
Review of Chapters 1-5.ppt
PPT
Statistics
PPTX
COM 201_Inferential Statistics_18032022.pptx
PDF
Research method ch07 statistical methods 1
PPT
Introductory Statistics
PPTX
Statistics
PPTX
Statistical analysis in pharmacokinetics.pptx
PPTX
PDF
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
PPT
PDF
Solution Manual for Introductory Statistics 9th by Mann
PPTX
Basic statistics 1
Complete Biostatistics (Descriptive and Inferential analysis)
Test Bank for Stats Data and Models 5th by De Veaux
statistics - Populations and Samples.pdf
Ch17 lab r_verdu103: Entry level statistics exercise (descriptives)
IntroStatsSlidesPost.pptx
Presentation_advance_1n.pptx
Lecturenotesstatistics
Statistics
Review of Chapters 1-5.ppt
Statistics
COM 201_Inferential Statistics_18032022.pptx
Research method ch07 statistical methods 1
Introductory Statistics
Statistics
Statistical analysis in pharmacokinetics.pptx
Unit 4b- Hypothesis testing and confidence intervals (Slides - up to slide 17...
Solution Manual for Introductory Statistics 9th by Mann
Basic statistics 1
Ad

More from Petri Lankoski (20)

PPTX
Character Engagement and Embodiment: Towards Understanding the Aesthetics of ...
PPTX
Game Design Research: Reflections and Direction
PPTX
Qualitative analysis with EPP and Taguette
PPTX
Studying Games: Formal Analysis and Stimulated Recall Interiviews
PDF
Game Analysis at HEVGA PhD Summer School
PDF
Constructive Alignment in Teaching Game Research in Game Development Bachelor...
PPTX
Perforce
PPTX
Level Design Course Intro and Assingnts
PDF
Embodiment, Game Characters and Game Design
PPTX
Game research methods book introduction
PPTX
Escape: Level Design Exercise in Unity
PPTX
Formal analysis of gameplay
PPT
Level Design
PPTX
Game system design
PPTX
Simulations: Evaluating game system behavior
PPTX
Models for story
PPTX
Designprocesser lecture1
PPTX
Unity programming 1
KEY
Gameplay Design Workshop 1/2 (2011)
KEY
Gameplay Design Workshop 2/2 (2011)
Character Engagement and Embodiment: Towards Understanding the Aesthetics of ...
Game Design Research: Reflections and Direction
Qualitative analysis with EPP and Taguette
Studying Games: Formal Analysis and Stimulated Recall Interiviews
Game Analysis at HEVGA PhD Summer School
Constructive Alignment in Teaching Game Research in Game Development Bachelor...
Perforce
Level Design Course Intro and Assingnts
Embodiment, Game Characters and Game Design
Game research methods book introduction
Escape: Level Design Exercise in Unity
Formal analysis of gameplay
Level Design
Game system design
Simulations: Evaluating game system behavior
Models for story
Designprocesser lecture1
Unity programming 1
Gameplay Design Workshop 1/2 (2011)
Gameplay Design Workshop 2/2 (2011)

Recently uploaded (20)

PPTX
Leprosy and NLEP programme community medicine
PPT
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
PPTX
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
PDF
Microsoft 365 products and services descrption
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPT
Predictive modeling basics in data cleaning process
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PDF
Transcultural that can help you someday.
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PDF
annual-report-2024-2025 original latest.
PPT
DU, AIS, Big Data and Data Analytics.ppt
PPTX
CYBER SECURITY the Next Warefare Tactics
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
PDF
Global Data and Analytics Market Outlook Report
PPTX
Managing Community Partner Relationships
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...
Leprosy and NLEP programme community medicine
lectureusjsjdhdsjjshdshshddhdhddhhd1.ppt
(Ali Hamza) Roll No: (F24-BSCS-1103).pptx
Microsoft 365 products and services descrption
SAP 2 completion done . PRESENTATION.pptx
Predictive modeling basics in data cleaning process
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
Transcultural that can help you someday.
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
annual-report-2024-2025 original latest.
DU, AIS, Big Data and Data Analytics.ppt
CYBER SECURITY the Next Warefare Tactics
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
AI Strategy room jwfjksfksfjsjsjsjsjfsjfsj
Global Data and Analytics Market Outlook Report
Managing Community Partner Relationships
Pilar Kemerdekaan dan Identi Bangsa.pptx
Data Engineering Interview Questions & Answers Batch Processing (Spark, Hadoo...

Quantitative analysis: A brief introduction

  • 1. Quantitative analysis A brief introduction Petri Lankoski, 2018 1
  • 2. You should be familiar with following • Mean (medelvärde), for a normal distribution • Median (median) • Mode (typvärde) • Line chart (linjediagram) • Bar chart (stapeldiagram) Petri Lankoski, 2018 2
  • 3. Is the Die Loaded? 11st throw 12st throw 43st throw 14st throw 25st throw We cannot say for certain, but we can estimate how likely or unlikely the perceived sequence is In long run we expect to see equal amount of 1s, 2s, 3s, 4s, 5s and 6s 16st throw Chance to get 1 is 1/6, but as first throw, this is as likely as any other result. We do not have enough information to say anything more about this six throws is probably still too little to estimate the die, so we would need to roll more… Petri Lankoski, 2018 3
  • 4. Is the Die Loaded? 1 1 4 1 2 1 3 6 1 1 1 5 Testing this sequence against expected sequence indicate that the die is loaded • But we have around 1% change to be wrong We roll following sequence: 2 6 2 6 6 4 6 5 4 1 3 4 4 6 5 3 5 3 2 5 • Amounts of 6s and 1s does not match to expected amounts • We would have 70% likelihood of being wrong if we claim that the die is load Petri Lankoski, 2018 4
  • 5. Boxplot Median IQR, 50% of data 1.5 * IQR Petri Lankoski, 2018 5
  • 6. density and violin plot Violin plot is a form of density plot Petri Lankoski, 2018 6 Density plot and data points
  • 7. Scatter plot -2 -1 0 1 2 -3-2-10123 Variable 1 Variable2 Scatter plot shows values of two variables • For example how a participant answered to questions Petri Lankoski, 2018 7
  • 8. Random sampling Predicting election results - It is not practically possible to ask all what they will vote - Picking a sample of people randomly & asking them However, we know that there is uncertainty here If random sample again, we might get something else We get: A: 37.6% B: 12.3% C: 33.1% D: 5.2% … We get: A: 36.9% B: 13.0% C: 32.7% D: 6.1% … We can estimate uncertainty, but we need to make some assumptions Petri Lankoski, 2018 8 We get: A: 38.7% B: 11.0% C: 31.7% D: 6.3% …
  • 9. Normal distribution 1𝜎 2𝜎-2𝜎 -1𝜎 0𝜎 68.3% 95.4% of data 9 𝜎 = standard deviation • describes the width of distribution
  • 10. Back to polling 1.96𝜎-1.96𝜎 0𝜎 95% of population is in the area of ∓1.96𝜎; sample distribution behaves similarly However, within 95% certainty what we observed falls in area between -1.96𝜎 and 1.96𝜎. We cannot know where in population distribution what we observed was (red vertical lines). 10 We do not know true population value (black vertical line). Support for A 36.1% 38.7% 37.6%
  • 11. Random sampling Instead of uncertainty, confidence is usually used. Confidence interval (CI), usually 95%, is function of sample size and probability of someone choosing a candidate. 0.376 ∓ 1.96 ∗ √ 0.376(1 − 0.376) 𝑁 𝜎95%A Petri Lankoski, 2018 11 We can backtrack from the sample distribution and estimate the uncertainty in what we observed when polling • When we poll next time within 95% certainty what we observed falls in area between -1.96𝜎 and 1.96𝜎
  • 12. Are two means different, t-test? A B∆ We have two sample means A and B Their difference is ∆=B-A Mean is calculated based on sampled values Mean(A) = ∑𝑎 𝑛 (for normally distruted variables) To extrapolate if the there is difference between groups A and B in population level (from witch A and B were sampled) we need to account uncertainty. Again population mean and sample mean can be different. Petri Lankoski, 2018 12
  • 13. Are two means different, t-test? A B∆ We have two sample means A and B Their difference is ∆=B-A t statistic describes difference so that it takes into account variance (𝜎2) and sample size p describes probability that perceived data deviates from null hypothesis; in case null hypothesis of t-test, is the means are not different. p depends on t-value and sample size; high t-value means lower p. p = 0.05 means that there is 5% change that observed data did not deviate from expected, there is no difference. P<0.05 is a typical statistically significant result criterion. Petri Lankoski, 2018 13
  • 14. Are tree means different, one-way ANOVA • One-way ANOVA is similar to t-test • F-statistic describes difference so that it takes into account variance and sample size • p describes probability that perceived data deviates from null hypothesis; in case null hypothesis of ANOVA, is the means are not different • A significant result (p<0.05) tells that at least one mean differ from others • But not which • Post hoc comparisons are needed to determine which variable differs from which Petri Lankoski, 2018 14
  • 15. Correlation Correlation (r) describes the strength of association between two variables p describes the likelihood that the observed correlation deviates from what is expected under null hypothesis (which is that there is no relation between the two variables) Correlation does not tell if v1 causes v2 or vice versa • There is a strong correlation between ice cream sales and drowning • Either is causing another • Third variable, temperature, related to both Petri Lankoski, 2018 15

Editor's Notes

  • #5: https://stats.stackexchange.com/questions/3194/how-can-i-test-the-fairness-of-a-d20/3735#3735 chisq.test(table(c(1,1,4,1,2,1,3,6,1,1,1,5)), p = rep(1/6,6)) Chi-squared test for given probabilities data: table(c(1, 1, 4, 1, 2, 1, 3, 6, 1, 1, 1, 5)) X-squared = 15, df = 5, p-value = 0.01036 Note that we cannot test if the die is not biased. We can only test if behaves enough unexpectly rolls = sample(1:6, 20, replace=TRUE) # 20 times d6 chisq.test(table(rolls), p = rep(1/6,6))
  • #9: Polling is done via random sampling using telephone catalog. However, people owning a phone and people voting are not the same populations and the poll results are systematically off; however, there are techniques counter the sampling bias, especially in the case of voting when it is possible to compare results to poll results.
  • #10: 𝜎=standard deviation, describes the width of distribution Black vertical line: population value Red vertical line: sample values
  • #11: 𝜎=standard deviation, describes the width of distribution Black vertical line: population value Red vertical line: sample values
  • #14: The standard deviation is the square root of the variance.