SlideShare a Scribd company logo
“The human understanding, on account of its
own nature, readily supposes a greater order
and uniformity in things than it finds. And ...
it devises parallels and correspondences and
relations which are not there.”
—Francis Bacon, 1620
Wednesday, 10 November 2010
“The human understanding, on account of its
own nature, readily supposes a greater order
and uniformity in things than it finds. And ...
it devises parallels and correspondences and
relations which are not there.”
—Francis Bacon, 1620
Is what we see reallythere?
Wednesday, 10 November 2010
October 2010
Hadley Wickham, Dianne Cook,
Heike Hofmann, Andreas Buja
Graphical inference
for infovis
Wednesday, 10 November 2010
Which one of these plots is not like the others?
Which of these plots just doesn’t belong?
Wednesday, 10 November 2010
7 of those plots were plots of random
(null) data. 1 plot was the real data.
If you correctly picked the true
plot from the null plots then we
have evidence that it really is
different.
In fact, we have rigorous statistical
evidence that there is a difference, just
using Sesame Street skills!
Wednesday, 10 November 2010
1. The statistical justice system
2. Line up protocol
3. Rorschach protocol
4. Future work
Wednesday, 10 November 2010
http://www.flickr.com/photos/joegratz/117048243
Hypothesis testing?
Wednesday, 10 November 2010
http://www.flickr.com/photos/joegratz/117048243
The statistical justice system
Hypothesis testing?
Wednesday, 10 November 2010
Ho: null hypothesis
Ha: alternative hypothesis
Defence
Prosecution
Wednesday, 10 November 2010
Ho: null hypothesis
Ha: alternative hypothesis
Defence
Prosecution
Null distribution Innocents
Wednesday, 10 November 2010
Ho: null hypothesis
Ha: alternative hypothesis
Defence
Prosecution
Reject the null
Fail to reject the null
Guilty
Not guilty
Null distribution Innocents
Wednesday, 10 November 2010
Ho: null hypothesis
Ha: alternative hypothesis
Defence
Prosecution
Reject the null
Fail to reject the null
Guilty
Not guilty
Null distribution Innocents
p-value Probability that a truly
innocent dataset
would look as guilty
as the suspect
Wednesday, 10 November 2010
Line up
Wednesday, 10 November 2010
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
Five tag clouds of selected words from the 1st (red) and 6th (blue)
editions of Darwin’s “Origin of Species”. Four of the tag clouds were
generated under the null hypothesis of no difference between editions,
and one is the true data. Can you spot it?
Wednesday, 10 November 2010
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
believe believe
case
caseclosely
closely descendants
descendants few few
long long modified
modified variations
variations very
very view view
Five tag clouds of selected words from the 1st (red) and 6th (blue)
editions of Darwin’s “Origin of Species”. Four of the tag clouds were
generated under the null hypothesis of no difference between editions,
and one is the true data. Can you spot it?
Wednesday, 10 November 2010
Protocol
Generate n-1 decoys
(null datasets)
Plot the decoys + the real data
(randomly positioned)
Show to an impartial observer.
Can they spot the real data?
If so, you have evidence for true difference
(p-value = 1/n)
Wednesday, 10 November 2010
E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a
photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954.
Wednesday, 10 November 2010
A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer-
generated picture. The Psychological Record, 16:1–10, 1966.
Wednesday, 10 November 2010
vs. classical tests
Of course, if we know what we’re looking
for, we can always develop an algorithm
or numerical test.
The advantage of visual inference is that
works for very general tasks, including
when you don’t know exactly what you’re
looking for.
Wednesday, 10 November 2010
ower of the test
!
Power
0.0
0.2
0.4
0.6
0.8
1.0
0.0
0.2
0.4
0.6
0.8
1.0
sigma = 12
!15 !10 !5 0 5 10 15
sigma = 5
!15 !10 !5 0 5 10 15
samplesize=100samplesize=300
power_curve
Theoretical test
Visual test
lower_CL
upper_CL
Recent work shows that power only
a little worse than classical test
Wednesday, 10 November 2010
Plot Task
Choropleth
map
Is there a spatial trend?
Treemap
Is the distribution in higher
level categories the same?
Scatterplot
Are the two variables
independent?
Time series
Is there a trend in mean or
variability?
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Wednesday, 10 November 2010
Once we’ve seen the plot,
we’re no longer impartial
Wednesday, 10 November 2010
Code
# Support package written in R
# http://github.com/ggobi/nullabor
# Provides reference implementation of ideas
library(nullabor)
library(ggplot2)
qplot(angle * 180 / pi, r, data = threept) %+%
lineup(null_model(r ~ poly(angle, 2)), n = 10) +
facet_wrap(~ .sample, ncol = 5)
Wednesday, 10 November 2010
Rorschach
Wednesday, 10 November 2010
Rorschach
We’re surprisingly bad at appreciating the
amount of variation in random data.
Showing only null plots is a good way to
calibrate our intuition.
We also plan on using these plots as an
empirical tool to understand what features
people pick up on. Anecdotally,
undergrads focus too much on outliers
Wednesday, 10 November 2010
result
count
0
20
40
60
80
100
0
20
40
60
80
100
0
20
40
60
80
100
1
4
7
0.0 0.2 0.4 0.6 0.8 1.0
2
5
8
0.0 0.2 0.4 0.6 0.8 1.0
3
6
9
0.0 0.2 0.4 0.6 0.8 1.0
Wednesday, 10 November 2010
Future work
Wednesday, 10 November 2010
Future work
How can visual inference be integrated
into visualisation software at a
fundamental level?
How does training impact results? How do
novices vs. experts differ?
What patterns do people pick up on?
What are the alternatives that people
respond to?
Wednesday, 10 November 2010
Questions?
Wednesday, 10 November 2010
Wednesday, 10 November 2010
This work is licensed under the Creative
Commons Attribution-Noncommercial 3.0 United
States License. To view a copy of this license,
visit http://creativecommons.org/licenses/by-nc/
3.0/us/ or send a letter to Creative Commons,
171 Second Street, Suite 300, San Francisco,
California, 94105, USA.
Wednesday, 10 November 2010

More Related Content

PDF
02 Ddply
PDF
23 data-structures
PDF
04 Wrapup
PDF
20 date-times
PDF
03 Modelling
PDF
Reshaping Data in R
PDF
01 Intro
PDF
27 development
02 Ddply
23 data-structures
04 Wrapup
20 date-times
03 Modelling
Reshaping Data in R
01 Intro
27 development

Viewers also liked (17)

PDF
27 development
PDF
16 Sequences
PDF
24 modelling
PPT
Correlations, Trends, and Outliers in ggplot2
PDF
PDF
03 Conditional
PDF
Model Visualisation (with ggplot2)
PDF
R workshop iii -- 3 hours to learn ggplot2 series
PDF
R packages
PPTX
Machine learning in R
PDF
4 R Tutorial DPLYR Apply Function
PDF
Data manipulation with dplyr
PDF
Data Manipulation Using R (& dplyr)
PDF
Introducing natural language processing(NLP) with r
PDF
Grouping & Summarizing Data in R
PDF
Elegant Graphics for Data Analysis with ggplot2
PDF
Rsplit apply combine
27 development
16 Sequences
24 modelling
Correlations, Trends, and Outliers in ggplot2
03 Conditional
Model Visualisation (with ggplot2)
R workshop iii -- 3 hours to learn ggplot2 series
R packages
Machine learning in R
4 R Tutorial DPLYR Apply Function
Data manipulation with dplyr
Data Manipulation Using R (& dplyr)
Introducing natural language processing(NLP) with r
Grouping & Summarizing Data in R
Elegant Graphics for Data Analysis with ggplot2
Rsplit apply combine
Ad

Similar to Graphical inference (17)

PDF
Dynamics of Internet-mediated partnership formation
PDF
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
PPT
Dynamic lexicon brazil 2018
PDF
Reimagining the Archive keynote presentation
PPTX
Augmented reality sandbox
PDF
Quantitative Narrative Analysis First Edition Roberto P Franzosi
PDF
20 Estimation
PPT
Sense Perception
PDF
AI, Sherlock Holmes style - Introduction to automated Abductive Inference
DOCX
Anthropology 130 Research Simulation 3Forensic Anthropolog.docx
PDF
2010 - Projeto Abelhas de Blackawton
PDF
Our New Super Powers
PDF
Finding Ostriches in the Courtroom
PDF
Index Of Wp-ContentUploads201001. Online assignment writing service.
PDF
Creatività sovrumana - gli impossibili, possibili
PDF
Descriptive Essay Describing A Person
PPT
University of California, Berkeley: iSchool Nov, 2009
Dynamics of Internet-mediated partnership formation
BACK TO THE DRAWING BOARD - The Myth of Data-Driven NLU and How to go Forward...
Dynamic lexicon brazil 2018
Reimagining the Archive keynote presentation
Augmented reality sandbox
Quantitative Narrative Analysis First Edition Roberto P Franzosi
20 Estimation
Sense Perception
AI, Sherlock Holmes style - Introduction to automated Abductive Inference
Anthropology 130 Research Simulation 3Forensic Anthropolog.docx
2010 - Projeto Abelhas de Blackawton
Our New Super Powers
Finding Ostriches in the Courtroom
Index Of Wp-ContentUploads201001. Online assignment writing service.
Creatività sovrumana - gli impossibili, possibili
Descriptive Essay Describing A Person
University of California, Berkeley: iSchool Nov, 2009
Ad

More from Hadley Wickham (20)

PDF
PDF
19 tables
PDF
18 cleaning
PDF
17 polishing
PDF
16 critique
PDF
15 time-space
PDF
14 case-study
PDF
13 case-study
PDF
12 adv-manip
PDF
11 adv-manip
PDF
11 adv-manip
PDF
10 simulation
PDF
10 simulation
PDF
09 bootstrapping
PDF
08 functions
PDF
07 problem-solving
PDF
PDF
05 subsetting
PDF
04 reports
PDF
03 extensions
19 tables
18 cleaning
17 polishing
16 critique
15 time-space
14 case-study
13 case-study
12 adv-manip
11 adv-manip
11 adv-manip
10 simulation
10 simulation
09 bootstrapping
08 functions
07 problem-solving
05 subsetting
04 reports
03 extensions

Recently uploaded (20)

PDF
STATICS OF THE RIGID BODIES Hibbelers.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
Institutional Correction lecture only . . .
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Insiders guide to clinical Medicine.pdf
PDF
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
Pharmacology of Heart Failure /Pharmacotherapy of CHF
PPTX
Pharma ospi slides which help in ospi learning
PDF
RMMM.pdf make it easy to upload and study
PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Pre independence Education in Inndia.pdf
PDF
Complications of Minimal Access Surgery at WLH
STATICS OF THE RIGID BODIES Hibbelers.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
Institutional Correction lecture only . . .
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
human mycosis Human fungal infections are called human mycosis..pptx
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Insiders guide to clinical Medicine.pdf
Black Hat USA 2025 - Micro ICS Summit - ICS/OT Threat Landscape
Renaissance Architecture: A Journey from Faith to Humanism
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Sports Quiz easy sports quiz sports quiz
Pharmacology of Heart Failure /Pharmacotherapy of CHF
Pharma ospi slides which help in ospi learning
RMMM.pdf make it easy to upload and study
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pre independence Education in Inndia.pdf
Complications of Minimal Access Surgery at WLH

Graphical inference

  • 1. “The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620 Wednesday, 10 November 2010
  • 2. “The human understanding, on account of its own nature, readily supposes a greater order and uniformity in things than it finds. And ... it devises parallels and correspondences and relations which are not there.” —Francis Bacon, 1620 Is what we see reallythere? Wednesday, 10 November 2010
  • 3. October 2010 Hadley Wickham, Dianne Cook, Heike Hofmann, Andreas Buja Graphical inference for infovis Wednesday, 10 November 2010
  • 4. Which one of these plots is not like the others? Which of these plots just doesn’t belong? Wednesday, 10 November 2010
  • 5. 7 of those plots were plots of random (null) data. 1 plot was the real data. If you correctly picked the true plot from the null plots then we have evidence that it really is different. In fact, we have rigorous statistical evidence that there is a difference, just using Sesame Street skills! Wednesday, 10 November 2010
  • 6. 1. The statistical justice system 2. Line up protocol 3. Rorschach protocol 4. Future work Wednesday, 10 November 2010
  • 8. http://www.flickr.com/photos/joegratz/117048243 The statistical justice system Hypothesis testing? Wednesday, 10 November 2010
  • 9. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Wednesday, 10 November 2010
  • 10. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Null distribution Innocents Wednesday, 10 November 2010
  • 11. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Reject the null Fail to reject the null Guilty Not guilty Null distribution Innocents Wednesday, 10 November 2010
  • 12. Ho: null hypothesis Ha: alternative hypothesis Defence Prosecution Reject the null Fail to reject the null Guilty Not guilty Null distribution Innocents p-value Probability that a truly innocent dataset would look as guilty as the suspect Wednesday, 10 November 2010
  • 13. Line up Wednesday, 10 November 2010
  • 14. believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it? Wednesday, 10 November 2010
  • 15. believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view believe believe case caseclosely closely descendants descendants few few long long modified modified variations variations very very view view Five tag clouds of selected words from the 1st (red) and 6th (blue) editions of Darwin’s “Origin of Species”. Four of the tag clouds were generated under the null hypothesis of no difference between editions, and one is the true data. Can you spot it? Wednesday, 10 November 2010
  • 16. Protocol Generate n-1 decoys (null datasets) Plot the decoys + the real data (randomly positioned) Show to an impartial observer. Can they spot the real data? If so, you have evidence for true difference (p-value = 1/n) Wednesday, 10 November 2010
  • 17. E. L. Scott, C. D. Shane, and M. D. Swanson. Comparison of the synthetic and actual distribution of galaxies on a photographic plate. Astrophysical Journal, 119:91–112, Jan. 1954. Wednesday, 10 November 2010
  • 18. A. M. Noll. Human or machine: A subjective comparison of Piet Mondrian’s “composition with lines” (1917) and a computer- generated picture. The Psychological Record, 16:1–10, 1966. Wednesday, 10 November 2010
  • 19. vs. classical tests Of course, if we know what we’re looking for, we can always develop an algorithm or numerical test. The advantage of visual inference is that works for very general tasks, including when you don’t know exactly what you’re looking for. Wednesday, 10 November 2010
  • 20. ower of the test ! Power 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 sigma = 12 !15 !10 !5 0 5 10 15 sigma = 5 !15 !10 !5 0 5 10 15 samplesize=100samplesize=300 power_curve Theoretical test Visual test lower_CL upper_CL Recent work shows that power only a little worse than classical test Wednesday, 10 November 2010
  • 21. Plot Task Choropleth map Is there a spatial trend? Treemap Is the distribution in higher level categories the same? Scatterplot Are the two variables independent? Time series Is there a trend in mean or variability? Wednesday, 10 November 2010
  • 25. Once we’ve seen the plot, we’re no longer impartial Wednesday, 10 November 2010
  • 26. Code # Support package written in R # http://github.com/ggobi/nullabor # Provides reference implementation of ideas library(nullabor) library(ggplot2) qplot(angle * 180 / pi, r, data = threept) %+% lineup(null_model(r ~ poly(angle, 2)), n = 10) + facet_wrap(~ .sample, ncol = 5) Wednesday, 10 November 2010
  • 28. Rorschach We’re surprisingly bad at appreciating the amount of variation in random data. Showing only null plots is a good way to calibrate our intuition. We also plan on using these plots as an empirical tool to understand what features people pick up on. Anecdotally, undergrads focus too much on outliers Wednesday, 10 November 2010
  • 29. result count 0 20 40 60 80 100 0 20 40 60 80 100 0 20 40 60 80 100 1 4 7 0.0 0.2 0.4 0.6 0.8 1.0 2 5 8 0.0 0.2 0.4 0.6 0.8 1.0 3 6 9 0.0 0.2 0.4 0.6 0.8 1.0 Wednesday, 10 November 2010
  • 30. Future work Wednesday, 10 November 2010
  • 31. Future work How can visual inference be integrated into visualisation software at a fundamental level? How does training impact results? How do novices vs. experts differ? What patterns do people pick up on? What are the alternatives that people respond to? Wednesday, 10 November 2010
  • 34. This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/ 3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA. Wednesday, 10 November 2010