SlideShare a Scribd company logo
OBJECTIVES OF STATISTICSAND VISUALIZATION
• It's a revolution. We're really just getting under way. But
the march of quantification, made possible by enormous
new sources of data, will sweep through academia,
business and government. There is no area that is going
to be untouched. Gary King
All of us now are being blasted by information design. It's
being poured into our eyes through the Web, and we're all
visualizers now; we're all demanding a visual aspect to our
information...And if you're navigating a dense information
jungle, coming across a beautiful graphic or a lovely data
visualization, it's a relief, it's like coming across a clearing
in the jungle.
David McCandless
There is more data available to us now than we can possibly process. Every
minute, Internet users add the following to the big data pool (i):
204,166,667 email messages sent
More than 2,000,000 Google searches
684,478 pieces of content added on Facebook
$272,070 spent by consumers via online shopping
More than 100,000 tweets on Twitter
47,000 app downloads from Apple
34,722 "likes" on Facebook for different brands and organizations
27,778 new posts on Tumblr blogs
3,600 new photos on Instagram
3,125 new photos on Flickr
2,083 check-ins on Foursquare
571 new websites created
347 new blog posts published on Wordpress
217 new mobile web users
48 hours of new video on YouTube
http://www.ted.com/read/ted-studies/statistics/introductory-essay
Data data everywhere, how much sense
does it make?
What good data presentation can do?
How the Recession Reshaped the
Economy, in 255 Charts
'How the Recession Reshaped the Economy, in 255 Charts' is sharing the
'pièce de résistance de 2014' prize so far and probably stands up well against
anything else produced in recent memory. It is a modern masterpiece,
supremely conceived and executed interactive graphic produced by Jeremy
Ashkanas and Alicia Parlapiano for TheUpshot (more later), showing how job
numbers have fared across 255 different industries in the past 10 years.
Pigeonholing it is hard: it is a small-multiple-scatterplotted area charts
extravaganza. Some of the tool-tip interactivity detail is extraordinary, which
kind of makes the single complaint - that of the colour scale failing on the red-
green issue - a slightly surprising oversight, albeit the position and direction of
the charts provides the main encoding.
http://www.nytimes.com/interactive/2014/06/05/upshot/how-the-recession-reshaped-the-economy-in-255-charts.html?&_r=1&abt=0002&abg=1
http://postgraphics.tumblr.com/
Is it data or the visualization that we like?
Introduction to data visualization 1
Introduction to data visualization 1
Visualizing uncertainty
Introduction to data visualization 1
Introduction to data visualization 1
improving-visualisation.org
http://www.visualisingdata.com/index.php/2015/02/references-visualising-
uncertainty/
http://improving-visualisation.org/
3D London maps, overlaid with air quality data
http://www.londonair.org.uk/london/asp/virtualmaps.asp?view=maps
http://www.newscientist.com/data/images/archive/2731/27311601.jpg
A modification of the basic histogram in order to display information for two variables.
In this case, the example shows the distribution of some reaction times by gender.
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136
Source:
P. J. Rousseeuw, I. Ruts, and J. W. Tukey (1999). The bagplot: A bivariate boxplot. The American Statistician, 53:382-387.
http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112
A generalisation of the boxplot, this visualisation relies on the median (red star)
and the representation of 50% of the observations (dark blue area).
http://www.informationisbeautiful.net/
Guardian US interactive team (News)
http://www.guardian.co.uk/world/interactive/2012/may/08/gay-rights-united-
states
Textual cross references found in the Bible.
Chris Harrison's Website: page 18 (Local Area Agreement)
http://www.chrisharrison.net/projects/bibleviz/index.html
Representation of all solar, lunar and
interplanetary missions.
http://books.nationalgeographic.com/map/map-day/2008/09/18
Obama's economic stimulus plan
Introduction to data visualization 1
Introduction to data visualization 1
12
What is statistics?
(and why is it so cool?)
• The study of the collection, organization, analysis, and
interpretation of data
• Why bother? Principles provide a framework for
• data collection
• design of experiments and observational studies
• drawing inferences about populations as a whole and of
future events
• Short story: Statistics is the science of using data to prove
a point (hopefully forming a correct conclusion).
20
3 Major Overarching Topics in the Course
1) Describing Data
- Graphically
- Numerically
2) Gathering Data
- Experimental Design and Randomization
- Observational Data and Random Sampling
- Randomness and Probability Models
3) Inferring from Data
- showing statistical significance and p-values
- estimation (confidence intervals) and testing hypothesis
11
Histograms can be sensitive to
definition of the bins (IPS, Ex 1.4)
Introduction to data visualization 1
19
Effect of Shape on Mean and
Median
• In a right skewed distribution, the mean is
greater than the median
• In a left skewed distribution, the mean is less
than the median
• In a symmetric distribution the mean is
approximately (sometimes exactly) equal to the
median
Measuring Spread (Variability) in Data
Two common methods
1. Variance and standard deviation
• Measure spread about the mean
• Most often used, but also sensitive to large
values in skewed distributions
2. Quantiles and percentiles
• Median
• Quartiles and more general percentiles
Introduction to data visualization 1
Introduction to data visualization 1
36
Another Plot Type – Box plots
• Box plots are design to show clearly the center, spread
(especially IQR), and outliers
• They are based on the five-number summary
• Minimum, Q1, Median, Q3, Maximum
• Easiest to explain with an example, using the tuition data.
37
Box plot of Per Capita Income
From Stata Documentation
99% 55755 55755 Kurtosis 5.07413
95% 46344 49852 Skewness 1.275147
90% 42392 46344 Variance 3.29e+07
75% 38712 45877
Largest Std. Dev. 5734.745
50% 34257 Mean 35470.67
25% 31891 29108 Sum of Wgt. 51
10% 29515 27935 Obs 51
5% 27935 27897
1% 26535 26535
Percentiles Smallest
income
. summarize income, detail
20,00030,00040,00050,00060,000
income
42
Effects of linear transformation
yi = a + bxi
• mean (Y) = a + b × mean (X)
• median (Y) = a + b × median (X)
• variance (Y) = b2
× variance (X)
• SD (Y) = |b| × SD (X)
• IQR (Y) = |b| × IQR (X)
In this notation, xi and yi are used for individual
observation, and X and Y are used for the collection of
xi’s and yi’s
52
Density Curves
• Always positive
• Area under the curve equals 1
• Median is point where 50% of area is to the left
and 50% is to the right.
• Mean is
“balance point”

More Related Content

PDF
Theory of Data Visualization_Vinu
PPT
Creating Graphics
PPTX
Spatial is (not) special - Adventures in location-based data
PDF
Making susanne
PPTX
Succeeding with GIS: Keynote at GISRUK 2019
PPTX
Creating Compelling Infographics
PPTX
Continued Experiments in the Storage, Discovery, and Use of GIS Resources for...
PDF
Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data
Theory of Data Visualization_Vinu
Creating Graphics
Spatial is (not) special - Adventures in location-based data
Making susanne
Succeeding with GIS: Keynote at GISRUK 2019
Creating Compelling Infographics
Continued Experiments in the Storage, Discovery, and Use of GIS Resources for...
Cyril Connolly, Lecturer, IADT, Dun Laoghaire: Visualising Road Accident Data

What's hot (7)

PDF
14 s4 i scatter plot final
PPTX
Info 56
PPTX
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
PPTX
On Spatial, Data and Mapping myopia
PPTX
Iftf 20191206 v9
PPT
Biotechnology Virtual lab.day 3
PDF
Data Visualization Best Practice Webinar presentation slides
14 s4 i scatter plot final
Info 56
Data Visualisation using SSRS: Euclid's Royal Road to the numbers
On Spatial, Data and Mapping myopia
Iftf 20191206 v9
Biotechnology Virtual lab.day 3
Data Visualization Best Practice Webinar presentation slides
Ad

Similar to Introduction to data visualization 1 (20)

PPTX
Data visualisation as a campaign tool for change
PDF
Big data @ #Webcoast
PPTX
Paul Maglio 20250421 v14 - AI Digital Twins.pptx
PPTX
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
PPTX
Martha, Martha, REBUILD THE HOUSE V2 Digital and Ark
PPTX
Data Visualization - What can you see? #baai17
KEY
Visualizing the Agency of the Future
PDF
Data Anayltics: How to predict anything
PPTX
20250408 RolandRust Societal Impact of AI .pptx
PPTX
Brian Fabo
PDF
Big Data et eGovernment
PDF
Data Mining And Big Data Brian C Castellani Rajeev Rajaram
PDF
IICT-Big Data.pdf slideshow information to communication
PDF
IICT-Big Data.pdf slideshow Information to communication technology
PDF
Explore Data: Data Science + Visualization
PPTX
20250515 Ntegra San Francisco 20250515 v15.pptx
PPTX
Data visualisation
PPTX
Picmet 20130801 v2
PDF
[Ebooks PDF] download Handbook of Digital Innovation 1st Edition Satish Nambi...
PPTX
Climate change action through artificial intelligence
Data visualisation as a campaign tool for change
Big data @ #Webcoast
Paul Maglio 20250421 v14 - AI Digital Twins.pptx
20250402 ACCA TeamScienceAIEra 20250402 v10.pptx
Martha, Martha, REBUILD THE HOUSE V2 Digital and Ark
Data Visualization - What can you see? #baai17
Visualizing the Agency of the Future
Data Anayltics: How to predict anything
20250408 RolandRust Societal Impact of AI .pptx
Brian Fabo
Big Data et eGovernment
Data Mining And Big Data Brian C Castellani Rajeev Rajaram
IICT-Big Data.pdf slideshow information to communication
IICT-Big Data.pdf slideshow Information to communication technology
Explore Data: Data Science + Visualization
20250515 Ntegra San Francisco 20250515 v15.pptx
Data visualisation
Picmet 20130801 v2
[Ebooks PDF] download Handbook of Digital Innovation 1st Edition Satish Nambi...
Climate change action through artificial intelligence
Ad

More from Sukant Khurana (17)

PPTX
Drug discovery using ai
PPTX
Need of the hour for Covid-19
PPTX
Presentation 1 study of animal symbols
PPTX
Graph theory NYC
PPTX
Eurozone crises
PPTX
Middle man
PPTX
Ann, GA and fuzzy mathematics
PPTX
War Plan
PPTX
Houses in india
PPTX
Bollywood
PPTX
Philosophy on developing world issues
PPTX
PPTX
Agricultural and medicinal data
PPT
Epidemiology
PPTX
Neuroscience epidemiology
PPTX
Epilepsy
PPTX
Epidemiology
Drug discovery using ai
Need of the hour for Covid-19
Presentation 1 study of animal symbols
Graph theory NYC
Eurozone crises
Middle man
Ann, GA and fuzzy mathematics
War Plan
Houses in india
Bollywood
Philosophy on developing world issues
Agricultural and medicinal data
Epidemiology
Neuroscience epidemiology
Epilepsy
Epidemiology

Recently uploaded (20)

PDF
. Radiology Case Scenariosssssssssssssss
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPTX
Introduction to Cardiovascular system_structure and functions-1
PPT
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
PPTX
Microbes in human welfare class 12 .pptx
PDF
Placing the Near-Earth Object Impact Probability in Context
PPTX
Seminar Hypertension and Kidney diseases.pptx
PDF
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
PPTX
C1 cut-Methane and it's Derivatives.pptx
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
PPTX
Application of enzymes in medicine (2).pptx
PDF
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
PDF
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
DOCX
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PDF
Phytochemical Investigation of Miliusa longipes.pdf
PDF
An interstellar mission to test astrophysical black holes
. Radiology Case Scenariosssssssssssssss
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
Introduction to Cardiovascular system_structure and functions-1
Heredity-grade-9 Heredity-grade-9. Heredity-grade-9.
Microbes in human welfare class 12 .pptx
Placing the Near-Earth Object Impact Probability in Context
Seminar Hypertension and Kidney diseases.pptx
Unveiling a 36 billion solar mass black hole at the centre of the Cosmic Hors...
C1 cut-Methane and it's Derivatives.pptx
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
POULTRY PRODUCTION AND MANAGEMENTNNN.pptx
Application of enzymes in medicine (2).pptx
Cosmic Outliers: Low-spin Halos Explain the Abundance, Compactness, and Redsh...
CAPERS-LRD-z9:AGas-enshroudedLittleRedDotHostingaBroad-lineActive GalacticNuc...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Q1_LE_Mathematics 8_Lesson 5_Week 5.docx
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
Phytochemical Investigation of Miliusa longipes.pdf
An interstellar mission to test astrophysical black holes

Introduction to data visualization 1

  • 2. • It's a revolution. We're really just getting under way. But the march of quantification, made possible by enormous new sources of data, will sweep through academia, business and government. There is no area that is going to be untouched. Gary King
  • 3. All of us now are being blasted by information design. It's being poured into our eyes through the Web, and we're all visualizers now; we're all demanding a visual aspect to our information...And if you're navigating a dense information jungle, coming across a beautiful graphic or a lovely data visualization, it's a relief, it's like coming across a clearing in the jungle. David McCandless
  • 4. There is more data available to us now than we can possibly process. Every minute, Internet users add the following to the big data pool (i): 204,166,667 email messages sent More than 2,000,000 Google searches 684,478 pieces of content added on Facebook $272,070 spent by consumers via online shopping More than 100,000 tweets on Twitter 47,000 app downloads from Apple 34,722 "likes" on Facebook for different brands and organizations 27,778 new posts on Tumblr blogs 3,600 new photos on Instagram 3,125 new photos on Flickr 2,083 check-ins on Foursquare 571 new websites created 347 new blog posts published on Wordpress 217 new mobile web users 48 hours of new video on YouTube http://www.ted.com/read/ted-studies/statistics/introductory-essay
  • 5. Data data everywhere, how much sense does it make?
  • 6. What good data presentation can do?
  • 7. How the Recession Reshaped the Economy, in 255 Charts 'How the Recession Reshaped the Economy, in 255 Charts' is sharing the 'pièce de résistance de 2014' prize so far and probably stands up well against anything else produced in recent memory. It is a modern masterpiece, supremely conceived and executed interactive graphic produced by Jeremy Ashkanas and Alicia Parlapiano for TheUpshot (more later), showing how job numbers have fared across 255 different industries in the past 10 years. Pigeonholing it is hard: it is a small-multiple-scatterplotted area charts extravaganza. Some of the tool-tip interactivity detail is extraordinary, which kind of makes the single complaint - that of the colour scale failing on the red- green issue - a slightly surprising oversight, albeit the position and direction of the charts provides the main encoding.
  • 10. Is it data or the visualization that we like?
  • 19. 3D London maps, overlaid with air quality data http://www.londonair.org.uk/london/asp/virtualmaps.asp?view=maps
  • 21. A modification of the basic histogram in order to display information for two variables. In this case, the example shows the distribution of some reaction times by gender. http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=136
  • 22. Source: P. J. Rousseeuw, I. Ruts, and J. W. Tukey (1999). The bagplot: A bivariate boxplot. The American Statistician, 53:382-387. http://addictedtor.free.fr/graphiques/RGraphGallery.php?graph=112 A generalisation of the boxplot, this visualisation relies on the median (red star) and the representation of 50% of the observations (dark blue area).
  • 24. Guardian US interactive team (News) http://www.guardian.co.uk/world/interactive/2012/may/08/gay-rights-united- states
  • 25. Textual cross references found in the Bible. Chris Harrison's Website: page 18 (Local Area Agreement) http://www.chrisharrison.net/projects/bibleviz/index.html
  • 26. Representation of all solar, lunar and interplanetary missions. http://books.nationalgeographic.com/map/map-day/2008/09/18 Obama's economic stimulus plan
  • 29. 12 What is statistics? (and why is it so cool?) • The study of the collection, organization, analysis, and interpretation of data • Why bother? Principles provide a framework for • data collection • design of experiments and observational studies • drawing inferences about populations as a whole and of future events • Short story: Statistics is the science of using data to prove a point (hopefully forming a correct conclusion).
  • 30. 20 3 Major Overarching Topics in the Course 1) Describing Data - Graphically - Numerically 2) Gathering Data - Experimental Design and Randomization - Observational Data and Random Sampling - Randomness and Probability Models 3) Inferring from Data - showing statistical significance and p-values - estimation (confidence intervals) and testing hypothesis
  • 31. 11 Histograms can be sensitive to definition of the bins (IPS, Ex 1.4)
  • 33. 19 Effect of Shape on Mean and Median • In a right skewed distribution, the mean is greater than the median • In a left skewed distribution, the mean is less than the median • In a symmetric distribution the mean is approximately (sometimes exactly) equal to the median
  • 34. Measuring Spread (Variability) in Data Two common methods 1. Variance and standard deviation • Measure spread about the mean • Most often used, but also sensitive to large values in skewed distributions 2. Quantiles and percentiles • Median • Quartiles and more general percentiles
  • 37. 36 Another Plot Type – Box plots • Box plots are design to show clearly the center, spread (especially IQR), and outliers • They are based on the five-number summary • Minimum, Q1, Median, Q3, Maximum • Easiest to explain with an example, using the tuition data.
  • 38. 37 Box plot of Per Capita Income From Stata Documentation 99% 55755 55755 Kurtosis 5.07413 95% 46344 49852 Skewness 1.275147 90% 42392 46344 Variance 3.29e+07 75% 38712 45877 Largest Std. Dev. 5734.745 50% 34257 Mean 35470.67 25% 31891 29108 Sum of Wgt. 51 10% 29515 27935 Obs 51 5% 27935 27897 1% 26535 26535 Percentiles Smallest income . summarize income, detail 20,00030,00040,00050,00060,000 income
  • 39. 42 Effects of linear transformation yi = a + bxi • mean (Y) = a + b × mean (X) • median (Y) = a + b × median (X) • variance (Y) = b2 × variance (X) • SD (Y) = |b| × SD (X) • IQR (Y) = |b| × IQR (X) In this notation, xi and yi are used for individual observation, and X and Y are used for the collection of xi’s and yi’s
  • 40. 52 Density Curves • Always positive • Area under the curve equals 1 • Median is point where 50% of area is to the left and 50% is to the right. • Mean is “balance point”

Editor's Notes

  • #6: There are good ways and bad ways of presenting data.
  • #25: An interactive dashboard which explores how the handling of gay rights issues vary by state and follow trends by region across the US