SlideShare a Scribd company logo
TODAY!
Review of Descriptive Statistics
•  Mean, Median, Mode, Standard Deviation
•  Frequency tables and histograms
•  The Normal Distribution (AKA: Gaussian, Bell-curve)
BREAK
Descriptive Statistics with Excel
•  Find Mean, Median, Mode, and Standard Deviation for two different
example distributions
•  Create histograms for the two different distributions
•  Compare them with the normal distribution
Descriptive Statistics with Tableau
•  Create histogram for an example distribution
•  Compare it to the normal distribution
BREAK
Introduction to Processing
•  Create histogram for an example distribution
•  Compare it to the normal distribution.
But First:
-  For those of you interested in tracking search trends, check out Google Trends
https://www.google.ca/trends/
-  For those of you interested in tracking your site traffic, look at Google Analytics
http://www.google.com/analytics/
-  Unfinished from last class: Geographical Representations using Tableau
Descriptive Statistics Explained Using Cows
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
mean =
total _cows
number _of _ farmers
1) Find the average number of cows per farmer.
mean =
1+2+2+3+4+5+5+6+18+20
10
= 6.6
Average (mean) is 6.6 cows per farmer
Descriptive Statistics Explained Using Cows
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
2) Find the median number of cows per farmer.
The median is the value that separates the lower
half of the distribution from the higher half of the
distribution.
Half of all farmers have more cows than the
median, and half of the farmers have fewer cows
than the median.
The median of this distribution is about 4.5
Descriptive Statistics Explained Using Cows
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
3) Find the mode number of cows per farmer.
The mode is the value that appears most often in
a set of data.
In a probability distribution, it is the most probably
value in the distribution.
A distribution that has two modes is said to be
bimodal.
The modes of this distribution are 2 and 5
Descriptive Statistics Explained Using Cows
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
std =
Σ(cows − mean)2
number _of _ farmers
4) Find the standard deviation in the number of
cows per farmer.
std =
(1−6.7)2
+(2−6.7)2
+(2−6.7)2
+(3−6.7)2
+(4−6.7)2
+(5−6.7)2
+(5−6.7)2
+(6−6.7)2
+(18−6.7)2
+(20−6.7)2
10
The standard deviation of this distribution is 6.7
The Standard Deviation is a measure of
how spread out numbers are.
Frequency Table
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
Number of
Cows	
  
Number of
Farmers
1	
   1
2	
   2
3	
   1
4	
   1
5	
   2
6	
   1
7	
   0
8	
   0
9	
   0
10	
   0
11	
   0
12	
   0
13	
   0
14	
   0
15	
   0
16	
   0
17	
   0
18	
   1
19	
   0
20	
   1
Histograms
Number of
Cows	
  
Number of
Farmers
1	
   1
2	
   2
3	
   1
4	
   1
5	
   2
6	
   1
7	
   0
8	
   0
9	
   0
10	
   0
11	
   0
12	
   0
13	
   0
14	
   0
15	
   0
16	
   0
17	
   0
18	
   1
19	
   0
20	
   1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1
3
2
1
Number of Cows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3
2
1
Histogram of our cow distribution
Number of Cows
NumberofFarmers
Mean (average)
Standard deviationStandard deviation
Median
Note: the number of farmers having a given number of cows is proportional to the
probability of a farmer having a given number of cows.
The Normal Distribution
Also known as the Gaussian function, and as the Bell Curve.
This is the most commonly occurring distribution.
It the the special case in which mean, median, mode are the same
It describes random variations around an average measure (random
fluctuations about the mean)
Examples: heights of men, heights of women, motion of particles in the
air, grade distributions for a course, …
P σ = standard deviation
µ  = mean
x = number of cows
The Normal Distribution
σ = standard deviation
µ  = mean (the average)
x = number of cows
P(x)
x
Here x=µ
This is the average (the mean)
This is also the median.
This is also the mode.
P
The Normal Distribution
σ = standard deviation
µ  = mean (the average)
x = number of cows
P
Back to cows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
3
2
1
Number of Cows
NumberofFarmers
Mean (average)
Standard deviationStandard deviation
Median
Normal distribution for:
Mean = 6.6 cows
Standard Deviation = 6.7 cows
Probability of having this many cows
Now – Let’s remove the outliers to see what happens
1 2 3 4 5 6 7
3
2
1
Number of Cows
NumberofFarmers
Farmer Number
of Cows
John 1	
  
Bob 2	
  
Sue 2	
  
Mary 3	
  
Jim 4	
  
Sally 5	
  
Fran 5	
  
Pat 6	
  
Bill 18	
  
Tom 20	
  
Number of
Cows	
  
Number of
Farmers
1	
   1
2	
   2
3	
   1
4	
   1
5	
   2
6	
   1
Mean = 3.5
Median = 3.5
Standard Deviation = 1.8
1 2 3 4 5 6 7
3
2
1
Number of Cows
NumberofFarmers
Now – Let’s remove the outliers to see what happens
Mean (average), also the Median
σσ
Mean = 3.5
Median = 3.5
Standard Deviation σ = 1.8
Probability of having this many cows
The normal distribution for these numbers:
Next, we will look at larger distributions and use Excel and Tableau to manipulate
the data, draw a histogram, and compare our data to the normal distribution.
For our next example, we will look at distribution of class grades. Please
download the example data from Canvas
If you are running Excel from windows, the histogram feature is already built in:
http://www.excel-easy.com/examples/histogram.html
But if you are running Excel from a Mac, then you will need this add-on:
http://www.analystsoft.com/en/products/statplusmacle/
Here are additional instructions:
http://www.gilsmethod.com/how-to-enable-statsplus-in-excel-2011-for-mac
During the break, please:
-  download the data
-  download and install the Excel add-on (if needed)
-  download and install Tableau Public (if you haven’t already)
During the second break:
- download Processing, and download my example code.

More Related Content

PDF
18 cleaning
PDF
CHAPTER 3 EXERCISES (Set 2)
PDF
Class 5
PPT
Class lecture notes #1 (statistics for research)
PPTX
River System, Erosion And Deposition Review
PPT
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
PPTX
Statistics
PPT
Statistics –meaning and uses1
18 cleaning
CHAPTER 3 EXERCISES (Set 2)
Class 5
Class lecture notes #1 (statistics for research)
River System, Erosion And Deposition Review
Basic statistics by Neeraj Bhandari ( Surkhet.Nepal )
Statistics
Statistics –meaning and uses1

Similar to Introduction to Data Visualization (20)

PPT
chapter three Sampling_distributions_1.ppt
PPT
Chapter one on sampling distributions.ppt
PPTX
Lecture 4 - probability distributions (2).pptx
PPTX
Statistics - Basics
PPTX
2012 2013 3rd 9 weeks midterm review
PPTX
Sampling Distributions and Estimators
PPTX
Sampling Distributions and Estimators
PPT
lecture6.ppt
PPT
continuous probability distributions.ppt
PPT
Displaying quantitative data
PDF
Statistics for biology
PPTX
Lect 3 background mathematics for Data Mining
PPT
Normal Distribution
PPTX
11-Finding-the-Mean-and-Variance-of-the-Sampling-Distribution-of-Means.pptx
PDF
Types of Probability Distributions - Statistics II
PPTX
Lect 3 background mathematics
PPTX
Ders 1 mean mod media st dev.pptx
PPT
Statistics Primer
PPTX
Pengenalan Ekonometrika
PDF
Quantitative Methods in Business - Lecture (2)
chapter three Sampling_distributions_1.ppt
Chapter one on sampling distributions.ppt
Lecture 4 - probability distributions (2).pptx
Statistics - Basics
2012 2013 3rd 9 weeks midterm review
Sampling Distributions and Estimators
Sampling Distributions and Estimators
lecture6.ppt
continuous probability distributions.ppt
Displaying quantitative data
Statistics for biology
Lect 3 background mathematics for Data Mining
Normal Distribution
11-Finding-the-Mean-and-Variance-of-the-Sampling-Distribution-of-Means.pptx
Types of Probability Distributions - Statistics II
Lect 3 background mathematics
Ders 1 mean mod media st dev.pptx
Statistics Primer
Pengenalan Ekonometrika
Quantitative Methods in Business - Lecture (2)
Ad

Recently uploaded (20)

PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPTX
IB Computer Science - Internal Assessment.pptx
PDF
Transcultural that can help you someday.
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
PDF
Introduction to Data Science and Data Analysis
PPT
ISS -ESG Data flows What is ESG and HowHow
PDF
annual-report-2024-2025 original latest.
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Database Infoormation System (DBIS).pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
climate analysis of Dhaka ,Banglades.pptx
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
STUDY DESIGN details- Lt Col Maksud (21).pptx
SAP 2 completion done . PRESENTATION.pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
IB Computer Science - Internal Assessment.pptx
Transcultural that can help you someday.
Supervised vs unsupervised machine learning algorithms
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Mega Projects Data Mega Projects Data
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Capcut Pro Crack For PC Latest Version {Fully Unlocked 2025}
Introduction to Data Science and Data Analysis
ISS -ESG Data flows What is ESG and HowHow
annual-report-2024-2025 original latest.
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Clinical guidelines as a resource for EBP(1).pdf
Database Infoormation System (DBIS).pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
climate analysis of Dhaka ,Banglades.pptx
Ad

Introduction to Data Visualization

  • 1. TODAY! Review of Descriptive Statistics •  Mean, Median, Mode, Standard Deviation •  Frequency tables and histograms •  The Normal Distribution (AKA: Gaussian, Bell-curve) BREAK Descriptive Statistics with Excel •  Find Mean, Median, Mode, and Standard Deviation for two different example distributions •  Create histograms for the two different distributions •  Compare them with the normal distribution Descriptive Statistics with Tableau •  Create histogram for an example distribution •  Compare it to the normal distribution BREAK Introduction to Processing •  Create histogram for an example distribution •  Compare it to the normal distribution.
  • 2. But First: -  For those of you interested in tracking search trends, check out Google Trends https://www.google.ca/trends/ -  For those of you interested in tracking your site traffic, look at Google Analytics http://www.google.com/analytics/ -  Unfinished from last class: Geographical Representations using Tableau
  • 3. Descriptive Statistics Explained Using Cows Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   mean = total _cows number _of _ farmers 1) Find the average number of cows per farmer. mean = 1+2+2+3+4+5+5+6+18+20 10 = 6.6 Average (mean) is 6.6 cows per farmer
  • 4. Descriptive Statistics Explained Using Cows Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   2) Find the median number of cows per farmer. The median is the value that separates the lower half of the distribution from the higher half of the distribution. Half of all farmers have more cows than the median, and half of the farmers have fewer cows than the median. The median of this distribution is about 4.5
  • 5. Descriptive Statistics Explained Using Cows Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   3) Find the mode number of cows per farmer. The mode is the value that appears most often in a set of data. In a probability distribution, it is the most probably value in the distribution. A distribution that has two modes is said to be bimodal. The modes of this distribution are 2 and 5
  • 6. Descriptive Statistics Explained Using Cows Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   std = Σ(cows − mean)2 number _of _ farmers 4) Find the standard deviation in the number of cows per farmer. std = (1−6.7)2 +(2−6.7)2 +(2−6.7)2 +(3−6.7)2 +(4−6.7)2 +(5−6.7)2 +(5−6.7)2 +(6−6.7)2 +(18−6.7)2 +(20−6.7)2 10 The standard deviation of this distribution is 6.7 The Standard Deviation is a measure of how spread out numbers are.
  • 7. Frequency Table Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   Number of Cows   Number of Farmers 1   1 2   2 3   1 4   1 5   2 6   1 7   0 8   0 9   0 10   0 11   0 12   0 13   0 14   0 15   0 16   0 17   0 18   1 19   0 20   1
  • 8. Histograms Number of Cows   Number of Farmers 1   1 2   2 3   1 4   1 5   2 6   1 7   0 8   0 9   0 10   0 11   0 12   0 13   0 14   0 15   0 16   0 17   0 18   1 19   0 20   1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 3 2 1 Number of Cows
  • 9. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3 2 1 Histogram of our cow distribution Number of Cows NumberofFarmers Mean (average) Standard deviationStandard deviation Median Note: the number of farmers having a given number of cows is proportional to the probability of a farmer having a given number of cows.
  • 10. The Normal Distribution Also known as the Gaussian function, and as the Bell Curve. This is the most commonly occurring distribution. It the the special case in which mean, median, mode are the same It describes random variations around an average measure (random fluctuations about the mean) Examples: heights of men, heights of women, motion of particles in the air, grade distributions for a course, … P σ = standard deviation µ  = mean x = number of cows
  • 11. The Normal Distribution σ = standard deviation µ  = mean (the average) x = number of cows P(x) x Here x=µ This is the average (the mean) This is also the median. This is also the mode. P
  • 12. The Normal Distribution σ = standard deviation µ  = mean (the average) x = number of cows P
  • 13. Back to cows 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 3 2 1 Number of Cows NumberofFarmers Mean (average) Standard deviationStandard deviation Median Normal distribution for: Mean = 6.6 cows Standard Deviation = 6.7 cows Probability of having this many cows
  • 14. Now – Let’s remove the outliers to see what happens 1 2 3 4 5 6 7 3 2 1 Number of Cows NumberofFarmers Farmer Number of Cows John 1   Bob 2   Sue 2   Mary 3   Jim 4   Sally 5   Fran 5   Pat 6   Bill 18   Tom 20   Number of Cows   Number of Farmers 1   1 2   2 3   1 4   1 5   2 6   1 Mean = 3.5 Median = 3.5 Standard Deviation = 1.8
  • 15. 1 2 3 4 5 6 7 3 2 1 Number of Cows NumberofFarmers Now – Let’s remove the outliers to see what happens Mean (average), also the Median σσ Mean = 3.5 Median = 3.5 Standard Deviation σ = 1.8 Probability of having this many cows The normal distribution for these numbers:
  • 16. Next, we will look at larger distributions and use Excel and Tableau to manipulate the data, draw a histogram, and compare our data to the normal distribution. For our next example, we will look at distribution of class grades. Please download the example data from Canvas If you are running Excel from windows, the histogram feature is already built in: http://www.excel-easy.com/examples/histogram.html But if you are running Excel from a Mac, then you will need this add-on: http://www.analystsoft.com/en/products/statplusmacle/ Here are additional instructions: http://www.gilsmethod.com/how-to-enable-statsplus-in-excel-2011-for-mac During the break, please: -  download the data -  download and install the Excel add-on (if needed) -  download and install Tableau Public (if you haven’t already) During the second break: - download Processing, and download my example code.