SlideShare a Scribd company logo
Statistics
Mean, Median, Mode, Standard
Deviation, Normal and Sampling
Distribution, and Z-Score
Portland Data Science Group
Created by Andrew Ferlitsch
Community Outreach Officer
July, 2017
Mean
• The mean is the average of a set of samples or a
population distribution.
Sum (add) up all the samples
Example:
Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
1 + 2 + 2.5 + 2.5 + 3 + 3 + 3.5
7
µ = 2.5
1
𝑛
𝑖=0
𝑛
𝑥𝑖
Divide the summation by the number of samples
µ =
Symbol for mean (mu)
Median
• The median is the mid-point in a sorted (frequency) distribution of
samples (population).
• Odd Number of Samples – is the sample at the midpoint (center)
• Even Number of Samples – is the average of the two samples at
the midpoint (center)
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 }
= 2.5
midpoint
Eight Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5, 4 }
= ( 2.5 + 3 ) / 2 = 2.75
midpoint
Symbol for median
Discrete vs. Continuous
• The values of a population can be classified as either discrete or
continuous values.
• Discrete – the values in a sample (population) are discrete if the
selected values are from a finite set of values. Examples, a fix set
of values for a categorical variable (US States), or a finite set of
numbers (person’s age in years as whole numbers).
• Continuous – the values in a sample (population) are continuous
if the selected values are from an infinite set of values. Examples,
an infinite number of real values (dollar value in checking account,
or a person’s age as a real number [not rounded]).
Ex., Age = 0, 1, 2 … 99
Checking = { $1, $10, $1046.37, $2,000,300.12, etc … }
Mode
• The mode is the value that occurs must frequently in a set of
samples (population distribution).
On a bar chart, it is the tallest bar.
• For discrete samples, it is the value that occurs most frequently.
• For continuous samples, it is the range that occurs must frequently,
where the values are grouped into ranges.
Samples = { 1, 2, 2, 2, 3, 3, 4, 5, 7 }
Discrete values that occur most frequent
Mode
Steps:
1. Select a Range Size (e.g., 10)
2. Partition the samples into sequential steps of the range (e.g., 10, 20, 30)
3. Assign each sample to a range that it is within.
4. Select the range with the largest number of samples.
Standard Deviation
• The standard deviation is a measure that is used to quantify the
amount of variation or dispersion of a set of samples (population).
1
𝑛
𝑖
𝑛
µ − 𝑥𝑖 2σ =
Symbol for standard deviation (sigma)
Sum (add) up the squared difference between the mean and each sample
Divide the summation by the number of samples
Example:
Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } , µ = 2.5
1
7
𝑖
𝑛
(2.5 – 1)2 + (2.5 – 2)2 + (2.5 – 2.5)2 + (2.5 – 2.5)2 + (2.5 – 3)2 + (2.5 – 3)2 + (2.5 – 3.5)2
1
7
𝑖
𝑛
2.25 + 0.25 + 0 + 0 + 0.25 + 0.25 + 1
1
7
∗ 4= = 4
7
= 0.87
Normal Distribution
• The normal (Gaussian) distribution is a distribution that is
used in probability for the expected random distribution of samples
in a population.
• Based on distributions on natural occurring things.
• 68% of the samples should be within 1 standard deviation of the mean.
• 95% of the samples should be within 2 standard deviations of the mean.
• 99.8% of the samples should be within 3 standard deviations of the mean.
Population vs. Sample
Population
Random Sample
Distribution
µ (mean)
σ (std. dev)
N (size)
Can be any distribution
Parameters
Probability
x̅ (mean)
s (std. dev)
n (size)
Can calculate probability of
sample is in population, when
population is known.
Statistic
Sampling Distribution
Population
Random Samples
( , , , … )
Sampling Distribution
µ = µ (mean)
σ =
σ
𝑛
(std. dev)
A collection of randomly chosen samples
in a population is called a sampling
distribution.
x̅
x̅
x̅
x̅
Each sample has a mean
x̅ x̅ x̅
Plot of Sample Means
Central Limit Theorem
As the number of samples increase,
plot of the sample means will
approach a normal distribution
The mean of a
sampling distribution
will approach the
mean of the
population.
x̅
x̅
Central limit theorem only specifies that the central part of a distribution of
averages will approach a normal distribution as the number of trials goes to infinity.
Z-Score
• The Z-Score is the same as the standard deviation from the mean
in a normal distribution.
Z-Score = 2Z-Score = -2
Arbitrary Z-score (e.g., 1.5)
Z =
(x̅ − µ )
σx̅
µ
Standard Normal Probabilities
• The Probability that a Z-Score for a sample will fall within the area
of a normal distribution can be looked up in the Standard Normal
Probabilities Table - http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf
50% Probability that Sample falls into the area of the distribution
µ
Probability of Sample falling within area of distribution increases with the std. deviation
Robot Example
• Warehouse of Boxes: Mean Weight of 50 lbs, Standard Deviation of 10 lbs.
• Pallet of Boxes: Need to move pallet of 10 boxes of unknown weight.
• Robot: Has lift limit of 560 lbs.
• Question: What is the probability the Robot can lift this pallet.
Population
Weight Distribution of Boxes
µ (mean) = 50 lbs
σ (std. dev) = 10 lbs
Pallet of 10 Boxes
Weight of Boxes Unknown
µ = µ (mean) = 50
σ =
σ
𝑛
(std. dev) = 10 / 𝟏𝟎 = 3.16
Calculate
Std. Dev.
of Pallet
max = 560 lbs / 10 boxes = 56
x̅
x̅
X̅
Z =
(x̅max − µ )
σ
x̅
Maximum mean weight of
10 boxes robot can lift.
=
𝟔
𝟑.𝟏𝟔
= 1.9Standard Normal Probability of 1.9 = 97.13 %
Null Hypothesis
• The Null Hypothesis H0 is the opposite of what one is trying to prove.
H0 = The mean price of a transaction has increased (e.g., µ > $25)
H1 = The mean price of a transaction has not increase (e.g., µ ≤ $25)
• To Prove the Alternate Hypothesis H1 :
• Disprove the Null Hypothesis
• Within a Level of Statistical Significance
• Example: Transaction History has µ = $25 with σ = $5
Transaction Sample has x̅ = $26.50
σ =
σ
𝑛
= 5 / 𝟏𝟎 = 1.58x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟏.𝟓𝟖
= 0.95
x̅
Calculate Std. Dev. of
Transaction
Z-Score of Transaction
Standard Normal Probability of 0.95 = 82.18 %
Confidence
Level
Transaction Sample Size = 10
σ =
σ
𝑛
= 5 / 𝟏𝟎𝟎 = 0.5x̅
Z =
(x̅max − µ )
σ
=
𝟐𝟔.𝟓 −𝟐𝟓
𝟎.𝟓
= 3
x̅
Standard Normal Probability of 3 = 99.87 %
Transaction Sample Size = 100
i.e., nothing changed
Box (and Whisker) Plot
• A method used to visualize the spread of data.
• Split the data into quartiles (quarters).
• A box is drawn around the middle two quartiles (1st and 3rd)
• The whiskers are drawn at the end points.
0
Data Values
(x) 2nd quartile (median)
1st quartile (median of lower half)
3rd quartile (median of upper half)
Box
(IQR)
Lowest value
Highest valueWhisker
Whisker
1. Calculate the median
of the entire dataset,
Split the dataset into halves.
2. Calculate the median
of the top and lower half
of the dataset, splitting them
Into quarters.
Box (and Whisker) Plot - Outliers
• A variation of a box plot to show outliers.
• The whiskers are replaced with an inner and outer fence at
1.5 x IQR (inner) and 3 x IQR (outer).
• Values between 1.5 and 3 IQR are suspected outliers (white).
• Values outside of 3 IQR are outliers (black).
0
Data Values
(x)
Inner Fence (1.5 IQR)
Box
(IQR)
Inner Fence (1.5 IQR)
Outer Fence (3 IQR)
Outlier
Suspected
Outliers
Outlier

More Related Content

PPTX
Basic Statistical Concepts & Decision-Making
PPTX
coefficient variation
PDF
Linear Programming (graphical method)
PPTX
Hypothesis testing
PPT
Estimation Of The Box Cox Transformation Parameter And Application To Hydrolo...
PPTX
Normal Curve
PPTX
Point and Interval Estimation
PPT
Normal Probability Distribution
Basic Statistical Concepts & Decision-Making
coefficient variation
Linear Programming (graphical method)
Hypothesis testing
Estimation Of The Box Cox Transformation Parameter And Application To Hydrolo...
Normal Curve
Point and Interval Estimation
Normal Probability Distribution

What's hot (20)

PPTX
The Normal distribution
PPTX
Point estimation
PPTX
Analysis of variance (ANOVA)
PPTX
Hypothesis testing
PDF
Hypothesis testing; z test, t-test. f-test
PPTX
T distribution
PPT
Regression analysis ppt
PPT
Hypothesis Testing in Six Sigma
PPTX
Testing of hypothesis
PPTX
Statistical inference: Estimation
PDF
Statistical Estimation and Testing Lecture Notes.pdf
PPTX
Goodness of fit (ppt)
PPT
Simple lin regress_inference
DOCX
Basics of statistical notation
PPTX
Analysis of variance
PPTX
Correlation and Regression
PPTX
Big-M Method Presentation
PPT
Simplex Method
PPT
Introduction to t-tests (statistics)
PPT
Markov Chains
The Normal distribution
Point estimation
Analysis of variance (ANOVA)
Hypothesis testing
Hypothesis testing; z test, t-test. f-test
T distribution
Regression analysis ppt
Hypothesis Testing in Six Sigma
Testing of hypothesis
Statistical inference: Estimation
Statistical Estimation and Testing Lecture Notes.pdf
Goodness of fit (ppt)
Simple lin regress_inference
Basics of statistical notation
Analysis of variance
Correlation and Regression
Big-M Method Presentation
Simplex Method
Introduction to t-tests (statistics)
Markov Chains
Ad

Similar to Statistics - Basics (20)

PPTX
Statistics (GE 4 CLASS).pptx
PDF
Lecture 01 probability distributions
PPTX
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
PPT
ERM-4b-finalERM-4b-finaERM-4b-finaERM-4b-fina.ppt
PPT
statistics
PDF
1.1 course notes inferential statistics
PPTX
Normal distribution
PPTX
probability and random single normal distribution presentation 017 (1).pptx
PPTX
probability and random single normal distribution presentation 017 (1).pptx
PDF
C2 st lecture 10 basic statistics and the z test handout
PPT
Lect 2 basic ppt
PDF
CHapter two desctriptive biostatistics.pdf
PPTX
normal curve distribution biostatics course.pptx
PPTX
Normal distribution
PPTX
Descriptive Stat numerical_-112700052.pptx
PPT
review of statistics for schools and colleges.ppt
PPTX
Review & Hypothesis Testing
PPT
estimation
PPT
Estimation
PPTX
Normal distribution
Statistics (GE 4 CLASS).pptx
Lecture 01 probability distributions
Descriptive Statistics: Mean, Median Mode and Standard Deviation.
ERM-4b-finalERM-4b-finaERM-4b-finaERM-4b-fina.ppt
statistics
1.1 course notes inferential statistics
Normal distribution
probability and random single normal distribution presentation 017 (1).pptx
probability and random single normal distribution presentation 017 (1).pptx
C2 st lecture 10 basic statistics and the z test handout
Lect 2 basic ppt
CHapter two desctriptive biostatistics.pdf
normal curve distribution biostatics course.pptx
Normal distribution
Descriptive Stat numerical_-112700052.pptx
review of statistics for schools and colleges.ppt
Review & Hypothesis Testing
estimation
Estimation
Normal distribution
Ad

More from Andrew Ferlitsch (20)

PPTX
AI - Intelligent Agents
PPTX
Pareto Principle Applied to QA
PPTX
Whiteboarding Coding Challenges in Python
PPTX
Object Oriented Programming Principles
PPTX
Python - OOP Programming
PPTX
Python - Installing and Using Python and Jupyter Notepad
PPTX
Natural Language Processing - Groupings (Associations) Generation
PPTX
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
PPTX
Machine Learning - Introduction to Recurrent Neural Networks
PPTX
Machine Learning - Introduction to Convolutional Neural Networks
PPTX
Machine Learning - Introduction to Neural Networks
PPTX
Python - Numpy/Pandas/Matplot Machine Learning Libraries
PPTX
Machine Learning - Accuracy and Confusion Matrix
PPTX
Machine Learning - Ensemble Methods
PPTX
ML - Multiple Linear Regression
PPTX
ML - Simple Linear Regression
PPTX
Machine Learning - Dummy Variable Conversion
PPTX
Machine Learning - Splitting Datasets
PPTX
Machine Learning - Dataset Preparation
PPTX
Machine Learning - Introduction to Tensorflow
AI - Intelligent Agents
Pareto Principle Applied to QA
Whiteboarding Coding Challenges in Python
Object Oriented Programming Principles
Python - OOP Programming
Python - Installing and Using Python and Jupyter Notepad
Natural Language Processing - Groupings (Associations) Generation
Natural Language Provessing - Handling Narrarive Fields in Datasets for Class...
Machine Learning - Introduction to Recurrent Neural Networks
Machine Learning - Introduction to Convolutional Neural Networks
Machine Learning - Introduction to Neural Networks
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Machine Learning - Accuracy and Confusion Matrix
Machine Learning - Ensemble Methods
ML - Multiple Linear Regression
ML - Simple Linear Regression
Machine Learning - Dummy Variable Conversion
Machine Learning - Splitting Datasets
Machine Learning - Dataset Preparation
Machine Learning - Introduction to Tensorflow

Recently uploaded (20)

PPTX
TLE Review Electricity (Electricity).pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
SOPHOS-XG Firewall Administrator PPT.pptx
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlocking AI with Model Context Protocol (MCP)
TLE Review Electricity (Electricity).pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
MIND Revenue Release Quarter 2 2025 Press Release
SOPHOS-XG Firewall Administrator PPT.pptx
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Advanced methodologies resolving dimensionality complications for autism neur...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Univ-Connecticut-ChatGPT-Presentaion.pdf
OMC Textile Division Presentation 2021.pptx
Group 1 Presentation -Planning and Decision Making .pptx
Encapsulation_ Review paper, used for researhc scholars
Digital-Transformation-Roadmap-for-Companies.pptx
A comparative analysis of optical character recognition models for extracting...
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Heart disease approach using modified random forest and particle swarm optimi...
Assigned Numbers - 2025 - Bluetooth® Document
Unlocking AI with Model Context Protocol (MCP)

Statistics - Basics

  • 1. Statistics Mean, Median, Mode, Standard Deviation, Normal and Sampling Distribution, and Z-Score Portland Data Science Group Created by Andrew Ferlitsch Community Outreach Officer July, 2017
  • 2. Mean • The mean is the average of a set of samples or a population distribution. Sum (add) up all the samples Example: Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } 1 + 2 + 2.5 + 2.5 + 3 + 3 + 3.5 7 µ = 2.5 1 𝑛 𝑖=0 𝑛 𝑥𝑖 Divide the summation by the number of samples µ = Symbol for mean (mu)
  • 3. Median • The median is the mid-point in a sorted (frequency) distribution of samples (population). • Odd Number of Samples – is the sample at the midpoint (center) • Even Number of Samples – is the average of the two samples at the midpoint (center) Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } = 2.5 midpoint Eight Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5, 4 } = ( 2.5 + 3 ) / 2 = 2.75 midpoint Symbol for median
  • 4. Discrete vs. Continuous • The values of a population can be classified as either discrete or continuous values. • Discrete – the values in a sample (population) are discrete if the selected values are from a finite set of values. Examples, a fix set of values for a categorical variable (US States), or a finite set of numbers (person’s age in years as whole numbers). • Continuous – the values in a sample (population) are continuous if the selected values are from an infinite set of values. Examples, an infinite number of real values (dollar value in checking account, or a person’s age as a real number [not rounded]). Ex., Age = 0, 1, 2 … 99 Checking = { $1, $10, $1046.37, $2,000,300.12, etc … }
  • 5. Mode • The mode is the value that occurs must frequently in a set of samples (population distribution). On a bar chart, it is the tallest bar. • For discrete samples, it is the value that occurs most frequently. • For continuous samples, it is the range that occurs must frequently, where the values are grouped into ranges. Samples = { 1, 2, 2, 2, 3, 3, 4, 5, 7 } Discrete values that occur most frequent Mode Steps: 1. Select a Range Size (e.g., 10) 2. Partition the samples into sequential steps of the range (e.g., 10, 20, 30) 3. Assign each sample to a range that it is within. 4. Select the range with the largest number of samples.
  • 6. Standard Deviation • The standard deviation is a measure that is used to quantify the amount of variation or dispersion of a set of samples (population). 1 𝑛 𝑖 𝑛 µ − 𝑥𝑖 2σ = Symbol for standard deviation (sigma) Sum (add) up the squared difference between the mean and each sample Divide the summation by the number of samples Example: Seven Samples = { 1, 2, 2.5, 2.5, 3, 3, 3.5 } , µ = 2.5 1 7 𝑖 𝑛 (2.5 – 1)2 + (2.5 – 2)2 + (2.5 – 2.5)2 + (2.5 – 2.5)2 + (2.5 – 3)2 + (2.5 – 3)2 + (2.5 – 3.5)2 1 7 𝑖 𝑛 2.25 + 0.25 + 0 + 0 + 0.25 + 0.25 + 1 1 7 ∗ 4= = 4 7 = 0.87
  • 7. Normal Distribution • The normal (Gaussian) distribution is a distribution that is used in probability for the expected random distribution of samples in a population. • Based on distributions on natural occurring things. • 68% of the samples should be within 1 standard deviation of the mean. • 95% of the samples should be within 2 standard deviations of the mean. • 99.8% of the samples should be within 3 standard deviations of the mean.
  • 8. Population vs. Sample Population Random Sample Distribution µ (mean) σ (std. dev) N (size) Can be any distribution Parameters Probability x̅ (mean) s (std. dev) n (size) Can calculate probability of sample is in population, when population is known. Statistic
  • 9. Sampling Distribution Population Random Samples ( , , , … ) Sampling Distribution µ = µ (mean) σ = σ 𝑛 (std. dev) A collection of randomly chosen samples in a population is called a sampling distribution. x̅ x̅ x̅ x̅ Each sample has a mean x̅ x̅ x̅ Plot of Sample Means Central Limit Theorem As the number of samples increase, plot of the sample means will approach a normal distribution The mean of a sampling distribution will approach the mean of the population. x̅ x̅ Central limit theorem only specifies that the central part of a distribution of averages will approach a normal distribution as the number of trials goes to infinity.
  • 10. Z-Score • The Z-Score is the same as the standard deviation from the mean in a normal distribution. Z-Score = 2Z-Score = -2 Arbitrary Z-score (e.g., 1.5) Z = (x̅ − µ ) σx̅ µ
  • 11. Standard Normal Probabilities • The Probability that a Z-Score for a sample will fall within the area of a normal distribution can be looked up in the Standard Normal Probabilities Table - http://www.stat.ufl.edu/~athienit/Tables/Ztable.pdf 50% Probability that Sample falls into the area of the distribution µ Probability of Sample falling within area of distribution increases with the std. deviation
  • 12. Robot Example • Warehouse of Boxes: Mean Weight of 50 lbs, Standard Deviation of 10 lbs. • Pallet of Boxes: Need to move pallet of 10 boxes of unknown weight. • Robot: Has lift limit of 560 lbs. • Question: What is the probability the Robot can lift this pallet. Population Weight Distribution of Boxes µ (mean) = 50 lbs σ (std. dev) = 10 lbs Pallet of 10 Boxes Weight of Boxes Unknown µ = µ (mean) = 50 σ = σ 𝑛 (std. dev) = 10 / 𝟏𝟎 = 3.16 Calculate Std. Dev. of Pallet max = 560 lbs / 10 boxes = 56 x̅ x̅ X̅ Z = (x̅max − µ ) σ x̅ Maximum mean weight of 10 boxes robot can lift. = 𝟔 𝟑.𝟏𝟔 = 1.9Standard Normal Probability of 1.9 = 97.13 %
  • 13. Null Hypothesis • The Null Hypothesis H0 is the opposite of what one is trying to prove. H0 = The mean price of a transaction has increased (e.g., µ > $25) H1 = The mean price of a transaction has not increase (e.g., µ ≤ $25) • To Prove the Alternate Hypothesis H1 : • Disprove the Null Hypothesis • Within a Level of Statistical Significance • Example: Transaction History has µ = $25 with σ = $5 Transaction Sample has x̅ = $26.50 σ = σ 𝑛 = 5 / 𝟏𝟎 = 1.58x̅ Z = (x̅max − µ ) σ = 𝟐𝟔.𝟓 −𝟐𝟓 𝟏.𝟓𝟖 = 0.95 x̅ Calculate Std. Dev. of Transaction Z-Score of Transaction Standard Normal Probability of 0.95 = 82.18 % Confidence Level Transaction Sample Size = 10 σ = σ 𝑛 = 5 / 𝟏𝟎𝟎 = 0.5x̅ Z = (x̅max − µ ) σ = 𝟐𝟔.𝟓 −𝟐𝟓 𝟎.𝟓 = 3 x̅ Standard Normal Probability of 3 = 99.87 % Transaction Sample Size = 100 i.e., nothing changed
  • 14. Box (and Whisker) Plot • A method used to visualize the spread of data. • Split the data into quartiles (quarters). • A box is drawn around the middle two quartiles (1st and 3rd) • The whiskers are drawn at the end points. 0 Data Values (x) 2nd quartile (median) 1st quartile (median of lower half) 3rd quartile (median of upper half) Box (IQR) Lowest value Highest valueWhisker Whisker 1. Calculate the median of the entire dataset, Split the dataset into halves. 2. Calculate the median of the top and lower half of the dataset, splitting them Into quarters.
  • 15. Box (and Whisker) Plot - Outliers • A variation of a box plot to show outliers. • The whiskers are replaced with an inner and outer fence at 1.5 x IQR (inner) and 3 x IQR (outer). • Values between 1.5 and 3 IQR are suspected outliers (white). • Values outside of 3 IQR are outliers (black). 0 Data Values (x) Inner Fence (1.5 IQR) Box (IQR) Inner Fence (1.5 IQR) Outer Fence (3 IQR) Outlier Suspected Outliers Outlier