SlideShare a Scribd company logo
9/3/2013
1
Lab #1 Basic Statistics
EVEN 3321
• Definition of STATISTICS
• 1: a branch of mathematics dealing with the collection, analysis,
interpretation, and presentation of masses of numerical data
• 2: a collection of quantitative data
• Origin of STATISTICS: German Statistik study of political facts and figures,
from New Latin statisticus of politics, from Latin status state
• First Known Use: 1770
• Rhymes with STATISTICS: ballistics, ekistics, linguistics, logistics, patristics,
stylistics
• http://www.merriam-webster.com/dictionary/statistics
Statistics
9/3/2013
2
Why is this important?
Environmental Sampling
∗ Need to know relationships
between quantities
∗ Parameters (examples):
PH
Conductivity
Particle concentration
Amount of a chemical or other
material in air, water, soil
Bacteria counts
Instrumentation
∗ PH Meter
∗ Micro-balance
∗ Gas Chromatography
∗ Ozone monitor
∗ ICPMS
∗ TOC
Morning Session of FE Exam
Engineering Probability and Statistics Topic Area
The following subtopics are covered in the Engineering Probability and
Statistics portion of the FE Examination:
A. Measures of central tendencies and dispersions (e.g., mean, mode, standard
deviation)
B. Probability distributions (e.g., discrete, continuous, normal, binomial)
C. Conditional probabilities
D. Estimation (e.g., point, confidence intervals) for a single mean
E. Regression and curve fitting
F. Expected value (weighted average) in decision-making
G. Hypothesis testing
The Engineering Probability and Statistics portion covers approximately 7% of
the morning session test content.
Reference: http://www.feexam.org/ProbStats.html
FE Exam
9/3/2013
3
• “Sample” versus “population”
• Random variables
• Population mean (μ), variance (σ2) & standard
deviation (s), kurtosis, skewness
• Also expressed as: Sample mean (y), variance (s2), and
standard deviation (s)
• Frequency distribution/histogram (relates to skewness)
• Boxplots
• Precision and accuracy, Confidence interval
• Linear regression
Some Key Ideas
• It is impossible to determine the concentrations of a
given pollutant at every possible location at a site.
• Statistical methods allow us to use a small number of
samples to make inferences about the entire site.
• A single sample is a subset of all the possible samples (n)
that could be taken from a given site.
–Multivariate data sets have several data values
generated for each location and time.
–As opposed to univariate data sets.
• The hypothetical set of all possible values is referred to
as the population.
Key Ideas: continued
9/3/2013
4
• Number of samples collected is the sample size (n).
• A random variable is a variable that is random.
• Experimental observations are considered random
variables.
• Experimental errors
Key Ideas continued
∗ Experimental measurements are always imperfect:
∗ Measured value = true value ± error
∗ The error is a combined measure of the inherent variation
of the phenomenon we are observing and the numerous
factors that interfere with the measurement.
∗ Any quantitative result should be reported with an
accompanying estimate of its error.
∗ Systematic errors (or determinate errors) can be traced to
their source (e.g., improper sampling or analytical
methods).
∗ Random errors (or indeterminate errors) are random
fluctuations and cannot be identified or corrected for.
Experimental Errors
9/3/2013
5
Example: Population versus Sample
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Ozone[ppb]
April 2013
0
10
20
30
40
50
60
1 2 3 4 5 6 7 8 9 101112131415161718192021222324
Ozone[ppb]
24 Hours
• Accuracy is the degree of agreement of a measured
value with the true or expected value.
• Precision is the degree of mutual agreement among
individual measurements (x1, x2, …xn) made under the
same conditions.
• Precision measures the variation among measurements
and may be expressed as sample standard deviation (s):
Accuracy and Precision
( )
2
1
1
n
i
i
y y
s
n
=
−
=
−
∑
9/3/2013
6
Accuracy and Precision
Example: Five analysts were each given five samples that were
prepared to have a known concentration of 8.0 mg/L. The results are
summarized in the figure below.
Accuracy and Precision
9/3/2013
7
• A random variable, y is characterized by:
• A set of possible values.
An associated set of relative likelihoods (this is called a
probability distribution).
• Random variables can be discrete or continuous.
e.g., a die toss is a discrete random variable.
e.g., ozone conc. is a continuous random variable.
• Experimental observations are considered random
variables.
Random Variables
• When we sample the environment, the sample values are
known, but not the population values.
• For a sample size n, the number of times a specific value
occurs is call the frequency.
• The frequency divided by the sample size n is the relative
frequency.
• The relative frequency is an estimate of the probability
that given value occurs in the population.
• If we compute the relative frequencies for each possible
value of a random variable, we have an estimate of the
probability distribution of the random variable (see next
slide).
Frequency Distribution
9/3/2013
8
• For continuous random variables, we can group the
measured values into intervals (or “bins”).
• Plotting the number of values measured in each interval
gives a frequency histogram (see next slide).
• Plotting the total number of measured values in or below
a given interval gives a cumulative frequency distribution
(see next slide).
• To obtain the relative frequency, the number of measured
values falling within a given interval is divided by the
sample size n.
• The shape of a histogram can allow us to infer the
distribution of the population.
Continuous Frequency Distributions
Histograms
Normal (Gaussian) and skewed
9/3/2013
9
Histograms (cont.)
Bimodal and Uniform
∗ In general, we do not know the mean and standard
deviation of the underlying population.
∗ The population mean can be estimated from the
sample mean and sample standard deviation s:
∗ Note that in environmental monitoring, the standard
deviation s for the sample depends on the amount of
sample collected
Sample Mean and Standard Deviation
1
1 n
i
i
y x
n =
= ∑ ( )
2
1
1
n
i
i
y y
s
n
=
−
=
−
∑
9/3/2013
10
In many situations, environmental data involves working with a small sample set.
Also known as Bessel’s correction or unbiased estimate.
http://en.wikipedia.org/wiki/Bessel%27s_correction
Another way of looking at it:
The POPULATION VARIANCE (σ2) is a PARAMETER of the population.
s2 The SAMPLE VARIANCE is a STATISTIC of the sample.
We use the sample statistic to estimate the population parameter.
The sample variance s2 is an estimate of the population variance σ2.
Note: Excel 2010 has a couple functions for standard deviation. One for population (=STD.P(range))
and the other based on sample (=STD.S(range)).
Short video:
https://www.khanacademy.org/math/probability/descriptive-
statistics/variance_std_deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased-
sample-variance
A note about (n-1)
• Most random variables have two important characteristic
values: the mean (μ) and the variance (s2).
• Square-root of the variance is the standard deviation (s).
• The mean is also called the expected value of the random
variable xi.
• The mean represents balance point on graph.
• The variance & standard deviation both quantify how
much the possible values disperse away from the mean.
• For a normal distribution, 68% of values lies within µ ±
σ, 95% within µ ± 2σ, and 99.7% within µ ± 3σ.
Mean, Variance, Standard Deviation
9/3/2013
11
Mean, Median, Mode
∗ Covariance is a simplistic test to determine whether the
data can be characterized by a normal distribution. The
formula for covariance is the standard deviation divided by
the mean. The closer the ratio is to zero, the better the
possibility that the data has a normal distribution. A
number greater than unity indicates a non- normal
distribution.
∗ Skewness is a measure of symmetry or lack of it and can be
normal, negative, or positive.
∗ Kurtosis is a measure whether the data are flat relative to a
normal distribution.
Covariance, Skewness, Kurtosis
9/3/2013
12
Skewness/Kurtosis
Box-and-Whisker Plot
9/3/2013
13
Normal Distribution at 68%, 95%, 99%
The value is the probability that a random variable will
fall in the upper or lower tail of a probability
distribution.
For example, α = 0.05 implies that there is a 0.95
probability that a random variable will not fall in the
upper or lower tail of the probability distribution.
Statistical tables of probability distributions (e.g.,
normal and “student t”) list probabilities that a random
variable will fall in the upper tail only.
α Values for Probability Distributions
9/3/2013
14
• We typically want to determine a confidence interval
for which we are 90% confident that a random
variable will not fall in either tail.
• In this case, we use an α/2 = 0.05.
• Similarly, to determine 95% and 99% confidence
intervals, we would use α/2 = 0.025 and 0.005,
respectively.
α values and confidence intervals
= ±
√
= ± ( )( )
Regression analysis (dependency) – an analysis focused
on the degree to which one variable (the dependent
variable) is dependent upon one or more other
variables (independent variable).
(examples: ozone vs. temperature, bacteria counts
versus chlorination treatment)
Correlation analysis – neither variable is identified as
more important than the other, but the investigator is
interested in their interdependence or joint behavior
NOTE: Correlation or association is not causation.
Linear Regression
9/3/2013
15
Linear Regression Examples
• Slope formula: y = mx + b
• coefficient of determination, R2 is used in the context of statistical models whose main
purpose is the prediction of future outcomes on the basis of other related
information. It is the proportion of variability in a data set that is accounted for by the
statistical model. It provides a measure of how well future outcomes are likely to be
predicted by the model.
R2 does NOT tell whether:
the independent variables are a true cause of the changes in the
dependent variable
omitted-variable bias exists
the correct regression was used
the most appropriate set of independent variables has been chosen
there is co-linearity present in the data
the model might be improved by using transformed versions of the
existing set of independent variables
R2, Slope Equation
9/3/2013
16
Statistics Excel 2010
Summary Statistics
http://academic.brooklyn.cuny.edu/economic/friedman/
descstatexcel.htm
Column1
Mean 74.92857143
Standard Error 5.013678308
Median 78.5
Mode 80
Standard Deviation 18.75946647
Sample Variance 351.9175824
Kurtosis 1.923164749
Skewness -1.31355395
Range 71
Minimum 29
Maximum 100
Sum 1049
Count 14
Confidence Level(95.0%) 10.83139138
Ozone April 2013
Histogram and Summary Statistics
Mean 35.48948
Median 35
Mode 35
Standard Dev 10.72231
Sample Variance 114.968
Kurtosis -0.20548
Skewness 0.146677
Minimum 2
Maximum 68
Sum 25304
Count 713
9/3/2013
17
April 2013 Ozone
Box-Whisker
Population size: 713
Median: 35
Minimum: 2
Maximum: 68
First quartile: 28
Third quartile: 43
Interquartile Range: 15
Outliers: 2 5 5 5 6 8 10 11 11 68 65 64 62 62 61
61 60 59 58 58 58
∗ Access TCEQ web site data.
∗ Importing files into Excel and Matlab.
∗ Using Excel for statistical work, Matlab for statistics.
Plotting histograms.
∗ Read the papers posted on Blackboard: Statistics for
Analysis of Experimental Data, Errors and Limitation
Associated with Regression, and Why we divide by n-
1.
∗ Lab will be assigned.
Lab Thursday
9/3/2013
18
Video
https://www.khanacademy.org/math/probability
Statistics Handbook
http://www.itl.nist.gov/div898/handbook/index.htm
Elementary Statistics
https://www.udacity.com/course/st095
Self Study/Supplemental

More Related Content

PPTX
Analysis of variance (anova)
PPT
Ch7 Analysis of Variance (ANOVA)
PPT
PDF
Assumptions of ANOVA
PPTX
ANOVA TEST by shafeek
PPTX
Comparing means
PPTX
Analysis of Variance
PPTX
ANOVA in R by Aman Chauhan
Analysis of variance (anova)
Ch7 Analysis of Variance (ANOVA)
Assumptions of ANOVA
ANOVA TEST by shafeek
Comparing means
Analysis of Variance
ANOVA in R by Aman Chauhan

What's hot (14)

PPTX
The comparison of two populations
PDF
Analysis of Variance
PPTX
Anova ppt
PPTX
Inferential statistics quantitative data - anova
PDF
Research method ch08 statistical methods 2 anova
PPTX
Analysis of Variance - Meaning and Types
PPT
Introduction to ANOVAs
PPTX
7 anova chi square test
PPT
PPT
Analysis of variance
PPTX
Application of ANOVA
PPT
One way anova
PPTX
Regression vs ANOVA
The comparison of two populations
Analysis of Variance
Anova ppt
Inferential statistics quantitative data - anova
Research method ch08 statistical methods 2 anova
Analysis of Variance - Meaning and Types
Introduction to ANOVAs
7 anova chi square test
Analysis of variance
Application of ANOVA
One way anova
Regression vs ANOVA
Ad

Viewers also liked (19)

PDF
Using an Agilent 6890 GCMS with Entech Canister Sampler
PDF
Lab Batch Reactors
PDF
Use lab safety-dwm
PDF
Site Operation Manual for a Typical Air Monitoring Site
PDF
Wiring a pH and Conductivity Probe to a Zeno3200
PDF
Use electsftylab-dwm
PDF
Using a Zeno 3200
DOC
2014 environmental engineeringlabmanual
PDF
Kpsi User Guide Model 500
PDF
Basic SOP for Agilent 6890/5973 system
PPT
Calibration dasibi-ozone
PPT
Performance checks dasibi ozone-v2
PDF
Method to expose rats to ozone-updated2014
PDF
2 lab qaqc-fall2013
PDF
X-Series ICPMS User Guide
PDF
Criteria Air Pollutants and Ambient Air Monitoring
PPTX
inductively coupled plasma ICP techniques & applications
PDF
Improve Analysis Precision for ICP-OES and ICP-MS for Environmental and Geolo...
PDF
10 Things your Audience Hates About your Presentation
Using an Agilent 6890 GCMS with Entech Canister Sampler
Lab Batch Reactors
Use lab safety-dwm
Site Operation Manual for a Typical Air Monitoring Site
Wiring a pH and Conductivity Probe to a Zeno3200
Use electsftylab-dwm
Using a Zeno 3200
2014 environmental engineeringlabmanual
Kpsi User Guide Model 500
Basic SOP for Agilent 6890/5973 system
Calibration dasibi-ozone
Performance checks dasibi ozone-v2
Method to expose rats to ozone-updated2014
2 lab qaqc-fall2013
X-Series ICPMS User Guide
Criteria Air Pollutants and Ambient Air Monitoring
inductively coupled plasma ICP techniques & applications
Improve Analysis Precision for ICP-OES and ICP-MS for Environmental and Geolo...
10 Things your Audience Hates About your Presentation
Ad

Similar to 1 lab basicstatisticsfall2013 (20)

PDF
Chapter 6
PDF
Basic Statistics Concepts
PDF
Basic statistics concepts
PDF
Statistics-Defined.pdf
PPT
PPT
lecture-2.ppt
PPTX
Presentation1
PDF
1Basic biostatistics.pdf
PDF
Introduction to biometry for sss-Zn18.pdf
PDF
Introduction to statistics44444-Zn18.pdf
PPTX
ststs nw.pptx
PDF
1.0 Descriptive statistics.pdf
PPTX
Environmental statistics
PPT
A basic Introduction To Statistics with examples
PDF
L1 statistics
PPT
Review of Chapters 1-5.ppt
PPTX
Basic statistics for pharmaceutical (Part 1)
PPTX
chapter 1.pptx
PPT
Advanced statistics
DOC
Chapter 7 sampling distributions
Chapter 6
Basic Statistics Concepts
Basic statistics concepts
Statistics-Defined.pdf
lecture-2.ppt
Presentation1
1Basic biostatistics.pdf
Introduction to biometry for sss-Zn18.pdf
Introduction to statistics44444-Zn18.pdf
ststs nw.pptx
1.0 Descriptive statistics.pdf
Environmental statistics
A basic Introduction To Statistics with examples
L1 statistics
Review of Chapters 1-5.ppt
Basic statistics for pharmaceutical (Part 1)
chapter 1.pptx
Advanced statistics
Chapter 7 sampling distributions

More from TAMUK (7)

PDF
Quartus_19.1_stand_ed_installingonPC.pdf
PDF
Dasibi 1008 Ozone Monitor Manual
PPT
Voltage Drop Calculator for Street Lighting
PDF
Lab colloid chemistry & turbidity
PDF
Settingupgsm1208modem
PDF
4 lab-tss-tds-vss
PDF
3 lab waterqualityparameters
Quartus_19.1_stand_ed_installingonPC.pdf
Dasibi 1008 Ozone Monitor Manual
Voltage Drop Calculator for Street Lighting
Lab colloid chemistry & turbidity
Settingupgsm1208modem
4 lab-tss-tds-vss
3 lab waterqualityparameters

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Modernizing your data center with Dell and AMD
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PDF
Advanced Soft Computing BINUS July 2025.pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Empathic Computing: Creating Shared Understanding
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Modernizing your data center with Dell and AMD
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
The AUB Centre for AI in Media Proposal.docx
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
CIFDAQ's Market Insight: SEC Turns Pro Crypto
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
Advanced Soft Computing BINUS July 2025.pdf
Unlocking AI with Model Context Protocol (MCP)
Dropbox Q2 2025 Financial Results & Investor Presentation
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Empathic Computing: Creating Shared Understanding
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx

1 lab basicstatisticsfall2013

  • 1. 9/3/2013 1 Lab #1 Basic Statistics EVEN 3321 • Definition of STATISTICS • 1: a branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data • 2: a collection of quantitative data • Origin of STATISTICS: German Statistik study of political facts and figures, from New Latin statisticus of politics, from Latin status state • First Known Use: 1770 • Rhymes with STATISTICS: ballistics, ekistics, linguistics, logistics, patristics, stylistics • http://www.merriam-webster.com/dictionary/statistics Statistics
  • 2. 9/3/2013 2 Why is this important? Environmental Sampling ∗ Need to know relationships between quantities ∗ Parameters (examples): PH Conductivity Particle concentration Amount of a chemical or other material in air, water, soil Bacteria counts Instrumentation ∗ PH Meter ∗ Micro-balance ∗ Gas Chromatography ∗ Ozone monitor ∗ ICPMS ∗ TOC Morning Session of FE Exam Engineering Probability and Statistics Topic Area The following subtopics are covered in the Engineering Probability and Statistics portion of the FE Examination: A. Measures of central tendencies and dispersions (e.g., mean, mode, standard deviation) B. Probability distributions (e.g., discrete, continuous, normal, binomial) C. Conditional probabilities D. Estimation (e.g., point, confidence intervals) for a single mean E. Regression and curve fitting F. Expected value (weighted average) in decision-making G. Hypothesis testing The Engineering Probability and Statistics portion covers approximately 7% of the morning session test content. Reference: http://www.feexam.org/ProbStats.html FE Exam
  • 3. 9/3/2013 3 • “Sample” versus “population” • Random variables • Population mean (μ), variance (σ2) & standard deviation (s), kurtosis, skewness • Also expressed as: Sample mean (y), variance (s2), and standard deviation (s) • Frequency distribution/histogram (relates to skewness) • Boxplots • Precision and accuracy, Confidence interval • Linear regression Some Key Ideas • It is impossible to determine the concentrations of a given pollutant at every possible location at a site. • Statistical methods allow us to use a small number of samples to make inferences about the entire site. • A single sample is a subset of all the possible samples (n) that could be taken from a given site. –Multivariate data sets have several data values generated for each location and time. –As opposed to univariate data sets. • The hypothetical set of all possible values is referred to as the population. Key Ideas: continued
  • 4. 9/3/2013 4 • Number of samples collected is the sample size (n). • A random variable is a variable that is random. • Experimental observations are considered random variables. • Experimental errors Key Ideas continued ∗ Experimental measurements are always imperfect: ∗ Measured value = true value ± error ∗ The error is a combined measure of the inherent variation of the phenomenon we are observing and the numerous factors that interfere with the measurement. ∗ Any quantitative result should be reported with an accompanying estimate of its error. ∗ Systematic errors (or determinate errors) can be traced to their source (e.g., improper sampling or analytical methods). ∗ Random errors (or indeterminate errors) are random fluctuations and cannot be identified or corrected for. Experimental Errors
  • 5. 9/3/2013 5 Example: Population versus Sample 0 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Ozone[ppb] April 2013 0 10 20 30 40 50 60 1 2 3 4 5 6 7 8 9 101112131415161718192021222324 Ozone[ppb] 24 Hours • Accuracy is the degree of agreement of a measured value with the true or expected value. • Precision is the degree of mutual agreement among individual measurements (x1, x2, …xn) made under the same conditions. • Precision measures the variation among measurements and may be expressed as sample standard deviation (s): Accuracy and Precision ( ) 2 1 1 n i i y y s n = − = − ∑
  • 6. 9/3/2013 6 Accuracy and Precision Example: Five analysts were each given five samples that were prepared to have a known concentration of 8.0 mg/L. The results are summarized in the figure below. Accuracy and Precision
  • 7. 9/3/2013 7 • A random variable, y is characterized by: • A set of possible values. An associated set of relative likelihoods (this is called a probability distribution). • Random variables can be discrete or continuous. e.g., a die toss is a discrete random variable. e.g., ozone conc. is a continuous random variable. • Experimental observations are considered random variables. Random Variables • When we sample the environment, the sample values are known, but not the population values. • For a sample size n, the number of times a specific value occurs is call the frequency. • The frequency divided by the sample size n is the relative frequency. • The relative frequency is an estimate of the probability that given value occurs in the population. • If we compute the relative frequencies for each possible value of a random variable, we have an estimate of the probability distribution of the random variable (see next slide). Frequency Distribution
  • 8. 9/3/2013 8 • For continuous random variables, we can group the measured values into intervals (or “bins”). • Plotting the number of values measured in each interval gives a frequency histogram (see next slide). • Plotting the total number of measured values in or below a given interval gives a cumulative frequency distribution (see next slide). • To obtain the relative frequency, the number of measured values falling within a given interval is divided by the sample size n. • The shape of a histogram can allow us to infer the distribution of the population. Continuous Frequency Distributions Histograms Normal (Gaussian) and skewed
  • 9. 9/3/2013 9 Histograms (cont.) Bimodal and Uniform ∗ In general, we do not know the mean and standard deviation of the underlying population. ∗ The population mean can be estimated from the sample mean and sample standard deviation s: ∗ Note that in environmental monitoring, the standard deviation s for the sample depends on the amount of sample collected Sample Mean and Standard Deviation 1 1 n i i y x n = = ∑ ( ) 2 1 1 n i i y y s n = − = − ∑
  • 10. 9/3/2013 10 In many situations, environmental data involves working with a small sample set. Also known as Bessel’s correction or unbiased estimate. http://en.wikipedia.org/wiki/Bessel%27s_correction Another way of looking at it: The POPULATION VARIANCE (σ2) is a PARAMETER of the population. s2 The SAMPLE VARIANCE is a STATISTIC of the sample. We use the sample statistic to estimate the population parameter. The sample variance s2 is an estimate of the population variance σ2. Note: Excel 2010 has a couple functions for standard deviation. One for population (=STD.P(range)) and the other based on sample (=STD.S(range)). Short video: https://www.khanacademy.org/math/probability/descriptive- statistics/variance_std_deviation/v/review-and-intuition-why-we-divide-by-n-1-for-the-unbiased- sample-variance A note about (n-1) • Most random variables have two important characteristic values: the mean (μ) and the variance (s2). • Square-root of the variance is the standard deviation (s). • The mean is also called the expected value of the random variable xi. • The mean represents balance point on graph. • The variance & standard deviation both quantify how much the possible values disperse away from the mean. • For a normal distribution, 68% of values lies within µ ± σ, 95% within µ ± 2σ, and 99.7% within µ ± 3σ. Mean, Variance, Standard Deviation
  • 11. 9/3/2013 11 Mean, Median, Mode ∗ Covariance is a simplistic test to determine whether the data can be characterized by a normal distribution. The formula for covariance is the standard deviation divided by the mean. The closer the ratio is to zero, the better the possibility that the data has a normal distribution. A number greater than unity indicates a non- normal distribution. ∗ Skewness is a measure of symmetry or lack of it and can be normal, negative, or positive. ∗ Kurtosis is a measure whether the data are flat relative to a normal distribution. Covariance, Skewness, Kurtosis
  • 13. 9/3/2013 13 Normal Distribution at 68%, 95%, 99% The value is the probability that a random variable will fall in the upper or lower tail of a probability distribution. For example, α = 0.05 implies that there is a 0.95 probability that a random variable will not fall in the upper or lower tail of the probability distribution. Statistical tables of probability distributions (e.g., normal and “student t”) list probabilities that a random variable will fall in the upper tail only. α Values for Probability Distributions
  • 14. 9/3/2013 14 • We typically want to determine a confidence interval for which we are 90% confident that a random variable will not fall in either tail. • In this case, we use an α/2 = 0.05. • Similarly, to determine 95% and 99% confidence intervals, we would use α/2 = 0.025 and 0.005, respectively. α values and confidence intervals = ± √ = ± ( )( ) Regression analysis (dependency) – an analysis focused on the degree to which one variable (the dependent variable) is dependent upon one or more other variables (independent variable). (examples: ozone vs. temperature, bacteria counts versus chlorination treatment) Correlation analysis – neither variable is identified as more important than the other, but the investigator is interested in their interdependence or joint behavior NOTE: Correlation or association is not causation. Linear Regression
  • 15. 9/3/2013 15 Linear Regression Examples • Slope formula: y = mx + b • coefficient of determination, R2 is used in the context of statistical models whose main purpose is the prediction of future outcomes on the basis of other related information. It is the proportion of variability in a data set that is accounted for by the statistical model. It provides a measure of how well future outcomes are likely to be predicted by the model. R2 does NOT tell whether: the independent variables are a true cause of the changes in the dependent variable omitted-variable bias exists the correct regression was used the most appropriate set of independent variables has been chosen there is co-linearity present in the data the model might be improved by using transformed versions of the existing set of independent variables R2, Slope Equation
  • 16. 9/3/2013 16 Statistics Excel 2010 Summary Statistics http://academic.brooklyn.cuny.edu/economic/friedman/ descstatexcel.htm Column1 Mean 74.92857143 Standard Error 5.013678308 Median 78.5 Mode 80 Standard Deviation 18.75946647 Sample Variance 351.9175824 Kurtosis 1.923164749 Skewness -1.31355395 Range 71 Minimum 29 Maximum 100 Sum 1049 Count 14 Confidence Level(95.0%) 10.83139138 Ozone April 2013 Histogram and Summary Statistics Mean 35.48948 Median 35 Mode 35 Standard Dev 10.72231 Sample Variance 114.968 Kurtosis -0.20548 Skewness 0.146677 Minimum 2 Maximum 68 Sum 25304 Count 713
  • 17. 9/3/2013 17 April 2013 Ozone Box-Whisker Population size: 713 Median: 35 Minimum: 2 Maximum: 68 First quartile: 28 Third quartile: 43 Interquartile Range: 15 Outliers: 2 5 5 5 6 8 10 11 11 68 65 64 62 62 61 61 60 59 58 58 58 ∗ Access TCEQ web site data. ∗ Importing files into Excel and Matlab. ∗ Using Excel for statistical work, Matlab for statistics. Plotting histograms. ∗ Read the papers posted on Blackboard: Statistics for Analysis of Experimental Data, Errors and Limitation Associated with Regression, and Why we divide by n- 1. ∗ Lab will be assigned. Lab Thursday