Probability and Statistics

Roozbeh Sanaei
Probability and Statistics
Introduction to
The Jelly-Bean Jar Game, Debra James

https://www.robbimack.com/blog-entries/2016/11/25/the-jelly-bean-jar-game
Roozbeh Sanaei
Probability
• Obtaining likelihood of an event given parameters of the models
• Knowing the proportion of each color  the probability of drawing a red jellybean.
• One correct answer
Statistics
• Obtaining parameters of the model given a sample
• Sampling from the jar  proportion of red jellybeans
• No single correct answer, depending on assumptions
Probability vs Statistics
1

Roozbeh Sanaei
Bayesian
• Probability is an abstract concept that measures a state of knowledge or a degree of
belief in each proposition.
• Bayes rule holds in any valid probability space
Frequentist
• Probability measures the frequency of various outcomes of an experiment.
• Frequentist definition of probability is a special case
Frequentist vs. Bayesian
2

https://www.onlinemathlearning.com/shading-venn-diagrams.html
Roozbeh Sanaei
Venn Diagrams
3

Roozbeh Sanaei 4
Inclusion-exclusion principle
A B 𝐴 ∪ 𝐵 = 𝐴 + 𝐵 − 𝐴 ∩ 𝐵
Rule of Sum
If there are n possible ways to do something, and m possible ways to do another
thing, and the two things can't both be done, then there are n + m total possible ways
to do one of the things.

Roozbeh Sanaei
Combination and Permutation
Permutation
Number of ways for selection
and of k items from n items in
which order matters.
Combination
Number of ways for selection of k
items from n items in which order
does not matter.
5

Roozbeh Sanaei
Rules of Probability
A A’
𝑃(𝐴′
) = 1 − 𝑃(𝐴)
A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃(𝐵)
A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴 =
𝑆(𝐴)
𝑆(𝑈)
= 𝛼 𝑆(𝐴)
6

Roozbeh Sanaei 7
Conditional Probability
P(A) P(B)
𝑃(𝐴
∩
𝐵)
𝑃 𝐴 𝐵 =
𝑃 𝐴 ∩ 𝐵
𝑃 𝐵
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃 𝐵

Roozbeh Sanaei 8
Law of Total Probability
B1
B2
𝑃 𝐴 = 𝑃 𝐴 𝐵1 𝑃 𝐵1 + 𝑃 𝐴 𝐵2 𝑃 𝐵2 + 𝑃 𝐴 𝐵3 𝑃 𝐵3
B3
A

Roozbeh Sanaei
Kolmogorov axioms
1. P A ≥ 0
2. P S = 1
3. If A⋂𝐵 = ∅ → P A ∪ 𝐵 = P A + P(B)
∀ 𝐴(𝑒𝑣𝑒𝑛𝑡) ⊂ 𝑆(𝑠𝑎𝑚𝑝𝑙𝑒 𝑆𝑝𝑎𝑐𝑒)
Supplemented by two definitions
P 𝐴 𝐵 =
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
if A and B are independent 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)
9

Roozbeh Sanaei 10
𝑃 𝐵 𝐴 =
𝑃 𝐴 𝐵 𝑃 𝐵
𝑃(𝐴)
Bayes Theorem
P(A) P(B)
𝑃(𝐴
∩
𝐵)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 = 𝑃 𝐴 𝐵 𝑃 𝐵

Roozbeh Sanaei 11
Terminology
Experiment: a repeatable procedure with well-defined possible outcomes.
Toss the coin twice, report if it lands heads or tails each time.
Sample space: the set of all possible outcomes. We usually denote the sample space by Ω, sometimes by S.
Ω = {HH,HT,TH,TT}.
Event: a subset of the sample space.
Σ ={HH, HT}
Probability function: a function giving the probability for each outcome.
Each outcome is equally likely with probability ¼
Random variable: A random variable is a function from the sample space to the real numbers. 𝑋: 𝑆 → ℝ
We can define a random variable X whose value is the number of observed heads. The value of X will be one of 0,1,2

Roozbeh Sanaei 12
Probability functions
𝑃 𝑋 = 𝑥 = 𝑃𝑋 𝑥 = 𝑝(𝑥)
Probability Mass Function Probability Density Function
𝑃 𝑎 < 𝑋 < 𝑏
𝑥
=
𝑎
𝑥
=
𝑏
𝑃
𝑋
≤
𝑎
Cumulative distribution function Percent Point Function
https://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm
𝑃
𝑋
=
𝑎
𝑑𝑃
𝑥
≤
𝑎
/𝑑𝑥
𝑃
−1
𝑋
≤
𝑎

Roozbeh Sanaei 13
Properties of and
𝑝𝑑𝑓(𝑥) ≥ 0
−∞
+∞
𝑝𝑑𝑓 𝑥 = 1
𝑝𝑑𝑓 𝑐𝑑𝑓
𝑐𝑑𝑓(𝑥) ≥ 0
0 < 𝑐𝑑𝑓 𝑥 < 1
𝑖𝑓 𝑎 > 𝑏 ⇒ 𝑐𝑑𝑓 𝑎 > 𝑐𝑑𝑓(𝑏)
lim
𝑥⟶∞
𝑐𝑑𝑓(𝑥) = 1
lim
𝑥⟶−∞
𝑐𝑑𝑓(𝑥) = 0
𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝑐𝑑𝑓 𝑏 − 𝑐𝑑𝑓(𝑎)
𝑐𝑑𝑓′
(𝑥) = 𝑝𝑑𝑓(𝑥)

http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
Roozbeh Sanaei 14
Univariate
Distributions

Roozbeh Sanaei 15
Bernoulli Distribution
Models one trial in an experiment that can result in either success or failure.
We will write 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝 or 𝐵𝑒𝑟, which is read “𝑋 follows a Bernoulli distribution with
parameter 𝑝“ or “𝑋 is drawn from a Bernoulli distribution with parameter 𝑝".
𝑃 𝑋 = 𝑥 =
1 − 𝑝 𝑖𝑓 𝑥 = 0
𝑝 𝑖𝑓 𝑥 = 1
𝑃 𝑋 ≤ 𝑘 =
0 𝑖𝑓 𝑘 < 0
𝑞 𝑖𝑓 0 ≤ 𝑘 < 1
1 𝑘 ≥ 1

Roozbeh Sanaei 16
Binomial Distribution
𝑃 𝑋 = 𝑘 =
#𝑜𝑓 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡𝑠
𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
×
𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 𝑜𝑓𝑎𝑛 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡
𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
Models the number of successes in 𝑛 independent 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) trials
𝑃 𝑋 = 𝑘 =
𝑛
𝑘
𝑝𝑘
𝑞𝑛−𝑘
𝑃 𝑋 ≤ 𝑘 = 𝐼𝑞 𝑛 − 𝑘, 1 + 𝑘

Roozbeh Sanaei 17
Geometric Distribution
Models the number of tails before the first head in a sequence of coin flips
𝑃 𝑋 = 𝑘 = (1 − 𝑝)𝑘
𝑝 𝑃 𝑋 ≤ 𝑘 = 1 − (1 − 𝑝)𝑘+1

Roozbeh Sanaei 18
Uniform Distribution
Models the situation where all the outcomes between certain bounds are equally likely
𝑃 𝑋 = 𝑥 =
1
𝑏 − 𝑎
𝑖𝑓 𝑥 ∈ 𝑎, 𝑏
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑃 𝑋 ≤ 𝑥 =
0 𝑖𝑓 𝑥 < 𝑎
𝑥 − 𝑎
𝑏 − 𝑎
𝑖𝑓 𝑥 ∈ 𝑎, 𝑏
1 𝑖𝑓 𝑥 > 𝑏

Roozbeh Sanaei 19
Poisson Distribution
The Poisson distribution is a limiting case of the binomial distribution when 𝑛 ⟶ ∞ while
𝜆 = 𝑛 × 𝑝 remains constant
It means running the Bernoulli trials faster and faster rate but with a smaller and smaller
success probability
Poisson distribution is memoryless: 𝑃 𝑇 > 𝑡 + 𝑠 𝑇 > 𝑠 = 𝑃(𝑇 > 𝑠)
𝑃 𝑋 = 𝑥 = lim
𝑛→∞
𝑛
𝑘
𝑝𝑘𝑞𝑛−𝑘 =
𝜆𝑘𝑒−𝜆
𝑘!
𝑃 𝑋 ≤ 𝑎 = 𝑒−𝜆
𝑖=0
𝑘
𝜆𝑘
𝑘!

Roozbeh Sanaei
Exponential Distribution
While the Poisson distribution deals with the number of occurrences in a fixed period of time,
the exponential distribution deals with the time between occurrences of successive events
as time flows by continuously.
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0 = 𝑃 𝑁𝜏 = 0
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0
𝑃 𝑁𝜏 = 0 = 𝑒−𝜆𝜏
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑒−𝜆𝜏
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑛𝑒𝑤 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏
𝑀𝑒𝑚𝑜𝑟𝑦𝑙𝑒𝑠𝑠 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦
𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
20

Roozbeh Sanaei
Expected Value and Variance
Expected Value Variance
Mean, Average
𝐸 𝑋 , 𝜇 𝑉𝑎𝑟 𝑋 , 𝜎2
𝐸 𝑋 =
𝑗=1
𝑛
𝑥𝑗 𝑝 𝑥𝑗 𝐸 𝑋 − 𝜇 2
=
𝑗=1
𝑛
𝑥𝑗 − 𝜇
2
𝑝 𝑥𝑗
𝐸 𝑎𝑋 + 𝑏 = 𝑎 𝐸 𝑋 + 𝑏 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎2
𝑉𝑎𝑟 𝑋
𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸(𝑌) 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
𝐸 ℎ(𝑋) =
𝑗
ℎ 𝑥𝑗 𝑝(𝑥𝑗)
𝑖𝑓 𝑋 ⊥ 𝑌 → 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸 𝑌 𝑖𝑓 𝑋 ⊥ 𝑌 → 𝑉𝑎𝑟 𝑋𝑌
= 𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟 𝑌 + 𝑉𝑎𝑟 𝑋 𝐸 𝑌 2
+ 𝑉𝑎𝑟 𝑌 𝐸 𝑋 2
21

Roozbeh Sanaei
Expected Value and Variance
Distribution range X pmf 𝑝(𝑥) Expected Value Variance
𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) 0,1 𝑝 0 = 1 − 𝑝, 𝑝 1 = 𝑝 𝑝 𝑝(1 − 𝑝)
𝐵𝑖𝑜𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) 0,1,…, n 𝑛
𝑘
𝑝𝑘
(1 − 𝑝)𝑛−𝑘 𝑛𝑝 𝑛 𝑝 (1 − 𝑝)
𝑈𝑛𝑖𝑓𝑜𝑟𝑚(𝑛)
1,2,…, n 1
𝑛
𝑛 + 1
2
𝑛2 − 1
12
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝)
0,1,2,…. 𝑝(1 − 𝑝)𝑘 1 − 𝑝
𝑝
1 − 𝑝
𝑝2
22

Roozbeh Sanaei
Transformation of random variables
𝑋~𝑈𝑛𝑖𝑓𝑜𝑟𝑚 0,2 𝑟𝑎𝑛𝑔𝑒, 𝑝𝑑𝑓 and 𝑐𝑑𝑓 of Y = 𝑋2?
𝑃 𝑌 ≤ 𝑦 = 𝑃 𝑋2 ≤ 𝑦 = 𝑃 𝑋 ≤ 𝑦 =
𝑦
2
⇒ 𝑃 𝑌 = 𝑦 =
1
4 𝑦
𝑦 = 𝑥2
⇒ 𝑑𝑦 = 2𝑥𝑑𝑥 ⟹ 𝑑𝑥 =
𝑑𝑦
2 𝑦
⇒ 𝑃 𝑌 = 𝑦 =
1
4 𝑦
Distribution function technique:
Method of Transformation:
23

Roozbeh Sanaei
Joint pdf
𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏, 𝑌𝑎 ≤ 𝑦 ≤ 𝑌𝑏 =
𝑋𝑎
𝑋𝑏
𝑌𝑎
𝑌𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦
𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏 =
𝑋𝑎
𝑋𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦
𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌) 𝑝 𝑥𝑖, 𝑦𝑖 = 𝑝𝑋(𝑥𝑖)𝑝𝑌(𝑦𝑖) 𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌)
https://www.researchgate.net/publication/320182941_Bayesian_tracking_of_multiple_point_targets_using_Expectation_Maximization
Joint pdf
Marginal pdf
Independence
24

Roozbeh Sanaei
Covariance
𝐶𝑜𝑣 𝑥, 𝑦 = 𝐸((𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦))
𝐶𝑜𝑣 𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑 = 𝑎𝑐 𝐶𝑜𝑣(𝑋, 𝑌)
𝐶𝑜𝑣 𝑋1 + 𝑋2, 𝑌 = 𝐶𝑜𝑣 𝑋1, 𝑌 + 𝐶𝑜𝑣(𝑋2, 𝑌)
𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉𝑎𝑟(𝑋)
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋𝑌 − 𝜇𝑥𝜇𝑦
𝑉𝑎𝑟 𝑋, 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌
𝐶𝑜𝑣 𝑋, 𝑌 =
𝑋𝑎
𝑋𝑏
𝑌𝑎
𝑌𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑥 𝑦 𝑑𝑥 𝑑𝑦 − 𝜇𝑥𝜇𝑦
25

Moderate
Positive
Strong
Positive
Perfect Positive
Correlation
Roozbeh Sanaei
Pearson Correlation
https://stats.libretexts.org/Courses/Highline_College/Book%3A_Statistics_Using_Technology_(Kozak)/10%3A_Regression_and_Correlation/10.02%3A_Correlation
Negative No
Correlation Non-linear Correlation
26
𝐶𝑜𝑟 𝑋, 𝑌 = 𝜌 =
𝐶𝑜𝑣 𝑋, 𝑌
𝜎𝑋𝜎𝑌
1. 𝜌 is the covariance of the standardizations of X and Y
2. 𝜌 is dimensionless and −1 ≤ 𝜌 ≤ +1
𝜌 = +1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 > 0
𝜌 = −1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 < 0

Roozbeh Sanaei
Maximum Likelihood
27
100
60
𝑝60(1 − 𝑝)40

• In Maximum likelihood estimation we find the most optimal set of values for model parameters 𝜃𝑀𝐿
• In Bayesian learning, we assign a probability to each set of model parameters
• We update parameter probabilities form prior probabilities to posterior by multiplying them in
likelihood of data given each set of parameters
• Since probabilities no longer sum up to “1” after multiplication, we need to normalize them back by dividing
them to the sum of the obtained values.
Roozbeh Sanaei
Bayesian Learning
28
𝑝(𝜃) 𝑝 𝑋 𝜃
𝑝 𝑋 𝜃
https://psyarxiv.com/w5vbp/

Median
Roozbeh Sanaei
Median, Quantiles, Quartiles, Decile, Percentile
𝑃 𝑋 = 𝑚𝑒𝑑𝑖𝑎𝑛 = 0.5
Pth Quantile 𝑃 𝑋 = 𝑞𝑝 = 𝑝
Pth Percentile 𝑃 𝑋 = 𝑞𝑝/100 = 𝑝/100
Decile 𝑃 𝑋 = 𝑞𝑝/10 = 𝑝/10
https://prepnuggets.com/glossary/quantile/ 29

Roozbeh Sanaei
Law of large numbers
With high probability the density histogram of a
large number of samples from a distribution is a
good approximation of the graph of the
underlying pdf f(x)
∀𝑎∈ℝ lim
𝑛→∞
𝑃 𝑋𝑛 − 𝜇 < 𝑎 = 1
https://plotly.com/chart-studio-help/histogram/ 30

Roozbeh Sanaei
Central Limit Theorem
The central limit theorem states that if
you sufficiently select random samples
from a population with mean μ and
standard deviation 𝜎 , then the
distribution of the sample means will
be approximately normally distributed
with mean μ and standard
deviation
𝜎
𝑛
https://towardsdatascience.com/central-limit-theorem-a-real-life-application-f638657686e1 31

Roozbeh Sanaei
z-distribution
𝑧 =
𝑥 − 𝜇
𝜎
https://www.researchgate.net/publication/296695387_Developing_a_Geospatial_Protocol_for_Coral_Epizootiology 32
𝑧 =
𝑝 − 𝑝0
𝑝0 1 − 𝑝0
𝑛
𝑥 : Sample mean value
𝜇: population mean value
𝑝 : sample proportion
𝑝0: population proportion
𝑛: sample size

Roozbeh Sanaei
Two sample Z-distribution
33
https://en.wikipedia.org/wiki/Student%27s_t-distribution
𝑧 =
𝑥1 − 𝑥2 − Δ
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝑥1 , 𝑥2 ∶ means of two samples
Δ ∶ difference between the population means
𝜎1, 𝜎2: populations standard deviations
𝑛1, 𝑛2: sample sizes
t

Roozbeh Sanaei
p-value
P-value
P-value
P-value
(doubled)
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 the null hypothesis true
≠ 𝑃 he null hypothesis true 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡
https://www.fromthegenesis.com/difference-between-one-tail-test-and-two-tail-test/ 34

Roozbeh Sanaei
Type I and Type II error
35
https://www.scribbr.com/statistics/type-i-and-type-ii-errors/
Critical
Value
Rejection Region
Non-rejection Region

Roozbeh Sanaei
t-distribution
36
𝑡 =
𝑥 − 𝜇
𝑠/ 𝑛 − 1
𝑡 ∶ Student’s t-distribution
𝑥 : Sample mean
𝜇 ∶ population mean
𝑠 ∶ sample standard deviation
𝑛 ∶ sample size
𝜈 = 𝑛 − 1 : degrees of freedom t
P(t)

Roozbeh Sanaei
Two sample t-distribution (equal underlying population variances)
37
𝑡 =
𝑥1 − 𝑥2 − Δ
𝑠𝑝/
1
𝑛1
+
1
𝑛2
𝑥 1, 𝑥 2: sample means
𝑠1, 𝑠2: sample standard deviations
𝑠𝑝
2 =
( 𝑛1 − 1 𝑠1
2
+ ( 𝑛2 − 1 𝑠2
2
)
𝑛1 + 𝑛2 − 2
𝜈 = 𝑛1 + 𝑛2 − 2

Roozbeh Sanaei
Two sample t-distribution (different underlying population variances
38
𝑡 =
𝑥1 − 𝑥2 − Δ
𝑠𝑝/
1
𝑛1
+
1
𝑛2
𝑥 1, 𝑥 2: sample means
𝑠1, 𝑠2: sample standard deviations
𝜈 =
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
2
𝑠1
2
𝑛1
𝑛1 − 1
+
𝑠2
2
𝑛2
𝑛2 − 1
𝑠𝑝
2
=
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2

Roozbeh Sanaei
Chi square distribution
39
Chi-squared distribution is distribution of sum of squares of k
independent, standard normal variables.
𝑄 =
𝑖=1
𝑟
𝑍𝑟

This enable us to evaluate the possibility of a set of variable being independently
drawn form for a set of normal distributions.
we can standardize the variables if we know their expected probability
Roozbeh Sanaei
Chi square distribution
40
𝔼 𝑛𝑗 = 𝑛 𝑝𝑗
𝑖=1
𝑟
𝑛𝑟 = 𝑁 so we only test r-1 variables
𝑖=1
𝑟
𝑛𝑗 − 𝔼 𝑛𝑗
2
𝔼 𝑛𝑗
=
𝑖=1
𝑟
𝑛𝑗 − 𝑛 𝑝𝑗
2
𝑛 𝑝𝑗
= 𝑋𝑟−1
2

Roozbeh Sanaei
Chi square test
41
A B C D total
White
collar
𝑜0,0
90
𝑜0,1
60
𝑜0,2
104
𝑜0,2
95 349
Blue
Collar
𝑜1,0
30
𝑜1,1
50
𝑜1,2
51
𝑜1,3
20 151
No
collar
𝑜2,0
30
𝑜1,1
40
𝑜1,2
45
𝑜1,3
35
150
Total 150
15
0
200 150 N = 650
𝑒𝑖,𝑗 = 𝑝𝑖𝑝𝑗𝑁 =
𝑖 𝑜𝑖,𝑗 𝑗 𝑜𝑖,𝑗
𝑁
𝜒2 =
(𝑜𝑖,𝑗 − 𝑒𝑖,𝑗)2
𝑒𝑖,𝑗
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1
(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑠 − 1)
Values remains independent in absence of
categorical variable dependence.
For measuring expectation values,
we assume independence between categories
𝑖, 𝑗

Roozbeh Sanaei
𝐹 − distribution
42
𝐹∗
=
𝑆1 𝑑1
𝑆2 𝑑2
Where 𝑆1 and 𝑆2 are independent random
variables with chi-square distributions with
respective degrees of freedom 𝑑1 and 𝑑2

Roozbeh Sanaei
𝐹 − test
43
The full model
𝑌𝑖 = 𝛽0 + 𝛽1𝑥 + 𝜀𝑖
The model thought to be most appropriate
for the data
The reduced model
𝑌𝑖 = 𝛽0 + 𝜀𝑖
The model described by the null hypothesis
https://online.stat.psu.edu/stat501/lesson/6/6.2

Roozbeh Sanaei
𝐹 − test
44
𝐹∗ =
𝑆𝑆𝐸 𝑅 − 𝑆𝑆𝐸(𝐹)
𝑞
/
𝑆𝑆𝐸(𝐹)
𝑛 − (𝑘 + 1)
𝑆𝑆𝐸 𝑅 : Error sum of squares of the reduced model
𝑆𝑆𝐸 𝐹 : Error sum of squares of the full model
𝑞 : number of restrictions
𝑛 : number of observations
𝑘 : number of independent variables

Roozbeh Sanaei
Coding Systems for Categorical Variables
45
Race x1 x2 x3
Hispanic 1 0 0
Asian 0 1 0
African 0 0 1
White 0 0 0
Dummy Coding
Race x1 x2 x3
Hispanic 1 0 0
Asian 0 1 0
African 0 0 1
White -1 -1 -1
SIMPLE effect contrast coding
https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/

ANOVA Types
ANOVA Type
Dependent
variables
Independent
variable
One-way ANOVA 1 continuous
1 categorical
variable
Two-way ANOVA 1 continuous
2 or more
categorical
ANCOVA
1 continuous
1 categorical
variable and
1 continuous
One-way
MANOVA
2 or more
continuous
2 or more
categorical
Two-way
MANOVA
2 or more
continuous
1 categorical
variable and 1
continuous
46

𝑑1 = 𝐼 − 1
𝑑2 = 𝑛𝑇 − 𝐼
Roozbeh Sanaei
One Way ANOVA
47
Independent variable : Categorical Dependent variable : Quantitative
Brand of soda : Coke, Pepsi, Sprite, Fanta Price per 100ml
Levels or Treatments
𝑖=1
#𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝑦,
2
=
𝑖=1
𝑘
𝑛𝑖 𝑦𝑖,𝑗 − 𝑦,
2
+
𝑖=1
𝑘
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝑦𝑖,
2
Sum of Squares of
Treatments
Sum of Squares of
Errors
𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑆𝑆𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑑1
𝑆𝑆𝐸𝑟𝑟𝑜𝑟 𝑑2
=
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
=
𝑀𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑆𝐸𝑟𝑟𝑜𝑟
𝐼 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑛𝑇 ∶ 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠

i: treatment
j: observation
Roozbeh Sanaei
One way ANOVA model and assumptions
48
𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝜖𝑖,𝑗
Observation
Common effect Treatment effect
Random error effect
Independence. The dependent variable score for
each experimental unit is independent of the score
for any other unit.
Normality. In the population, dependent variable
scores are normally distributed within treatment
groups.
Equality of variance. In the population, the
variance of dependent variable scores in each
treatment group is equal. (Equality of variance is
also known as homogeneity of
variance or homoscedasticity.)
https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf
μ is always a fixed parameter
𝜖𝑖,𝑗 are assumed to be normally and
independently distributed,
with mean zero and variance 𝜎𝜖
2
SS𝐸 = 𝑖=1
#𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝜇𝑗
2
𝑑𝑆𝑆𝐸
𝑑𝜇𝑗
= 0 𝜇𝑗 = 𝑦𝑗

Roozbeh Sanaei
One way ANOVA model and assumptions
49
𝜏𝑖 are fixed parameters if the levels of the treatment are fixed and not a random
sample from a population of possible levels, It is also assumed that μ is chosen
so that
𝜏𝑖 = 0
Fixed effects model
Random effects model
𝜏𝑖 𝜖 𝑁𝐼𝐷(0, 𝜎𝜏)
F is the value of the statistic used to test whether 𝜎𝜏 = 0

𝑉𝑎𝑟 𝐿 = 𝜎2
𝑖
𝑐𝑖
2
𝑛𝑖
𝐿 =
𝑖
𝑐𝑖𝑌𝑖 ,
Roozbeh Sanaei
Linear combination of the factor level means
50
Mean Value for linear combination of factor levels
Estimating dependent variable on mixture of multiple factor levels 𝐿 = 𝑝1𝜇1 + 𝑝2𝜇2 + 𝑝3𝜇3
𝐿 =
𝑖
𝑐𝑖𝜇𝑖 𝑆𝐸 𝐿 = 𝑀𝑆𝐸
𝑖
𝑐𝑖
2
𝑛𝑖
Test 𝑯𝟎 ∶ 𝑳 = 𝑳𝟎 𝐚𝐠𝐚𝐢𝐧𝐬𝐭 𝑯𝟎 ∶ 𝑳 ≠ 𝑳𝟎
𝑡∗ =
𝐿 − 𝐿0
𝑆𝐸 𝐿
Pairwise Comparison: Comparing one level with another levels, 𝐿 = 𝜇1 − 𝜇2
Contrast: a linear combination of the factor level means such that the coefficients sum to zero, useful
when we compare one level against multiple levels, 𝐿 = 𝜇1 − 0.5 𝜇2 − 0.5 𝜇2

Roozbeh Sanaei
Multiple Comparisons Problem
51
In testing multiple linear combinations of factor level means,
Familywise Type I Error Rate (FWER) is the probability of making at least one error
among all tested linear combinations.
• Single test Type I Error : 𝐏(𝐑𝐞𝐣𝐞𝐜𝐭 𝐇𝟎|𝐂𝐨𝐧𝐟𝐢𝐫𝐦 𝐇𝟎)
• For q independent tests: 𝐅𝐖𝐄𝐑 = 𝟏 − (𝟏 − 𝛂)𝐪
FWER will depend on the number of tests and whether or not tests are independent of
one another

Roozbeh Sanaei
Bonferroni’s Correction
52
Boole’s inequality
𝐹𝑊𝐸𝑅 ≤
𝑘=1
𝑓
𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0𝑘 𝐶𝑜𝑛𝑓𝑖𝑟𝑚𝐻0𝑘 =
𝑘=1
𝑓
𝛼∗
= 𝑓 𝛼∗
Bonferroni’s Correction
𝛼∗
= 𝛼 𝑓
Major Strength
Applicable to many situations (no assumptions)
Major Strength
Overly Conservative
Sidak Correction
𝛼∗= 1 − (1 − 𝛼) (1 𝑚)

Roozbeh Sanaei
Holm’s Step-Down and Hochberg’s Step-Up Procedure
53
Holm-Bonferroni Equation
𝛼∗
= 𝛼 (𝑛 − 𝑟𝑎𝑛𝑘 + 1)
1. Order p−values in ascending way
2. Calculate critical 𝛼∗ value for each rank of the list
3. Scan forward and accept all hypothesis that exceed 𝛼∗
Holm’s Step-Down Procedure
Hochberg’s Step-Up Procedure
1. Order p−values in ascending way
2. Calculate critical 𝛼∗ value for each rank of the list
3. Scan backward and reject all hypothesis that surpass 𝛼∗

Roozbeh Sanaei
Studentized range distribution
54
1. Sample is of size n from each of 𝑘 populations with the same normal distribution 𝑁 𝜇, 𝜎2
2. 𝑦𝑚𝑖𝑛 is the smallest and 𝑦𝑚𝑎𝑥 is the largest of these sample means
3. 𝑠² is the pooled sample variance from these samples.
𝑞 =
𝑦𝑚𝑎𝑥 − 𝑦𝑚𝑖𝑛
𝑠 𝑛
Follows the studentized range distribution
Studentized range statistic

Roozbeh Sanaei
Tukey's range test
55
ANOVA only tests if independent variable level significantly changes the dependent variable, It does not
figure out between which pair of the levels cause this significant change lies.
In turkey range test we compare mean of dependent variable for each two levels against each other
using Studentized range statistic.
𝑉(𝐿) = 𝜎2
1
𝑛𝑗
+
1
𝑛𝑘
𝐿 = 𝜇𝑖 − 𝜇𝑗
𝑞∗ =
2𝐿
𝑉 (𝐿)
We reject the mean difference between two levels if it goes beyond than Studentized range distribution
for certain k=#levels and df = #experiments-k

Roozbeh Sanaei
Two Way ANOVA
56
Independent variables : Categorical Dependent variable : Quantitative
Fertilizer Type: A, B, C
Planting Density: Low, High
Levels or Treatments
Final crop yield in bushels per acre at harvest time
𝑆𝑆 𝐴 = 𝑟𝑏
𝑖=1
𝑎
𝑦𝑖, , − 𝑦,,
2
𝑆𝑆 𝐵 = 𝑟𝑎
𝑖=1
𝑏
𝑦,𝑗, − 𝑦,,
2
𝑆𝑆 𝐴𝐵 = 𝑟
𝑖=1
𝑎
𝑖=1
𝑏
𝑦𝑖,𝑗, − 𝑦𝑖,, − 𝑦,𝑗, + 𝑦,,
2
𝑆𝑆 𝐸 =
𝑖=1
𝑟
𝑖=1
𝑎
𝑗=1
𝑏
𝑦𝑖,𝑗,𝑘 − 𝑦𝑖,𝑗,
2
𝑆𝑆 𝐸 =
𝑖=1
𝑟
𝑖=1
𝑎
𝑗=1
𝑏
𝑦𝑖,𝑗,𝑘 − 𝑦,,
2
Source SS df Mean Square
Factor A SS(A) (a-1) SS(A)
𝑎 − 1
Factor B SS(B) (b-1) SS(B)
𝑎 − 1
Interaction SS(AB) (a-1)(b-1) SS(AB)
(𝑎 − 1)(𝑏 − 1)
Error SSE (N-ab) SSE
(𝑁 − 𝑎𝑏)
Total SS(Total) (N-1)

i: treatment
j: observation
Roozbeh Sanaei
Two Way ANOVA Model and Parameter Estimation
57
𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝛽𝑗 + 𝛾𝑖,𝑗 + 𝜖𝑖,𝑗,𝑘
Observation
Common effect
i-th Treatment effect of A
Random error effect
https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf
j-th Treatment effect of B
Interaction between i-th Treatment effect of A
and j-th Treatment effect of B

Probability and Statistics

More Related Content

What's hot (20)

Similar to Probability and Statistics (20)

Recently uploaded (20)

Probability and Statistics