Roozbeh Sanaei
Probability and Statistics
Introduction to
The Jelly-Bean Jar Game, Debra James
https://www.robbimack.com/blog-entries/2016/11/25/the-jelly-bean-jar-game
Roozbeh Sanaei
Probability
• Obtaining likelihood of an event given parameters of the models
• Knowing the proportion of each color  the probability of drawing a red jellybean.
• One correct answer
Statistics
• Obtaining parameters of the model given a sample
• Sampling from the jar  proportion of red jellybeans
• No single correct answer, depending on assumptions
Probability vs Statistics
1
Roozbeh Sanaei
Bayesian
• Probability is an abstract concept that measures a state of knowledge or a degree of
belief in each proposition.
• Bayes rule holds in any valid probability space
Frequentist
• Probability measures the frequency of various outcomes of an experiment.
• Frequentist definition of probability is a special case
Frequentist vs. Bayesian
2
https://www.onlinemathlearning.com/shading-venn-diagrams.html
Roozbeh Sanaei
Venn Diagrams
3
Roozbeh Sanaei 4
Inclusion-exclusion principle
A B 𝐴 ∪ 𝐵 = 𝐴 + 𝐵 − 𝐴 ∩ 𝐵
Rule of Sum
If there are n possible ways to do something, and m possible ways to do another
thing, and the two things can't both be done, then there are n + m total possible ways
to do one of the things.
Roozbeh Sanaei
Combination and Permutation
Permutation
Number of ways for selection
and of k items from n items in
which order matters.
Combination
Number of ways for selection of k
items from n items in which order
does not matter.
5
Roozbeh Sanaei
Rules of Probability
A A’
𝑃(𝐴′
) = 1 − 𝑃(𝐴)
A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃(𝐵)
A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵)
𝑃 𝐴 =
𝑆(𝐴)
𝑆(𝑈)
= 𝛼 𝑆(𝐴)
6
Roozbeh Sanaei 7
Conditional Probability
P(A) P(B)
𝑃(𝐴
∩
𝐵)
𝑃 𝐴 𝐵 =
𝑃 𝐴 ∩ 𝐵
𝑃 𝐵
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃 𝐵
Roozbeh Sanaei 8
Law of Total Probability
B1
B2
𝑃 𝐴 = 𝑃 𝐴 𝐵1 𝑃 𝐵1 + 𝑃 𝐴 𝐵2 𝑃 𝐵2 + 𝑃 𝐴 𝐵3 𝑃 𝐵3
B3
A
Roozbeh Sanaei
Kolmogorov axioms
1. P A ≥ 0
2. P S = 1
3. If A⋂𝐵 = ∅ → P A ∪ 𝐵 = P A + P(B)
∀ 𝐴(𝑒𝑣𝑒𝑛𝑡) ⊂ 𝑆(𝑠𝑎𝑚𝑝𝑙𝑒 𝑆𝑝𝑎𝑐𝑒)
Supplemented by two definitions
P 𝐴 𝐵 =
𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵)
if A and B are independent 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵)
9
Roozbeh Sanaei 10
𝑃 𝐵 𝐴 =
𝑃 𝐴 𝐵 𝑃 𝐵
𝑃(𝐴)
Bayes Theorem
P(A) P(B)
𝑃(𝐴
∩
𝐵)
𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 = 𝑃 𝐴 𝐵 𝑃 𝐵
Roozbeh Sanaei 11
Terminology
Experiment: a repeatable procedure with well-defined possible outcomes.
Toss the coin twice, report if it lands heads or tails each time.
Sample space: the set of all possible outcomes. We usually denote the sample space by Ω, sometimes by S.
Ω = {HH,HT,TH,TT}.
Event: a subset of the sample space.
Σ ={HH, HT}
Probability function: a function giving the probability for each outcome.
Each outcome is equally likely with probability ¼
Random variable: A random variable is a function from the sample space to the real numbers. 𝑋: 𝑆 → ℝ
We can define a random variable X whose value is the number of observed heads. The value of X will be one of 0,1,2
Roozbeh Sanaei 12
Probability functions
𝑃 𝑋 = 𝑥 = 𝑃𝑋 𝑥 = 𝑝(𝑥)
Probability Mass Function Probability Density Function
𝑃 𝑎 < 𝑋 < 𝑏
𝑥
=
𝑎
𝑥
=
𝑏
𝑃
𝑋
≤
𝑎
Cumulative distribution function Percent Point Function
https://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm
𝑃
𝑋
=
𝑎
𝑑𝑃
𝑥
≤
𝑎
/𝑑𝑥
𝑃
−1
𝑋
≤
𝑎
Roozbeh Sanaei 13
Properties of and
𝑝𝑑𝑓(𝑥) ≥ 0
−∞
+∞
𝑝𝑑𝑓 𝑥 = 1
𝑝𝑑𝑓 𝑐𝑑𝑓
𝑐𝑑𝑓(𝑥) ≥ 0
0 < 𝑐𝑑𝑓 𝑥 < 1
𝑖𝑓 𝑎 > 𝑏 ⇒ 𝑐𝑑𝑓 𝑎 > 𝑐𝑑𝑓(𝑏)
lim
𝑥⟶∞
𝑐𝑑𝑓(𝑥) = 1
lim
𝑥⟶−∞
𝑐𝑑𝑓(𝑥) = 0
𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝑐𝑑𝑓 𝑏 − 𝑐𝑑𝑓(𝑎)
𝑐𝑑𝑓′
(𝑥) = 𝑝𝑑𝑓(𝑥)
http://www.math.wm.edu/~leemis/chart/UDR/UDR.html
Roozbeh Sanaei 14
Univariate
Distributions
Roozbeh Sanaei 15
Bernoulli Distribution
Models one trial in an experiment that can result in either success or failure.
We will write 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝 or 𝐵𝑒𝑟, which is read “𝑋 follows a Bernoulli distribution with
parameter 𝑝“ or “𝑋 is drawn from a Bernoulli distribution with parameter 𝑝".
𝑃 𝑋 = 𝑥 =
1 − 𝑝 𝑖𝑓 𝑥 = 0
𝑝 𝑖𝑓 𝑥 = 1
𝑃 𝑋 ≤ 𝑘 =
0 𝑖𝑓 𝑘 < 0
𝑞 𝑖𝑓 0 ≤ 𝑘 < 1
1 𝑘 ≥ 1
Roozbeh Sanaei 16
Binomial Distribution
𝑃 𝑋 = 𝑘 =
#𝑜𝑓 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡𝑠
𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
×
𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 𝑜𝑓𝑎𝑛 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡
𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
Models the number of successes in 𝑛 independent 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) trials
𝑃 𝑋 = 𝑘 =
𝑛
𝑘
𝑝𝑘
𝑞𝑛−𝑘
𝑃 𝑋 ≤ 𝑘 = 𝐼𝑞 𝑛 − 𝑘, 1 + 𝑘
Roozbeh Sanaei 17
Geometric Distribution
Models the number of tails before the first head in a sequence of coin flips
𝑃 𝑋 = 𝑘 = (1 − 𝑝)𝑘
𝑝 𝑃 𝑋 ≤ 𝑘 = 1 − (1 − 𝑝)𝑘+1
Roozbeh Sanaei 18
Uniform Distribution
Models the situation where all the outcomes between certain bounds are equally likely
𝑃 𝑋 = 𝑥 =
1
𝑏 − 𝑎
𝑖𝑓 𝑥 ∈ 𝑎, 𝑏
0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
𝑃 𝑋 ≤ 𝑥 =
0 𝑖𝑓 𝑥 < 𝑎
𝑥 − 𝑎
𝑏 − 𝑎
𝑖𝑓 𝑥 ∈ 𝑎, 𝑏
1 𝑖𝑓 𝑥 > 𝑏
Roozbeh Sanaei 19
Poisson Distribution
The Poisson distribution is a limiting case of the binomial distribution when 𝑛 ⟶ ∞ while
𝜆 = 𝑛 × 𝑝 remains constant
It means running the Bernoulli trials faster and faster rate but with a smaller and smaller
success probability
Poisson distribution is memoryless: 𝑃 𝑇 > 𝑡 + 𝑠 𝑇 > 𝑠 = 𝑃(𝑇 > 𝑠)
𝑃 𝑋 = 𝑥 = lim
𝑛→∞
𝑛
𝑘
𝑝𝑘𝑞𝑛−𝑘 =
𝜆𝑘𝑒−𝜆
𝑘!
𝑃 𝑋 ≤ 𝑎 = 𝑒−𝜆
𝑖=0
𝑘
𝜆𝑘
𝑘!
Roozbeh Sanaei
Exponential Distribution
While the Poisson distribution deals with the number of occurrences in a fixed period of time,
the exponential distribution deals with the time between occurrences of successive events
as time flows by continuously.
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0 = 𝑃 𝑁𝜏 = 0
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0
𝑃 𝑁𝜏 = 0 = 𝑒−𝜆𝜏
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑒−𝜆𝜏
𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑛𝑒𝑤 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏
𝑀𝑒𝑚𝑜𝑟𝑦𝑙𝑒𝑠𝑠 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦
𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛
20
Roozbeh Sanaei
Expected Value and Variance
Expected Value Variance
Mean, Average
𝐸 𝑋 , 𝜇 𝑉𝑎𝑟 𝑋 , 𝜎2
𝐸 𝑋 =
𝑗=1
𝑛
𝑥𝑗 𝑝 𝑥𝑗 𝐸 𝑋 − 𝜇 2
=
𝑗=1
𝑛
𝑥𝑗 − 𝜇
2
𝑝 𝑥𝑗
𝐸 𝑎𝑋 + 𝑏 = 𝑎 𝐸 𝑋 + 𝑏 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎2
𝑉𝑎𝑟 𝑋
𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸(𝑌) 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
𝐸 ℎ(𝑋) =
𝑗
ℎ 𝑥𝑗 𝑝(𝑥𝑗)
𝑖𝑓 𝑋 ⊥ 𝑌 → 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸 𝑌 𝑖𝑓 𝑋 ⊥ 𝑌 → 𝑉𝑎𝑟 𝑋𝑌
= 𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟 𝑌 + 𝑉𝑎𝑟 𝑋 𝐸 𝑌 2
+ 𝑉𝑎𝑟 𝑌 𝐸 𝑋 2
21
Roozbeh Sanaei
Expected Value and Variance
Distribution range X pmf 𝑝(𝑥) Expected Value Variance
𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) 0,1 𝑝 0 = 1 − 𝑝, 𝑝 1 = 𝑝 𝑝 𝑝(1 − 𝑝)
𝐵𝑖𝑜𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) 0,1,…, n 𝑛
𝑘
𝑝𝑘
(1 − 𝑝)𝑛−𝑘 𝑛𝑝 𝑛 𝑝 (1 − 𝑝)
𝑈𝑛𝑖𝑓𝑜𝑟𝑚(𝑛)
1,2,…, n 1
𝑛
𝑛 + 1
2
𝑛2 − 1
12
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝)
0,1,2,…. 𝑝(1 − 𝑝)𝑘 1 − 𝑝
𝑝
1 − 𝑝
𝑝2
22
Roozbeh Sanaei
Transformation of random variables
𝑋~𝑈𝑛𝑖𝑓𝑜𝑟𝑚 0,2 𝑟𝑎𝑛𝑔𝑒, 𝑝𝑑𝑓 and 𝑐𝑑𝑓 of Y = 𝑋2?
𝑃 𝑌 ≤ 𝑦 = 𝑃 𝑋2 ≤ 𝑦 = 𝑃 𝑋 ≤ 𝑦 =
𝑦
2
⇒ 𝑃 𝑌 = 𝑦 =
1
4 𝑦
𝑦 = 𝑥2
⇒ 𝑑𝑦 = 2𝑥𝑑𝑥 ⟹ 𝑑𝑥 =
𝑑𝑦
2 𝑦
⇒ 𝑃 𝑌 = 𝑦 =
1
4 𝑦
Distribution function technique:
Method of Transformation:
23
Roozbeh Sanaei
Joint pdf
𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏, 𝑌𝑎 ≤ 𝑦 ≤ 𝑌𝑏 =
𝑋𝑎
𝑋𝑏
𝑌𝑎
𝑌𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦
𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏 =
𝑋𝑎
𝑋𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦
𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌) 𝑝 𝑥𝑖, 𝑦𝑖 = 𝑝𝑋(𝑥𝑖)𝑝𝑌(𝑦𝑖) 𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌)
https://www.researchgate.net/publication/320182941_Bayesian_tracking_of_multiple_point_targets_using_Expectation_Maximization
Joint pdf
Marginal pdf
Independence
24
Roozbeh Sanaei
Covariance
𝐶𝑜𝑣 𝑥, 𝑦 = 𝐸((𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦))
𝐶𝑜𝑣 𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑 = 𝑎𝑐 𝐶𝑜𝑣(𝑋, 𝑌)
𝐶𝑜𝑣 𝑋1 + 𝑋2, 𝑌 = 𝐶𝑜𝑣 𝑋1, 𝑌 + 𝐶𝑜𝑣(𝑋2, 𝑌)
𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉𝑎𝑟(𝑋)
𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋𝑌 − 𝜇𝑥𝜇𝑦
𝑉𝑎𝑟 𝑋, 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌
𝐶𝑜𝑣 𝑋, 𝑌 =
𝑋𝑎
𝑋𝑏
𝑌𝑎
𝑌𝑏
𝑝𝑑𝑓 𝑥, 𝑦 𝑥 𝑦 𝑑𝑥 𝑑𝑦 − 𝜇𝑥𝜇𝑦
25
Moderate
Positive
Strong
Positive
Perfect Positive
Correlation
Roozbeh Sanaei
Pearson Correlation
https://stats.libretexts.org/Courses/Highline_College/Book%3A_Statistics_Using_Technology_(Kozak)/10%3A_Regression_and_Correlation/10.02%3A_Correlation
Negative No
Correlation Non-linear Correlation
26
𝐶𝑜𝑟 𝑋, 𝑌 = 𝜌 =
𝐶𝑜𝑣 𝑋, 𝑌
𝜎𝑋𝜎𝑌
1. 𝜌 is the covariance of the standardizations of X and Y
2. 𝜌 is dimensionless and −1 ≤ 𝜌 ≤ +1
𝜌 = +1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 > 0
𝜌 = −1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 < 0
Roozbeh Sanaei
Maximum Likelihood
27
100
60
𝑝60(1 − 𝑝)40
• In Maximum likelihood estimation we find the most optimal set of values for model parameters 𝜃𝑀𝐿
• In Bayesian learning, we assign a probability to each set of model parameters
• We update parameter probabilities form prior probabilities to posterior by multiplying them in
likelihood of data given each set of parameters
• Since probabilities no longer sum up to “1” after multiplication, we need to normalize them back by dividing
them to the sum of the obtained values.
Roozbeh Sanaei
Bayesian Learning
28
𝑝(𝜃) 𝑝 𝑋 𝜃
𝑝 𝑋 𝜃
https://psyarxiv.com/w5vbp/
Median
Roozbeh Sanaei
Median, Quantiles, Quartiles, Decile, Percentile
𝑃 𝑋 = 𝑚𝑒𝑑𝑖𝑎𝑛 = 0.5
Pth Quantile 𝑃 𝑋 = 𝑞𝑝 = 𝑝
Pth Percentile 𝑃 𝑋 = 𝑞𝑝/100 = 𝑝/100
Decile 𝑃 𝑋 = 𝑞𝑝/10 = 𝑝/10
https://prepnuggets.com/glossary/quantile/ 29
Roozbeh Sanaei
Law of large numbers
With high probability the density histogram of a
large number of samples from a distribution is a
good approximation of the graph of the
underlying pdf f(x)
∀𝑎∈ℝ lim
𝑛→∞
𝑃 𝑋𝑛 − 𝜇 < 𝑎 = 1
https://plotly.com/chart-studio-help/histogram/ 30
Roozbeh Sanaei
Central Limit Theorem
The central limit theorem states that if
you sufficiently select random samples
from a population with mean μ and
standard deviation 𝜎 , then the
distribution of the sample means will
be approximately normally distributed
with mean μ and standard
deviation
𝜎
𝑛
https://towardsdatascience.com/central-limit-theorem-a-real-life-application-f638657686e1 31
Roozbeh Sanaei
z-distribution
𝑧 =
𝑥 − 𝜇
𝜎
https://www.researchgate.net/publication/296695387_Developing_a_Geospatial_Protocol_for_Coral_Epizootiology 32
𝑧 =
𝑝 − 𝑝0
𝑝0 1 − 𝑝0
𝑛
𝑥 : Sample mean value
𝜇: population mean value
𝑝 : sample proportion
𝑝0: population proportion
𝑛: sample size
Roozbeh Sanaei
Two sample Z-distribution
33
https://en.wikipedia.org/wiki/Student%27s_t-distribution
𝑧 =
𝑥1 − 𝑥2 − Δ
𝜎1
2
𝑛1
+
𝜎2
2
𝑛2
𝑥1 , 𝑥2 ∶ means of two samples
Δ ∶ difference between the population means
𝜎1, 𝜎2: populations standard deviations
𝑛1, 𝑛2: sample sizes
t
Roozbeh Sanaei
p-value
P-value
P-value
P-value
(doubled)
𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 the null hypothesis true
≠ 𝑃 he null hypothesis true 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡
https://www.fromthegenesis.com/difference-between-one-tail-test-and-two-tail-test/ 34
Roozbeh Sanaei
Type I and Type II error
35
https://www.scribbr.com/statistics/type-i-and-type-ii-errors/
Critical
Value
Rejection Region
Non-rejection Region
Roozbeh Sanaei
t-distribution
36
https://en.wikipedia.org/wiki/Student%27s_t-distribution
𝑡 =
𝑥 − 𝜇
𝑠/ 𝑛 − 1
𝑡 ∶ Student’s t-distribution
𝑥 : Sample mean
𝜇 ∶ population mean
𝑠 ∶ sample standard deviation
𝑛 ∶ sample size
𝜈 = 𝑛 − 1 : degrees of freedom t
P(t)
Roozbeh Sanaei
Two sample t-distribution (equal underlying population variances)
37
https://en.wikipedia.org/wiki/Student%27s_t-distribution
𝑡 =
𝑥1 − 𝑥2 − Δ
𝑠𝑝/
1
𝑛1
+
1
𝑛2
𝑥 1, 𝑥 2: sample means
𝑠1, 𝑠2: sample standard deviations
𝑛1, 𝑛2: sample sizes
Δ ∶ difference between the population means
𝑠𝑝
2 =
( 𝑛1 − 1 𝑠1
2
+ ( 𝑛2 − 1 𝑠2
2
)
𝑛1 + 𝑛2 − 2
𝜈 = 𝑛1 + 𝑛2 − 2
Roozbeh Sanaei
Two sample t-distribution (different underlying population variances
38
https://en.wikipedia.org/wiki/Student%27s_t-distribution
𝑡 =
𝑥1 − 𝑥2 − Δ
𝑠𝑝/
1
𝑛1
+
1
𝑛2
𝑥 1, 𝑥 2: sample means
𝑠1, 𝑠2: sample standard deviations
𝑛1, 𝑛2: sample sizes
Δ ∶ difference between the population means
𝜈 =
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
2
𝑠1
2
𝑛1
𝑛1 − 1
+
𝑠2
2
𝑛2
𝑛2 − 1
𝑠𝑝
2
=
𝑠1
2
𝑛1
+
𝑠2
2
𝑛2
Roozbeh Sanaei
Chi square distribution
39
Chi-squared distribution is distribution of sum of squares of k
independent, standard normal variables.
𝑄 =
𝑖=1
𝑟
𝑍𝑟
This enable us to evaluate the possibility of a set of variable being independently
drawn form for a set of normal distributions.
we can standardize the variables if we know their expected probability
Roozbeh Sanaei
Chi square distribution
40
𝔼 𝑛𝑗 = 𝑛 𝑝𝑗
𝑖=1
𝑟
𝑛𝑟 = 𝑁 so we only test r-1 variables
𝑖=1
𝑟
𝑛𝑗 − 𝔼 𝑛𝑗
2
𝔼 𝑛𝑗
=
𝑖=1
𝑟
𝑛𝑗 − 𝑛 𝑝𝑗
2
𝑛 𝑝𝑗
= 𝑋𝑟−1
2
Roozbeh Sanaei
Chi square test
41
A B C D total
White
collar
𝑜0,0
90
𝑜0,1
60
𝑜0,2
104
𝑜0,2
95 349
Blue
Collar
𝑜1,0
30
𝑜1,1
50
𝑜1,2
51
𝑜1,3
20 151
No
collar
𝑜2,0
30
𝑜1,1
40
𝑜1,2
45
𝑜1,3
35
150
Total 150
15
0
200 150 N = 650
𝑒𝑖,𝑗 = 𝑝𝑖𝑝𝑗𝑁 =
𝑖 𝑜𝑖,𝑗 𝑗 𝑜𝑖,𝑗
𝑁
𝜒2 =
(𝑜𝑖,𝑗 − 𝑒𝑖,𝑗)2
𝑒𝑖,𝑗
𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1
(𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑠 − 1)
Values remains independent in absence of
categorical variable dependence.
For measuring expectation values,
we assume independence between categories
𝑖, 𝑗
Roozbeh Sanaei
𝐹 − distribution
42
𝐹∗
=
𝑆1 𝑑1
𝑆2 𝑑2
Where 𝑆1 and 𝑆2 are independent random
variables with chi-square distributions with
respective degrees of freedom 𝑑1 and 𝑑2
Roozbeh Sanaei
𝐹 − test
43
The full model
𝑌𝑖 = 𝛽0 + 𝛽1𝑥 + 𝜀𝑖
The model thought to be most appropriate
for the data
The reduced model
𝑌𝑖 = 𝛽0 + 𝜀𝑖
The model described by the null hypothesis
https://online.stat.psu.edu/stat501/lesson/6/6.2
Roozbeh Sanaei
𝐹 − test
44
𝐹∗ =
𝑆𝑆𝐸 𝑅 − 𝑆𝑆𝐸(𝐹)
𝑞
/
𝑆𝑆𝐸(𝐹)
𝑛 − (𝑘 + 1)
𝑆𝑆𝐸 𝑅 : Error sum of squares of the reduced model
𝑆𝑆𝐸 𝐹 : Error sum of squares of the full model
𝑞 : number of restrictions
𝑛 : number of observations
𝑘 : number of independent variables
Roozbeh Sanaei
Coding Systems for Categorical Variables
45
Race x1 x2 x3
Hispanic 1 0 0
Asian 0 1 0
African 0 0 1
White 0 0 0
Dummy Coding
Race x1 x2 x3
Hispanic 1 0 0
Asian 0 1 0
African 0 0 1
White -1 -1 -1
SIMPLE effect contrast coding
https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/
ANOVA Types
ANOVA Type
Dependent
variables
Independent
variable
One-way ANOVA 1 continuous
1 categorical
variable
Two-way ANOVA 1 continuous
2 or more
categorical
ANCOVA
1 continuous
1 categorical
variable and
1 continuous
One-way
MANOVA
2 or more
continuous
2 or more
categorical
Two-way
MANOVA
2 or more
continuous
1 categorical
variable and 1
continuous
46
𝑑1 = 𝐼 − 1
𝑑2 = 𝑛𝑇 − 𝐼
Roozbeh Sanaei
One Way ANOVA
47
Independent variable : Categorical Dependent variable : Quantitative
Brand of soda : Coke, Pepsi, Sprite, Fanta Price per 100ml
Levels or Treatments
𝑖=1
#𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝑦,
2
=
𝑖=1
𝑘
𝑛𝑖 𝑦𝑖,𝑗 − 𝑦,
2
+
𝑖=1
𝑘
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝑦𝑖,
2
Sum of Squares of
Treatments
Sum of Squares of
Errors
𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑆𝑆𝐸𝑟𝑟𝑜𝑟
𝐹 =
𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑑1
𝑆𝑆𝐸𝑟𝑟𝑜𝑟 𝑑2
=
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
=
𝑀𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑀𝑆𝐸𝑟𝑟𝑜𝑟
𝐼 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑛𝑇 ∶ 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
i: treatment
j: observation
Roozbeh Sanaei
One way ANOVA model and assumptions
48
𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝜖𝑖,𝑗
Observation
Common effect Treatment effect
Random error effect
Independence. The dependent variable score for
each experimental unit is independent of the score
for any other unit.
Normality. In the population, dependent variable
scores are normally distributed within treatment
groups.
Equality of variance. In the population, the
variance of dependent variable scores in each
treatment group is equal. (Equality of variance is
also known as homogeneity of
variance or homoscedasticity.)
https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf
μ is always a fixed parameter
𝜖𝑖,𝑗 are assumed to be normally and
independently distributed,
with mean zero and variance 𝜎𝜖
2
SS𝐸 = 𝑖=1
#𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠
𝑗=1
𝑛𝑖
𝑦𝑖,𝑗 − 𝜇𝑗
2
𝑑𝑆𝑆𝐸
𝑑𝜇𝑗
= 0 𝜇𝑗 = 𝑦𝑗
Roozbeh Sanaei
One way ANOVA model and assumptions
49
𝜏𝑖 are fixed parameters if the levels of the treatment are fixed and not a random
sample from a population of possible levels, It is also assumed that μ is chosen
so that
𝜏𝑖 = 0
Fixed effects model
Random effects model
𝜏𝑖 𝜖 𝑁𝐼𝐷(0, 𝜎𝜏)
F is the value of the statistic used to test whether 𝜎𝜏 = 0
𝑉𝑎𝑟 𝐿 = 𝜎2
𝑖
𝑐𝑖
2
𝑛𝑖
𝐿 =
𝑖
𝑐𝑖𝑌𝑖 ,
Roozbeh Sanaei
Linear combination of the factor level means
50
Mean Value for linear combination of factor levels
Estimating dependent variable on mixture of multiple factor levels 𝐿 = 𝑝1𝜇1 + 𝑝2𝜇2 + 𝑝3𝜇3
𝐿 =
𝑖
𝑐𝑖𝜇𝑖 𝑆𝐸 𝐿 = 𝑀𝑆𝐸
𝑖
𝑐𝑖
2
𝑛𝑖
Test 𝑯𝟎 ∶ 𝑳 = 𝑳𝟎 𝐚𝐠𝐚𝐢𝐧𝐬𝐭 𝑯𝟎 ∶ 𝑳 ≠ 𝑳𝟎
𝑡∗ =
𝐿 − 𝐿0
𝑆𝐸 𝐿
Pairwise Comparison: Comparing one level with another levels, 𝐿 = 𝜇1 − 𝜇2
Contrast: a linear combination of the factor level means such that the coefficients sum to zero, useful
when we compare one level against multiple levels, 𝐿 = 𝜇1 − 0.5 𝜇2 − 0.5 𝜇2
Roozbeh Sanaei
Multiple Comparisons Problem
51
In testing multiple linear combinations of factor level means,
Familywise Type I Error Rate (FWER) is the probability of making at least one error
among all tested linear combinations.
• Single test Type I Error : 𝐏(𝐑𝐞𝐣𝐞𝐜𝐭 𝐇𝟎|𝐂𝐨𝐧𝐟𝐢𝐫𝐦 𝐇𝟎)
• For q independent tests: 𝐅𝐖𝐄𝐑 = 𝟏 − (𝟏 − 𝛂)𝐪
FWER will depend on the number of tests and whether or not tests are independent of
one another
Roozbeh Sanaei
Bonferroni’s Correction
52
Boole’s inequality
𝐹𝑊𝐸𝑅 ≤
𝑘=1
𝑓
𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0𝑘 𝐶𝑜𝑛𝑓𝑖𝑟𝑚𝐻0𝑘 =
𝑘=1
𝑓
𝛼∗
= 𝑓 𝛼∗
Bonferroni’s Correction
𝛼∗
= 𝛼 𝑓
Major Strength
Applicable to many situations (no assumptions)
Major Strength
Overly Conservative
Sidak Correction
𝛼∗= 1 − (1 − 𝛼) (1 𝑚)
Roozbeh Sanaei
Holm’s Step-Down and Hochberg’s Step-Up Procedure
53
Holm-Bonferroni Equation
𝛼∗
= 𝛼 (𝑛 − 𝑟𝑎𝑛𝑘 + 1)
1. Order p−values in ascending way
2. Calculate critical 𝛼∗ value for each rank of the list
3. Scan forward and accept all hypothesis that exceed 𝛼∗
Holm’s Step-Down Procedure
Hochberg’s Step-Up Procedure
1. Order p−values in ascending way
2. Calculate critical 𝛼∗ value for each rank of the list
3. Scan backward and reject all hypothesis that surpass 𝛼∗
Roozbeh Sanaei
Studentized range distribution
54
1. Sample is of size n from each of 𝑘 populations with the same normal distribution 𝑁 𝜇, 𝜎2
2. 𝑦𝑚𝑖𝑛 is the smallest and 𝑦𝑚𝑎𝑥 is the largest of these sample means
3. 𝑠² is the pooled sample variance from these samples.
𝑞 =
𝑦𝑚𝑎𝑥 − 𝑦𝑚𝑖𝑛
𝑠 𝑛
Follows the studentized range distribution
Studentized range statistic
Roozbeh Sanaei
Tukey's range test
55
ANOVA only tests if independent variable level significantly changes the dependent variable, It does not
figure out between which pair of the levels cause this significant change lies.
In turkey range test we compare mean of dependent variable for each two levels against each other
using Studentized range statistic.
𝑉(𝐿) = 𝜎2
1
𝑛𝑗
+
1
𝑛𝑘
𝐿 = 𝜇𝑖 − 𝜇𝑗
𝑞∗ =
2𝐿
𝑉 (𝐿)
We reject the mean difference between two levels if it goes beyond than Studentized range distribution
for certain k=#levels and df = #experiments-k
Roozbeh Sanaei
Two Way ANOVA
56
Independent variables : Categorical Dependent variable : Quantitative
Fertilizer Type: A, B, C
Planting Density: Low, High
Levels or Treatments
Final crop yield in bushels per acre at harvest time
𝑆𝑆 𝐴 = 𝑟𝑏
𝑖=1
𝑎
𝑦𝑖, , − 𝑦,,
2
𝑆𝑆 𝐵 = 𝑟𝑎
𝑖=1
𝑏
𝑦,𝑗, − 𝑦,,
2
𝑆𝑆 𝐴𝐵 = 𝑟
𝑖=1
𝑎
𝑖=1
𝑏
𝑦𝑖,𝑗, − 𝑦𝑖,, − 𝑦,𝑗, + 𝑦,,
2
𝑆𝑆 𝐸 =
𝑖=1
𝑟
𝑖=1
𝑎
𝑗=1
𝑏
𝑦𝑖,𝑗,𝑘 − 𝑦𝑖,𝑗,
2
𝑆𝑆 𝐸 =
𝑖=1
𝑟
𝑖=1
𝑎
𝑗=1
𝑏
𝑦𝑖,𝑗,𝑘 − 𝑦,,
2
Source SS df Mean Square
Factor A SS(A) (a-1) SS(A)
𝑎 − 1
Factor B SS(B) (b-1) SS(B)
𝑎 − 1
Interaction SS(AB) (a-1)(b-1) SS(AB)
(𝑎 − 1)(𝑏 − 1)
Error SSE (N-ab) SSE
(𝑁 − 𝑎𝑏)
Total SS(Total) (N-1)
i: treatment
j: observation
Roozbeh Sanaei
Two Way ANOVA Model and Parameter Estimation
57
𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝛽𝑗 + 𝛾𝑖,𝑗 + 𝜖𝑖,𝑗,𝑘
Observation
Common effect
i-th Treatment effect of A
Random error effect
https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf
j-th Treatment effect of B
Interaction between i-th Treatment effect of A
and j-th Treatment effect of B

More Related Content

PDF
Random Variable and Probability Distribution
PPTX
Normal Distribution – Introduction and Properties
PPTX
Random variables
PPTX
Statistics & probability
PPTX
Probability distribution for Dummies
PDF
Discrete probability distribution (complete)
PDF
Concept of probability
PPTX
law of large number and central limit theorem
Random Variable and Probability Distribution
Normal Distribution – Introduction and Properties
Random variables
Statistics & probability
Probability distribution for Dummies
Discrete probability distribution (complete)
Concept of probability
law of large number and central limit theorem

What's hot (20)

PPT
multiple regression
PPT
Probability Distribution
PPTX
Probability Distribution
PPTX
The Central Limit Theorem
PPTX
Pearson's correlation
PPTX
Lecture 4 The Normal Distribution.pptx
PPTX
Probability Distribution
PPT
Statistics-Measures of dispersions
PPT
Probability distribution
PPTX
Measures of-central-tendency
PPT
Chapter 4 - multiple regression
PPT
One way anova
PPT
probability
PPTX
Probability Distribution
PPTX
Central limit theorem
PPT
Lecture-3 Probability and probability distribution.ppt
PPTX
Statistical Estimation
PPT
Regression analysis ppt
PDF
Multicollinearity1
PPTX
Measures of central tendency ppt
multiple regression
Probability Distribution
Probability Distribution
The Central Limit Theorem
Pearson's correlation
Lecture 4 The Normal Distribution.pptx
Probability Distribution
Statistics-Measures of dispersions
Probability distribution
Measures of-central-tendency
Chapter 4 - multiple regression
One way anova
probability
Probability Distribution
Central limit theorem
Lecture-3 Probability and probability distribution.ppt
Statistical Estimation
Regression analysis ppt
Multicollinearity1
Measures of central tendency ppt
Ad

Similar to Probability and Statistics (20)

PPTX
Unit II PPT.pptx
PDF
PTSP PPT.pdf
PDF
Statistics (recap)
PDF
Actuarial Science Reference Sheet
PPTX
Basic statistics for algorithmic trading
PDF
STAB52 Introduction to probability (Summer 2025) Lecture 1
PDF
Statistics And Exploratory Data Analysis
PDF
Refresher probabilities-statistics
PPTX
Data Distribution &The Probability Distributions
PDF
Statistical inference: Probability and Distribution
PPTX
Probability distributionv1
PDF
Chapter 1 - Probability Distributions.pdf
PPTX
Econometrics 2.pptx
PDF
Different types of distributions
PPT
Discrete probability
PPTX
GENMATH 11 - COMPOSITION OF FUNCTION PPT
PDF
Prob distros
PDF
Lecture 1,2 maths presentation slides.pdf
PPTX
5. RV and Distributions.pptx
PPTX
Statistical Analysis with R- III
Unit II PPT.pptx
PTSP PPT.pdf
Statistics (recap)
Actuarial Science Reference Sheet
Basic statistics for algorithmic trading
STAB52 Introduction to probability (Summer 2025) Lecture 1
Statistics And Exploratory Data Analysis
Refresher probabilities-statistics
Data Distribution &The Probability Distributions
Statistical inference: Probability and Distribution
Probability distributionv1
Chapter 1 - Probability Distributions.pdf
Econometrics 2.pptx
Different types of distributions
Discrete probability
GENMATH 11 - COMPOSITION OF FUNCTION PPT
Prob distros
Lecture 1,2 maths presentation slides.pdf
5. RV and Distributions.pptx
Statistical Analysis with R- III
Ad

Recently uploaded (20)

PPTX
Managing Community Partner Relationships
PDF
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
PPTX
Business_Capability_Map_Collection__pptx
PPTX
Introduction to Inferential Statistics.pptx
PPTX
Leprosy and NLEP programme community medicine
DOCX
Factor Analysis Word Document Presentation
PDF
Optimise Shopper Experiences with a Strong Data Estate.pdf
PDF
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
PPTX
Steganography Project Steganography Project .pptx
PPTX
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
PDF
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
PPTX
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
SET 1 Compulsory MNH machine learning intro
PDF
Microsoft Core Cloud Services powerpoint
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PPTX
Pilar Kemerdekaan dan Identi Bangsa.pptx
PDF
Transcultural that can help you someday.
Managing Community Partner Relationships
Votre score augmente si vous choisissez une catégorie et que vous rédigez une...
Business_Capability_Map_Collection__pptx
Introduction to Inferential Statistics.pptx
Leprosy and NLEP programme community medicine
Factor Analysis Word Document Presentation
Optimise Shopper Experiences with a Strong Data Estate.pdf
OneRead_20250728_1808.pdfhdhddhshahwhwwjjaaja
Steganography Project Steganography Project .pptx
DS-40-Pre-Engagement and Kickoff deck - v8.0.pptx
Systems Analysis and Design, 12th Edition by Scott Tilley Test Bank.pdf
FMIS 108 and AISlaudon_mis17_ppt_ch11.pptx
A Complete Guide to Streamlining Business Processes
SET 1 Compulsory MNH machine learning intro
Microsoft Core Cloud Services powerpoint
[EN] Industrial Machine Downtime Prediction
Copy of 16 Timeline & Flowchart Templates – HubSpot.pptx
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Pilar Kemerdekaan dan Identi Bangsa.pptx
Transcultural that can help you someday.

Probability and Statistics

  • 1. Roozbeh Sanaei Probability and Statistics Introduction to The Jelly-Bean Jar Game, Debra James
  • 2. https://www.robbimack.com/blog-entries/2016/11/25/the-jelly-bean-jar-game Roozbeh Sanaei Probability • Obtaining likelihood of an event given parameters of the models • Knowing the proportion of each color  the probability of drawing a red jellybean. • One correct answer Statistics • Obtaining parameters of the model given a sample • Sampling from the jar  proportion of red jellybeans • No single correct answer, depending on assumptions Probability vs Statistics 1
  • 3. Roozbeh Sanaei Bayesian • Probability is an abstract concept that measures a state of knowledge or a degree of belief in each proposition. • Bayes rule holds in any valid probability space Frequentist • Probability measures the frequency of various outcomes of an experiment. • Frequentist definition of probability is a special case Frequentist vs. Bayesian 2
  • 5. Roozbeh Sanaei 4 Inclusion-exclusion principle A B 𝐴 ∪ 𝐵 = 𝐴 + 𝐵 − 𝐴 ∩ 𝐵 Rule of Sum If there are n possible ways to do something, and m possible ways to do another thing, and the two things can't both be done, then there are n + m total possible ways to do one of the things.
  • 6. Roozbeh Sanaei Combination and Permutation Permutation Number of ways for selection and of k items from n items in which order matters. Combination Number of ways for selection of k items from n items in which order does not matter. 5
  • 7. Roozbeh Sanaei Rules of Probability A A’ 𝑃(𝐴′ ) = 1 − 𝑃(𝐴) A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃(𝐵) A B 𝑃 𝐴 ∪ 𝐵 = 𝑃 𝐴 + 𝑃 𝐵 − 𝑃(𝐴 ∩ 𝐵) 𝑃 𝐴 = 𝑆(𝐴) 𝑆(𝑈) = 𝛼 𝑆(𝐴) 6
  • 8. Roozbeh Sanaei 7 Conditional Probability P(A) P(B) 𝑃(𝐴 ∩ 𝐵) 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∩ 𝐵 𝑃 𝐵 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃 𝐵
  • 9. Roozbeh Sanaei 8 Law of Total Probability B1 B2 𝑃 𝐴 = 𝑃 𝐴 𝐵1 𝑃 𝐵1 + 𝑃 𝐴 𝐵2 𝑃 𝐵2 + 𝑃 𝐴 𝐵3 𝑃 𝐵3 B3 A
  • 10. Roozbeh Sanaei Kolmogorov axioms 1. P A ≥ 0 2. P S = 1 3. If A⋂𝐵 = ∅ → P A ∪ 𝐵 = P A + P(B) ∀ 𝐴(𝑒𝑣𝑒𝑛𝑡) ⊂ 𝑆(𝑠𝑎𝑚𝑝𝑙𝑒 𝑆𝑝𝑎𝑐𝑒) Supplemented by two definitions P 𝐴 𝐵 = 𝑃(𝐴 ∩ 𝐵) 𝑃(𝐵) if A and B are independent 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝑃(𝐵) 9
  • 11. Roozbeh Sanaei 10 𝑃 𝐵 𝐴 = 𝑃 𝐴 𝐵 𝑃 𝐵 𝑃(𝐴) Bayes Theorem P(A) P(B) 𝑃(𝐴 ∩ 𝐵) 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐵 𝐴 𝑃 𝐴 = 𝑃 𝐴 𝐵 𝑃 𝐵
  • 12. Roozbeh Sanaei 11 Terminology Experiment: a repeatable procedure with well-defined possible outcomes. Toss the coin twice, report if it lands heads or tails each time. Sample space: the set of all possible outcomes. We usually denote the sample space by Ω, sometimes by S. Ω = {HH,HT,TH,TT}. Event: a subset of the sample space. Σ ={HH, HT} Probability function: a function giving the probability for each outcome. Each outcome is equally likely with probability ¼ Random variable: A random variable is a function from the sample space to the real numbers. 𝑋: 𝑆 → ℝ We can define a random variable X whose value is the number of observed heads. The value of X will be one of 0,1,2
  • 13. Roozbeh Sanaei 12 Probability functions 𝑃 𝑋 = 𝑥 = 𝑃𝑋 𝑥 = 𝑝(𝑥) Probability Mass Function Probability Density Function 𝑃 𝑎 < 𝑋 < 𝑏 𝑥 = 𝑎 𝑥 = 𝑏 𝑃 𝑋 ≤ 𝑎 Cumulative distribution function Percent Point Function https://www.itl.nist.gov/div898/handbook/eda/section3/eda362.htm 𝑃 𝑋 = 𝑎 𝑑𝑃 𝑥 ≤ 𝑎 /𝑑𝑥 𝑃 −1 𝑋 ≤ 𝑎
  • 14. Roozbeh Sanaei 13 Properties of and 𝑝𝑑𝑓(𝑥) ≥ 0 −∞ +∞ 𝑝𝑑𝑓 𝑥 = 1 𝑝𝑑𝑓 𝑐𝑑𝑓 𝑐𝑑𝑓(𝑥) ≥ 0 0 < 𝑐𝑑𝑓 𝑥 < 1 𝑖𝑓 𝑎 > 𝑏 ⇒ 𝑐𝑑𝑓 𝑎 > 𝑐𝑑𝑓(𝑏) lim 𝑥⟶∞ 𝑐𝑑𝑓(𝑥) = 1 lim 𝑥⟶−∞ 𝑐𝑑𝑓(𝑥) = 0 𝑃 𝑎 ≤ 𝑥 ≤ 𝑏 = 𝑐𝑑𝑓 𝑏 − 𝑐𝑑𝑓(𝑎) 𝑐𝑑𝑓′ (𝑥) = 𝑝𝑑𝑓(𝑥)
  • 16. Roozbeh Sanaei 15 Bernoulli Distribution Models one trial in an experiment that can result in either success or failure. We will write 𝑋~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 𝑝 or 𝐵𝑒𝑟, which is read “𝑋 follows a Bernoulli distribution with parameter 𝑝“ or “𝑋 is drawn from a Bernoulli distribution with parameter 𝑝". 𝑃 𝑋 = 𝑥 = 1 − 𝑝 𝑖𝑓 𝑥 = 0 𝑝 𝑖𝑓 𝑥 = 1 𝑃 𝑋 ≤ 𝑘 = 0 𝑖𝑓 𝑘 < 0 𝑞 𝑖𝑓 0 ≤ 𝑘 < 1 1 𝑘 ≥ 1
  • 17. Roozbeh Sanaei 16 Binomial Distribution 𝑃 𝑋 = 𝑘 = #𝑜𝑓 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡𝑠 𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 × 𝑝𝑟𝑜𝑏𝑎𝑏𝑙𝑖𝑡𝑦 𝑜𝑓𝑎𝑛 𝑎𝑟𝑟𝑎𝑛𝑔𝑒𝑚𝑒𝑛𝑡 𝑤𝑖𝑡ℎ 𝑘 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 Models the number of successes in 𝑛 independent 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) trials 𝑃 𝑋 = 𝑘 = 𝑛 𝑘 𝑝𝑘 𝑞𝑛−𝑘 𝑃 𝑋 ≤ 𝑘 = 𝐼𝑞 𝑛 − 𝑘, 1 + 𝑘
  • 18. Roozbeh Sanaei 17 Geometric Distribution Models the number of tails before the first head in a sequence of coin flips 𝑃 𝑋 = 𝑘 = (1 − 𝑝)𝑘 𝑝 𝑃 𝑋 ≤ 𝑘 = 1 − (1 − 𝑝)𝑘+1
  • 19. Roozbeh Sanaei 18 Uniform Distribution Models the situation where all the outcomes between certain bounds are equally likely 𝑃 𝑋 = 𝑥 = 1 𝑏 − 𝑎 𝑖𝑓 𝑥 ∈ 𝑎, 𝑏 0 𝑂𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 𝑃 𝑋 ≤ 𝑥 = 0 𝑖𝑓 𝑥 < 𝑎 𝑥 − 𝑎 𝑏 − 𝑎 𝑖𝑓 𝑥 ∈ 𝑎, 𝑏 1 𝑖𝑓 𝑥 > 𝑏
  • 20. Roozbeh Sanaei 19 Poisson Distribution The Poisson distribution is a limiting case of the binomial distribution when 𝑛 ⟶ ∞ while 𝜆 = 𝑛 × 𝑝 remains constant It means running the Bernoulli trials faster and faster rate but with a smaller and smaller success probability Poisson distribution is memoryless: 𝑃 𝑇 > 𝑡 + 𝑠 𝑇 > 𝑠 = 𝑃(𝑇 > 𝑠) 𝑃 𝑋 = 𝑥 = lim 𝑛→∞ 𝑛 𝑘 𝑝𝑘𝑞𝑛−𝑘 = 𝜆𝑘𝑒−𝜆 𝑘! 𝑃 𝑋 ≤ 𝑎 = 𝑒−𝜆 𝑖=0 𝑘 𝜆𝑘 𝑘!
  • 21. Roozbeh Sanaei Exponential Distribution While the Poisson distribution deals with the number of occurrences in a fixed period of time, the exponential distribution deals with the time between occurrences of successive events as time flows by continuously. 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0 = 𝑃 𝑁𝜏 = 0 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 = 0 𝑃 𝑁𝜏 = 0 = 𝑒−𝜆𝜏 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 = 1 − 𝑒−𝜆𝜏 𝑃 𝑁𝑡+𝜏 − 𝑁𝑡 > 0 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑛𝑒𝑤 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑎𝑡 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 𝜏 𝑀𝑒𝑚𝑜𝑟𝑦𝑙𝑒𝑠𝑠 𝑃𝑟𝑜𝑝𝑒𝑟𝑡𝑦 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛 20
  • 22. Roozbeh Sanaei Expected Value and Variance Expected Value Variance Mean, Average 𝐸 𝑋 , 𝜇 𝑉𝑎𝑟 𝑋 , 𝜎2 𝐸 𝑋 = 𝑗=1 𝑛 𝑥𝑗 𝑝 𝑥𝑗 𝐸 𝑋 − 𝜇 2 = 𝑗=1 𝑛 𝑥𝑗 − 𝜇 2 𝑝 𝑥𝑗 𝐸 𝑎𝑋 + 𝑏 = 𝑎 𝐸 𝑋 + 𝑏 𝑉𝑎𝑟 𝑎𝑋 + 𝑏 = 𝑎2 𝑉𝑎𝑟 𝑋 𝐸 𝑋 + 𝑌 = 𝐸 𝑋 + 𝐸(𝑌) 𝑉𝑎𝑟 𝑋 + 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌) 𝐸 ℎ(𝑋) = 𝑗 ℎ 𝑥𝑗 𝑝(𝑥𝑗) 𝑖𝑓 𝑋 ⊥ 𝑌 → 𝐸 𝑋𝑌 = 𝐸 𝑋 𝐸 𝑌 𝑖𝑓 𝑋 ⊥ 𝑌 → 𝑉𝑎𝑟 𝑋𝑌 = 𝑉𝑎𝑟 𝑋 𝑉𝑎𝑟 𝑌 + 𝑉𝑎𝑟 𝑋 𝐸 𝑌 2 + 𝑉𝑎𝑟 𝑌 𝐸 𝑋 2 21
  • 23. Roozbeh Sanaei Expected Value and Variance Distribution range X pmf 𝑝(𝑥) Expected Value Variance 𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖(𝑝) 0,1 𝑝 0 = 1 − 𝑝, 𝑝 1 = 𝑝 𝑝 𝑝(1 − 𝑝) 𝐵𝑖𝑜𝑛𝑜𝑚𝑖𝑎𝑙(𝑛, 𝑝) 0,1,…, n 𝑛 𝑘 𝑝𝑘 (1 − 𝑝)𝑛−𝑘 𝑛𝑝 𝑛 𝑝 (1 − 𝑝) 𝑈𝑛𝑖𝑓𝑜𝑟𝑚(𝑛) 1,2,…, n 1 𝑛 𝑛 + 1 2 𝑛2 − 1 12 𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐(𝑝) 0,1,2,…. 𝑝(1 − 𝑝)𝑘 1 − 𝑝 𝑝 1 − 𝑝 𝑝2 22
  • 24. Roozbeh Sanaei Transformation of random variables 𝑋~𝑈𝑛𝑖𝑓𝑜𝑟𝑚 0,2 𝑟𝑎𝑛𝑔𝑒, 𝑝𝑑𝑓 and 𝑐𝑑𝑓 of Y = 𝑋2? 𝑃 𝑌 ≤ 𝑦 = 𝑃 𝑋2 ≤ 𝑦 = 𝑃 𝑋 ≤ 𝑦 = 𝑦 2 ⇒ 𝑃 𝑌 = 𝑦 = 1 4 𝑦 𝑦 = 𝑥2 ⇒ 𝑑𝑦 = 2𝑥𝑑𝑥 ⟹ 𝑑𝑥 = 𝑑𝑦 2 𝑦 ⇒ 𝑃 𝑌 = 𝑦 = 1 4 𝑦 Distribution function technique: Method of Transformation: 23
  • 25. Roozbeh Sanaei Joint pdf 𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏, 𝑌𝑎 ≤ 𝑦 ≤ 𝑌𝑏 = 𝑋𝑎 𝑋𝑏 𝑌𝑎 𝑌𝑏 𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 𝑃 𝑋𝑎 ≤ 𝑥 ≤ 𝑋𝑏 = 𝑋𝑎 𝑋𝑏 𝑝𝑑𝑓 𝑥, 𝑦 𝑑𝑥𝑑𝑦 𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌) 𝑝 𝑥𝑖, 𝑦𝑖 = 𝑝𝑋(𝑥𝑖)𝑝𝑌(𝑦𝑖) 𝑝𝑑𝑓 𝑋, 𝑌 = 𝑝𝑑𝑓𝑋(𝑋)𝑝𝑑𝑓𝑌(𝑌) https://www.researchgate.net/publication/320182941_Bayesian_tracking_of_multiple_point_targets_using_Expectation_Maximization Joint pdf Marginal pdf Independence 24
  • 26. Roozbeh Sanaei Covariance 𝐶𝑜𝑣 𝑥, 𝑦 = 𝐸((𝑋 − 𝜇𝑥)(𝑌 − 𝜇𝑦)) 𝐶𝑜𝑣 𝑎𝑋 + 𝑏, 𝑐𝑌 + 𝑑 = 𝑎𝑐 𝐶𝑜𝑣(𝑋, 𝑌) 𝐶𝑜𝑣 𝑋1 + 𝑋2, 𝑌 = 𝐶𝑜𝑣 𝑋1, 𝑌 + 𝐶𝑜𝑣(𝑋2, 𝑌) 𝐶𝑜𝑣 𝑋, 𝑋 = 𝑉𝑎𝑟(𝑋) 𝐶𝑜𝑣 𝑋, 𝑌 = 𝐸 𝑋𝑌 − 𝜇𝑥𝜇𝑦 𝑉𝑎𝑟 𝑋, 𝑌 = 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 𝐶𝑜𝑣 𝑋, 𝑌 = 𝑋𝑎 𝑋𝑏 𝑌𝑎 𝑌𝑏 𝑝𝑑𝑓 𝑥, 𝑦 𝑥 𝑦 𝑑𝑥 𝑑𝑦 − 𝜇𝑥𝜇𝑦 25
  • 27. Moderate Positive Strong Positive Perfect Positive Correlation Roozbeh Sanaei Pearson Correlation https://stats.libretexts.org/Courses/Highline_College/Book%3A_Statistics_Using_Technology_(Kozak)/10%3A_Regression_and_Correlation/10.02%3A_Correlation Negative No Correlation Non-linear Correlation 26 𝐶𝑜𝑟 𝑋, 𝑌 = 𝜌 = 𝐶𝑜𝑣 𝑋, 𝑌 𝜎𝑋𝜎𝑌 1. 𝜌 is the covariance of the standardizations of X and Y 2. 𝜌 is dimensionless and −1 ≤ 𝜌 ≤ +1 𝜌 = +1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 > 0 𝜌 = −1 ⇔ 𝑌 = 𝑎𝑋 + 𝑏 𝑤𝑖𝑡ℎ 𝑎 < 0
  • 29. • In Maximum likelihood estimation we find the most optimal set of values for model parameters 𝜃𝑀𝐿 • In Bayesian learning, we assign a probability to each set of model parameters • We update parameter probabilities form prior probabilities to posterior by multiplying them in likelihood of data given each set of parameters • Since probabilities no longer sum up to “1” after multiplication, we need to normalize them back by dividing them to the sum of the obtained values. Roozbeh Sanaei Bayesian Learning 28 𝑝(𝜃) 𝑝 𝑋 𝜃 𝑝 𝑋 𝜃 https://psyarxiv.com/w5vbp/
  • 30. Median Roozbeh Sanaei Median, Quantiles, Quartiles, Decile, Percentile 𝑃 𝑋 = 𝑚𝑒𝑑𝑖𝑎𝑛 = 0.5 Pth Quantile 𝑃 𝑋 = 𝑞𝑝 = 𝑝 Pth Percentile 𝑃 𝑋 = 𝑞𝑝/100 = 𝑝/100 Decile 𝑃 𝑋 = 𝑞𝑝/10 = 𝑝/10 https://prepnuggets.com/glossary/quantile/ 29
  • 31. Roozbeh Sanaei Law of large numbers With high probability the density histogram of a large number of samples from a distribution is a good approximation of the graph of the underlying pdf f(x) ∀𝑎∈ℝ lim 𝑛→∞ 𝑃 𝑋𝑛 − 𝜇 < 𝑎 = 1 https://plotly.com/chart-studio-help/histogram/ 30
  • 32. Roozbeh Sanaei Central Limit Theorem The central limit theorem states that if you sufficiently select random samples from a population with mean μ and standard deviation 𝜎 , then the distribution of the sample means will be approximately normally distributed with mean μ and standard deviation 𝜎 𝑛 https://towardsdatascience.com/central-limit-theorem-a-real-life-application-f638657686e1 31
  • 33. Roozbeh Sanaei z-distribution 𝑧 = 𝑥 − 𝜇 𝜎 https://www.researchgate.net/publication/296695387_Developing_a_Geospatial_Protocol_for_Coral_Epizootiology 32 𝑧 = 𝑝 − 𝑝0 𝑝0 1 − 𝑝0 𝑛 𝑥 : Sample mean value 𝜇: population mean value 𝑝 : sample proportion 𝑝0: population proportion 𝑛: sample size
  • 34. Roozbeh Sanaei Two sample Z-distribution 33 https://en.wikipedia.org/wiki/Student%27s_t-distribution 𝑧 = 𝑥1 − 𝑥2 − Δ 𝜎1 2 𝑛1 + 𝜎2 2 𝑛2 𝑥1 , 𝑥2 ∶ means of two samples Δ ∶ difference between the population means 𝜎1, 𝜎2: populations standard deviations 𝑛1, 𝑛2: sample sizes t
  • 35. Roozbeh Sanaei p-value P-value P-value P-value (doubled) 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 the null hypothesis true ≠ 𝑃 he null hypothesis true 𝑂𝑏𝑡𝑎𝑖𝑛𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑎𝑚𝑝𝑙𝑒 𝑟𝑒𝑠𝑢𝑙𝑡 https://www.fromthegenesis.com/difference-between-one-tail-test-and-two-tail-test/ 34
  • 36. Roozbeh Sanaei Type I and Type II error 35 https://www.scribbr.com/statistics/type-i-and-type-ii-errors/ Critical Value Rejection Region Non-rejection Region
  • 37. Roozbeh Sanaei t-distribution 36 https://en.wikipedia.org/wiki/Student%27s_t-distribution 𝑡 = 𝑥 − 𝜇 𝑠/ 𝑛 − 1 𝑡 ∶ Student’s t-distribution 𝑥 : Sample mean 𝜇 ∶ population mean 𝑠 ∶ sample standard deviation 𝑛 ∶ sample size 𝜈 = 𝑛 − 1 : degrees of freedom t P(t)
  • 38. Roozbeh Sanaei Two sample t-distribution (equal underlying population variances) 37 https://en.wikipedia.org/wiki/Student%27s_t-distribution 𝑡 = 𝑥1 − 𝑥2 − Δ 𝑠𝑝/ 1 𝑛1 + 1 𝑛2 𝑥 1, 𝑥 2: sample means 𝑠1, 𝑠2: sample standard deviations 𝑛1, 𝑛2: sample sizes Δ ∶ difference between the population means 𝑠𝑝 2 = ( 𝑛1 − 1 𝑠1 2 + ( 𝑛2 − 1 𝑠2 2 ) 𝑛1 + 𝑛2 − 2 𝜈 = 𝑛1 + 𝑛2 − 2
  • 39. Roozbeh Sanaei Two sample t-distribution (different underlying population variances 38 https://en.wikipedia.org/wiki/Student%27s_t-distribution 𝑡 = 𝑥1 − 𝑥2 − Δ 𝑠𝑝/ 1 𝑛1 + 1 𝑛2 𝑥 1, 𝑥 2: sample means 𝑠1, 𝑠2: sample standard deviations 𝑛1, 𝑛2: sample sizes Δ ∶ difference between the population means 𝜈 = 𝑠1 2 𝑛1 + 𝑠2 2 𝑛2 2 𝑠1 2 𝑛1 𝑛1 − 1 + 𝑠2 2 𝑛2 𝑛2 − 1 𝑠𝑝 2 = 𝑠1 2 𝑛1 + 𝑠2 2 𝑛2
  • 40. Roozbeh Sanaei Chi square distribution 39 Chi-squared distribution is distribution of sum of squares of k independent, standard normal variables. 𝑄 = 𝑖=1 𝑟 𝑍𝑟
  • 41. This enable us to evaluate the possibility of a set of variable being independently drawn form for a set of normal distributions. we can standardize the variables if we know their expected probability Roozbeh Sanaei Chi square distribution 40 𝔼 𝑛𝑗 = 𝑛 𝑝𝑗 𝑖=1 𝑟 𝑛𝑟 = 𝑁 so we only test r-1 variables 𝑖=1 𝑟 𝑛𝑗 − 𝔼 𝑛𝑗 2 𝔼 𝑛𝑗 = 𝑖=1 𝑟 𝑛𝑗 − 𝑛 𝑝𝑗 2 𝑛 𝑝𝑗 = 𝑋𝑟−1 2
  • 42. Roozbeh Sanaei Chi square test 41 A B C D total White collar 𝑜0,0 90 𝑜0,1 60 𝑜0,2 104 𝑜0,2 95 349 Blue Collar 𝑜1,0 30 𝑜1,1 50 𝑜1,2 51 𝑜1,3 20 151 No collar 𝑜2,0 30 𝑜1,1 40 𝑜1,2 45 𝑜1,3 35 150 Total 150 15 0 200 150 N = 650 𝑒𝑖,𝑗 = 𝑝𝑖𝑝𝑗𝑁 = 𝑖 𝑜𝑖,𝑗 𝑗 𝑜𝑖,𝑗 𝑁 𝜒2 = (𝑜𝑖,𝑗 − 𝑒𝑖,𝑗)2 𝑒𝑖,𝑗 𝑘 = 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑟𝑜𝑤𝑠 − 1 (𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑙𝑠 − 1) Values remains independent in absence of categorical variable dependence. For measuring expectation values, we assume independence between categories 𝑖, 𝑗
  • 43. Roozbeh Sanaei 𝐹 − distribution 42 𝐹∗ = 𝑆1 𝑑1 𝑆2 𝑑2 Where 𝑆1 and 𝑆2 are independent random variables with chi-square distributions with respective degrees of freedom 𝑑1 and 𝑑2
  • 44. Roozbeh Sanaei 𝐹 − test 43 The full model 𝑌𝑖 = 𝛽0 + 𝛽1𝑥 + 𝜀𝑖 The model thought to be most appropriate for the data The reduced model 𝑌𝑖 = 𝛽0 + 𝜀𝑖 The model described by the null hypothesis https://online.stat.psu.edu/stat501/lesson/6/6.2
  • 45. Roozbeh Sanaei 𝐹 − test 44 𝐹∗ = 𝑆𝑆𝐸 𝑅 − 𝑆𝑆𝐸(𝐹) 𝑞 / 𝑆𝑆𝐸(𝐹) 𝑛 − (𝑘 + 1) 𝑆𝑆𝐸 𝑅 : Error sum of squares of the reduced model 𝑆𝑆𝐸 𝐹 : Error sum of squares of the full model 𝑞 : number of restrictions 𝑛 : number of observations 𝑘 : number of independent variables
  • 46. Roozbeh Sanaei Coding Systems for Categorical Variables 45 Race x1 x2 x3 Hispanic 1 0 0 Asian 0 1 0 African 0 0 1 White 0 0 0 Dummy Coding Race x1 x2 x3 Hispanic 1 0 0 Asian 0 1 0 African 0 0 1 White -1 -1 -1 SIMPLE effect contrast coding https://stats.oarc.ucla.edu/spss/faq/coding-systems-for-categorical-variables-in-regression-analysis/
  • 47. ANOVA Types ANOVA Type Dependent variables Independent variable One-way ANOVA 1 continuous 1 categorical variable Two-way ANOVA 1 continuous 2 or more categorical ANCOVA 1 continuous 1 categorical variable and 1 continuous One-way MANOVA 2 or more continuous 2 or more categorical Two-way MANOVA 2 or more continuous 1 categorical variable and 1 continuous 46
  • 48. 𝑑1 = 𝐼 − 1 𝑑2 = 𝑛𝑇 − 𝐼 Roozbeh Sanaei One Way ANOVA 47 Independent variable : Categorical Dependent variable : Quantitative Brand of soda : Coke, Pepsi, Sprite, Fanta Price per 100ml Levels or Treatments 𝑖=1 #𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑗=1 𝑛𝑖 𝑦𝑖,𝑗 − 𝑦, 2 = 𝑖=1 𝑘 𝑛𝑖 𝑦𝑖,𝑗 − 𝑦, 2 + 𝑖=1 𝑘 𝑗=1 𝑛𝑖 𝑦𝑖,𝑗 − 𝑦𝑖, 2 Sum of Squares of Treatments Sum of Squares of Errors 𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑆𝑆𝐸𝑟𝑟𝑜𝑟 𝐹 = 𝑆𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑑1 𝑆𝑆𝐸𝑟𝑟𝑜𝑟 𝑑2 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑤𝑖𝑡ℎ𝑖𝑛 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 = 𝑀𝑆𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑀𝑆𝐸𝑟𝑟𝑜𝑟 𝐼 ∶ 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑛𝑇 ∶ 𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑎𝑠𝑒𝑠
  • 49. i: treatment j: observation Roozbeh Sanaei One way ANOVA model and assumptions 48 𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝜖𝑖,𝑗 Observation Common effect Treatment effect Random error effect Independence. The dependent variable score for each experimental unit is independent of the score for any other unit. Normality. In the population, dependent variable scores are normally distributed within treatment groups. Equality of variance. In the population, the variance of dependent variable scores in each treatment group is equal. (Equality of variance is also known as homogeneity of variance or homoscedasticity.) https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf μ is always a fixed parameter 𝜖𝑖,𝑗 are assumed to be normally and independently distributed, with mean zero and variance 𝜎𝜖 2 SS𝐸 = 𝑖=1 #𝑡𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑠 𝑗=1 𝑛𝑖 𝑦𝑖,𝑗 − 𝜇𝑗 2 𝑑𝑆𝑆𝐸 𝑑𝜇𝑗 = 0 𝜇𝑗 = 𝑦𝑗
  • 50. Roozbeh Sanaei One way ANOVA model and assumptions 49 𝜏𝑖 are fixed parameters if the levels of the treatment are fixed and not a random sample from a population of possible levels, It is also assumed that μ is chosen so that 𝜏𝑖 = 0 Fixed effects model Random effects model 𝜏𝑖 𝜖 𝑁𝐼𝐷(0, 𝜎𝜏) F is the value of the statistic used to test whether 𝜎𝜏 = 0
  • 51. 𝑉𝑎𝑟 𝐿 = 𝜎2 𝑖 𝑐𝑖 2 𝑛𝑖 𝐿 = 𝑖 𝑐𝑖𝑌𝑖 , Roozbeh Sanaei Linear combination of the factor level means 50 Mean Value for linear combination of factor levels Estimating dependent variable on mixture of multiple factor levels 𝐿 = 𝑝1𝜇1 + 𝑝2𝜇2 + 𝑝3𝜇3 𝐿 = 𝑖 𝑐𝑖𝜇𝑖 𝑆𝐸 𝐿 = 𝑀𝑆𝐸 𝑖 𝑐𝑖 2 𝑛𝑖 Test 𝑯𝟎 ∶ 𝑳 = 𝑳𝟎 𝐚𝐠𝐚𝐢𝐧𝐬𝐭 𝑯𝟎 ∶ 𝑳 ≠ 𝑳𝟎 𝑡∗ = 𝐿 − 𝐿0 𝑆𝐸 𝐿 Pairwise Comparison: Comparing one level with another levels, 𝐿 = 𝜇1 − 𝜇2 Contrast: a linear combination of the factor level means such that the coefficients sum to zero, useful when we compare one level against multiple levels, 𝐿 = 𝜇1 − 0.5 𝜇2 − 0.5 𝜇2
  • 52. Roozbeh Sanaei Multiple Comparisons Problem 51 In testing multiple linear combinations of factor level means, Familywise Type I Error Rate (FWER) is the probability of making at least one error among all tested linear combinations. • Single test Type I Error : 𝐏(𝐑𝐞𝐣𝐞𝐜𝐭 𝐇𝟎|𝐂𝐨𝐧𝐟𝐢𝐫𝐦 𝐇𝟎) • For q independent tests: 𝐅𝐖𝐄𝐑 = 𝟏 − (𝟏 − 𝛂)𝐪 FWER will depend on the number of tests and whether or not tests are independent of one another
  • 53. Roozbeh Sanaei Bonferroni’s Correction 52 Boole’s inequality 𝐹𝑊𝐸𝑅 ≤ 𝑘=1 𝑓 𝑃 𝑅𝑒𝑗𝑒𝑐𝑡 𝐻0𝑘 𝐶𝑜𝑛𝑓𝑖𝑟𝑚𝐻0𝑘 = 𝑘=1 𝑓 𝛼∗ = 𝑓 𝛼∗ Bonferroni’s Correction 𝛼∗ = 𝛼 𝑓 Major Strength Applicable to many situations (no assumptions) Major Strength Overly Conservative Sidak Correction 𝛼∗= 1 − (1 − 𝛼) (1 𝑚)
  • 54. Roozbeh Sanaei Holm’s Step-Down and Hochberg’s Step-Up Procedure 53 Holm-Bonferroni Equation 𝛼∗ = 𝛼 (𝑛 − 𝑟𝑎𝑛𝑘 + 1) 1. Order p−values in ascending way 2. Calculate critical 𝛼∗ value for each rank of the list 3. Scan forward and accept all hypothesis that exceed 𝛼∗ Holm’s Step-Down Procedure Hochberg’s Step-Up Procedure 1. Order p−values in ascending way 2. Calculate critical 𝛼∗ value for each rank of the list 3. Scan backward and reject all hypothesis that surpass 𝛼∗
  • 55. Roozbeh Sanaei Studentized range distribution 54 1. Sample is of size n from each of 𝑘 populations with the same normal distribution 𝑁 𝜇, 𝜎2 2. 𝑦𝑚𝑖𝑛 is the smallest and 𝑦𝑚𝑎𝑥 is the largest of these sample means 3. 𝑠² is the pooled sample variance from these samples. 𝑞 = 𝑦𝑚𝑎𝑥 − 𝑦𝑚𝑖𝑛 𝑠 𝑛 Follows the studentized range distribution Studentized range statistic
  • 56. Roozbeh Sanaei Tukey's range test 55 ANOVA only tests if independent variable level significantly changes the dependent variable, It does not figure out between which pair of the levels cause this significant change lies. In turkey range test we compare mean of dependent variable for each two levels against each other using Studentized range statistic. 𝑉(𝐿) = 𝜎2 1 𝑛𝑗 + 1 𝑛𝑘 𝐿 = 𝜇𝑖 − 𝜇𝑗 𝑞∗ = 2𝐿 𝑉 (𝐿) We reject the mean difference between two levels if it goes beyond than Studentized range distribution for certain k=#levels and df = #experiments-k
  • 57. Roozbeh Sanaei Two Way ANOVA 56 Independent variables : Categorical Dependent variable : Quantitative Fertilizer Type: A, B, C Planting Density: Low, High Levels or Treatments Final crop yield in bushels per acre at harvest time 𝑆𝑆 𝐴 = 𝑟𝑏 𝑖=1 𝑎 𝑦𝑖, , − 𝑦,, 2 𝑆𝑆 𝐵 = 𝑟𝑎 𝑖=1 𝑏 𝑦,𝑗, − 𝑦,, 2 𝑆𝑆 𝐴𝐵 = 𝑟 𝑖=1 𝑎 𝑖=1 𝑏 𝑦𝑖,𝑗, − 𝑦𝑖,, − 𝑦,𝑗, + 𝑦,, 2 𝑆𝑆 𝐸 = 𝑖=1 𝑟 𝑖=1 𝑎 𝑗=1 𝑏 𝑦𝑖,𝑗,𝑘 − 𝑦𝑖,𝑗, 2 𝑆𝑆 𝐸 = 𝑖=1 𝑟 𝑖=1 𝑎 𝑗=1 𝑏 𝑦𝑖,𝑗,𝑘 − 𝑦,, 2 Source SS df Mean Square Factor A SS(A) (a-1) SS(A) 𝑎 − 1 Factor B SS(B) (b-1) SS(B) 𝑎 − 1 Interaction SS(AB) (a-1)(b-1) SS(AB) (𝑎 − 1)(𝑏 − 1) Error SSE (N-ab) SSE (𝑁 − 𝑎𝑏) Total SS(Total) (N-1)
  • 58. i: treatment j: observation Roozbeh Sanaei Two Way ANOVA Model and Parameter Estimation 57 𝑌𝑖,𝑗 = 𝜇 + 𝜏𝑖 + 𝛽𝑗 + 𝛾𝑖,𝑗 + 𝜖𝑖,𝑗,𝑘 Observation Common effect i-th Treatment effect of A Random error effect https://sites.ualberta.ca/~lkgray/uploads/7/3/6/2/7362679/slides_-_anova_assumptions.pdf j-th Treatment effect of B Interaction between i-th Treatment effect of A and j-th Treatment effect of B