SlideShare a Scribd company logo
CMPUT 366 F20: Probability Theory
James Wright & Vadim Bulitko
October 15, 2020
CMPUT 366 F20: Probability Theory 1
Lecture Outline
Probability Theory
PM 8.1-8.2
CMPUT 366 F20: Probability Theory 2
Uncertainty
In both search and RL we assumed that the
agent knows its current state s
That is an abstraction/simplification
in real life agents may not know the entire
state with certainty
Agent’s knowledge is uncertain
agent must consider multiple hypotheses
agent must update beliefs about which
hypotheses are likely given observations
Stephen Hladky, 2009
CMPUT 366 F20: Probability Theory 3
Example*
An AI robot has to decide between three actions:
drive without wearing a seatbelt
drive while wearing a seatbelt
stay home
If the robot knows with certainty that an accident will happen, it will just stay
home
If the robot knows with certainty that an accident will not happen, it will not
bother to wear a seatbelt
Wearing a seatbelt makes sense because the robot is uncertain about whether
driving will lead to an accident
*
This is a hypothetical example with a robot. As a human in real life please always follow appropriate laws and regulations on wearing seatbealts.
CMPUT 366 F20: Probability Theory 4
Measuring Uncertainty
Probability is a way of measuring/quantifying uncertainty
The agent assigns a number between 0 and 1 to hypotheses
0 means absolutely certain that statement is false
1 means absolutely certain that statement is true
intermediate values mean more or less certain
Probability is a measurement of uncertainty, not truth
a statement with probability 0.75 is not “mostly true”
rather, the agent believes it is more likely to be true than not
CMPUT 366 F20: Probability Theory 5
Subjective versus Objective: The Frequentist Perspective
Probabilities can be interpreted as objective statements about the world, or as
subjective statements about an agent’s beliefs
Objective view is called frequentist:
The probability of an event is the proportion of times it would happen in the long
run of repeated experiments
Every event has a single, true probability
Events that can only happen once do not have a well-defined probability
CMPUT 366 F20: Probability Theory 6
Subjective versus Objective: The Bayesian Perspective
Probabilities can be interpreted as objective statements about the world or as
subjective statements about an agent’s beliefs
Subjective view is called Bayesian
The probability of an event is a measure of an agent’s belief about its likelihood
Different agents can legitimately have different beliefs, so they can legitimately
assign different probabilities to the same event
There is only one way to update those beliefs in response to new data
In this course, we will primarily take the Bayesian view
CMPUT 366 F20: Probability Theory 7
Example: Dice
Discuss:
Diane rolls a fair, six-sided dice, and gets the number X
What is P(X = 5)?
Diane truthfully tells Oliver that she rolled an odd number
What should Oliver believe P(X = 5) is?
Diane truthfully tells Greta that she rolled a number greater than or equal to 5
What should Greta believe P(X = 5) is?
CMPUT 366 F20: Probability Theory 8
Semantics: Possible Worlds
Random variables (e.g., X) take values from a set (domain)
A possible world ω is a complete assignment of values to all random variables
A probability measure is a function P : Ω → R:
P
ω∈Ω
P(ω) = 1
∀ω ∈ Ω [P(ω) ≥ 0]
CMPUT 366 F20: Probability Theory 9
Semantics: Possible Worlds
Random variables (e.g., X) take values from a set (domain)
A possible world ω is a complete assignment of values to all random variables
A probability measure is a function P : Ω → R:
P
ω∈Ω
P(ω) = 1
∀ω ∈ Ω [P(ω) ≥ 0]
Discuss for the six-sided fair dice example:
What is the random variable?
What is its domain?
How many worlds are there?
What is the P?
CMPUT 366 F20: Probability Theory 10
Propositions
A primitive proposition is an equality or an inequality (e.g., X = 2 or X ≥ 5)
A proposition is built up from other propositions using logical connectives (e.g.,
X = 1 ∨ X = 3 ∨ X = 5)
The probability of a proposition is the sum of the probabilities of the possible
worlds in which that proposition is true
P(α) =
X
ω∈Ω, ω|=α
P(ω)
Example: in the dice example P(X ≥ 5) = P(X = 5) + P(X = 6) = 1/6 + 1/6 = 1/3
CMPUT 366 F20: Probability Theory 11
Basic Properties
P(α ∨ β) ≥ P(α)
P(α ∨ β) ≥ P(β)
P(α & β) ≤ P(α)
P(α & β) ≤ P(β)
P(¬α) = 1 − P(α)
CMPUT 366 F20: Probability Theory 12
Joint Distributions
In our dice example there was a single random variable X
We typically want to think about the interactions of multiple random variables
A joint distribution assigns a probability to each full assignment of values to
variables
P(X = 1, Y = 5) is equivalent to P(X = 1 & Y = 5)
the cumulative probability of all worlds in which X = 1 and Y = 5
Suppose Diane now throws her six-sided fair dice twice. The result of the first
throw is X and the second throw is Y
Discuss:
What is P(X = 1, Y = 5)?
What is P(X = 1)?
CMPUT 366 F20: Probability Theory 13
Another Joint-Distribution Example
What might a day be like in Edmonton?
Two random variables:
Weather with domain {clear, snowing}
Temperature with domain {mild, cold, very_cold}
Joint distribution P(Weather, Temperature) →
CMPUT 366 F20: Probability Theory 14
Marginalization
Marginalization is using a joint distribution
P(X1, . . . , Xm, . . . , Xn) to compute a distribution
over a smaller number of variables P(X1, . . . , Xm)
The smaller distribution is called the marginal
distribution of its variables
We compute the marginal distribution by
summing out the other variables, for instance:
P(X, Y) =
X
z
P(X, Y, Z = z)
What is the marginal distribution of Weather?
What is P(Weather = clear)?
What is P(Weather = snowing)?
CMPUT 366 F20: Probability Theory 15
Conditional Probability
Agents need to be able to update their beliefs based on new observations
This process is called conditioning
We write P(h|e) to denote the probability of hypothesis h given that we have
observed evidence e
P(h|e) is the probability of h conditional on e
CMPUT 366 F20: Probability Theory 16
Semantics of Conditional Probability
Evidence e lets us rule out all of the worlds that
are incompatible with e
For instance, if the agent observes that the
weather is clear, it should no longer assign any
probability to the worlds in which it is snowing
We need to normalize the probabilities of the
remaining worlds to ensure that the probabilities
of possible worlds sum to 1
Modify the table on the right given the evidence
that the weather is clear
CMPUT 366 F20: Probability Theory 17
Chain Rule
Conditional probability is defined as
P(h|e) =
P(h, e)
P(e)
which is exactly the sum of probabilities of all worlds in which h & e are true
divided by the sum of probabilities of all worlds in which e is true
in the weather example, P(mild|clear) = 0.2
0.2+0.3+0.25
From there we have P(h, e) = P(h|e)P(e)
More generally, we have the chain rule:
P(α1, . . . , αn) = P(α1)P(α2|α1) . . . P(αn|α1, . . . , αn−1)
=
n
Y
i=1
P(αi|α1, . . . , αi−1)
CMPUT 366 F20: Probability Theory 18
Bayes’ Rule
We have P(h, e) = P(h|e)P(e) = P(e|h)P(h)
From here we have the Baye’s rule
P(h|e) =
P(e|h)P(h)
P(e)
P(e) is probability of the the evidence
P(h) is the prior probability of a hypothesis h
P(e|h) is the likelihood — often easier to compute than:
P(h|e) is the posterior
Discuss why P(wet|rain) is easier to compute than P(rain|wet)
wet is the evidence e
rain is the hypothesis h
CMPUT 366 F20: Probability Theory 19
Expected Value
The expected value of a random variable X is the weighted average of that
variable over the domain, weighted by the probability of each value:
E[X] =
X
x
P(X = x)x
The conditional expected value of a variable X conditioned on proposition y is
its expected value weighted by the conditional probability:
E[X|y] =
X
x
P(X = x|y)x
Discuss
What is the expected value of a six-sided fair dice?
What is the conditional expected value of a six-sided fair dice conditioned on the
fact that it is even?
CMPUT 366 F20: Probability Theory 20
Expected Value Examples: E[X] = 3
CMPUT 366 F20: Probability Theory 21
Summary
Probability is a numerical measure of uncertainty
Formal semantics:
positive weights, sum up to 1 over possible worlds
probability of proposition is total weight of worlds in which the proposition is true
Conditional probability updates the agent’s beliefs based on evidence
Expected value of a variable is its probability-weighted average over possible
worlds
CMPUT 366 F20: Probability Theory 22

More Related Content

PPTX
Module5_chapter1PPT (1).pptxhdhrjgrrjjrjrdbjejej
DOCX
Probabilistic decision making
PPT
tps5e_Ch5_3.ppt
PPT
tps5e_Ch5_3.ppt
PPT
Ppt unit-05-mbf103
PDF
Probability concepts for Data Analytics
PDF
Data mining assignment 2
PPTX
artificial intelligence and uncertain reasoning
Module5_chapter1PPT (1).pptxhdhrjgrrjjrjrdbjejej
Probabilistic decision making
tps5e_Ch5_3.ppt
tps5e_Ch5_3.ppt
Ppt unit-05-mbf103
Probability concepts for Data Analytics
Data mining assignment 2
artificial intelligence and uncertain reasoning

Similar to Probability Theory.pdf (20)

PPTX
Probability
PPT
Lesson 5.ppt
DOCX
1 Probability Please read sections 3.1 – 3.3 in your .docx
PPT
Chapter 05
PDF
PPTX
Bayesian statistics
PDF
Uncertain knowledge and reasoning
PPTX
BSM with Sofware package for Social Sciences
PPTX
Probability.pptx
PPT
Bayes Classification
PDF
STOMA FULL SLIDE (probability of IISc bangalore)
PPT
Probability Review Additions
PPTX
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
PDF
Mathematics for Language Technology: Introduction to Probability Theory
PPTX
ch4 probablity and probablity destrubition
PPTX
Probabilistic Reasoning
PPT
9-Decision Tree Induction-23-01-2025.ppt
PPT
pattern recognition
PPTX
Probability Theory
PPT
Probability Concepts Applications
Probability
Lesson 5.ppt
1 Probability Please read sections 3.1 – 3.3 in your .docx
Chapter 05
Bayesian statistics
Uncertain knowledge and reasoning
BSM with Sofware package for Social Sciences
Probability.pptx
Bayes Classification
STOMA FULL SLIDE (probability of IISc bangalore)
Probability Review Additions
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
Mathematics for Language Technology: Introduction to Probability Theory
ch4 probablity and probablity destrubition
Probabilistic Reasoning
9-Decision Tree Induction-23-01-2025.ppt
pattern recognition
Probability Theory
Probability Concepts Applications
Ad

Recently uploaded (20)

PPTX
Oracle Fusion HCM Cloud Demo for Beginners
PDF
Website Design Services for Small Businesses.pdf
PPTX
Trending Python Topics for Data Visualization in 2025
DOCX
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
chapter 5 systemdesign2008.pptx for cimputer science students
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PPTX
Computer Software and OS of computer science of grade 11.pptx
PPTX
Weekly report ppt - harsh dattuprasad patel.pptx
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Salesforce Agentforce AI Implementation.pdf
PPTX
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
MCP Security Tutorial - Beginner to Advanced
PDF
Types of Token_ From Utility to Security.pdf
PPTX
Custom Software Development Services.pptx.pptx
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PPTX
"Secure File Sharing Solutions on AWS".pptx
PPTX
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Oracle Fusion HCM Cloud Demo for Beginners
Website Design Services for Small Businesses.pdf
Trending Python Topics for Data Visualization in 2025
Greta — No-Code AI for Building Full-Stack Web & Mobile Apps
Topaz Photo AI Crack New Download (Latest 2025)
chapter 5 systemdesign2008.pptx for cimputer science students
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
Computer Software and OS of computer science of grade 11.pptx
Weekly report ppt - harsh dattuprasad patel.pptx
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Salesforce Agentforce AI Implementation.pdf
AMADEUS TRAVEL AGENT SOFTWARE | AMADEUS TICKETING SYSTEM
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
MCP Security Tutorial - Beginner to Advanced
Types of Token_ From Utility to Security.pdf
Custom Software Development Services.pptx.pptx
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
"Secure File Sharing Solutions on AWS".pptx
WiFi Honeypot Detecscfddssdffsedfseztor.pptx
Ad

Probability Theory.pdf

  • 1. CMPUT 366 F20: Probability Theory James Wright & Vadim Bulitko October 15, 2020 CMPUT 366 F20: Probability Theory 1
  • 2. Lecture Outline Probability Theory PM 8.1-8.2 CMPUT 366 F20: Probability Theory 2
  • 3. Uncertainty In both search and RL we assumed that the agent knows its current state s That is an abstraction/simplification in real life agents may not know the entire state with certainty Agent’s knowledge is uncertain agent must consider multiple hypotheses agent must update beliefs about which hypotheses are likely given observations Stephen Hladky, 2009 CMPUT 366 F20: Probability Theory 3
  • 4. Example* An AI robot has to decide between three actions: drive without wearing a seatbelt drive while wearing a seatbelt stay home If the robot knows with certainty that an accident will happen, it will just stay home If the robot knows with certainty that an accident will not happen, it will not bother to wear a seatbelt Wearing a seatbelt makes sense because the robot is uncertain about whether driving will lead to an accident * This is a hypothetical example with a robot. As a human in real life please always follow appropriate laws and regulations on wearing seatbealts. CMPUT 366 F20: Probability Theory 4
  • 5. Measuring Uncertainty Probability is a way of measuring/quantifying uncertainty The agent assigns a number between 0 and 1 to hypotheses 0 means absolutely certain that statement is false 1 means absolutely certain that statement is true intermediate values mean more or less certain Probability is a measurement of uncertainty, not truth a statement with probability 0.75 is not “mostly true” rather, the agent believes it is more likely to be true than not CMPUT 366 F20: Probability Theory 5
  • 6. Subjective versus Objective: The Frequentist Perspective Probabilities can be interpreted as objective statements about the world, or as subjective statements about an agent’s beliefs Objective view is called frequentist: The probability of an event is the proportion of times it would happen in the long run of repeated experiments Every event has a single, true probability Events that can only happen once do not have a well-defined probability CMPUT 366 F20: Probability Theory 6
  • 7. Subjective versus Objective: The Bayesian Perspective Probabilities can be interpreted as objective statements about the world or as subjective statements about an agent’s beliefs Subjective view is called Bayesian The probability of an event is a measure of an agent’s belief about its likelihood Different agents can legitimately have different beliefs, so they can legitimately assign different probabilities to the same event There is only one way to update those beliefs in response to new data In this course, we will primarily take the Bayesian view CMPUT 366 F20: Probability Theory 7
  • 8. Example: Dice Discuss: Diane rolls a fair, six-sided dice, and gets the number X What is P(X = 5)? Diane truthfully tells Oliver that she rolled an odd number What should Oliver believe P(X = 5) is? Diane truthfully tells Greta that she rolled a number greater than or equal to 5 What should Greta believe P(X = 5) is? CMPUT 366 F20: Probability Theory 8
  • 9. Semantics: Possible Worlds Random variables (e.g., X) take values from a set (domain) A possible world ω is a complete assignment of values to all random variables A probability measure is a function P : Ω → R: P ω∈Ω P(ω) = 1 ∀ω ∈ Ω [P(ω) ≥ 0] CMPUT 366 F20: Probability Theory 9
  • 10. Semantics: Possible Worlds Random variables (e.g., X) take values from a set (domain) A possible world ω is a complete assignment of values to all random variables A probability measure is a function P : Ω → R: P ω∈Ω P(ω) = 1 ∀ω ∈ Ω [P(ω) ≥ 0] Discuss for the six-sided fair dice example: What is the random variable? What is its domain? How many worlds are there? What is the P? CMPUT 366 F20: Probability Theory 10
  • 11. Propositions A primitive proposition is an equality or an inequality (e.g., X = 2 or X ≥ 5) A proposition is built up from other propositions using logical connectives (e.g., X = 1 ∨ X = 3 ∨ X = 5) The probability of a proposition is the sum of the probabilities of the possible worlds in which that proposition is true P(α) = X ω∈Ω, ω|=α P(ω) Example: in the dice example P(X ≥ 5) = P(X = 5) + P(X = 6) = 1/6 + 1/6 = 1/3 CMPUT 366 F20: Probability Theory 11
  • 12. Basic Properties P(α ∨ β) ≥ P(α) P(α ∨ β) ≥ P(β) P(α & β) ≤ P(α) P(α & β) ≤ P(β) P(¬α) = 1 − P(α) CMPUT 366 F20: Probability Theory 12
  • 13. Joint Distributions In our dice example there was a single random variable X We typically want to think about the interactions of multiple random variables A joint distribution assigns a probability to each full assignment of values to variables P(X = 1, Y = 5) is equivalent to P(X = 1 & Y = 5) the cumulative probability of all worlds in which X = 1 and Y = 5 Suppose Diane now throws her six-sided fair dice twice. The result of the first throw is X and the second throw is Y Discuss: What is P(X = 1, Y = 5)? What is P(X = 1)? CMPUT 366 F20: Probability Theory 13
  • 14. Another Joint-Distribution Example What might a day be like in Edmonton? Two random variables: Weather with domain {clear, snowing} Temperature with domain {mild, cold, very_cold} Joint distribution P(Weather, Temperature) → CMPUT 366 F20: Probability Theory 14
  • 15. Marginalization Marginalization is using a joint distribution P(X1, . . . , Xm, . . . , Xn) to compute a distribution over a smaller number of variables P(X1, . . . , Xm) The smaller distribution is called the marginal distribution of its variables We compute the marginal distribution by summing out the other variables, for instance: P(X, Y) = X z P(X, Y, Z = z) What is the marginal distribution of Weather? What is P(Weather = clear)? What is P(Weather = snowing)? CMPUT 366 F20: Probability Theory 15
  • 16. Conditional Probability Agents need to be able to update their beliefs based on new observations This process is called conditioning We write P(h|e) to denote the probability of hypothesis h given that we have observed evidence e P(h|e) is the probability of h conditional on e CMPUT 366 F20: Probability Theory 16
  • 17. Semantics of Conditional Probability Evidence e lets us rule out all of the worlds that are incompatible with e For instance, if the agent observes that the weather is clear, it should no longer assign any probability to the worlds in which it is snowing We need to normalize the probabilities of the remaining worlds to ensure that the probabilities of possible worlds sum to 1 Modify the table on the right given the evidence that the weather is clear CMPUT 366 F20: Probability Theory 17
  • 18. Chain Rule Conditional probability is defined as P(h|e) = P(h, e) P(e) which is exactly the sum of probabilities of all worlds in which h & e are true divided by the sum of probabilities of all worlds in which e is true in the weather example, P(mild|clear) = 0.2 0.2+0.3+0.25 From there we have P(h, e) = P(h|e)P(e) More generally, we have the chain rule: P(α1, . . . , αn) = P(α1)P(α2|α1) . . . P(αn|α1, . . . , αn−1) = n Y i=1 P(αi|α1, . . . , αi−1) CMPUT 366 F20: Probability Theory 18
  • 19. Bayes’ Rule We have P(h, e) = P(h|e)P(e) = P(e|h)P(h) From here we have the Baye’s rule P(h|e) = P(e|h)P(h) P(e) P(e) is probability of the the evidence P(h) is the prior probability of a hypothesis h P(e|h) is the likelihood — often easier to compute than: P(h|e) is the posterior Discuss why P(wet|rain) is easier to compute than P(rain|wet) wet is the evidence e rain is the hypothesis h CMPUT 366 F20: Probability Theory 19
  • 20. Expected Value The expected value of a random variable X is the weighted average of that variable over the domain, weighted by the probability of each value: E[X] = X x P(X = x)x The conditional expected value of a variable X conditioned on proposition y is its expected value weighted by the conditional probability: E[X|y] = X x P(X = x|y)x Discuss What is the expected value of a six-sided fair dice? What is the conditional expected value of a six-sided fair dice conditioned on the fact that it is even? CMPUT 366 F20: Probability Theory 20
  • 21. Expected Value Examples: E[X] = 3 CMPUT 366 F20: Probability Theory 21
  • 22. Summary Probability is a numerical measure of uncertainty Formal semantics: positive weights, sum up to 1 over possible worlds probability of proposition is total weight of worlds in which the proposition is true Conditional probability updates the agent’s beliefs based on evidence Expected value of a variable is its probability-weighted average over possible worlds CMPUT 366 F20: Probability Theory 22