SlideShare a Scribd company logo
July 30, 2019
Probability
SYMSYS1
TODD DAVIES
Lecture Outline
1. Basic Probability Theory
2. Conditional Probability
3. Independence
4. Philosophical Foundations
5. Subjective Probability Elicitation
6. Heuristics and Biases in Human Probability Judgment
Why study probability?
1. Basic Probability
Theory
Lecture: Probability
SYMSYS 1
Todd Davies
Sample spaces
DEFINITION 1.1. Let S = {s1,s2,...,sn} be a finite set of possible outcomes in
a context C. S is a (finite) sample space for C iff exactly one outcome
among the elements of S is or will be true in C.
EXAMPLE 1.2: Let C be the particular flipping of a coin. Then…
▪ S = {Heads, Tails} is a sample space for C.
▪ Another sample space for C is S' = {Heads is observed, Tails is
observed, Cannot observe whether the coin is heads or tails}.
▪ Yet another is S'' = {Heads is observed and someone coughs, Heads is
observed and no one coughs, Tails is observed whether someone
coughs or not}.
Event spaces
DEFINITION 1.3. Let S be a sample space, and ⊘ ≠ E ⊆ 2S (E is a
nonempty subset of the power set of S, i.e., it is a set of subsets of S). Then
E is an event space (or algebra of events) on S iff for every A,B ∈ E:
(a) S  A = AC ∈ E (the S-complement of A is in E)
and
(b) A∪B ∈ E (the union of A and B is in E).
We call the elements of E consisting of single elements of S atomic events.
EXAMPLE 1.4: If S = {Heads, Tails}, then E = {⊘, {Heads}, {Tails}, {Heads,
Tails}} is an event space on S. The atomic events are {Heads} and {Tails}.
Probability measures
DEFINITION 1.5. Let S be a sample space and E an event space on S.
Then a function P: E →[0,1] is a (finitely additive) probability measure on E
iff for every A,B ∈ E:
(a) P(S) = 1
and
(b) If A∩B = ⊘ (the intersection of A and B is empty, in which case we
say that A and B are disjoint events), then P(A∪B) = P(A) + P(B)
(additivity).
The triple <S,E,P> is called a (finitely additive) probability space.
A B
S
Basic corollaries
COROLLARY 1.6. Binary complementarity. If <S,E,P> is a finitely additive
probability space, then for all A,B ∈ E:
P(AC) = 1 – P(A).
Proof.
S = A∪AC by the definition of complementarity. Therefore P(A∪AC) = 1 by
Definition 1.5(a). A and AC are disjoint by the definition of complementarity,
so by 1.5(b), P(A∪AC) = P(A) + P(AC), so P(A) + P(AC) = 1 and the result
follows by subtracting P(A) from both sides of the equation.
AACS
Basic corollaries
COROLLARY 1.7. If <S,E,P> is a finitely additive probability space, then for
all A,B ∈ E:
P(⊘) = 0.
Proof.
SC = S  S = ⊘. Thus P(⊘) = P(SC) = 1 – P(S) by Corollary 1.6, which by
Definition 1.5(a) is 1-1 = 0.
Basic corollaries
COROLLARY 1.8. If <S,E,P> is a finitely additive probability space, then for
all A,B ∈ E:
P(A∪B) = P(A) + P(B) - P(A∩B).
Proof.
From set theory, we have A = (A∩B)∪(A∩BC) and B = (B∩A)∪(B∩AC), A∪B
= (A∩B)∪(A∩BC)∪(B∩AC) = (A∩B)∪[(A∩BC)∪(B∩AC)], (A∩B)∩(A∩BC) = ⊘,
(B∩A)∩(B∩AC) = ⊘, and (A∩B)∩[(A∩BC)∩(B∩AC)] = ⊘∩⊘=⊘. Therefore,
P(A)=P[(A∩B)∪(A∩BC)]=P(A∩B)+P(A∩BC) and
P(B)=P[(B∩A)∪(B∩AC)]=P(B∩A)+P(B∩AC), and P(A∪B) =
P{(A∩B)∪{(A∩BC)∪(B∩AC)]} = P(A∩B)+P(A∩BC)+P(B∩AC). Substituting,
P(A∪B) = P(A∩B)+[P(A)-P(A∩B)]+[P(B)-P(B∩A)]. B∩A= A∩B, so P(A∪B) =
P(A)+P(B)-P(A∩B).
Basic corollaries
Venn diagram version of the previous proof:
COROLLARY 1.8. If <S,E,P> is a finitely additive probability space, then for
all A,B ∈ E:
P(A∪B) = P(A) + P(B) - P(A∩B).
A BS
Basic corollaries
COROLLARY 1.9. Conjunction rule. If <S,E,P> is a finitely additive
probability space, then for all A,B ∈ E:
P(A∩B) ≤ P(A).
Proof. From set theory, we have A = (A∩B)∪(A∩BC), and (A∩B)∩(A∩BC) =
⊘, so for a probability measure, by Definition 1.5(b),
P(A)=P[(A∩B)∪(A∩BC)]=P(A∩B)+P(A∩BC). By the definition of probability,
P(A∩BC) ≥ 0, so P(A) – P(A∩B) = P(A∩BC) ≥ 0. Therefore P(A) ≥ P(A∩B).
A B
S
Basic corollaries
PROPOSITION 1.10. Disjunction rule. If <S,E,P> is a finitely additive
probability space, then for all A,B ∈ E:
P(A∪B) ≥ P(A).
EXERCISE 1.11. Prove Proposition 1.10.
2. Conditional
Probability
Lecture: Probability
SYMSYS 1
Todd Davies
Definition of conditional probability
DEFINITION 2.1. The conditional probability P(A|B) of an event A given
event B is defined as follows: P(A|B) = P(A∩B) / P(B).
COROLLARY 2.2. P(A∩B) = P(A|B) P(B).
A BS
Bayes’s theorem
THEOREM 2.3. For events A and B, P(A|B) = [P(B|A)P(A)] / P(B).
Proof.
• By Definition 2.1, P(A|B) = P(A∩B) / P(B).
• A∩B = B∩A, so P(A∩B) = P(B∩A).
• From Corollary 2.2, P(B∩A) = P(B|A)P(A).
• From Corollary 2.2 again, P(B∩A) = P(A∩B)= P(A|B)P(B).
• Therefore P(A|B)P(B) = P(B|A)P(A) [*]
• The theorem follows if we divide both sides of equation [*] by P(B).
Applying Bayes to Medical Diagnosis
EXAMPLE 2.4. (from David Eddy ,1982) Calculation of the probability that a
breast lesion is cancerous based on a positive mammogram:
▪ Eddy estimates the probability that a lesion will be detected through a
mammogram as .792.
▪ Hence the test will turn up negative when cancer is actually present
20.8% of the time.
▪ When no cancer is present, the test produces a positive result 9.6% of
the time (and is therefore correctly negative 90.4% of the time).
▪ The prior probability that a patient who has a mammogram will have
cancer is taken to be 1%.
▪ Thus, the probability of cancer given as positive test as [(.792)(.01)] /
[(.792)(.01) + (.096)(.99)] = .077, applying Theorem 2.3.
▪ So a patient with a positive test has less than an 8% chance of having
breast cancer. Does this seem low to you?
3. Independence
Lecture: Probability
SYMSYS 1
Todd Davies
Independent events
DEFINITION 3.1. Two events A and B are independent iff P(A∩B) =
P(A)P(B).
COROLLARY 3.2. Two events A and B satisfying P(B)>0 are independent
iff P(A|B) = P(A).
Proof. (a) Only if direction: Since A and B are independent, P(A∩B) =
P(A)P(B) by Definition 3.1. Since P(B)>0, P(A|B) = P(A∩B)/P(B) =
P(A)P(B)/P(B) = P(A). (b) If direction: P(A|B) = P(A), so multiplying both
sides by P(B), P(A|B)P(B) = P(A)P(B) = P(A∩B).
Independent coin tosses
EXAMPLE 3.3. Consider two flips of a coin.
▪ Let A={Heads on the first toss} and
B={Tails on the second toss}.
▪ The probability for tails on the second
toss is not affected by what happened
on the first toss and vice versa, so
P(B|A) = P(B) and P(A|B) = P(A).
▪ Assuming both sides of the coin have a
nonzero probability of landing on top,
the tosses are independent, and
therefore, P(A∩B) = P(A)P(B).
▪ Assuming the coin is unbiased
(P({Heads})=P({Tails})=0.5), this means
that P(A∩B) = (.5)(.5) = .25.
Dice rolls
EXERCISE 3.4. Consider two six-sided dice (with faces varying from 1 to 6
dots) which are rolled simultaneously, and assume each roll is independent
of the other. What is the probability that the sum of the two dice is 7?
Conditional independence
DEFINITION 3.5. Two events A and B are conditionally independent given
event C iff P(A∩B|C) = P(A|C)P(B|C).
PROPOSITION 3.6. Two events A and B are conditionally independent
given event C iff P(A|B∩C) = P(A|C) or P(B|C) = 0.
EXERCISE 3.7. Prove Proposition 3.6.
4. Philosophical
Foundations of
Probability
Lecture: Probability
SYMSYS 1
Todd Davies
24
Classical view
Coin has 2 sides.
By symmetry, the probability for each
side is 1 divided by 2.
25
Frequentist view
Toss the coin many times
Observe the proportion of times the
coin comes up Heads versus Tails
Probability for each side is therefore
the observed frequency (proportion)
of that outcome out of the total
number of tosses.
26
Subjectivist view
Make a judgment of how likely you
think it is for this coin to turn up
Heads versus Tails.
Probability represents your relative
degree of belief in one outcome
relative to another.
Also known as the Bayesian view.
5. Subjective
Probability Elicitation
Lecture: Probability
SYMSYS 1
Todd Davies
Numerical method
Q: What do you believe is the
probability that it will rain tomorrow?
A. I would put the probability of rain
tomorrow at _____%.
Choice method
Q: Which of the following two events
do you think is more probable?
• Rain tomorrow
• No rain tomorrow
Probability wheel
Q: Choose between...
(I) Receiving $50 if it rains tomorrow
(II) Receiving $50 if the arrow lands within the displayed
probability range
Procedural Invariance
DEFINITION 5.1. An agent’s stated confidence (elicited subjective
probability) D is procedurally invariant with respect to two procedures ! and
!’ iff the inferred inequality relations >! and >!’ are such that for all events A
and B, D(A) >! D(B) iff D(A) >!’ D(B).
Numerical
Method
Choice
Method
≅
Calibration
DEFINITION 5.2. The confidence D of a probability judge is perfectly
calibrated if for all values x ∈ [0,1] , and all events E, if D(E) = x, then the
observed P(E) = x.
For values x ∈ [0.5,1] , if D(E) > P(E), then the judge is said to be
overconfident. If D(E) < P(E), then the judge is said to be underconfident.
6. Heuristics and
Biases of Human
Probability Judgment
Lecture: Probability
SYMSYS 1
Todd Davies
Using probability theory to predict probability judgments
EXPERIMENT 6.1. Test of binary complementarity. A number of
experiments reported in Tversky and Koehler (1994) and Wallsten,
Budescu, and Zwick (1992) show that subjects' confidence judgments
generally obey Corollary 1.6, so that D(AC) = 1 – D(A).
▪ The latter authors “presented subjects with 300 propositions concerning
world history and geography (e.g. 'The Monroe Doctrine was proclaimed
before the Republican Party was founded')... (cited in T&K ‘94)
▪ True and false (complementary) versions of each proposition were
presented on different days.”
▪ The average sum probability for propositions and their complements was
1.02, which is insignificantly different from 1.
Probability judgment heuristics
A landmark paper by Amos Tversky
and Daniel Kahneman (Science,
1974) argued that human judges
often employ heuristics in making
judgments about uncertain
prospects, in particular:
• Representativeness: A is judged
more likely than B in a context C if
A is more representative of C than
B is.
• Availability: A is judged more
likely than B in context C if
instances of A are easier than of B
to bring to mind in context C.
Another experiment…
EXPERIMENT 6.2. Tennis prediction. Tversky and Kahneman (1983):
▪ Subjects evaluated the relative likelihoods that Bjorn Borg (then the most
dominant male tennis player in the world), would (a) win the final match
at Wimbledon, (b) lose the first set, (c) lose the first set but win the
match, and (d) win the first set but lose the match.
>>> How would you rank events (a) through (d) by likelihood?
Conjunction fallacy…
EXPERIMENT 6.2. Tennis prediction. Tversky and Kahneman (1983):
▪ Subjects evaluated the relative likelihoods that Bjorn Borg (then the most
dominant male tennis player in the world), would (a) win the final match
at Wimbledon, (b) lose the first set, (c) lose the first set but win the
match, and (d) win the first set but lose the match.
▪ The average rankings (1=most probable, 2 = second most probable, etc.)
were 1.7 for a, 2.7 for b, 2.2 for c, and 3.5 for d.
▪ Thus, subjects on average ranked the conjunction of Borg losing the first
set and winning the match as more likely than that he would lose the first
set (2.2 versus 2.7).
▪ The authors' explanation is that people rank likelihoods based on the
representativeness heuristic, which makes the conjunction of Borg's
losing the first set but winning the match more representative of Borg
than is the proposition that Borg loses the first set.
Another conjunction fallacy
EXPERIMENT 6.3. Natural disasters. Tversky and Kahneman asked
subjects to evaluate the probability of occurrence of several events in 1983.
▪ Half of subjects evaluated a basic outcome (e.g. “A massive flood
somewhere in North America in 1983, in which more than 1,000 people
drown”) and the other half evaluated a more detailed scenario leading to
the same outcome (e.g. “An earthquake in California sometime in 1983,
causing a flood in which more than 1,000 people drown.”)
▪ The estimates of the conjunction were significantly higher than those for
the flood.
▪ Thus, scenarios that include a cause-effect story appear more plausible
than those that lack a cause, even though the latter are extensionally
more likely.
▪ The causal story makes the conjunction easier to imagine, an aspect of
the availability heuristic.
Disjunction fallacy?
EXERCISE 6.4. Construct an example experiment in which you would expect
subjects to violate the disjunction rule of Proposition 1.10 (and which you were
asked to prove in Exercise 1.11).
Is human judgment Bayesian?
This is a highly studied question.
The answer depends on the type of judgement or cognitive task being
performed, and to a lesser extent on the identity of the judge.
See separate lecture in this series, “Are people Bayesian reasoners?”
Gambler’s fallacy
EXPERIMENT 6.5. Tversky and Kahneman (1974):
▪ Subjects on average regard the sequence H-T-H-T-T-H of fair coin
tosses to be more likely than the sequence H-H-H-T-T-T, even though
both sequences are equally likely, a result that follows from the
generalized definition of event independence.
▪ People in general regard, for example, a heads toss as more likely after
a long run of tails tosses than after a similar run of tails or a mixed run.
This tendency has been called the “gambler's fallacy”.
▪ Tversky and Kahneman's explanation is that people expect sequences of
tosses to be representative of the process that generates them. H-T-H-T-
T-H is rated more likely than H-H-H-T-T-T because the former
sequences is more typical of fair coin toss sequences generally, in which
H and T are not grouped such that all the Hs precede all the Ts, but
rather Hs and Ts are intermingled.
Other findings
Belief in the “hot hand”. Both lay and expert basketball observers perceive
players as being more likely to make a shot after a hit (or a run or previous
hits) than after a miss, even when the data indicate that shots are
independent events as defined in 3.1. (Gilovich, Tversky, and Vallone,
1985)
Overconfidence. Various studies have shown that people’s probability
judgments are not well calibrated, and that on average people exhibit
overconfidence by the criteria in Definition 5.2. (see Hoffrage, 2004)
Belief reversals. Fox and Levav (2000) showed evidence that probability
judgments are affected by the elicitation method, in violation of the
procedural invariance criterion of Definition 5.1, and that choice and
numerical methods sometimes yield opposite results.
What can we learn from illusions
What can we learn from illusions
Further perspectives
Dual process model of judgment (Stanovich and West, 2000; Kahneman
and Frederick, 2002)
▪ System 1 – quick, intuitive
▪ System 2 – slower, deliberative
Evolutionary psychology (e.g., Gigerenzer and Todd, 1999)
Rational analysis of cognition (Oaksford and Chater, 2007)
Computational probabilistic models of cognition (e.g. Tenenbaum, Kemp,
Griffiths, and Goodman, 2011)

More Related Content

PDF
Basic concepts of probability
PPTX
Conditional Probability
PPT
Conditional Probability
PPT
Basic Concept Of Probability
PPTX
Mathematical Expectation And Variance
PPT
Probability addition rule
PPT
Basic concept of probability
PDF
Probability concepts for Data Analytics
Basic concepts of probability
Conditional Probability
Conditional Probability
Basic Concept Of Probability
Mathematical Expectation And Variance
Probability addition rule
Basic concept of probability
Probability concepts for Data Analytics

What's hot (20)

PPT
Probability
PPTX
introduction to probability
PPTX
Probability
PPTX
4.1-4.2 Sample Spaces and Probability
PPTX
Intro to probability
PPTX
PROBABILITY
PPTX
Probability
PPTX
Discrete Random Variables And Probability Distributions
PPTX
Modular arithmetic
PPT
Probability By Ms Aarti
PPTX
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
PPTX
Normal as Approximation to Binomial
PPTX
Basic concepts of probability
PPTX
Conditional Probability
PPTX
Probability
PPTX
Probability
PPTX
CMSC 56 | Lecture 16: Equivalence of Relations & Partial Ordering
PPTX
Probability&Bayes theorem
PPT
Probability - Independent & Dependent Events
PPTX
Probability
Probability
introduction to probability
Probability
4.1-4.2 Sample Spaces and Probability
Intro to probability
PROBABILITY
Probability
Discrete Random Variables And Probability Distributions
Modular arithmetic
Probability By Ms Aarti
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Normal as Approximation to Binomial
Basic concepts of probability
Conditional Probability
Probability
Probability
CMSC 56 | Lecture 16: Equivalence of Relations & Partial Ordering
Probability&Bayes theorem
Probability - Independent & Dependent Events
Probability
Ad

Similar to Probability (20)

PPTX
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
PPTX
Day 3.pptx
PPTX
maths ca1.pptx ds45hgcr5rtv7vtcuvr6d6tgu8u8
PPTX
maths ca1 - Copy.pptx swswswqswsswqsqwssqwqw
PPT
Tp4 probability
PPTX
Slide-Chap2.pptx1231231123123123123121231
PPT
Discrete probability
PPTX
Probability and probability distribution.pptx
PPTX
Probability
PDF
mathes probabality mca syllabus for probability and stats
PDF
Course material mca
PDF
Introduction to probability.pdf
PPT
Probability theory graphical model seee.ppt
PPTX
01_Module_1-ProbabilityTheory.pptx
PPTX
BSM with Sofware package for Social Sciences
DOCX
1 Probability Please read sections 3.1 – 3.3 in your .docx
PPT
Probability concepts and procedures law of profitability
PDF
Lecture 1,2 maths presentation slides.pdf
KEY
Probability Review
PDF
R4 m.s. radhakrishnan, probability &amp; statistics, dlpd notes.
MATHS PRESENTATION OF STATISTICS AND PROBABILITY.pptx
Day 3.pptx
maths ca1.pptx ds45hgcr5rtv7vtcuvr6d6tgu8u8
maths ca1 - Copy.pptx swswswqswsswqsqwssqwqw
Tp4 probability
Slide-Chap2.pptx1231231123123123123121231
Discrete probability
Probability and probability distribution.pptx
Probability
mathes probabality mca syllabus for probability and stats
Course material mca
Introduction to probability.pdf
Probability theory graphical model seee.ppt
01_Module_1-ProbabilityTheory.pptx
BSM with Sofware package for Social Sciences
1 Probability Please read sections 3.1 – 3.3 in your .docx
Probability concepts and procedures law of profitability
Lecture 1,2 maths presentation slides.pdf
Probability Review
R4 m.s. radhakrishnan, probability &amp; statistics, dlpd notes.
Ad

More from Todd Davies (6)

PDF
Digital Public Infrastructure: A Corporation for Public Software
PPTX
From G3 to G4: What to Make of the New Sym Sys Major Requirements
PPTX
Chi2015 sig-od
PDF
Soc info2014 davies-slides
PPT
Sv code camp-slides-2011
PPT
Megaprojects militarization
Digital Public Infrastructure: A Corporation for Public Software
From G3 to G4: What to Make of the New Sym Sys Major Requirements
Chi2015 sig-od
Soc info2014 davies-slides
Sv code camp-slides-2011
Megaprojects militarization

Recently uploaded (20)

PDF
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
O7-L3 Supply Chain Operations - ICLT Program
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Sports Quiz easy sports quiz sports quiz
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Computing-Curriculum for Schools in Ghana
PDF
Basic Mud Logging Guide for educational purpose
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Anesthesia in Laparoscopic Surgery in India
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Lesson notes of climatology university.
PDF
TR - Agricultural Crops Production NC III.pdf
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PDF
Pre independence Education in Inndia.pdf
3rd Neelam Sanjeevareddy Memorial Lecture.pdf
Microbial disease of the cardiovascular and lymphatic systems
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
2.FourierTransform-ShortQuestionswithAnswers.pdf
GDM (1) (1).pptx small presentation for students
O7-L3 Supply Chain Operations - ICLT Program
human mycosis Human fungal infections are called human mycosis..pptx
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Sports Quiz easy sports quiz sports quiz
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Computing-Curriculum for Schools in Ghana
Basic Mud Logging Guide for educational purpose
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Anesthesia in Laparoscopic Surgery in India
PPH.pptx obstetrics and gynecology in nursing
Lesson notes of climatology university.
TR - Agricultural Crops Production NC III.pdf
Renaissance Architecture: A Journey from Faith to Humanism
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
Pre independence Education in Inndia.pdf

Probability

  • 2. Lecture Outline 1. Basic Probability Theory 2. Conditional Probability 3. Independence 4. Philosophical Foundations 5. Subjective Probability Elicitation 6. Heuristics and Biases in Human Probability Judgment
  • 4. 1. Basic Probability Theory Lecture: Probability SYMSYS 1 Todd Davies
  • 5. Sample spaces DEFINITION 1.1. Let S = {s1,s2,...,sn} be a finite set of possible outcomes in a context C. S is a (finite) sample space for C iff exactly one outcome among the elements of S is or will be true in C. EXAMPLE 1.2: Let C be the particular flipping of a coin. Then… ▪ S = {Heads, Tails} is a sample space for C. ▪ Another sample space for C is S' = {Heads is observed, Tails is observed, Cannot observe whether the coin is heads or tails}. ▪ Yet another is S'' = {Heads is observed and someone coughs, Heads is observed and no one coughs, Tails is observed whether someone coughs or not}.
  • 6. Event spaces DEFINITION 1.3. Let S be a sample space, and ⊘ ≠ E ⊆ 2S (E is a nonempty subset of the power set of S, i.e., it is a set of subsets of S). Then E is an event space (or algebra of events) on S iff for every A,B ∈ E: (a) S A = AC ∈ E (the S-complement of A is in E) and (b) A∪B ∈ E (the union of A and B is in E). We call the elements of E consisting of single elements of S atomic events. EXAMPLE 1.4: If S = {Heads, Tails}, then E = {⊘, {Heads}, {Tails}, {Heads, Tails}} is an event space on S. The atomic events are {Heads} and {Tails}.
  • 7. Probability measures DEFINITION 1.5. Let S be a sample space and E an event space on S. Then a function P: E →[0,1] is a (finitely additive) probability measure on E iff for every A,B ∈ E: (a) P(S) = 1 and (b) If A∩B = ⊘ (the intersection of A and B is empty, in which case we say that A and B are disjoint events), then P(A∪B) = P(A) + P(B) (additivity). The triple <S,E,P> is called a (finitely additive) probability space. A B S
  • 8. Basic corollaries COROLLARY 1.6. Binary complementarity. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(AC) = 1 – P(A). Proof. S = A∪AC by the definition of complementarity. Therefore P(A∪AC) = 1 by Definition 1.5(a). A and AC are disjoint by the definition of complementarity, so by 1.5(b), P(A∪AC) = P(A) + P(AC), so P(A) + P(AC) = 1 and the result follows by subtracting P(A) from both sides of the equation. AACS
  • 9. Basic corollaries COROLLARY 1.7. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(⊘) = 0. Proof. SC = S S = ⊘. Thus P(⊘) = P(SC) = 1 – P(S) by Corollary 1.6, which by Definition 1.5(a) is 1-1 = 0.
  • 10. Basic corollaries COROLLARY 1.8. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(A∪B) = P(A) + P(B) - P(A∩B). Proof. From set theory, we have A = (A∩B)∪(A∩BC) and B = (B∩A)∪(B∩AC), A∪B = (A∩B)∪(A∩BC)∪(B∩AC) = (A∩B)∪[(A∩BC)∪(B∩AC)], (A∩B)∩(A∩BC) = ⊘, (B∩A)∩(B∩AC) = ⊘, and (A∩B)∩[(A∩BC)∩(B∩AC)] = ⊘∩⊘=⊘. Therefore, P(A)=P[(A∩B)∪(A∩BC)]=P(A∩B)+P(A∩BC) and P(B)=P[(B∩A)∪(B∩AC)]=P(B∩A)+P(B∩AC), and P(A∪B) = P{(A∩B)∪{(A∩BC)∪(B∩AC)]} = P(A∩B)+P(A∩BC)+P(B∩AC). Substituting, P(A∪B) = P(A∩B)+[P(A)-P(A∩B)]+[P(B)-P(B∩A)]. B∩A= A∩B, so P(A∪B) = P(A)+P(B)-P(A∩B).
  • 11. Basic corollaries Venn diagram version of the previous proof: COROLLARY 1.8. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(A∪B) = P(A) + P(B) - P(A∩B). A BS
  • 12. Basic corollaries COROLLARY 1.9. Conjunction rule. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(A∩B) ≤ P(A). Proof. From set theory, we have A = (A∩B)∪(A∩BC), and (A∩B)∩(A∩BC) = ⊘, so for a probability measure, by Definition 1.5(b), P(A)=P[(A∩B)∪(A∩BC)]=P(A∩B)+P(A∩BC). By the definition of probability, P(A∩BC) ≥ 0, so P(A) – P(A∩B) = P(A∩BC) ≥ 0. Therefore P(A) ≥ P(A∩B). A B S
  • 13. Basic corollaries PROPOSITION 1.10. Disjunction rule. If <S,E,P> is a finitely additive probability space, then for all A,B ∈ E: P(A∪B) ≥ P(A). EXERCISE 1.11. Prove Proposition 1.10.
  • 15. Definition of conditional probability DEFINITION 2.1. The conditional probability P(A|B) of an event A given event B is defined as follows: P(A|B) = P(A∩B) / P(B). COROLLARY 2.2. P(A∩B) = P(A|B) P(B). A BS
  • 16. Bayes’s theorem THEOREM 2.3. For events A and B, P(A|B) = [P(B|A)P(A)] / P(B). Proof. • By Definition 2.1, P(A|B) = P(A∩B) / P(B). • A∩B = B∩A, so P(A∩B) = P(B∩A). • From Corollary 2.2, P(B∩A) = P(B|A)P(A). • From Corollary 2.2 again, P(B∩A) = P(A∩B)= P(A|B)P(B). • Therefore P(A|B)P(B) = P(B|A)P(A) [*] • The theorem follows if we divide both sides of equation [*] by P(B).
  • 17. Applying Bayes to Medical Diagnosis EXAMPLE 2.4. (from David Eddy ,1982) Calculation of the probability that a breast lesion is cancerous based on a positive mammogram: ▪ Eddy estimates the probability that a lesion will be detected through a mammogram as .792. ▪ Hence the test will turn up negative when cancer is actually present 20.8% of the time. ▪ When no cancer is present, the test produces a positive result 9.6% of the time (and is therefore correctly negative 90.4% of the time). ▪ The prior probability that a patient who has a mammogram will have cancer is taken to be 1%. ▪ Thus, the probability of cancer given as positive test as [(.792)(.01)] / [(.792)(.01) + (.096)(.99)] = .077, applying Theorem 2.3. ▪ So a patient with a positive test has less than an 8% chance of having breast cancer. Does this seem low to you?
  • 19. Independent events DEFINITION 3.1. Two events A and B are independent iff P(A∩B) = P(A)P(B). COROLLARY 3.2. Two events A and B satisfying P(B)>0 are independent iff P(A|B) = P(A). Proof. (a) Only if direction: Since A and B are independent, P(A∩B) = P(A)P(B) by Definition 3.1. Since P(B)>0, P(A|B) = P(A∩B)/P(B) = P(A)P(B)/P(B) = P(A). (b) If direction: P(A|B) = P(A), so multiplying both sides by P(B), P(A|B)P(B) = P(A)P(B) = P(A∩B).
  • 20. Independent coin tosses EXAMPLE 3.3. Consider two flips of a coin. ▪ Let A={Heads on the first toss} and B={Tails on the second toss}. ▪ The probability for tails on the second toss is not affected by what happened on the first toss and vice versa, so P(B|A) = P(B) and P(A|B) = P(A). ▪ Assuming both sides of the coin have a nonzero probability of landing on top, the tosses are independent, and therefore, P(A∩B) = P(A)P(B). ▪ Assuming the coin is unbiased (P({Heads})=P({Tails})=0.5), this means that P(A∩B) = (.5)(.5) = .25.
  • 21. Dice rolls EXERCISE 3.4. Consider two six-sided dice (with faces varying from 1 to 6 dots) which are rolled simultaneously, and assume each roll is independent of the other. What is the probability that the sum of the two dice is 7?
  • 22. Conditional independence DEFINITION 3.5. Two events A and B are conditionally independent given event C iff P(A∩B|C) = P(A|C)P(B|C). PROPOSITION 3.6. Two events A and B are conditionally independent given event C iff P(A|B∩C) = P(A|C) or P(B|C) = 0. EXERCISE 3.7. Prove Proposition 3.6.
  • 23. 4. Philosophical Foundations of Probability Lecture: Probability SYMSYS 1 Todd Davies
  • 24. 24 Classical view Coin has 2 sides. By symmetry, the probability for each side is 1 divided by 2.
  • 25. 25 Frequentist view Toss the coin many times Observe the proportion of times the coin comes up Heads versus Tails Probability for each side is therefore the observed frequency (proportion) of that outcome out of the total number of tosses.
  • 26. 26 Subjectivist view Make a judgment of how likely you think it is for this coin to turn up Heads versus Tails. Probability represents your relative degree of belief in one outcome relative to another. Also known as the Bayesian view.
  • 27. 5. Subjective Probability Elicitation Lecture: Probability SYMSYS 1 Todd Davies
  • 28. Numerical method Q: What do you believe is the probability that it will rain tomorrow? A. I would put the probability of rain tomorrow at _____%.
  • 29. Choice method Q: Which of the following two events do you think is more probable? • Rain tomorrow • No rain tomorrow
  • 30. Probability wheel Q: Choose between... (I) Receiving $50 if it rains tomorrow (II) Receiving $50 if the arrow lands within the displayed probability range
  • 31. Procedural Invariance DEFINITION 5.1. An agent’s stated confidence (elicited subjective probability) D is procedurally invariant with respect to two procedures ! and !’ iff the inferred inequality relations >! and >!’ are such that for all events A and B, D(A) >! D(B) iff D(A) >!’ D(B). Numerical Method Choice Method ≅
  • 32. Calibration DEFINITION 5.2. The confidence D of a probability judge is perfectly calibrated if for all values x ∈ [0,1] , and all events E, if D(E) = x, then the observed P(E) = x. For values x ∈ [0.5,1] , if D(E) > P(E), then the judge is said to be overconfident. If D(E) < P(E), then the judge is said to be underconfident.
  • 33. 6. Heuristics and Biases of Human Probability Judgment Lecture: Probability SYMSYS 1 Todd Davies
  • 34. Using probability theory to predict probability judgments EXPERIMENT 6.1. Test of binary complementarity. A number of experiments reported in Tversky and Koehler (1994) and Wallsten, Budescu, and Zwick (1992) show that subjects' confidence judgments generally obey Corollary 1.6, so that D(AC) = 1 – D(A). ▪ The latter authors “presented subjects with 300 propositions concerning world history and geography (e.g. 'The Monroe Doctrine was proclaimed before the Republican Party was founded')... (cited in T&K ‘94) ▪ True and false (complementary) versions of each proposition were presented on different days.” ▪ The average sum probability for propositions and their complements was 1.02, which is insignificantly different from 1.
  • 35. Probability judgment heuristics A landmark paper by Amos Tversky and Daniel Kahneman (Science, 1974) argued that human judges often employ heuristics in making judgments about uncertain prospects, in particular: • Representativeness: A is judged more likely than B in a context C if A is more representative of C than B is. • Availability: A is judged more likely than B in context C if instances of A are easier than of B to bring to mind in context C.
  • 36. Another experiment… EXPERIMENT 6.2. Tennis prediction. Tversky and Kahneman (1983): ▪ Subjects evaluated the relative likelihoods that Bjorn Borg (then the most dominant male tennis player in the world), would (a) win the final match at Wimbledon, (b) lose the first set, (c) lose the first set but win the match, and (d) win the first set but lose the match. >>> How would you rank events (a) through (d) by likelihood?
  • 37. Conjunction fallacy… EXPERIMENT 6.2. Tennis prediction. Tversky and Kahneman (1983): ▪ Subjects evaluated the relative likelihoods that Bjorn Borg (then the most dominant male tennis player in the world), would (a) win the final match at Wimbledon, (b) lose the first set, (c) lose the first set but win the match, and (d) win the first set but lose the match. ▪ The average rankings (1=most probable, 2 = second most probable, etc.) were 1.7 for a, 2.7 for b, 2.2 for c, and 3.5 for d. ▪ Thus, subjects on average ranked the conjunction of Borg losing the first set and winning the match as more likely than that he would lose the first set (2.2 versus 2.7). ▪ The authors' explanation is that people rank likelihoods based on the representativeness heuristic, which makes the conjunction of Borg's losing the first set but winning the match more representative of Borg than is the proposition that Borg loses the first set.
  • 38. Another conjunction fallacy EXPERIMENT 6.3. Natural disasters. Tversky and Kahneman asked subjects to evaluate the probability of occurrence of several events in 1983. ▪ Half of subjects evaluated a basic outcome (e.g. “A massive flood somewhere in North America in 1983, in which more than 1,000 people drown”) and the other half evaluated a more detailed scenario leading to the same outcome (e.g. “An earthquake in California sometime in 1983, causing a flood in which more than 1,000 people drown.”) ▪ The estimates of the conjunction were significantly higher than those for the flood. ▪ Thus, scenarios that include a cause-effect story appear more plausible than those that lack a cause, even though the latter are extensionally more likely. ▪ The causal story makes the conjunction easier to imagine, an aspect of the availability heuristic.
  • 39. Disjunction fallacy? EXERCISE 6.4. Construct an example experiment in which you would expect subjects to violate the disjunction rule of Proposition 1.10 (and which you were asked to prove in Exercise 1.11).
  • 40. Is human judgment Bayesian? This is a highly studied question. The answer depends on the type of judgement or cognitive task being performed, and to a lesser extent on the identity of the judge. See separate lecture in this series, “Are people Bayesian reasoners?”
  • 41. Gambler’s fallacy EXPERIMENT 6.5. Tversky and Kahneman (1974): ▪ Subjects on average regard the sequence H-T-H-T-T-H of fair coin tosses to be more likely than the sequence H-H-H-T-T-T, even though both sequences are equally likely, a result that follows from the generalized definition of event independence. ▪ People in general regard, for example, a heads toss as more likely after a long run of tails tosses than after a similar run of tails or a mixed run. This tendency has been called the “gambler's fallacy”. ▪ Tversky and Kahneman's explanation is that people expect sequences of tosses to be representative of the process that generates them. H-T-H-T- T-H is rated more likely than H-H-H-T-T-T because the former sequences is more typical of fair coin toss sequences generally, in which H and T are not grouped such that all the Hs precede all the Ts, but rather Hs and Ts are intermingled.
  • 42. Other findings Belief in the “hot hand”. Both lay and expert basketball observers perceive players as being more likely to make a shot after a hit (or a run or previous hits) than after a miss, even when the data indicate that shots are independent events as defined in 3.1. (Gilovich, Tversky, and Vallone, 1985) Overconfidence. Various studies have shown that people’s probability judgments are not well calibrated, and that on average people exhibit overconfidence by the criteria in Definition 5.2. (see Hoffrage, 2004) Belief reversals. Fox and Levav (2000) showed evidence that probability judgments are affected by the elicitation method, in violation of the procedural invariance criterion of Definition 5.1, and that choice and numerical methods sometimes yield opposite results.
  • 43. What can we learn from illusions
  • 44. What can we learn from illusions
  • 45. Further perspectives Dual process model of judgment (Stanovich and West, 2000; Kahneman and Frederick, 2002) ▪ System 1 – quick, intuitive ▪ System 2 – slower, deliberative Evolutionary psychology (e.g., Gigerenzer and Todd, 1999) Rational analysis of cognition (Oaksford and Chater, 2007) Computational probabilistic models of cognition (e.g. Tenenbaum, Kemp, Griffiths, and Goodman, 2011)