SlideShare a Scribd company logo
Bayesian statistics
2ProbabilityA random variable isthe basic element of probability Refers to an event and there is some degree of uncertainty as to the outcome of the eventFor example, the random variable A could be the event of getting a heads on a coin flip
Classical vs. BayesianClassical:Experiments are infinitely repeatable under the same conditions (hence: ’frequentist’)The parameter of interest (θ) is fixed and unknownBayesian:Each experiment is unique (i.e., not repeatable)The parameter of interest has an unknown distribution
Classical ProbabilityProperty of environment‘Physical’ probabilityImagine all data sets of size N that could be generated by sampling from the distribution determined by parameters. Each data set occurs with some probability and produces an estimate“The probability of getting heads on this particular coin is 50%”
Bayesian Probability‘Personal’ probabilityDegree of beliefProperty of person who assigns itObservations are fixed, imagine all possible values of parameters from which they could have come“I think the coin will land on heads 50% of the time”
The Bayes’ Formula
Conditional ProbabilityP(A = true | B = true) = Out of all the outcomes in which B is true, how many also have A equal to trueRead this as: “Probability of A conditioned on B” or “Probability of A given B”H = “Have a headache”F = “Coming down with Flu”P(H = true) = 1/10P(F = true) = 1/40P(H  = true | F = true) = 1/2“Headaches are rare and flu is rarer, but if you’re coming down with flu there’s a 50-50 chance you’ll have a headache.”P(F = true)P(H = true)
The Joint Probability DistributionWe will write P(A = true, B = true) to mean “the probability of A = trueandB = true”Notice that:P(H=true|F=true)P(F = true)P(H = true)In general, P(X|Y)=P(X,Y)/P(Y)
The Joint Probability DistributionJoint probabilities can be between any number of variables	eg. P(A = true, B = true, C = true)For each combination of variables, we need to say how probable that combination isThe probabilities of these combinations need to sum to 1Sums to 1
The Joint Probability DistributionOnce you have the joint probability distribution, you can calculate any probability involving A, B, and CNote: May need to use marginalization and Bayes rule, (both of which are not discussed in these slides)Examples of things you can compute:P(A=true) = sum of P(A,B,C) in rows with A=true
P(A=true, B = true | C=true) = P(A = true, B = true, C = true) / P(C = true)
The Problem with the Joint DistributionLots of entries in the table to fill up!For k Boolean random variables, you need a table of size 2kHow do we use fewer numbers?  Need the concept of independence
IndependenceVariables A and B are independent if any of the following hold:P(A,B) = P(A)P(B)P(A | B) = P(A)P(B | A) = P(B)This says that knowing the outcome of A does not tell me anything new about the outcome of B.
IndependenceHow is independence useful?Suppose you have n coin flips and you want to calculate the joint distribution P(C1, …, Cn)If the coin flips are not independent, you need 2n values in the tableIf the coin flips are independent, thenEach P(Ci) table has 2 entries and there are n of them for a total of 2n values
Conditional IndependenceVariables A and B are conditionally independent given C if any of the following hold:P(A, B | C) = P(A | C)P(B | C)P(A | B, C) = P(A | C)P(B | A, C) = P(B | C)Knowing C tells me everything about B. I don’t gain anything by knowing A (either because A doesn’t influence B or because knowing C provides all the information knowing A would give)
Example: A Clinical TestLet’s move to a simple example, a clinical test. Consider a rare disease such that at any given point of time, on average 1 in 1000 people in the population has the disease. There exists also a clinical test for the disease (a blood test, for example), but it is not perfect. The sensitivity of the test, i.e. the probability of giving a correct positive, is .99. The specificity of the test, i.e. the probability of giving a correct negative is .98. To better visualise it, let’s draw a tree
Example: A Clinical TestP(Disease) = p(θ=1)=.0001P(Test Positive|Disease) =p(x=1| θ =1)=.99P(Test Negative|No Disease) =p(x=0| θ=0)=.98.99+.0001  Disease.01-.02+.9999No Disease.98-
Example: A Clinical TestNow let us consider a person, who gets worried, takes a test, and get’s a positive result. Remembering, that the test is not perfect, what can we say about the actual chances of his being ill?
Example: A Clinical Test.99+.0001  Disease.01-.02+.9999No Disease.98-Applying the Bayes’ formula we get the result: the probability of our patient being ill, given what we know about the prevalence of the disease in the population and about the test performance, is .0049. Naturally, higher than in the background population, but still very small. Suppose it is customary to repeat the test, and the result is positive again
.99+.0001  Disease.01-.02+.9999No Disease.98-Example: A Clinical Test.99+.0049  Disease.01-.02+.9951No Disease.98-We can still use the same formula, of course, and the same tree for the visualisation, but now the probabilities of having the disease and being disease free before the test are not .0001 and .9999 respectively, but. Instead, are .0049 and .9951 based on the result of the test so far.After applying the formula this time, we get the result of .20. Still not too high, implying that the test is probably not very efficient, because evidently you have to repeat it many times before you become reasonably sure.
Bayesian Inference:Combine prior with model to produce posterior probability distributionStep-wise modeling:
ExampleThumbtack problem: will it land on the point (heads) or the flat bit (tails)?Flip it N timesWhat will it do on the N+1th time?How to compute p(xN+1|D, ξ) from p(θ|ξ)?
Inference as LearningSuppose you have a coin with an unknown bias, θ ≡ P(head).You flip the coin multiple times and observe the outcome.From observations, you can infer the bias of the coinThis is learning.  This is inference.
independent events ->Sufficient statistics
Given that I have flipped a coin 100 times and it has landed heads-up 100 times, what is the likelihood that the coin is fair?
Bayesian SolutionN=10 throws. 2 ’heads’ recorded.MODEL: x|θ ~ BIN(N=10,θ)PRIOR -?Beta(θ|α=1,β=1)Beta(θ|α=5,β=5)Beta(θ|α=5,β=20)
Posterior Distribution=> θ|x,N,α,β ~ Beta(α+x,β+N-x)Beta(θ|1+2,1+10-2) i.e., Beta(θ|3,9)
Bayesian statistics
Posterior DistributionWhen the data is ’strong enough’ the posterior will not depend on priorUnless some extraneous information is available, a non-informative or vague prior can be used, in which case the inference will mostly be based on data. If, however, some prior knowledge is available regarding the parameters of interest, then it should be included in the reasoning. Furthermore, it is a common practive to conduct a sensitivity analysis – in other words, to examine the effect the prior might have on the posterior distribution. In any case, the prior assumptions should always be made explicit, and if the result is found to be dependent on them, such sensitivity should be investigated in detail.
Posterior InferencePosterior distribution: Beta(θ|3,9)Posterior mean: .25Posterior variance: .014Posterior 95% HDR: (.041,.484)P(θ<.7)=.9994Frequentist:Mean:      .2Variance: .016Exact 95% CI: (.03,.56)
Bayesian networks
A Bayesian NetworkA Bayesian network is made up of:1. A Directed Acyclic GraphABCD2. A set of tables for each node in the graph
A Directed Acyclic GraphA node X is a parent of another node Y if there is an arrow from node X to node Yeg. A is a parent of BEach node in the graph is a random variableABCDInformally, an arrow from node X to node Y means X has a direct influence on Y
A Set of Tables for Each NodeEach node Xi has a conditional probability distribution P(Xi | Parents(Xi)) that quantifies the effect of the parents on the nodeThe parameters are the probabilities in these conditional probability tables (CPTs)ABCD
A Set of Tables for Each NodeConditional Probability Distribution for C given BFor a given combination of values of the parents (B in this example), the entries for P(C=true | B) and P(C=false | B) must add up to 1 eg. P(C=true | B=false) + P(C=false |B=false )=1If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities (but only 2k need to be stored)
Bayesian NetworksTwo important properties:Encodes the conditional independence relationships between the variables in the graph structureIs a compact representation of the joint probability distribution over the variables
Conditional IndependenceThe Markov condition: given its parents (P1, P2),a node (X) is conditionally independent of its non-descendants (ND1, ND2)P1P2XND2ND1C1C2
The Joint Probability DistributionDue to the Markov condition, we can compute the joint probability distribution over all the variables X1, …, Xn in the Bayesian net using the formula:Where Parents(Xi) means the values of the Parents of the node Xi with respect to the graph
Using a Bayesian Network ExampleUsing the network in the example, suppose you want to calculate:P(A = true, B = true, C = true, D = true)= P(A = true) * P(B = true | A = true) *    P(C = true | B = true) P( D = true | B = true) = (0.4)*(0.3)*(0.1)*(0.95)ABCD
Using a Bayesian Network ExampleUsing the network in the example, suppose you want to calculate:P(A = true, B = true, C = true, D = true)= P(A = true) * P(B = true | A = true) *    P(C = true | B = true) P( D = true | B = true) = (0.4)*(0.3)*(0.1)*(0.95)This is from the graph structureABThese numbers are from the conditional probability tablesCD
InferenceUsing a Bayesian network to compute probabilities is called inferenceIn general, inference involves queries of the form:	P( X | E )E = The evidence variable(s)X = The query variable(s)

More Related Content

PDF
Introduction to Bayesian Methods
ODP
What is bayesian statistics and how is it different?
PPT
Skewness and Kurtosis
ODP
Introduction to Bayesian Statistics
PPTX
Regression
PDF
An introduction to Bayesian Statistics using Python
PPTX
Logistic regression
PDF
Bayesian model choice (and some alternatives)
Introduction to Bayesian Methods
What is bayesian statistics and how is it different?
Skewness and Kurtosis
Introduction to Bayesian Statistics
Regression
An introduction to Bayesian Statistics using Python
Logistic regression
Bayesian model choice (and some alternatives)

What's hot (20)

PPTX
Bayesian statistics
PPT
Probability concept and Probability distribution
PPT
Logistic regression
PPTX
Hypothesis testing
PPTX
Normal distribution
PPTX
Correlation and regression
PPTX
The Wishart and inverse-wishart distribution
PDF
Probability Distributions
PDF
Bayesian inference
PPTX
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
PPT
Chi-square, Yates, Fisher & McNemar
PPT
Regression analysis ppt
PPTX
Binomial distribution
PPTX
Basics of statistics
PPTX
Probability&Bayes theorem
PPTX
STATISTICS: Normal Distribution
PPTX
Review & Hypothesis Testing
PPTX
Poisson regression models for count data
PPTX
Estimation and confidence interval
PPTX
Confidence interval
Bayesian statistics
Probability concept and Probability distribution
Logistic regression
Hypothesis testing
Normal distribution
Correlation and regression
The Wishart and inverse-wishart distribution
Probability Distributions
Bayesian inference
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Chi-square, Yates, Fisher & McNemar
Regression analysis ppt
Binomial distribution
Basics of statistics
Probability&Bayes theorem
STATISTICS: Normal Distribution
Review & Hypothesis Testing
Poisson regression models for count data
Estimation and confidence interval
Confidence interval
Ad

Viewers also liked (8)

PPS
PPTX
Wikis in the Classroom
PPTX
Enhancing Portfolios Through Creative Research Projects: Scholarship of Appli...
PPT
Storytelling with Audio Slideshows
PPT
AEJMC 2008
PPTX
Code Igniter 2
DOC
Teaching storytelling with audio slideshows – some basics
PDF
SocialLearning: descubriendo contenidos educativos de manera colaborativa
Wikis in the Classroom
Enhancing Portfolios Through Creative Research Projects: Scholarship of Appli...
Storytelling with Audio Slideshows
AEJMC 2008
Code Igniter 2
Teaching storytelling with audio slideshows – some basics
SocialLearning: descubriendo contenidos educativos de manera colaborativa
Ad

Similar to Bayesian statistics (20)

PPT
Basic Concept Of Probability
PPT
Bayes Classification
PPTX
probability assignment help (2)
PPTX
Binomial probability distributions
PDF
Data mining assignment 2
DOCX
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
PPT
Bayesian Belief Network (BBN) Bayesian Belief Network (BBN) Bayesian Belief N...
PDF
Introduction to Evidential Neural Networks
PPT
Introduction to Bayesian Statistics.ppt
PPTX
ch4 probablity and probablity destrubition
DOCX
Descriptive Statistics Formula Sheet Sample Populatio.docx
PPTX
ISM_Session_5 _ 23rd and 24th December.pptx
PPTX
2.statistical DEcision makig.pptx
PPTX
BSM with Sofware package for Social Sciences
PPT
9-Decision Tree Induction-23-01-2025.ppt
PDF
Probability cheatsheet
PPT
PDF
1615 probability-notation for joint probabilities
PDF
2013.03.26 Bayesian Methods for Modern Statistical Analysis
Basic Concept Of Probability
Bayes Classification
probability assignment help (2)
Binomial probability distributions
Data mining assignment 2
HW1_STAT206.pdfStatistical Inference II J. Lee Assignment.docx
Bayesian Belief Network (BBN) Bayesian Belief Network (BBN) Bayesian Belief N...
Introduction to Evidential Neural Networks
Introduction to Bayesian Statistics.ppt
ch4 probablity and probablity destrubition
Descriptive Statistics Formula Sheet Sample Populatio.docx
ISM_Session_5 _ 23rd and 24th December.pptx
2.statistical DEcision makig.pptx
BSM with Sofware package for Social Sciences
9-Decision Tree Induction-23-01-2025.ppt
Probability cheatsheet
1615 probability-notation for joint probabilities
2013.03.26 Bayesian Methods for Modern Statistical Analysis

More from Alberto Labarga (20)

PDF
El Salto Communities - EditorsLab 2017
PDF
Shokesu - Premio Nobel de Literatura a Bob Dylan
PDF
Genome visualization challenges
PDF
Hacksanfermin 2015 :: Dropcoin Street
PDF
hacksanfermin 2015 :: Parking inteligente
PDF
jpd5 big data
PDF
Vidas Contadas :: Visualizar 2015
PDF
Periodismo de datos y visualización de datos abiertos #siglibre9
PDF
myHealthHackmedicine
PDF
Big Data y Salud
PDF
Arduino: Control de motores
PDF
Entrada/salida analógica con Arduino
PDF
Práctica con Arduino: Simon Dice
PDF
Entrada/Salida digital con Arduino
PDF
Presentación Laboratorio de Fabricación Digital UPNA 2014
PDF
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
PDF
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
PDF
Introducción a la impresión 3D
PDF
Vidas Contadas
PDF
La vida y trabajo de Shinichi Suzuki
El Salto Communities - EditorsLab 2017
Shokesu - Premio Nobel de Literatura a Bob Dylan
Genome visualization challenges
Hacksanfermin 2015 :: Dropcoin Street
hacksanfermin 2015 :: Parking inteligente
jpd5 big data
Vidas Contadas :: Visualizar 2015
Periodismo de datos y visualización de datos abiertos #siglibre9
myHealthHackmedicine
Big Data y Salud
Arduino: Control de motores
Entrada/salida analógica con Arduino
Práctica con Arduino: Simon Dice
Entrada/Salida digital con Arduino
Presentación Laboratorio de Fabricación Digital UPNA 2014
Conceptos de electrónica - Laboratorio de Fabricación Digital UPNA 2014
Introducción a la plataforma Arduino - Laboratorio de Fabricación Digital UPN...
Introducción a la impresión 3D
Vidas Contadas
La vida y trabajo de Shinichi Suzuki

Recently uploaded (20)

PDF
O5-L3 Freight Transport Ops (International) V1.pdf
PPTX
Cell Structure & Organelles in detailed.
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
Cell Types and Its function , kingdom of life
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
VCE English Exam - Section C Student Revision Booklet
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Computing-Curriculum for Schools in Ghana
PDF
Insiders guide to clinical Medicine.pdf
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
Microbial disease of the cardiovascular and lymphatic systems
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
Pharma ospi slides which help in ospi learning
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
O7-L3 Supply Chain Operations - ICLT Program
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
O5-L3 Freight Transport Ops (International) V1.pdf
Cell Structure & Organelles in detailed.
PPH.pptx obstetrics and gynecology in nursing
Cell Types and Its function , kingdom of life
2.FourierTransform-ShortQuestionswithAnswers.pdf
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
Anesthesia in Laparoscopic Surgery in India
VCE English Exam - Section C Student Revision Booklet
TR - Agricultural Crops Production NC III.pdf
Classroom Observation Tools for Teachers
Supply Chain Operations Speaking Notes -ICLT Program
Computing-Curriculum for Schools in Ghana
Insiders guide to clinical Medicine.pdf
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Microbial disease of the cardiovascular and lymphatic systems
FourierSeries-QuestionsWithAnswers(Part-A).pdf
Pharma ospi slides which help in ospi learning
Renaissance Architecture: A Journey from Faith to Humanism
O7-L3 Supply Chain Operations - ICLT Program
Chapter 2 Heredity, Prenatal Development, and Birth.pdf

Bayesian statistics

  • 2. 2ProbabilityA random variable isthe basic element of probability Refers to an event and there is some degree of uncertainty as to the outcome of the eventFor example, the random variable A could be the event of getting a heads on a coin flip
  • 3. Classical vs. BayesianClassical:Experiments are infinitely repeatable under the same conditions (hence: ’frequentist’)The parameter of interest (θ) is fixed and unknownBayesian:Each experiment is unique (i.e., not repeatable)The parameter of interest has an unknown distribution
  • 4. Classical ProbabilityProperty of environment‘Physical’ probabilityImagine all data sets of size N that could be generated by sampling from the distribution determined by parameters. Each data set occurs with some probability and produces an estimate“The probability of getting heads on this particular coin is 50%”
  • 5. Bayesian Probability‘Personal’ probabilityDegree of beliefProperty of person who assigns itObservations are fixed, imagine all possible values of parameters from which they could have come“I think the coin will land on heads 50% of the time”
  • 7. Conditional ProbabilityP(A = true | B = true) = Out of all the outcomes in which B is true, how many also have A equal to trueRead this as: “Probability of A conditioned on B” or “Probability of A given B”H = “Have a headache”F = “Coming down with Flu”P(H = true) = 1/10P(F = true) = 1/40P(H = true | F = true) = 1/2“Headaches are rare and flu is rarer, but if you’re coming down with flu there’s a 50-50 chance you’ll have a headache.”P(F = true)P(H = true)
  • 8. The Joint Probability DistributionWe will write P(A = true, B = true) to mean “the probability of A = trueandB = true”Notice that:P(H=true|F=true)P(F = true)P(H = true)In general, P(X|Y)=P(X,Y)/P(Y)
  • 9. The Joint Probability DistributionJoint probabilities can be between any number of variables eg. P(A = true, B = true, C = true)For each combination of variables, we need to say how probable that combination isThe probabilities of these combinations need to sum to 1Sums to 1
  • 10. The Joint Probability DistributionOnce you have the joint probability distribution, you can calculate any probability involving A, B, and CNote: May need to use marginalization and Bayes rule, (both of which are not discussed in these slides)Examples of things you can compute:P(A=true) = sum of P(A,B,C) in rows with A=true
  • 11. P(A=true, B = true | C=true) = P(A = true, B = true, C = true) / P(C = true)
  • 12. The Problem with the Joint DistributionLots of entries in the table to fill up!For k Boolean random variables, you need a table of size 2kHow do we use fewer numbers? Need the concept of independence
  • 13. IndependenceVariables A and B are independent if any of the following hold:P(A,B) = P(A)P(B)P(A | B) = P(A)P(B | A) = P(B)This says that knowing the outcome of A does not tell me anything new about the outcome of B.
  • 14. IndependenceHow is independence useful?Suppose you have n coin flips and you want to calculate the joint distribution P(C1, …, Cn)If the coin flips are not independent, you need 2n values in the tableIf the coin flips are independent, thenEach P(Ci) table has 2 entries and there are n of them for a total of 2n values
  • 15. Conditional IndependenceVariables A and B are conditionally independent given C if any of the following hold:P(A, B | C) = P(A | C)P(B | C)P(A | B, C) = P(A | C)P(B | A, C) = P(B | C)Knowing C tells me everything about B. I don’t gain anything by knowing A (either because A doesn’t influence B or because knowing C provides all the information knowing A would give)
  • 16. Example: A Clinical TestLet’s move to a simple example, a clinical test. Consider a rare disease such that at any given point of time, on average 1 in 1000 people in the population has the disease. There exists also a clinical test for the disease (a blood test, for example), but it is not perfect. The sensitivity of the test, i.e. the probability of giving a correct positive, is .99. The specificity of the test, i.e. the probability of giving a correct negative is .98. To better visualise it, let’s draw a tree
  • 17. Example: A Clinical TestP(Disease) = p(θ=1)=.0001P(Test Positive|Disease) =p(x=1| θ =1)=.99P(Test Negative|No Disease) =p(x=0| θ=0)=.98.99+.0001 Disease.01-.02+.9999No Disease.98-
  • 18. Example: A Clinical TestNow let us consider a person, who gets worried, takes a test, and get’s a positive result. Remembering, that the test is not perfect, what can we say about the actual chances of his being ill?
  • 19. Example: A Clinical Test.99+.0001 Disease.01-.02+.9999No Disease.98-Applying the Bayes’ formula we get the result: the probability of our patient being ill, given what we know about the prevalence of the disease in the population and about the test performance, is .0049. Naturally, higher than in the background population, but still very small. Suppose it is customary to repeat the test, and the result is positive again
  • 20. .99+.0001 Disease.01-.02+.9999No Disease.98-Example: A Clinical Test.99+.0049 Disease.01-.02+.9951No Disease.98-We can still use the same formula, of course, and the same tree for the visualisation, but now the probabilities of having the disease and being disease free before the test are not .0001 and .9999 respectively, but. Instead, are .0049 and .9951 based on the result of the test so far.After applying the formula this time, we get the result of .20. Still not too high, implying that the test is probably not very efficient, because evidently you have to repeat it many times before you become reasonably sure.
  • 21. Bayesian Inference:Combine prior with model to produce posterior probability distributionStep-wise modeling:
  • 22. ExampleThumbtack problem: will it land on the point (heads) or the flat bit (tails)?Flip it N timesWhat will it do on the N+1th time?How to compute p(xN+1|D, ξ) from p(θ|ξ)?
  • 23. Inference as LearningSuppose you have a coin with an unknown bias, θ ≡ P(head).You flip the coin multiple times and observe the outcome.From observations, you can infer the bias of the coinThis is learning. This is inference.
  • 25. Given that I have flipped a coin 100 times and it has landed heads-up 100 times, what is the likelihood that the coin is fair?
  • 26. Bayesian SolutionN=10 throws. 2 ’heads’ recorded.MODEL: x|θ ~ BIN(N=10,θ)PRIOR -?Beta(θ|α=1,β=1)Beta(θ|α=5,β=5)Beta(θ|α=5,β=20)
  • 27. Posterior Distribution=> θ|x,N,α,β ~ Beta(α+x,β+N-x)Beta(θ|1+2,1+10-2) i.e., Beta(θ|3,9)
  • 29. Posterior DistributionWhen the data is ’strong enough’ the posterior will not depend on priorUnless some extraneous information is available, a non-informative or vague prior can be used, in which case the inference will mostly be based on data. If, however, some prior knowledge is available regarding the parameters of interest, then it should be included in the reasoning. Furthermore, it is a common practive to conduct a sensitivity analysis – in other words, to examine the effect the prior might have on the posterior distribution. In any case, the prior assumptions should always be made explicit, and if the result is found to be dependent on them, such sensitivity should be investigated in detail.
  • 30. Posterior InferencePosterior distribution: Beta(θ|3,9)Posterior mean: .25Posterior variance: .014Posterior 95% HDR: (.041,.484)P(θ<.7)=.9994Frequentist:Mean: .2Variance: .016Exact 95% CI: (.03,.56)
  • 32. A Bayesian NetworkA Bayesian network is made up of:1. A Directed Acyclic GraphABCD2. A set of tables for each node in the graph
  • 33. A Directed Acyclic GraphA node X is a parent of another node Y if there is an arrow from node X to node Yeg. A is a parent of BEach node in the graph is a random variableABCDInformally, an arrow from node X to node Y means X has a direct influence on Y
  • 34. A Set of Tables for Each NodeEach node Xi has a conditional probability distribution P(Xi | Parents(Xi)) that quantifies the effect of the parents on the nodeThe parameters are the probabilities in these conditional probability tables (CPTs)ABCD
  • 35. A Set of Tables for Each NodeConditional Probability Distribution for C given BFor a given combination of values of the parents (B in this example), the entries for P(C=true | B) and P(C=false | B) must add up to 1 eg. P(C=true | B=false) + P(C=false |B=false )=1If you have a Boolean variable with k Boolean parents, this table has 2k+1 probabilities (but only 2k need to be stored)
  • 36. Bayesian NetworksTwo important properties:Encodes the conditional independence relationships between the variables in the graph structureIs a compact representation of the joint probability distribution over the variables
  • 37. Conditional IndependenceThe Markov condition: given its parents (P1, P2),a node (X) is conditionally independent of its non-descendants (ND1, ND2)P1P2XND2ND1C1C2
  • 38. The Joint Probability DistributionDue to the Markov condition, we can compute the joint probability distribution over all the variables X1, …, Xn in the Bayesian net using the formula:Where Parents(Xi) means the values of the Parents of the node Xi with respect to the graph
  • 39. Using a Bayesian Network ExampleUsing the network in the example, suppose you want to calculate:P(A = true, B = true, C = true, D = true)= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true) = (0.4)*(0.3)*(0.1)*(0.95)ABCD
  • 40. Using a Bayesian Network ExampleUsing the network in the example, suppose you want to calculate:P(A = true, B = true, C = true, D = true)= P(A = true) * P(B = true | A = true) * P(C = true | B = true) P( D = true | B = true) = (0.4)*(0.3)*(0.1)*(0.95)This is from the graph structureABThese numbers are from the conditional probability tablesCD
  • 41. InferenceUsing a Bayesian network to compute probabilities is called inferenceIn general, inference involves queries of the form: P( X | E )E = The evidence variable(s)X = The query variable(s)
  • 42. InferenceHasAnthraxHasCoughHasFeverHasDifficultyBreathingHasWideMediastinumAn example of a query would be: P( HasAnthrax = true | HasFever = true, HasCough= true)Note: Even though HasDifficultyBreathing and HasWideMediastinum are in the Bayesian network, they are not given values in the query (ie. they do not appear either as query variables or evidence variables)They are treated as unobserved variables