2. Graphical Models
• A graphical model is a probabilistic model for which a graph expresses
the conditional dependence structure between random variables.
• It provides a language to facilitate communication between a domain
expert and a statistician, provide flexible and modular definitions of
families of probability distributions, and are amenable to scalable
computational techniques
• Graphical models in machine learning are a powerful framework used to
represent and reason about the dependencies between variables.
• These models provide a structured way to visualize and compute joint
probabilities for a set of variables in complex systems, which is useful for
tasks like prediction, decision making, and inference.
3. Graphical Models
• The Graphical model (GM) is a branch of ML which uses a graph to
represent a domain problem
• Probabilistic graphical modeling combines both probability and graph
theory
• Also called as Bayesian networks, belief networks or probabilistic networks
• Consists of graph structure-Nodes and arcs
• Two categories — Bayesian networks and Markov networks
4. Graphical Models
• Each node corresponds to a random variable, X, and has a value
corresponding to the probability of the random variable, P(X).
• If there is a directed arc from node X to node Y, this indicates that X has a
direct influence on Y.
• This influence is specified by the conditional probability directed acyclic
P(Y|X).
• Bayesian -The network is a directed acyclic graph (DAG); namely, these are
graph no cycles.
• The nodes and the arcs between the nodes define the structure of the
network, and the conditional probabilities are the parameters given the
structure.
5. Example
• This example models that rain causes the grass to get wet
• It rains on 40 percent of the days and when it rains, there is a 90 percent
chance that the grass gets wet; maybe 10 percent of the time it does not
rain long enough for us to really consider the grass wet enough.
• The random variables in this example are binary; they are either true or
false.
• There is a 20 percent probability that the grass gets wet without its actually
raining, for example, when a sprinkler is used
7. Conditional Independence
• In a graphical model, not all nodes are connected; actually, in general, a node is
connected to only a small number of other nodes.
• Certain subgraphs imply conditional independence statements, and these allow
us to break down a complex graph into smaller subsets in which inferences can
be done locally and whose results are later propagated over the graph
9. Canonical Cases for Conditional
Independence
Case 1: Head-to-tail Connection
•Three events may be connected serially, as seen in figure . We see
here that X and Z are independent given Y: Knowing Y tells Z
everything; knowing the state of X does not add any extra knowledge
for Z; we write P(Z|Y,X)= P(Z|Y). We say that Y blocks the path from X
to Z, or in other words, it separates them in the sense that if Y is
removed, there is no path between X to Z. In this case, the joint is
written as
12. Case 2: Tail-to-tail
X may be the parent of two nodes Y and Z. The joint density is
written as
Normally Y and Z are dependent through X; given X, they
become independent:
19. BAYESIAN NETWORKS
Directed graphs not contain cycles, that is, there cannot be any loops in the
graphs(DAGs: directed, acyclic graphs) , when they are paired with the
conditional probability tables, they are called Bayesian networks
Bayesian Networks help us to effectively visualize the probabilistic model for
each domain and to study the relationship between random variables in the
form of a user-friendly graph.
20. Why Bayes Network?
Bayes optimal classifier is too costly to apply
Naïve Bayes makes overly restrictive assumptions.
But all variables are rarely completely independent.
Bayes network represents conditional independence relations among the
features.
Representation of causal relations makes the representation and
inference efficient.
21. Bayes Network
Two different ways to calculate the conditional probability.
Given A and B are dependent events, the conditional probability is
calculated as P (A| B) = P (A and B) / P (B)
If A and B are independent events, then the expression for conditional
probability is given by, P(A| B) = P (A)
22. Bayesian Network – example 1
o The probability of a random variable depends on his parents.
oBayesian network models capture both conditionally dependent and conditionally
independent relationships between random variables.
Create a Bayesian Network that will model the marks of a student in his
examination
23. Bayesian Network- example
The marks will depend on
Exam Level (e) :(difficult, easy)
IQ of the students(I): (high,low)
Marks -> admitted to a university
The IQ -> aptitude score(s) of the
student
Each node has a probability table
24. Bayesian Network- example
Exam level and IQ level are parent nodes – represented the probability
Marks depends on Exam level and IQ level – represented by conditional
probability .
Conditional probability table for Marks contains entry for Exam level and IQ
level
Conditional probability table for Admission contains entry for Marks
Conditional probability table for Apti score contains entry for IQ level
25. Bayesian Network- example
Calculate Joint probability
p(a,m,i,e,s)=p(a|m) p(m|i,e) p(e) p(i) p(s|i)
p(a|m) : CP of student admit-> marks
p(m|i,d):cp of the student’s marks ->(IQ &
Exam level)
p(i): probability -> IQ level
p(e): probability -> exam level
p(a): probability ->aptitude level
p(s|i) CP of aptitude scores ->IQ level
26. Bayesian Network- example
Calculate the probability that in spite of the exam level being
difficult, the student having a low IQ level and a low Aptitude Score,
manages to pass the exam and secure admission to the university.
Joint Probability Distribution can be written as
P[a=1, m=1, i=0, e=1, s=0]
From the above Conditional Probability tables, the values for the
given conditions are fed to the formula and is calculated as below.
P[a=1, m=1, i=0, e=0, s=0] = P(a=1 | m=1) . P(m=1 | i=0, e=1) . P(i=0) .
P(e=1) . P(s=0 | i=0)
= 0.1 * 0.1 * 0.8 * 0.3 * 0.75
= 0.0018
27. Bayesian Networks – Example 2
You have a new burglar alarm installed at home
It is reliable at detecting burglary ,but also sometimes responds to minor
earthquakes.
You have two neighbors, John and Mary ,who promised to call you at work
when they hear the alarm
John always calls when he hears the alarm, but sometimes confuses
telephone ringing with the alarm and calls too
Merry likes loud music and sometimes misses the alarm
Given the evidence of who has or has not called, we would like to estimate the
probability of a burglary
28. Probability for no
burglary =1-0.01
=0.99
Probability of for
earthquake =1-0.02
=0.98
Probability for no
alarm given burglary
and earthquake =1-
0.95
Probability for Mary
will not call and no
=1-0.01=0.99
29. 1.What is the probability that the alarm has sounded but neither a burglary nor
an earthquake has occurred, and both John and Merry call?
31. Naive Bayes’ Classifier
If the inputs are independent, we have the graph
which is called the naive Bayes’ classifier, because
it ignores possible dependencies, namely,
correlations, among the inputs and reduces a
multivariate problem to a group of univariate
problems
34. The Hidden Markov
model (HMM)
• The Hidden Markov model (HMM) is a statistical model and uses a
Markov process that contains hidden and unknown parameters.
• In this model, the observed parameters are used to identify the
hidden parameters. These parameters are then used for further
analysis
• It is a probabilistic graphical model that is commonly used in
statistical pattern recognition and classification.