SlideShare a Scribd company logo
Markov Models
Charles Yan
2008
2
Markov Chains
 A Markov process is a stochastic process (random process) in
which the probability distribution of the current state is
conditionally independent of the path of past states, a
characteristic called the Markov property.
 Markov chain is a discrete-time stochastic process with the
Markov property
 I will use a gene finding example (to be exactly, CpG islands
identification) to show Markov chains, since it is a simple and
well-studied case.
 The same approach can be used to other problems.
3
Markov Chains
 The CG island is a short stretch of DNA in which the
frequency of the CG sequence is higher than other
regions. It is also called the CpG island, where "p" simply
indicates that "C" and "G" are connected by a
phosphodiester bond.
 Whenever the dinucleotide CpG occurs, the C nucleotide is
typically chemically modified by methylation.
 C of CpG is methylated into methyl-C.
 methyl-C mutates into T relatively easily.
4
Markov Chains
 Thus, in general, CpG dinuclueotides are rarer in the
genome. F (CpG) < f(C) * f(G).
 Methylation process is supressed before the “starting
point” of many genes.
 These regions (CpG islands) have more CpG than
elsewhere.
 Usually, CpG islands are a few hundred to a few
thousand bases long.
 Identification of CpG islands is important for gene
finding.
5
Markov Chains
APRT
(Homo Sapiens)
6
Markov Chains
 We want to develop a probabilistic model for CpG
islands, such that every CpG island sequence is
generated by the model.
 Since dinucleotides are important, we want a model
that generates sequences in which the probability of
a symbol depends on the previous symbol.
 The simplest one is a Markov chain.
7
Markov Chains
8
Markov Chains
Training the model, i.e., estimate the transition probabilities


`
`
t
st
st
st
c
c
a Where Cst is the number of times that letter t
followed letter s
Maximum likelihood (ML) approach is used to
estimated the transition probabilities
9
Prediction Using Data-Mining Approach
is
10
Markov Chains
The probability that a sequence x is generated by a
Markov chain model
By applying many times of
)
|
(
)
(
)
,
( X
Y
P
X
P
Y
X
P 

11
Markov Chains
One assumption of Markov chain is that the probability
of xi only depend on the previous symbol xi-1, i.e.,
Thus,
12
Markov Chains
 Given a sequence x, does it belong to CpG islands?
If the log likelihood ratio >0, then x belongs to CpG
islands.
13
Markov Chains
In this model, we must specify the probability P(x1) as
well as the transition probabilities .
To make the formula homogeneous (i.e., comprise of
only terms in the form of ), we can
introduce a begin state to the model.
14
Markov Chains
15
Markov Chains
The probability that a sequence x is generated by a
Markov chain model (with a begin state)
16
Markov Chains
Training the model, i.e., estimate the transition probabilities


`
`
t
st
st
st
c
c
a Where Cst is the number of times that letter t
followed letter s
Maximum likelihood (ML) approach is used to
estimated the transition probabilities
17
Markov Chains
 A set of CpG islands
(CpG model)
 1st row: The probabilities
that A is followed by each
of the four bases.
 The sum of each row is 1
 A set of sequences that
are not CpG islands
(Background model)
18
Markov Chains
 Given a sequence x, does it belong to CpG islands?
If the log likelihood ratio >0, then x belongs to CpG
islands.
19
Markov Chains
20
Markov Chains
21
Markov Chains
22
Markov Chains to Hidden Markov Models

More Related Content

PPT
ch14MarkovChainkfkkklmkllmkkaskldask.ppt
PDF
Sparse Random Network Coding for Reliable Multicast Services
PDF
Book chapter-5
PDF
CpG Island Identification with Hidden Markov Models
PDF
Hastings 1970
PDF
short course at CIRM, Bayesian Masterclass, October 2018
PDF
An Analytical Expression for Service Curves of Fading Channels
PDF
TIME-DOMAIN SIMULATION OF ELECTROMAGNETIC WAVE PROPAGATION IN A MAGNETIZED PL...
ch14MarkovChainkfkkklmkllmkkaskldask.ppt
Sparse Random Network Coding for Reliable Multicast Services
Book chapter-5
CpG Island Identification with Hidden Markov Models
Hastings 1970
short course at CIRM, Bayesian Masterclass, October 2018
An Analytical Expression for Service Curves of Fading Channels
TIME-DOMAIN SIMULATION OF ELECTROMAGNETIC WAVE PROPAGATION IN A MAGNETIZED PL...

Similar to Markov_Chains.ppt (20)

PDF
Assignment 1 -_jasper_hatilima
PPTX
Hidden Markov Models
PPSX
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
PPTX
Lecture 6 - Marcov Chain introduction.pptx
PDF
Bag of Pursuits and Neural Gas for Improved Sparse Codin
PDF
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
PPTX
Monte Carlo Berkeley.pptx
PDF
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
PPTX
머피의 머신러닝: 17장 Markov Chain and HMM
PDF
Introduction to MCMC methods
PDF
50620130101005
PDF
PDF
Slides -e._starostina
PDF
First paper with the NITheCS affiliation
PDF
Comparison_of_Genetic_Algorithm_and_Quantum_Geneti.pdf
PPTX
Cryptography using probability
PPT
polymerase chain reaction and so on .ppt
PDF
10.1.1.666.9435
Assignment 1 -_jasper_hatilima
Hidden Markov Models
Characterization of Subsurface Heterogeneity: Integration of Soft and Hard In...
Lecture 6 - Marcov Chain introduction.pptx
Bag of Pursuits and Neural Gas for Improved Sparse Codin
LTE Physical Layer Transmission Mode Selection Over MIMO Scattering Channels
Monte Carlo Berkeley.pptx
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
머피의 머신러닝: 17장 Markov Chain and HMM
Introduction to MCMC methods
50620130101005
Slides -e._starostina
First paper with the NITheCS affiliation
Comparison_of_Genetic_Algorithm_and_Quantum_Geneti.pdf
Cryptography using probability
polymerase chain reaction and so on .ppt
10.1.1.666.9435
Ad

Recently uploaded (20)

PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
DOCX
80 DE ÔN VÀO 10 NĂM 2023vhkkkjjhhhhjjjj
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
PPTX
Slide gioi thieu VietinBank Quy 2 - 2025
PDF
Keppel_Proposed Divestment of M1 Limited
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PPTX
interschool scomp.pptxzdkjhdjvdjvdjdhjhieij
PDF
TyAnn Osborn: A Visionary Leader Shaping Corporate Workforce Dynamics
PPTX
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
PPTX
2025 Product Deck V1.0.pptxCATALOGTCLCIA
PDF
Satish NS: Fostering Innovation and Sustainability: Haier India’s Customer-Ce...
PPTX
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PPTX
CTG - Business Update 2Q2025 & 6M2025.pptx
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Lecture 3344;;,,(,(((((((((((((((((((((((
80 DE ÔN VÀO 10 NĂM 2023vhkkkjjhhhhjjjj
Principles of Marketing, Industrial, Consumers,
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
Slide gioi thieu VietinBank Quy 2 - 2025
Keppel_Proposed Divestment of M1 Limited
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
interschool scomp.pptxzdkjhdjvdjvdjdhjhieij
TyAnn Osborn: A Visionary Leader Shaping Corporate Workforce Dynamics
svnfcksanfskjcsnvvjknsnvsdscnsncxasxa saccacxsax
2025 Product Deck V1.0.pptxCATALOGTCLCIA
Satish NS: Fostering Innovation and Sustainability: Haier India’s Customer-Ce...
TRAINNING, DEVELOPMENT AND APPRAISAL.pptx
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
CTG - Business Update 2Q2025 & 6M2025.pptx
Tata consultancy services case study shri Sharda college, basrur
NEW - FEES STRUCTURES (01-july-2024).pdf
Family Law: The Role of Communication in Mediation (www.kiu.ac.ug)
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
Digital Marketing & E-commerce Certificate Glossary.pdf.................
Ad

Markov_Chains.ppt

  • 2. 2 Markov Chains  A Markov process is a stochastic process (random process) in which the probability distribution of the current state is conditionally independent of the path of past states, a characteristic called the Markov property.  Markov chain is a discrete-time stochastic process with the Markov property  I will use a gene finding example (to be exactly, CpG islands identification) to show Markov chains, since it is a simple and well-studied case.  The same approach can be used to other problems.
  • 3. 3 Markov Chains  The CG island is a short stretch of DNA in which the frequency of the CG sequence is higher than other regions. It is also called the CpG island, where "p" simply indicates that "C" and "G" are connected by a phosphodiester bond.  Whenever the dinucleotide CpG occurs, the C nucleotide is typically chemically modified by methylation.  C of CpG is methylated into methyl-C.  methyl-C mutates into T relatively easily.
  • 4. 4 Markov Chains  Thus, in general, CpG dinuclueotides are rarer in the genome. F (CpG) < f(C) * f(G).  Methylation process is supressed before the “starting point” of many genes.  These regions (CpG islands) have more CpG than elsewhere.  Usually, CpG islands are a few hundred to a few thousand bases long.  Identification of CpG islands is important for gene finding.
  • 6. 6 Markov Chains  We want to develop a probabilistic model for CpG islands, such that every CpG island sequence is generated by the model.  Since dinucleotides are important, we want a model that generates sequences in which the probability of a symbol depends on the previous symbol.  The simplest one is a Markov chain.
  • 8. 8 Markov Chains Training the model, i.e., estimate the transition probabilities   ` ` t st st st c c a Where Cst is the number of times that letter t followed letter s Maximum likelihood (ML) approach is used to estimated the transition probabilities
  • 10. 10 Markov Chains The probability that a sequence x is generated by a Markov chain model By applying many times of ) | ( ) ( ) , ( X Y P X P Y X P  
  • 11. 11 Markov Chains One assumption of Markov chain is that the probability of xi only depend on the previous symbol xi-1, i.e., Thus,
  • 12. 12 Markov Chains  Given a sequence x, does it belong to CpG islands? If the log likelihood ratio >0, then x belongs to CpG islands.
  • 13. 13 Markov Chains In this model, we must specify the probability P(x1) as well as the transition probabilities . To make the formula homogeneous (i.e., comprise of only terms in the form of ), we can introduce a begin state to the model.
  • 15. 15 Markov Chains The probability that a sequence x is generated by a Markov chain model (with a begin state)
  • 16. 16 Markov Chains Training the model, i.e., estimate the transition probabilities   ` ` t st st st c c a Where Cst is the number of times that letter t followed letter s Maximum likelihood (ML) approach is used to estimated the transition probabilities
  • 17. 17 Markov Chains  A set of CpG islands (CpG model)  1st row: The probabilities that A is followed by each of the four bases.  The sum of each row is 1  A set of sequences that are not CpG islands (Background model)
  • 18. 18 Markov Chains  Given a sequence x, does it belong to CpG islands? If the log likelihood ratio >0, then x belongs to CpG islands.
  • 22. 22 Markov Chains to Hidden Markov Models