SlideShare a Scribd company logo
Maximum Likelihood
Likelihood
The likelihood is the probability of the data given the
model.
If we flip a coin and get a head and we think the coin is
unbiased, then the probability of observing this head is 0.5.
If we think the coin is biased so that we expect to get a head
80% of the time, then the likelihood of observing this datum (a
head) is 0.8.
The likelihood of making some observation is entirely
dependent on the model that underlies our assumption.
The datum has not changed, our model has. Therefore under
the new model the likelihood of observing the datum has
changed.
Likelihood
Maximum Likelihood (ML)
ML assumes a explicit model of sequence evolution. This is
justifiable, since molecular sequence data can be shown to
have arisen according to a stochastic process.
ML attempts to answer the question:
What is the probability that I would observe these data (a
multiple sequence alignment) given a particular model of
evolution (a tree and a process)?
Likelihood calculations
In molecular phylogenetics, the data are an alignment of sequences
We optimize parameters and branch lengths to get the maximum likelihood
Each site has a likelihood
The total likelihood is the product of the site likelihoods
The maximum likelihood tree is the tree topology that gives the highest
(optimized) likelihood under the given model.
We use reversible models, so the position of the root does not matter.
What is the probability of observing a G nucleotide?
If we have a DNA sequence of 1 nucleotide in length and the identity of this
nucleotide is G, what is the likelihood that we would observe this G?
In the same way as the coin-flipping observation, the likelihood of observing
this G is dependent on the model of sequence evolution that is thought to
underlie the data.
Model 1: frequency of G = 0.4 => likelihood(G) = 0.4
Model 2: frequency of G = 0.1 => likelihood(G) = 0.1
Model 3: frequency of G = 0.25 => likelihood(G) = 0.25
What about longer sequences?
If we consider a gene of length 2
gene 1 GA
The the probability of observing this gene is the product of the
probabilities of observing each character
Model frequency of G = 0.4 frequencyof A= 0.15
p(G) = 0.4 p(A) =0.15
Likelihood (GA) = 0.4 x 0.15 = 0.06
…or even longer sequences?
gene 1 GACTAGCTAGACAGATACGAATTAC
Model simple base frequency model
p(A)=0.15; p(C)=0.2; p(G)=0.4; p(T)=0.25;
(the sum of all probabilities must equal 1)
Likelihood (gene 1) = 0.000000000000000018452813
Note about models
You might notice that our model of base frequency is not the
optimal model for our observed data.
If we had used the following model
p(A)=0.4; p(C) =0.2; p(G)= 0.2; p(T) = 0.2;
The likelihood of observing the gene is
L (gene 1) = 0.000000000000335544320000
L (gene 1) = 0.000000000000000018452813
The datum has not changed, our model has. Therefore under
the new model the likelihood of observing the datum has
changed.
Increase in model sophistication
It is no longer possible to simply invoke a model that
encompasses base composition, we must also include the
mechanism of sequence change and stasis.
There are two parts to this model - the tree and the process
(the latter is confusingly referred to as the model, although
both parts really compose the model).
Different Branch Lengths
For very short branch lengths, the probability of a character staying the
same is high and the probability of it changing is low.
For longer branch lengths, the probability of character change becomes
higher and the probability of staying the same is lower.
The previous calculations are based on the assumption that the branch
length describes one Certain Evolutionary Distance or CED.
If we want to consider a branch length that is twice as long (2 CED), then
we can multiply the substitution matrix by itself (matrix2
).
I (A) II (C)
I (A) II (C)
v = 0.1
v = 1.0
v = µt
µ = mutation rate
t = time
ximum Likelihood
Two trees each consisting of single branch
Jukes-Cantor model
I (A) II (C)
I (A) II (C)
v = 0.1
v = 1.0
Ι AACC
ΙΙ CACT
1 j
N
1 C G G A C A C G T T T A C
2 C A G A C A C C T C T A C
3 C G G A T A A G T T A A C
4 C G G A T A G C C T A G C
1
42
3
1
C
2
C
4
G
3
A
5
6
L(j) = p
C C A G
A
A
C C A G
C
A
C C A G
T
T
+ p + … + p
L(j) = p
C C A G
A
A
C C A G
C
A
C C A G
T
T
+ p + … + p
N
L = L(1) • L(2) • … L(N) = ΠL(j)j = 1
N
lnL = lnL(1) + lnL(2) + … L(N) = Σ lnL(j)j = 1
Likelihood of the alignment at various branch lengths
0
0,00002
0,00004
0,00006
0,00008
0,0001
0,00012
0,00014
0,00016
0,00018
0,0002
0 0,1 0,2 0,3 0,4 0,5 0,6
Strengths of ML
• Does not try to make an observation of sequence change and then a
correction for superimposed substitutions. There is no need to
‘correct’ for anything, the models take care of superimposed
substitutions.
• Accurate branch lengths.
• Each site has a likelihood.
• If the model is correct, we should retrieve the correct tree (If we have
long-enough sequences and a sophisticated-enough model).
• You can use a model that fits the data.
• ML uses all the data (no selection of sites based on informativeness,
all sites are informative).
• ML can not only tell you about the phylogeny of the sequences, but
also the process of evolution that led to the observations of today’s
sequences.
Weaknesses of ML
• Can be inconsistent if we use models that are not accurate.
• Model might not be sophisticated enough
• Very computationally-intensive. Might not be possible to
examine all models (substitution matrices, tree topologies).
Models
• You can use models that:
Deal with different transition/transversion ratios.
Deal with unequal base composition.
Deal with heterogeneity of rates across sites.
Deal with heterogeneity of the substitution process (different rates
across lineages, different rates at different parts of the tree).
• The more free parameters, the better your model fits your data (good).
• The more free parameters, the higher the variance of the estimate (bad).
Choosing a Model
Don’t assume a model, rather find a model that fits your data.
Models often have “free” parameters. These can be fixed to a
reasonable value, or estimated by ML.
The more free parameters, the better the fit (higher the likelihood) of
the model to the data. (Good!)
The more free parameters, the higher the variance, and the less
power to discriminate among competing hypotheses. (Bad!)
We do not want to over-fit the model to the data
What is the best way to fit a line (a model) through these points?
How to tell if adding (or removing) a certain parameter is a good idea?
• Use statistics
• The null hypothesis is that the presence or absence of the parameter makes no difference
• In order to assess signifcance you need a null distribution
We have some DNA data, and a tree. Evaluate the data with 3 different
models.
model ln likelihood ∆
JC -2348.68
K2P -2256.73 91.95
GTR -2254.94 1.79
Evaluations with more complex models have higher likelihoods
The K2P model has 1 more parameter than the JC model
The GTR model has 4 more parameters than the K2P model
Are the extra parameters worth adding?
JC vs K2P K2P vs GTR
We have generated many true null hypothesis data sets and evaluated them under the JC
model and the K2P model. 95% of the differences are under 2.The statistic for our original
data set was 91.95, and so it is highly significant. In this case it is worthwhile to add the extra
parameter (tRatio).
We have generated many true null hypothesis data sets and evaluated them under the K2P
model and the GTR model. The statistic for our original data set was 1.79, and so it is not
signifcant. In this case it is not worthwhile to add the extra parameters.
You can use the χ2
approximation to assess
significance of adding parameters
Bayesian Inference
Maximum likelihood
Search for tree that maximizes the chance of
seeing the data (P (Data | Tree))
Bayesian Inference
Search for tree that maximizes the chance of
seeing the tree given the data (P (Tree | Data))
Bayesian Phylogenetics
Maximize the posterior probability of a tree given the aligned DNA
sequences
Two steps
- Definition of the posterior probabilities of trees (Bayes’ Rule)
- Approximation of the posterior probabilities of trees
Markov chain Monte Carlo (MCMC) methods
90 10
yesian Inference
yesian Inference
Data miningmaximumlikelihood
Markov Chain Monte Carlo Methods
Posterior probabilities of trees are complex joint probabilities
that cannot be calculated analytically.
Instead, the posterior probabilities of trees are approximated
with Markov Chain Monte Carlo (MCMC) methods that sample
trees from their posterior probability distribution.
MCMC
A way of sampling / touring a set of solutions,biased
by their likelihood
1 Make a random solution N1 the current solution
2 Pick another solution N2
3 If Likelihood (N1 < N2) then replace N1 with N2
4 Else if Random (Likelihood (N2) / Likelihood (N1)) then replace
N1 with N2
5 Sample (record) the current solution
6 Repeat from step 2
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
Data miningmaximumlikelihood
yesian Inference
yesian Inference

More Related Content

PDF
Avoid Overfitting with Regularization
PDF
Data Science - Part III - EDA & Model Selection
PDF
Haoying1999
PPT
Quicksort
PDF
Logistic regression
PDF
A Preference Model on Adaptive Affinity Propagation
Avoid Overfitting with Regularization
Data Science - Part III - EDA & Model Selection
Haoying1999
Quicksort
Logistic regression
A Preference Model on Adaptive Affinity Propagation

What's hot (6)

PDF
A Method for Constructing Non-Isosceles Triangular Fuzzy Numbers Using Freque...
PDF
Random Forest / Bootstrap Aggregation
PPT
Quicksort
PDF
Summary statistics
PPTX
Quicksort algorithm
PDF
Simplicial closure and higher-order link prediction --- SIAMNS18
A Method for Constructing Non-Isosceles Triangular Fuzzy Numbers Using Freque...
Random Forest / Bootstrap Aggregation
Quicksort
Summary statistics
Quicksort algorithm
Simplicial closure and higher-order link prediction --- SIAMNS18
Ad

Viewers also liked (20)

PPT
Exception
PPT
Crypto theory practice
PPT
Memory caching
PPT
Game theory
PPTX
Overview prolog
PPT
Text classification
PPT
Stack queue
PPT
Big data
PPT
Gm theory
PPTX
Introduction to security_and_crypto
PPTX
Naïve bayes
PPT
Introduction toprolog
PPT
Access data connection
PPTX
Key exchange in crypto
PPT
Hash mac algorithms
PPT
List in webpage
PPTX
Decision tree
PPT
Database concepts
PPT
Maven
PPTX
Nlp naive bayes
Exception
Crypto theory practice
Memory caching
Game theory
Overview prolog
Text classification
Stack queue
Big data
Gm theory
Introduction to security_and_crypto
Naïve bayes
Introduction toprolog
Access data connection
Key exchange in crypto
Hash mac algorithms
List in webpage
Decision tree
Database concepts
Maven
Nlp naive bayes
Ad

Similar to Data miningmaximumlikelihood (20)

PPT
Phylogenetics2
PPT
6238578.ppt
PPT
Plant Molecular Systematics Phylogenetics.ppt
PPT
phylogenetics (1)...............................ppt
PPT
distance based phylogenetics-methodology
PDF
Defense Talk Slides
PDF
Holder and Koch ievobio-2013 ascertainment biases
PDF
thesis
DOCX
Humans, it would seem, have a great love of categorizing, organi
PDF
Phylogenetics Analysis in R
PPT
Interpreting ‘tree space’ in the context of very large empirical datasets
PDF
Pennell-Evolution-2014-talk
PPTX
BTC 506 Phylogenetic Analysis.pptx
PPTX
Perl for Phyloinformatics
PDF
EVE161: Microbial Phylogenomics - Class 4 - Phylogeny
PDF
Phylogenetic analysis
PPTX
Phylogenetic tree construction
PPTX
human phylogetic contrution of evolution tree.pptx
PPT
Softwares For Phylogentic Analysis
PPT
Multiple Sequence Alignment-just glims of viewes on bioinformatics.
Phylogenetics2
6238578.ppt
Plant Molecular Systematics Phylogenetics.ppt
phylogenetics (1)...............................ppt
distance based phylogenetics-methodology
Defense Talk Slides
Holder and Koch ievobio-2013 ascertainment biases
thesis
Humans, it would seem, have a great love of categorizing, organi
Phylogenetics Analysis in R
Interpreting ‘tree space’ in the context of very large empirical datasets
Pennell-Evolution-2014-talk
BTC 506 Phylogenetic Analysis.pptx
Perl for Phyloinformatics
EVE161: Microbial Phylogenomics - Class 4 - Phylogeny
Phylogenetic analysis
Phylogenetic tree construction
human phylogetic contrution of evolution tree.pptx
Softwares For Phylogentic Analysis
Multiple Sequence Alignment-just glims of viewes on bioinformatics.

More from Fraboni Ec (20)

PPT
Hardware multithreading
PPT
PDF
What is simultaneous multithreading
PPTX
Directory based cache coherence
PPTX
Business analytics and data mining
PPTX
Big picture of data mining
PPTX
Data mining and knowledge discovery
PPTX
Cache recap
PPTX
How analysis services caching works
PPTX
Hardware managed cache
PPTX
Data structures and algorithms
PPTX
Cobol, lisp, and python
PPT
Abstract data types
PPTX
Optimizing shared caches in chip multiprocessors
PPTX
Abstraction file
PPTX
Object model
PPTX
Object oriented analysis
PPT
Abstract class
PPTX
Concurrency with java
PPTX
Inheritance
Hardware multithreading
What is simultaneous multithreading
Directory based cache coherence
Business analytics and data mining
Big picture of data mining
Data mining and knowledge discovery
Cache recap
How analysis services caching works
Hardware managed cache
Data structures and algorithms
Cobol, lisp, and python
Abstract data types
Optimizing shared caches in chip multiprocessors
Abstraction file
Object model
Object oriented analysis
Abstract class
Concurrency with java
Inheritance

Recently uploaded (20)

PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
project resource management chapter-09.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Mushroom cultivation and it's methods.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
A Presentation on Touch Screen Technology
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Encapsulation theory and applications.pdf
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
A Presentation on Artificial Intelligence
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Agricultural_Statistics_at_a_Glance_2022_0.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
1 - Historical Antecedents, Social Consideration.pdf
project resource management chapter-09.pdf
Encapsulation_ Review paper, used for researhc scholars
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
A comparative analysis of optical character recognition models for extracting...
Programs and apps: productivity, graphics, security and other tools
OMC Textile Division Presentation 2021.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Mushroom cultivation and it's methods.pdf
Assigned Numbers - 2025 - Bluetooth® Document
Group 1 Presentation -Planning and Decision Making .pptx
A Presentation on Touch Screen Technology
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Encapsulation theory and applications.pdf
WOOl fibre morphology and structure.pdf for textiles
A Presentation on Artificial Intelligence
gpt5_lecture_notes_comprehensive_20250812015547.pdf

Data miningmaximumlikelihood

  • 2. Likelihood The likelihood is the probability of the data given the model.
  • 3. If we flip a coin and get a head and we think the coin is unbiased, then the probability of observing this head is 0.5. If we think the coin is biased so that we expect to get a head 80% of the time, then the likelihood of observing this datum (a head) is 0.8. The likelihood of making some observation is entirely dependent on the model that underlies our assumption. The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed. Likelihood
  • 4. Maximum Likelihood (ML) ML assumes a explicit model of sequence evolution. This is justifiable, since molecular sequence data can be shown to have arisen according to a stochastic process. ML attempts to answer the question: What is the probability that I would observe these data (a multiple sequence alignment) given a particular model of evolution (a tree and a process)?
  • 5. Likelihood calculations In molecular phylogenetics, the data are an alignment of sequences We optimize parameters and branch lengths to get the maximum likelihood Each site has a likelihood The total likelihood is the product of the site likelihoods The maximum likelihood tree is the tree topology that gives the highest (optimized) likelihood under the given model. We use reversible models, so the position of the root does not matter.
  • 6. What is the probability of observing a G nucleotide? If we have a DNA sequence of 1 nucleotide in length and the identity of this nucleotide is G, what is the likelihood that we would observe this G? In the same way as the coin-flipping observation, the likelihood of observing this G is dependent on the model of sequence evolution that is thought to underlie the data. Model 1: frequency of G = 0.4 => likelihood(G) = 0.4 Model 2: frequency of G = 0.1 => likelihood(G) = 0.1 Model 3: frequency of G = 0.25 => likelihood(G) = 0.25
  • 7. What about longer sequences? If we consider a gene of length 2 gene 1 GA The the probability of observing this gene is the product of the probabilities of observing each character Model frequency of G = 0.4 frequencyof A= 0.15 p(G) = 0.4 p(A) =0.15 Likelihood (GA) = 0.4 x 0.15 = 0.06
  • 8. …or even longer sequences? gene 1 GACTAGCTAGACAGATACGAATTAC Model simple base frequency model p(A)=0.15; p(C)=0.2; p(G)=0.4; p(T)=0.25; (the sum of all probabilities must equal 1) Likelihood (gene 1) = 0.000000000000000018452813
  • 9. Note about models You might notice that our model of base frequency is not the optimal model for our observed data. If we had used the following model p(A)=0.4; p(C) =0.2; p(G)= 0.2; p(T) = 0.2; The likelihood of observing the gene is L (gene 1) = 0.000000000000335544320000 L (gene 1) = 0.000000000000000018452813 The datum has not changed, our model has. Therefore under the new model the likelihood of observing the datum has changed.
  • 10. Increase in model sophistication It is no longer possible to simply invoke a model that encompasses base composition, we must also include the mechanism of sequence change and stasis. There are two parts to this model - the tree and the process (the latter is confusingly referred to as the model, although both parts really compose the model).
  • 11. Different Branch Lengths For very short branch lengths, the probability of a character staying the same is high and the probability of it changing is low. For longer branch lengths, the probability of character change becomes higher and the probability of staying the same is lower. The previous calculations are based on the assumption that the branch length describes one Certain Evolutionary Distance or CED. If we want to consider a branch length that is twice as long (2 CED), then we can multiply the substitution matrix by itself (matrix2 ).
  • 12. I (A) II (C) I (A) II (C) v = 0.1 v = 1.0 v = µt µ = mutation rate t = time ximum Likelihood Two trees each consisting of single branch
  • 13. Jukes-Cantor model I (A) II (C) I (A) II (C) v = 0.1 v = 1.0
  • 15. 1 j N 1 C G G A C A C G T T T A C 2 C A G A C A C C T C T A C 3 C G G A T A A G T T A A C 4 C G G A T A G C C T A G C 1 42 3 1 C 2 C 4 G 3 A 5 6 L(j) = p C C A G A A C C A G C A C C A G T T + p + … + p
  • 16. L(j) = p C C A G A A C C A G C A C C A G T T + p + … + p N L = L(1) • L(2) • … L(N) = ΠL(j)j = 1 N lnL = lnL(1) + lnL(2) + … L(N) = Σ lnL(j)j = 1
  • 17. Likelihood of the alignment at various branch lengths 0 0,00002 0,00004 0,00006 0,00008 0,0001 0,00012 0,00014 0,00016 0,00018 0,0002 0 0,1 0,2 0,3 0,4 0,5 0,6
  • 18. Strengths of ML • Does not try to make an observation of sequence change and then a correction for superimposed substitutions. There is no need to ‘correct’ for anything, the models take care of superimposed substitutions. • Accurate branch lengths. • Each site has a likelihood. • If the model is correct, we should retrieve the correct tree (If we have long-enough sequences and a sophisticated-enough model). • You can use a model that fits the data. • ML uses all the data (no selection of sites based on informativeness, all sites are informative). • ML can not only tell you about the phylogeny of the sequences, but also the process of evolution that led to the observations of today’s sequences.
  • 19. Weaknesses of ML • Can be inconsistent if we use models that are not accurate. • Model might not be sophisticated enough • Very computationally-intensive. Might not be possible to examine all models (substitution matrices, tree topologies).
  • 20. Models • You can use models that: Deal with different transition/transversion ratios. Deal with unequal base composition. Deal with heterogeneity of rates across sites. Deal with heterogeneity of the substitution process (different rates across lineages, different rates at different parts of the tree). • The more free parameters, the better your model fits your data (good). • The more free parameters, the higher the variance of the estimate (bad).
  • 21. Choosing a Model Don’t assume a model, rather find a model that fits your data. Models often have “free” parameters. These can be fixed to a reasonable value, or estimated by ML. The more free parameters, the better the fit (higher the likelihood) of the model to the data. (Good!) The more free parameters, the higher the variance, and the less power to discriminate among competing hypotheses. (Bad!) We do not want to over-fit the model to the data
  • 22. What is the best way to fit a line (a model) through these points? How to tell if adding (or removing) a certain parameter is a good idea? • Use statistics • The null hypothesis is that the presence or absence of the parameter makes no difference • In order to assess signifcance you need a null distribution
  • 23. We have some DNA data, and a tree. Evaluate the data with 3 different models. model ln likelihood ∆ JC -2348.68 K2P -2256.73 91.95 GTR -2254.94 1.79 Evaluations with more complex models have higher likelihoods The K2P model has 1 more parameter than the JC model The GTR model has 4 more parameters than the K2P model Are the extra parameters worth adding?
  • 24. JC vs K2P K2P vs GTR We have generated many true null hypothesis data sets and evaluated them under the JC model and the K2P model. 95% of the differences are under 2.The statistic for our original data set was 91.95, and so it is highly significant. In this case it is worthwhile to add the extra parameter (tRatio). We have generated many true null hypothesis data sets and evaluated them under the K2P model and the GTR model. The statistic for our original data set was 1.79, and so it is not signifcant. In this case it is not worthwhile to add the extra parameters. You can use the χ2 approximation to assess significance of adding parameters
  • 26. Maximum likelihood Search for tree that maximizes the chance of seeing the data (P (Data | Tree)) Bayesian Inference Search for tree that maximizes the chance of seeing the tree given the data (P (Tree | Data))
  • 27. Bayesian Phylogenetics Maximize the posterior probability of a tree given the aligned DNA sequences Two steps - Definition of the posterior probabilities of trees (Bayes’ Rule) - Approximation of the posterior probabilities of trees Markov chain Monte Carlo (MCMC) methods
  • 31. Markov Chain Monte Carlo Methods Posterior probabilities of trees are complex joint probabilities that cannot be calculated analytically. Instead, the posterior probabilities of trees are approximated with Markov Chain Monte Carlo (MCMC) methods that sample trees from their posterior probability distribution.
  • 32. MCMC A way of sampling / touring a set of solutions,biased by their likelihood 1 Make a random solution N1 the current solution 2 Pick another solution N2 3 If Likelihood (N1 < N2) then replace N1 with N2 4 Else if Random (Likelihood (N2) / Likelihood (N1)) then replace N1 with N2 5 Sample (record) the current solution 6 Repeat from step 2

Editor's Notes

  • #11: Even though we tend to refer to the tree and the model separately, they are in fact both parts of the model.