SlideShare a Scribd company logo
Phy logenetic  S ignal with  I nduction and non- C ontradiction:  the  PhySIC  method for building supertrees http:/atgc.lirmm.fr/SuperTree/PhySIC Vincent Berry 1 ,   V. Ranwez 2 , A. Criscuolo 1,2 ,  P.-H. Fabre 2 , S. Guillemot 1 ,  C. Scornavacca 1,2 ,  E.J.P. Douzery 2 Funded by ACI IMPBIO & BIOSTIC LR 1 2
Introduction:  use of supertrees Supertrees are useful for producing well-resolved large phylogenies to provide a framework for broad comparative studies (Gittleman et al 2004) Quantitative studies of input-tree congruence, identifying outlier taxa by tree-supertree distance measures (Willkinson et al 2004) Exploring and identifying  agreement and  disagreement  among sets of input trees. The aim is then to reveal conflicts rather than resolving them. Conflict are ultimately resolved from additional data or analyses (Willkinson et al 2001) Identifying  where  limited overlap  between the leaf sets of the input trees is an obstacle in their amalgamation, thereby guiding further research (Sanderson et al 1996, Arné et al 2007).
Introduction :  dealing with conflicts Dealing with topological contradictions (“conflicts”)  among source trees : Voting  methods  (MRP ,MMC,CLANN,…)   resolve conflicts based on a voting procedure (optimization approach) Veto   methods (Strict Consensus, Build,SMAST) :  do not favor any resolution in case of conflict (consensus approach)  D C B A C B D A
Veto  methods Proceed from an  axiomatic approach:  proposed supertrees satisfy specified theoretical properties G oal: obtain a  reliable , if incomplete, picture of  how the source trees fit together Motivation: Full congruence  with the source trees can be necessary for further applications such as phylogeography, divergence time estimations, etc. Avoid as much as possible the inference of non-supported novel clades, unlike in some existing voting methods
Overview Some relevant properties for reliable inference Decomposition of a tree into triplets Identifying a tree Property of Induction (PI) Property of non-Contradiction (PC) Algorithms  (sketch) BUILD - Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
Axiomatic approach:  important properties Reliable facts   are those that can be   induced   from testimonies and that are   not   incompatible   with any other. The superTree method The inspector Phylogenetic information contained within source trees The testimonies The source trees The witnesses SuperTree Police investigation Deducing new facts by cross-checking Pointing out contradictions in the testimonies Deducing the true story
Decomposition of trees in building stones T 1 T 2 ac|d Triplets   (rooted triples):  subtrees on 3 taxa d c b a c d b e d c a d b a tr(T 1 ) d c b c b a bc|d ac|d ab|d ab|c ed|c eb|d eb|c tr(T 2 ) bd|c
Properties of interest:  identification A tree  T   displays  a set  R  of triplets iff  R    tr( T )  In such a case  R  is said to be  compatible  : all triplets of  R  can be combined into a tree bc|d  ab|c T d c b a c b a d c b ab|c  ab|d R’  does not identify  T R  identifies  T R   identifies   T  iff   T  displays  R AND  every tree  T’  displaying  R  contains all the clades of  T c d b a X
R   identifies   T yet  R  does not contain all triples of  tr(T): additional triples are induced by those present in  R Properties of interest:  identification d c b bc|d  ab|c c b a R d ab|d  and  ac|d  are  induced   T c b a
We want to infer  reliable   supertrees:  not making arbitrary inferences Relevant properties:  induction (PI) we only accept supertrees  T  such that  tr(T)  is  present in the data   R  or  induced  by hypotheses in  R PI d c b a ab|c  ab|d ac|d? cd|b? c b a d b a R d c b a ab|c  ab|d ac|d? bc|d? d c b a ab|c  ab|d
Focusing on a coherent subset of hypotheses There is no chance that practical data exactly identifies a (super)tree: Lack of overlap  between the source trees: missing data Errors  due to gene specific evolution, systematic errors in the source tree inference (long branch attraction, estimated model of evolution) find a subset  R’  of  R  identifying a tree   (ie, a subtree of the underlying tree) However, there is a chance that part of the underlying “correct” tree appears uncorrupted in the data: R ab|c  bc|d  ab|d  ac|d ad|c  bd|c d c b a c d b a Supertree method ? R  identifies  T T
Relevant properties:  non-contradiction We search for a subset of  R  identifying a tree  T But we want to be  reliable :  no clade contradicted by the data   we reject  subsets  R’  obtained by keeping xy|z and removing xz|y. ab|c  ab|d  bc|d  ac|d bd|c  ad|c R’      R We focus on   R(T) , the triplets of R resolved by  T   we  don’t accept  hypotheses that are in  direct contradiction  with discarded hypotheses dc b a T R’  identifies  T PC
Link between the properties: R ( T )  identifies   T is equivalent to T  satisfies  PC : (property of non-contradiction)  for any triplet ab|c displayed by  T ,  R ( T ) induces neither bc|a nor ac|b  and T  satisfies  PI : (property of induction) every triplet ab|c displayed by  T  is induced by  R ( T ) Given a supertree T and a collection of source trees, PI and PC can be checked in polynomial time. A given supertree can be modified in polynomial time so that it verifies PI and PC. Why not designing a supertree method proposing supertrees satisfying PI and PC from the start :   the  PhySIC  method ( Phy logenetic  S ignal with   I nduction and non- C ontradiction)
Overview Relevant properties for a veto method (reliable facts) Decomposition of a tree into triplets Tree identification Property of Induction (PI) Property of non-Contradiction (PC) Algorithms  (sketch) BUILD - Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
Algorithmic ideas:  BUILD (Aho et al 81) bc|d   ab|c R a   b     c   d d {a,b,c} a   b     c c {a,b} a   b   a b c b a d c b d c b a
Algorithmic ideas:  limits of BUILD Returns a tree only when  R  is compatible. d c b a c d b a R 2 bc|d  bd|c ac|d   ad|c ab|c ab|d a   b     c   d d c b a d b c a R 1 ab|c  ac|b bc|d   ab|d   ac|d a   b     c   d d {a,b,c} a   b     c d c b a
Algorithmic ideas:  PhySIC PC R bc|d   bd|c ac|d   ad|c ab|c ab|d R’ bc|d   bd|c ac|d   ad|c ab|c ab|d At each iteration, if there is a single connected component Check if using  R ’ leads to several connected components If so, check that the tree will satisfy PC w.r.t.  R . Or else, propose a multifurcation on those taxa We thus obtain a more resolved tree satisfying PC: contradictions affecting basal clades do not  always  imped deeper clades to be obtained Idea:  temporarily forget the direct contradictions d c b a c d b a a   b     c   d   d a   b     c c d b a
Algorithmic ideas:   limits of BUILD (2) When the graph contains several connected components, it is necessary  to   check that the triplets we are about to create are really induced by  R Branches that create triplets not induced by  R  are collapsed ( use graph algorithms ) ef|a ?? R ab|c  ef|c  c b a a   b     c    e    f {a,b} c {e,f} c f e a b c e f
Algorithmic ideas  -  a summary A supertree draft is proposed by PhySIC PC  ensuring PC If a clade is not « strong enough » the corresponding branch is collapsed by PhySIC PI  ensuring also PI Physic is a polynomial-time supertree method: Decomposition of the input forest into triplets  O ( kn 3 ) Creation of a tree satisfying PC   O ( n 4 ) Collapsing edges displaying triplets not induced by the source trees:   O ( n 4 ) the algorithm requires  O ( kn 3 + n 4 ) computing time
Overview Relevant properties for a veto method Decomposition of a tree into triplets Tree identification Property of Induction (PI) Property of non-Contradiction (PC) Algorithms (intuitive presentation) BUILD Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
Primate case study:  source trees ADRA2B and IRBP study (Poux et al. 04, 06) SINEs (Roos et al. 04) Branches with bootstrap support <50% are collapsed Anthropo ids
Primate case study:  PC & PI in action ADRA2B IRBP Platyrrhines are unresolved due to a conflict  (PC) PhySIC PC   PhySIC Arbitrary resolution among Anthropo i ds is removed  (PI) Source trees
Labels indicating source of problems PhySIC can tell the reason for multifurcations proposed : Lack of overlap or information in the source trees  (i) Local contradictions between the source trees  (c) this guides correction/completion of source trees and primary data
Pointing out “problems” in other supertrees eg, MRP is known to have some indesirable features: inferring “novel clades” not supported by any input tree  (Bininda-Emonds & Bryant 98, Goloboff & Pol 01, Goloboff 05) being affected by a size-bias, i.e. when two trees conflict on the resolution of a clade, the tree with the smallest local sampling is ignored   (Purvis 95, Bininda-Emonds & Bryant 98, Goloboff 05) favoring source tree that are more unbalanced   (Wilkinson et al 01) A supertree already built from a collection of source trees by an usual supertree method, can be  reanalyzed  in the light of  PI  &  PC  to identify problems on some dubious nodes.
Primate case study:  MRP tree analyzed ADRA2B IRBP Source trees  MRP supertree filtered MRP supertree 1 1 2 PC
Online server:  http://atgc.lirmm.fr/SuperTree/PhySIC Contact:  [email_address]
Conclusion & prospects appearing in the november issue of Syst.Biol.  PI and PC properties PhySIC method   ( http://atgc.lirmm.fr/SuperTree/PhySIC ) Supertrees satisfying PI and PC (exact) and as much resolved as possible (heuristics) Proposes very reliable supertrees: identified by the data (low type-I err) Polynomial-time method Localization of conflicts and areas with insufficient overlap Enables to check/correct supertrees built by other methods (MRP, …). Further developments: Producing more resolved trees satisfying PC et PI Filtering triplets based on their frequencies  Coupling with a database (TreeBase, …)
Thanks Emmanuel Douzery Vincent Ranwez Alexis Criscuolo Sylvain Guillemot Pierre-Henri Fabre Celine Scornavacca Vincent Lefort Equipe Méth. et Algor. pour la bioinf. LIRMM Equipe Phylogénie Moléculaire ISEM

More Related Content

PPT
Phylogenomic Supertrees. ORP Bininda-Emond
PDF
17 Machine Learning Radial Basis Functions
PDF
Gaussian Process Latent Variable Models & applications in single-cell genomics
PPTX
Graphical Structure Learning accelerated with POWER9
PDF
31 Machine Learning Unsupervised Cluster Validity
PDF
Kernel methods for data integration in systems biology
PDF
'ACCOST' for differential HiC analysis
PPTX
Biodiversity Knowledge Graphs
Phylogenomic Supertrees. ORP Bininda-Emond
17 Machine Learning Radial Basis Functions
Gaussian Process Latent Variable Models & applications in single-cell genomics
Graphical Structure Learning accelerated with POWER9
31 Machine Learning Unsupervised Cluster Validity
Kernel methods for data integration in systems biology
'ACCOST' for differential HiC analysis
Biodiversity Knowledge Graphs

Similar to Phylogenetic Signal with Induction and non-Contradiction - V Berry (20)

PPTX
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
PPT
Contextual ontology alignment may 2011
PPT
probabilistic ranking
PPT
Phylogenetic analyses1
PPT
Intelligent Methods in Models of Text Information Retrieval: Implications for...
PPT
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
PPT
Phylogenetic analysis
PPT
Working with Trees in the Phyloinformatic Age. WH Piel
PPT
A search engine for phylogenetic tree databases - D. Fernándes-Baca
PDF
On the identifiability of phylogenetic networks under a pseudolikelihood model
PPTX
Probabilistic information retrieval models & systems
PPTX
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
PDF
So sánh cấu trúc protein_Protein structure comparison
PPT
(Talk in Powerpoint Format)
PPTX
Rules for inducing hierarchies from social tagging data
PDF
Dimensionality reduction by matrix factorization using concept lattice in dat...
PDF
Workload-aware materialization for efficient variable elimination on Bayesian...
PDF
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
PPTX
Contextual Ontology Alignment - ESWC 2011
PDF
lecture12.pdf Introduction to bioinformatics
Querying and Merging Heterogeneous Data by Approximate Joins on Higher-Order ...
Contextual ontology alignment may 2011
probabilistic ranking
Phylogenetic analyses1
Intelligent Methods in Models of Text Information Retrieval: Implications for...
Paper talk (presented by Prof. Ludaescher), WORKS workshop, 2010
Phylogenetic analysis
Working with Trees in the Phyloinformatic Age. WH Piel
A search engine for phylogenetic tree databases - D. Fernándes-Baca
On the identifiability of phylogenetic networks under a pseudolikelihood model
Probabilistic information retrieval models & systems
Probabilistic Retrieval Models - Sean Golliher Lecture 8 MSU CSCI 494
So sánh cấu trúc protein_Protein structure comparison
(Talk in Powerpoint Format)
Rules for inducing hierarchies from social tagging data
Dimensionality reduction by matrix factorization using concept lattice in dat...
Workload-aware materialization for efficient variable elimination on Bayesian...
An Optimal Approach For Knowledge Protection In Structured Frequent Patterns
Contextual Ontology Alignment - ESWC 2011
lecture12.pdf Introduction to bioinformatics
Ad

More from Roderic Page (20)

PPTX
ALEC (A List of Everything Cool)
PPTX
Wikidata and the Biodiversity Knowledge Graph
PPTX
BioStor Next
PPTX
Ozymandias - from an atlas to a knowledge graph of living Australia
PPTX
SLiDInG6 talk on biodiversity knowledge graph
PPTX
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
PPTX
Towards a biodiversity knowledge graph
PPTX
The Sam Adams talk
PPTX
Unknown knowns, long tails, and long data
PPTX
In praise of grumpy old men: Open versus closed data and the challenge of cre...
PPTX
BHL, BioStor, and beyond
PPTX
Cisco Digital Catapult
PPTX
Built in the 19th century, rebuilt for the 21st
PPTX
Two graphs, three responses
PPTX
GrBio Workshop talk
PPTX
Visualing phylogenies: a personal view
PPTX
Biodiversity informatics: digitising the living world
PPTX
Ebbe Nielsen Challenge GBIF #gb21
PPTX
GBIF Science Committee Report GB21, Delhi, India
PPTX
Building the Biodiversity Knowledge Graph
ALEC (A List of Everything Cool)
Wikidata and the Biodiversity Knowledge Graph
BioStor Next
Ozymandias - from an atlas to a knowledge graph of living Australia
SLiDInG6 talk on biodiversity knowledge graph
Wild idea for TDWG17 Bitcoins, biodiversity and micropayments
Towards a biodiversity knowledge graph
The Sam Adams talk
Unknown knowns, long tails, and long data
In praise of grumpy old men: Open versus closed data and the challenge of cre...
BHL, BioStor, and beyond
Cisco Digital Catapult
Built in the 19th century, rebuilt for the 21st
Two graphs, three responses
GrBio Workshop talk
Visualing phylogenies: a personal view
Biodiversity informatics: digitising the living world
Ebbe Nielsen Challenge GBIF #gb21
GBIF Science Committee Report GB21, Delhi, India
Building the Biodiversity Knowledge Graph
Ad

Recently uploaded (20)

PDF
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
DOCX
search engine optimization ppt fir known well about this
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PPTX
O2C Customer Invoices to Receipt V15A.pptx
PPTX
Benefits of Physical activity for teenagers.pptx
PPTX
The various Industrial Revolutions .pptx
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PDF
WOOl fibre morphology and structure.pdf for textiles
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
August Patch Tuesday
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Unlock new opportunities with location data.pdf
PDF
sustainability-14-14877-v2.pddhzftheheeeee
PDF
Zenith AI: Advanced Artificial Intelligence
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Transform Your ITIL® 4 & ITSM Strategy with AI in 2025.pdf
search engine optimization ppt fir known well about this
DP Operators-handbook-extract for the Mautical Institute
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
O2C Customer Invoices to Receipt V15A.pptx
Benefits of Physical activity for teenagers.pptx
The various Industrial Revolutions .pptx
A contest of sentiment analysis: k-nearest neighbor versus neural network
WOOl fibre morphology and structure.pdf for textiles
Group 1 Presentation -Planning and Decision Making .pptx
A comparative study of natural language inference in Swahili using monolingua...
August Patch Tuesday
Web Crawler for Trend Tracking Gen Z Insights.pptx
Chapter 5: Probability Theory and Statistics
Unlock new opportunities with location data.pdf
sustainability-14-14877-v2.pddhzftheheeeee
Zenith AI: Advanced Artificial Intelligence
Module 1.ppt Iot fundamentals and Architecture
Developing a website for English-speaking practice to English as a foreign la...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...

Phylogenetic Signal with Induction and non-Contradiction - V Berry

  • 1. Phy logenetic S ignal with I nduction and non- C ontradiction: the PhySIC method for building supertrees http:/atgc.lirmm.fr/SuperTree/PhySIC Vincent Berry 1 , V. Ranwez 2 , A. Criscuolo 1,2 , P.-H. Fabre 2 , S. Guillemot 1 , C. Scornavacca 1,2 , E.J.P. Douzery 2 Funded by ACI IMPBIO & BIOSTIC LR 1 2
  • 2. Introduction: use of supertrees Supertrees are useful for producing well-resolved large phylogenies to provide a framework for broad comparative studies (Gittleman et al 2004) Quantitative studies of input-tree congruence, identifying outlier taxa by tree-supertree distance measures (Willkinson et al 2004) Exploring and identifying agreement and disagreement among sets of input trees. The aim is then to reveal conflicts rather than resolving them. Conflict are ultimately resolved from additional data or analyses (Willkinson et al 2001) Identifying where limited overlap between the leaf sets of the input trees is an obstacle in their amalgamation, thereby guiding further research (Sanderson et al 1996, Arné et al 2007).
  • 3. Introduction : dealing with conflicts Dealing with topological contradictions (“conflicts”) among source trees : Voting methods (MRP ,MMC,CLANN,…) resolve conflicts based on a voting procedure (optimization approach) Veto methods (Strict Consensus, Build,SMAST) : do not favor any resolution in case of conflict (consensus approach) D C B A C B D A
  • 4. Veto methods Proceed from an axiomatic approach: proposed supertrees satisfy specified theoretical properties G oal: obtain a reliable , if incomplete, picture of how the source trees fit together Motivation: Full congruence with the source trees can be necessary for further applications such as phylogeography, divergence time estimations, etc. Avoid as much as possible the inference of non-supported novel clades, unlike in some existing voting methods
  • 5. Overview Some relevant properties for reliable inference Decomposition of a tree into triplets Identifying a tree Property of Induction (PI) Property of non-Contradiction (PC) Algorithms (sketch) BUILD - Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
  • 6. Axiomatic approach: important properties Reliable facts are those that can be induced from testimonies and that are not incompatible with any other. The superTree method The inspector Phylogenetic information contained within source trees The testimonies The source trees The witnesses SuperTree Police investigation Deducing new facts by cross-checking Pointing out contradictions in the testimonies Deducing the true story
  • 7. Decomposition of trees in building stones T 1 T 2 ac|d Triplets (rooted triples): subtrees on 3 taxa d c b a c d b e d c a d b a tr(T 1 ) d c b c b a bc|d ac|d ab|d ab|c ed|c eb|d eb|c tr(T 2 ) bd|c
  • 8. Properties of interest: identification A tree T displays a set R of triplets iff R  tr( T ) In such a case R is said to be compatible : all triplets of R can be combined into a tree bc|d ab|c T d c b a c b a d c b ab|c ab|d R’ does not identify T R identifies T R identifies T iff T displays R AND every tree T’ displaying R contains all the clades of T c d b a X
  • 9. R identifies T yet R does not contain all triples of tr(T): additional triples are induced by those present in R Properties of interest: identification d c b bc|d ab|c c b a R d ab|d and ac|d are induced T c b a
  • 10. We want to infer reliable supertrees: not making arbitrary inferences Relevant properties: induction (PI) we only accept supertrees T such that tr(T) is present in the data R or induced by hypotheses in R PI d c b a ab|c ab|d ac|d? cd|b? c b a d b a R d c b a ab|c ab|d ac|d? bc|d? d c b a ab|c ab|d
  • 11. Focusing on a coherent subset of hypotheses There is no chance that practical data exactly identifies a (super)tree: Lack of overlap between the source trees: missing data Errors due to gene specific evolution, systematic errors in the source tree inference (long branch attraction, estimated model of evolution) find a subset R’ of R identifying a tree (ie, a subtree of the underlying tree) However, there is a chance that part of the underlying “correct” tree appears uncorrupted in the data: R ab|c bc|d ab|d ac|d ad|c bd|c d c b a c d b a Supertree method ? R identifies T T
  • 12. Relevant properties: non-contradiction We search for a subset of R identifying a tree T But we want to be reliable : no clade contradicted by the data we reject subsets R’ obtained by keeping xy|z and removing xz|y. ab|c ab|d bc|d ac|d bd|c ad|c R’  R We focus on R(T) , the triplets of R resolved by T we don’t accept hypotheses that are in direct contradiction with discarded hypotheses dc b a T R’ identifies T PC
  • 13. Link between the properties: R ( T ) identifies T is equivalent to T satisfies PC : (property of non-contradiction) for any triplet ab|c displayed by T , R ( T ) induces neither bc|a nor ac|b and T satisfies PI : (property of induction) every triplet ab|c displayed by T is induced by R ( T ) Given a supertree T and a collection of source trees, PI and PC can be checked in polynomial time. A given supertree can be modified in polynomial time so that it verifies PI and PC. Why not designing a supertree method proposing supertrees satisfying PI and PC from the start : the PhySIC method ( Phy logenetic S ignal with I nduction and non- C ontradiction)
  • 14. Overview Relevant properties for a veto method (reliable facts) Decomposition of a tree into triplets Tree identification Property of Induction (PI) Property of non-Contradiction (PC) Algorithms (sketch) BUILD - Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
  • 15. Algorithmic ideas: BUILD (Aho et al 81) bc|d ab|c R a  b   c  d d {a,b,c} a  b   c c {a,b} a  b  a b c b a d c b d c b a
  • 16. Algorithmic ideas: limits of BUILD Returns a tree only when R is compatible. d c b a c d b a R 2 bc|d bd|c ac|d ad|c ab|c ab|d a  b   c  d d c b a d b c a R 1 ab|c ac|b bc|d ab|d ac|d a  b   c  d d {a,b,c} a  b   c d c b a
  • 17. Algorithmic ideas: PhySIC PC R bc|d bd|c ac|d ad|c ab|c ab|d R’ bc|d bd|c ac|d ad|c ab|c ab|d At each iteration, if there is a single connected component Check if using R ’ leads to several connected components If so, check that the tree will satisfy PC w.r.t. R . Or else, propose a multifurcation on those taxa We thus obtain a more resolved tree satisfying PC: contradictions affecting basal clades do not always imped deeper clades to be obtained Idea: temporarily forget the direct contradictions d c b a c d b a a  b   c  d  d a  b   c c d b a
  • 18. Algorithmic ideas: limits of BUILD (2) When the graph contains several connected components, it is necessary to check that the triplets we are about to create are really induced by R Branches that create triplets not induced by R are collapsed ( use graph algorithms ) ef|a ?? R ab|c ef|c c b a a  b   c  e  f {a,b} c {e,f} c f e a b c e f
  • 19. Algorithmic ideas - a summary A supertree draft is proposed by PhySIC PC ensuring PC If a clade is not « strong enough » the corresponding branch is collapsed by PhySIC PI ensuring also PI Physic is a polynomial-time supertree method: Decomposition of the input forest into triplets O ( kn 3 ) Creation of a tree satisfying PC O ( n 4 ) Collapsing edges displaying triplets not induced by the source trees: O ( n 4 ) the algorithm requires O ( kn 3 + n 4 ) computing time
  • 20. Overview Relevant properties for a veto method Decomposition of a tree into triplets Tree identification Property of Induction (PI) Property of non-Contradiction (PC) Algorithms (intuitive presentation) BUILD Aho PhySIC PC PhySIC PI Biological case study: Primate supertree Conclusion & prospects
  • 21. Primate case study: source trees ADRA2B and IRBP study (Poux et al. 04, 06) SINEs (Roos et al. 04) Branches with bootstrap support <50% are collapsed Anthropo ids
  • 22. Primate case study: PC & PI in action ADRA2B IRBP Platyrrhines are unresolved due to a conflict (PC) PhySIC PC PhySIC Arbitrary resolution among Anthropo i ds is removed (PI) Source trees
  • 23. Labels indicating source of problems PhySIC can tell the reason for multifurcations proposed : Lack of overlap or information in the source trees (i) Local contradictions between the source trees (c) this guides correction/completion of source trees and primary data
  • 24. Pointing out “problems” in other supertrees eg, MRP is known to have some indesirable features: inferring “novel clades” not supported by any input tree (Bininda-Emonds & Bryant 98, Goloboff & Pol 01, Goloboff 05) being affected by a size-bias, i.e. when two trees conflict on the resolution of a clade, the tree with the smallest local sampling is ignored (Purvis 95, Bininda-Emonds & Bryant 98, Goloboff 05) favoring source tree that are more unbalanced (Wilkinson et al 01) A supertree already built from a collection of source trees by an usual supertree method, can be reanalyzed in the light of PI & PC to identify problems on some dubious nodes.
  • 25. Primate case study: MRP tree analyzed ADRA2B IRBP Source trees MRP supertree filtered MRP supertree 1 1 2 PC
  • 26. Online server: http://atgc.lirmm.fr/SuperTree/PhySIC Contact: [email_address]
  • 27. Conclusion & prospects appearing in the november issue of Syst.Biol. PI and PC properties PhySIC method ( http://atgc.lirmm.fr/SuperTree/PhySIC ) Supertrees satisfying PI and PC (exact) and as much resolved as possible (heuristics) Proposes very reliable supertrees: identified by the data (low type-I err) Polynomial-time method Localization of conflicts and areas with insufficient overlap Enables to check/correct supertrees built by other methods (MRP, …). Further developments: Producing more resolved trees satisfying PC et PI Filtering triplets based on their frequencies Coupling with a database (TreeBase, …)
  • 28. Thanks Emmanuel Douzery Vincent Ranwez Alexis Criscuolo Sylvain Guillemot Pierre-Henri Fabre Celine Scornavacca Vincent Lefort Equipe Méth. et Algor. pour la bioinf. LIRMM Equipe Phylogénie Moléculaire ISEM