SlideShare a Scribd company logo
09/02/2011 - Rencontres Alphy, Lyon



 Practical use of combinatorial methods
for phylogenetic network reconstruction
               Philippe Gambette
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Phylogenetic trees

 Phylogenetic tree of a species set




             species                              “tokogeny”
              tree S




         A     B       C




                                      A   B   C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization




S




A      B     C




                            A        B           C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization


              species
              tree S


 A      B     C                                              gene G1


                                                                 A     B   C


A       B     C
network N
                            A        B           C
Genetic material transfer

    Genetic material transfers between coexisting species:
    • horizontal gene transfer
    • hybridization

                                                             gene G1
              species
              tree S
                                                                 A     B     C

 A      B     C
                                                              incompatible
                                                              gene trees


                                                             gene G2
A       B     C
network N
                                                                 A     B     C
                            A        B           C
Phylogenetic networks

  Phylogenetic network: network representing evolution data
  • explicit phylogenetic networks
  to model evolution                                                              Simplistic
                                                           synthesis
                                                            diagram
                         galled
                       network
                                                                                      level-2
                                                                                      network
                      Dendroscope

                                              HorizStory

  • abstract phylogenetic network
  to classify, visualize data
                                                                        minimum covering network
split network                          median
                                       network


SplitsTree
                                    Network
                                                                       TCS
Phylogenetic network software
                                                     Who is Who in
                                                     Phylogenetic Networks,
                                                     Articles, Authors &
                                                     Programs




Based on BibAdmin by Sergiu Chelcea
+ tag clouds, date histogram, journal lists,
co-author graphs, keyword definitions.
                                               http://www.atgc-montpellier.fr/phylnet
Explicit phylogenetic networks

 Phylogenetic network: network representing evolution data
 • explicit phylogenetic networks
 evolution model
                                                     Simplistic


                        galled
                      network
                                                         level-2
                                                         network
                    Dendroscope




                                synthesis
                                 diagram




                   HorizStory               T-Rex
                                              reticulogram
Explicit phylogenetic networks

 Rooted explicit phylogenetic network: tree-like parts + blobs.




                                        vertices with more than one parent:
                                        reticulations
               h1
                             h3
                    h2


   a b c d e f g h i              j k
Plan

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Phylogenetic network reconstruction



espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT
                                        {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT                 distance methods
                  G1        G2                         Bandelt & Dress 1992 - Legendre & Makarenkov
                                                                        2000 - Bryant & Moulton 2002

                                                 parsimony methods
                                                           Hein 1990 - Kececioglu & Gusfield 1994 - Jin,
                                                                             Nakhleh, Snir, Tuller 2009

                                                 likelihood methods
                                                     Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 -
                                                                                   Velasco & Sober 2009

                                         network N
Phylogenetic network reconstruction

                                           Problem: usually slow,
                                        lots of sequences available.

espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT
                                         {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT                    distance methods
                  G1        G2                           Bandelt & Dress 1992 - Legendre & Makarenkov
                                                                          2000 - Bryant & Moulton 2002

                                                    parsimony methods
                                                             Hein 1990 - Kececioglu & Gusfield 1994 - Jin,
                                                                               Nakhleh, Snir, Tuller 2009

                                                    likelihood methods
                                                       Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 -
                                                                                     Velasco & Sober 2009

                                           network N
Phylogenetic network reconstruction
espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT    {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT

                  G1        G2                    Reconstruction of one tree for each gene
                                                  present in several species
                                                                        Guindon & Gascuel, SB, 2003

             T1
                                             {trees}        HOGENOM database
                                                            Dufayard, Duret, Penel, Gouy,
                                                            Rechenmann & Perrière, BioInf, 2005
                            T2
                                                  Tree consensus or reconciliation

      rooted
     explicit
     network                            optimal network N
Phylogenetic network reconstruction
espèce   1   :   AATTGCAG TAGCCCAAAAT
espèce   2   :   ACCTGCAG TAGACCAAT
espèce
espèce
         3
         4
             :
             :
                 GCTTGCCG
                 ATTTGCAG
                          TAGACAAGAAT
                          AAGACCAAAT        {gene sequences}
espèce   5   :            TAGACAAGAAT
espèce   6   :   ACTTGCAG TAGCACAAAAT
espèce   7   :   ACCTGGTG TAAAAT

                  G1         G2                         Reconstruction of one tree for each gene
                                                        present in several species

                                                                             Guindon & Gascuel, SB, 2003
             T1
                                                  {trees}        HOGENOM database
                                                                 Dufayard, Duret, Penel, Gouy,
                                                                 Rechenmann & Perrière, BioInf, 2005
                              T2                                 > 500 species, >70 000 trees
                                                        Tree consensus or reconciliation

      rooted
     explicit
     network                               optimal network N

                    Problem: tree reconciliation is difficult even for 2 trees
                            (NP-complete for 2 trees with minimum reticulation number)
                                                                            Bordewich & Semple, DAM, 2007
Triplets and clusters

Problem:
Reconstructing the supernetwork of a set of trees is
                                    hard.
Idea:
reconstruct a network containing all:



                       triplets
                        a|ce

          a b c d e


                                            of the input trees ?
Triplets and clusters

Problem:
Reconstructing the supernetwork of a set of trees is
                                    hard.
Idea:
reconstruct a network containing all:



                       triplets                               clusters
                        a|ce                                  {c,d,e}


          a b c d e                             a b c d e


                                            of the input trees ?
Softwired clusters

 “softwired” cluster : cluster of a tree contained in the network

 Tree-like model of gene transmission:
 each gene comes from a single parent


                                   abc
                                   ab         cd
                                         bc



                               a    b     c   d
Softwired clusters

 “softwired” cluster : cluster of a tree contained in the network

 Tree-like model of gene transmission:
 each gene comes from a single parent


                                   abc
                                   ab         cd
                                         bc



                               a    b     c   d
Combinatorial phylogenetic network reconstruction

Idea:
change the type of data to process

                                     {trees}




                                {clusters}        {triplets}




          optimal               optimal            optimal
      supernetwork N        supernetwork N'    supernetwork N''

                       N = N' = N''?
Combinatorial phylogenetic network reconstruction

Idea:
change the type of data to process

                                     {trees}




                                {clusters}        {triplets}




          optimal               optimal            optimal
      supernetwork N        supernetwork N'    supernetwork N''

           { N } ⊆ { N' } ⊆ { N'' }
Plan

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Reconstruction from softwired clusters

{trees}      Fast exact galled network reconstruction method from softwired
             clusters
                                         Huson, Rupp, Berry, Gambette & Paul, ISMB 2009



              Step 1- Solve cluster conflicts by deleting taxa,
              reconstruct tree on remaining taxa
{clusters}    MAXIMUM COMPATIBLE SUBSET                                  a   b    c   d
                                                                             x    y

              Step 2- Attach taxa involved in conflicts to the tree
              with the minimum number of arcs:
              MINIMUM ATTACHMENT
   N'
   galled network

                                             a b      x    y    c   d
                                                               http://www.dendroscope.org
Reconstruction from softwired clusters

{arbres}      Exact method for level k network reconstruction from softwired
              clusters
                                                  Iersel, Kelk, Rupp & Huson, ISMB 2010

                         less reticulations, but slower for level > 2.


{clusters}                                     level =
                                               maximum number of reticulations
                                               per blob.


             a b c d e f g h i           j k
              level-2 network
   N'
   level-k
   network                level-1 network
                          (“galled tree”)       a b c d e f g h i                  j k
                                                          http://www.dendroscope.org
Reconstruction from triplets

 {trees}     Exact methods to reconstruct level-1 and 2 networks (if there exist
             any) from a dense triplet set
                                             Jansson, Nguyen & Sung, SODA'05 : O(n3) pour niveau 1,
                                               van Iersel, Kelk & al, RECOMB'08 : O(n8) pour niveau 2,
                                                           To & Habib, CPM'09 : O(n5k+4) pour niveau k


             T dense triplet set =
{triplets}   On any subset of 3 leaves, T contains at least one triplet

             Program Simplistic




   N'                              Yeast phylogenetic network -
   level-k                         Van Iersel et al. :
                                   Constructing level-2 phylogenetic
   network                         networks from triplets.
                                   RECOMB 2008

                                                   http://homepages.cwi.nl/~kelk/simplistic.html
Reconstruction from triplets

 {trees}     Fast heuristic method to reconstruct a level-1 network containing
             most of the input triplets
                                                           Huber, van Iersel, Kelk & Suchecki, TCBB, 2011



             Program Lev1athan
{triplets}


                          Phylogenetic network built from triplets
                          extracted from 2 trees of HIV-1 strains
                          Huber, van Iersel, Kelk & Suchecki
                          A practical algorithm for reconstructing
                          level-1 phylogenetic networks
                          TCBB, 2011
   N'
   level-1
   network


                                                          http://homepages.cwi.nl/~kelk/lev1athan/
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.



                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.

       a|bc

                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.

       c|ab

                               a                c
              a         c
                                   b                b

                  b                        c                a
                 Characterization for level-1 networks :
        The only ambiguous cases have such blobs (< 5 vertices)


                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.


                                                x1
                                x1                   x2
                           x2

                    b
                                a           b         a
           2 level-2 networks with exactly the same triplet set




                                                           Gambette & Huber, 2011
Solution ambiguity

      Ambiguity of the results even with complete and correct data



              Many distinct minimal networks have exactly
          the same set of contained trees, triplets, and clusters.


                                                x1
                                x1                   x2
                           x2

                    b
                                a           b         a
           2 level-2 networks with exactly the same triplet set
                  Even with complete and correct data,
       impossible to choose among the ambiguous configurations!

                                                           Gambette & Huber, 2011
Practical use
                                                                                          existing methods
                                                                                          still work to do!
Conditions for use                 Available data                Possible processings

rooted trees                       unrooted trees                Rooting with a reference species
                                                                 tree or topological constraints

Single-copy gene trees             MUL-trees (with duplicated    MUL-tree processing
                                   genes)                         Scornavacca, Berry & Ranwez, 2009


Correct clusters and triplets      noisy data                    Tree cleaning
                                                                                          PhySIC_IST, 2008
                                                                 Data filtering (clusters with high
                                                                 bootstrap value, present in >x% of the
                                                                 trees)
                                                                 Data editing : solution containing
                                                                 most of the input data

Complete data (density for         partial data, deleted genes   Selection of a large number of trees
triplet sets, complete clusters)                                 with a large number of common
                                                                 species
                                                                 Selection of the maximal number of
                                                                 taxa with triplet density
                                                                                   NP-complete problems
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Illustrations

 16 trees on 47 taxa from the HOGENOM database            Lev1athan
 (proteobacteria)                                          (heuristic
 24 Enterobacteriales                                        triplets,
 2 Pasteurellales                                            level-1)
 1 Aeromonadales                                              24 sec.
 9 Alteromonadales
 1 Oceanospirillales
 6 Rhodobacterales
 4 Rhizobiales

 Networks containing triplets, softwired clusters,
 present in at least 20% of the trees


 Simplistic                                             Dendroscope
 (triplets, level-7                                  (clusters, galled
 network)                                                   network)
 63 sec.                                                      <1 sec.
Illustrations

 16 trees on 47 taxa from the HOGENOM database        Dendroscope
 (proteobacteria)                                        (clusters,
 24 Enterobacteriales                                       cluster
 2 Pasteurellales                                        network,
 1 Aeromonadales                                           level 1)
 9 Alteromonadales                                          <1 sec.
 1 Oceanospirillales
 6 Rhodobacterales
 4 Rhizobiales

 Networks containing triplets, softwired clusters,
 present in at least 20% of the trees


                                                       Dendroscope
 Dendroscope                                         (clusters,galled
 (clusters,                                                network)
 level-7                                                      <1 sec.
 network)
 2 sec.
Illustrations

 9 trees on 279 procaryote species
 Clusters in at least 2 trees
                                     Auch, Steigele, Huson & Henz, 2009




                                                               Dendroscope
                                                  (clusters, galled network)
                                                                       2 sec.
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 99% bootstrap confidence                                                               (galled
                                                                                                    network)
                                                                                                       4 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 80% bootstrap confidence                                                               (galled
 present in at least 2 trees                                                                        network)
                                                                                                      <1 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Illustrations
 23 trees, 45 species from the 3 domains of life                                                 Dendroscope
 clusters with 80% bootstrap confidence                                                               (level-3
 present in at least 2 trees                                                                        network)
                                                                                                       <1 sec.




                                          Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ:
             Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
Outline

 • Phylogenetic networks
 • Motivations for the combinatorial reconstruction approach
 • Combinatorial reconstruction methods
 • Practical use
 • Illustrations
 • Perspectives
Perspectives

 Combinatorics :
 - Better knowledge of small level networks
 - Update of a network with new data
 - Unrooted explicit phylogenetic network reconstruction

 Bioinformatics :
 - Function of transferred genes (“transfer highways”)
 - Integration of combinatorial methods in a statistical framework

        Sequence data                                Combinatorial reconstruction
                                                        of a set of candidates

                            Construction of
                          combinatorial data
                                                     Choice among the candidates
                                                        By statistical methods


                                      Phylogenetic network
Thank you !
 Coauthors:
 - Vincent Berry, Christophe Paul (LIRMM)
 - Daniel Huson, Regula Rupp (Tübingen)
 - Katharina Huber (East Anglia)

 Brown et al.'s data provided by Sophie Abby




                                               Réticulogramme des 25 mots les plus fréquents
                                               de ma thèse, construit par
                                                 TreeCloud, SplitsTree et T-Rex
                                               Coloration : rouge au début, bleu à la fin

                                                              http://www.treecloud.org

More Related Content

PDF
GeneArt® services - Gene synthesis through protein production
PDF
Jan2016 bio nano han cao
PDF
Codage des voisinages et parcours en largeur en temps O(n) des graphes d'inte...
PPTX
Review of Liao et al - A draft human pangenome reference - Nature (2023)
PDF
The wheat genome sequence: a foundation for accelerating improvment of bread ...
PPTX
Microbial physiology in genomic era
PDF
Multi-scale network biology model & the model library
PDF
Introduction to Apollo for i5k
GeneArt® services - Gene synthesis through protein production
Jan2016 bio nano han cao
Codage des voisinages et parcours en largeur en temps O(n) des graphes d'inte...
Review of Liao et al - A draft human pangenome reference - Nature (2023)
The wheat genome sequence: a foundation for accelerating improvment of bread ...
Microbial physiology in genomic era
Multi-scale network biology model & the model library
Introduction to Apollo for i5k

Similar to Practical use of combinatorial methods for phylogenetic network reconstruction (20)

PDF
Open Tree of Life @Evolution 2012
PDF
Pathway analysis 2012
PDF
Next-generation sequencing course, part 1: technologies
PDF
Basics of Genome Assembly
PPTX
Bioinformatica t3-scoring matrices
PPTX
Aug2013 bioinformatics working group
PDF
Software for SBML Today
PPTX
Genetics (PPT from Mrs. Brenda Lee)
PDF
Creating a Kinship Matrix Using MSA
PPTX
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
PPTX
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
KEY
Hadoop for Bioinformatics
PDF
20190927 generative models_aia
PDF
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
PPTX
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
PDF
Hw09 Hadoop For Bioinfomatics
PDF
OBC | Synthetic biology announcing the coming technological revolution
PPTX
Web Security and Privacy: Privacy of Genomic Data
PDF
Biological Network Inference via Gaussian Graphical Models
Open Tree of Life @Evolution 2012
Pathway analysis 2012
Next-generation sequencing course, part 1: technologies
Basics of Genome Assembly
Bioinformatica t3-scoring matrices
Aug2013 bioinformatics working group
Software for SBML Today
Genetics (PPT from Mrs. Brenda Lee)
Creating a Kinship Matrix Using MSA
Genome in a Bottle - Towards new benchmarks for the “dark matter” of the huma...
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Hadoop for Bioinformatics
20190927 generative models_aia
New Molecular Approaches to Identify 21st Century Microbes - Dr Melissa Mille...
Solis-Lemus & Ane (2016) Inferring Phylogenetic Networks.pptx
Hw09 Hadoop For Bioinfomatics
OBC | Synthetic biology announcing the coming technological revolution
Web Security and Privacy: Privacy of Genomic Data
Biological Network Inference via Gaussian Graphical Models
Ad

More from Philippe Gambette (15)

PDF
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloud
PDF
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
PDF
Longueur de branches et arbres de mots
PDF
Méthodes combinatoires de reconstruction de réseaux phylogénétiques
PDF
Quadruplets et réseaux non enracinés de niveau k
PDF
Utilisation de la visualisation en nuage arboré pour l'analyse littéraire
PDF
Analyse de textes avec TreeCloud et Lexico3
PDF
Géolocalisation de données et conception de cartes interactives
PDF
Reconstruction combinatoire de réseaux phylogénétiques
PDF
The Structure of Level-k Phylogenetic Networks
PDF
Visualiser un texte par un nuage arboré
PDF
Estimation du nombre de citations de papillotes et de blagues Carambar
PDF
On restrictions of balanced 2-interval graphs
PDF
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
PDF
Visualising a text with a tree cloud
Nuages arborés et analyse textuelle de corpus politiques avec TreeCloud
Nuages arborés et analyse textuelle - Présentation de l’outil TreeCloud
Longueur de branches et arbres de mots
Méthodes combinatoires de reconstruction de réseaux phylogénétiques
Quadruplets et réseaux non enracinés de niveau k
Utilisation de la visualisation en nuage arboré pour l'analyse littéraire
Analyse de textes avec TreeCloud et Lexico3
Géolocalisation de données et conception de cartes interactives
Reconstruction combinatoire de réseaux phylogénétiques
The Structure of Level-k Phylogenetic Networks
Visualiser un texte par un nuage arboré
Estimation du nombre de citations de papillotes et de blagues Carambar
On restrictions of balanced 2-interval graphs
Reconstruction de reseaux phylogenetiques a structure arboree depuis un ensem...
Visualising a text with a tree cloud
Ad

Recently uploaded (20)

PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
KodekX | Application Modernization Development
PDF
Approach and Philosophy of On baking technology
PPT
Teaching material agriculture food technology
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Spectroscopy.pptx food analysis technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Encapsulation theory and applications.pdf
20250228 LYD VKU AI Blended-Learning.pptx
Advanced methodologies resolving dimensionality complications for autism neur...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Understanding_Digital_Forensics_Presentation.pptx
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
sap open course for s4hana steps from ECC to s4
Digital-Transformation-Roadmap-for-Companies.pptx
MYSQL Presentation for SQL database connectivity
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Network Security Unit 5.pdf for BCA BBA.
KodekX | Application Modernization Development
Approach and Philosophy of On baking technology
Teaching material agriculture food technology
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Spectroscopy.pptx food analysis technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Encapsulation theory and applications.pdf

Practical use of combinatorial methods for phylogenetic network reconstruction

  • 1. 09/02/2011 - Rencontres Alphy, Lyon Practical use of combinatorial methods for phylogenetic network reconstruction Philippe Gambette
  • 2. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 3. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 4. Phylogenetic trees Phylogenetic tree of a species set species “tokogeny” tree S A B C A B C
  • 5. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization S A B C A B C
  • 6. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization species tree S A B C gene G1 A B C A B C network N A B C
  • 7. Genetic material transfer Genetic material transfers between coexisting species: • horizontal gene transfer • hybridization gene G1 species tree S A B C A B C incompatible gene trees gene G2 A B C network N A B C A B C
  • 8. Phylogenetic networks Phylogenetic network: network representing evolution data • explicit phylogenetic networks to model evolution Simplistic synthesis diagram galled network level-2 network Dendroscope HorizStory • abstract phylogenetic network to classify, visualize data minimum covering network split network median network SplitsTree Network TCS
  • 9. Phylogenetic network software Who is Who in Phylogenetic Networks, Articles, Authors & Programs Based on BibAdmin by Sergiu Chelcea + tag clouds, date histogram, journal lists, co-author graphs, keyword definitions. http://www.atgc-montpellier.fr/phylnet
  • 10. Explicit phylogenetic networks Phylogenetic network: network representing evolution data • explicit phylogenetic networks evolution model Simplistic galled network level-2 network Dendroscope synthesis diagram HorizStory T-Rex reticulogram
  • 11. Explicit phylogenetic networks Rooted explicit phylogenetic network: tree-like parts + blobs. vertices with more than one parent: reticulations h1 h3 h2 a b c d e f g h i j k
  • 12. Plan • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 13. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT distance methods G1 G2 Bandelt & Dress 1992 - Legendre & Makarenkov 2000 - Bryant & Moulton 2002 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 likelihood methods Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 - Velasco & Sober 2009 network N
  • 14. Phylogenetic network reconstruction Problem: usually slow, lots of sequences available. espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT distance methods G1 G2 Bandelt & Dress 1992 - Legendre & Makarenkov 2000 - Bryant & Moulton 2002 parsimony methods Hein 1990 - Kececioglu & Gusfield 1994 - Jin, Nakhleh, Snir, Tuller 2009 likelihood methods Snir & Tuller 2009 - Jin, Nakhleh, Snir, Tuller 2009 - Velasco & Sober 2009 network N
  • 15. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT G1 G2 Reconstruction of one tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} HOGENOM database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 T2 Tree consensus or reconciliation rooted explicit network optimal network N
  • 16. Phylogenetic network reconstruction espèce 1 : AATTGCAG TAGCCCAAAAT espèce 2 : ACCTGCAG TAGACCAAT espèce espèce 3 4 : : GCTTGCCG ATTTGCAG TAGACAAGAAT AAGACCAAAT {gene sequences} espèce 5 : TAGACAAGAAT espèce 6 : ACTTGCAG TAGCACAAAAT espèce 7 : ACCTGGTG TAAAAT G1 G2 Reconstruction of one tree for each gene present in several species Guindon & Gascuel, SB, 2003 T1 {trees} HOGENOM database Dufayard, Duret, Penel, Gouy, Rechenmann & Perrière, BioInf, 2005 T2 > 500 species, >70 000 trees Tree consensus or reconciliation rooted explicit network optimal network N Problem: tree reconciliation is difficult even for 2 trees (NP-complete for 2 trees with minimum reticulation number) Bordewich & Semple, DAM, 2007
  • 17. Triplets and clusters Problem: Reconstructing the supernetwork of a set of trees is hard. Idea: reconstruct a network containing all: triplets a|ce a b c d e of the input trees ?
  • 18. Triplets and clusters Problem: Reconstructing the supernetwork of a set of trees is hard. Idea: reconstruct a network containing all: triplets clusters a|ce {c,d,e} a b c d e a b c d e of the input trees ?
  • 19. Softwired clusters “softwired” cluster : cluster of a tree contained in the network Tree-like model of gene transmission: each gene comes from a single parent abc ab cd bc a b c d
  • 20. Softwired clusters “softwired” cluster : cluster of a tree contained in the network Tree-like model of gene transmission: each gene comes from a single parent abc ab cd bc a b c d
  • 21. Combinatorial phylogenetic network reconstruction Idea: change the type of data to process {trees} {clusters} {triplets} optimal optimal optimal supernetwork N supernetwork N' supernetwork N'' N = N' = N''?
  • 22. Combinatorial phylogenetic network reconstruction Idea: change the type of data to process {trees} {clusters} {triplets} optimal optimal optimal supernetwork N supernetwork N' supernetwork N'' { N } ⊆ { N' } ⊆ { N'' }
  • 23. Plan • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 24. Reconstruction from softwired clusters {trees} Fast exact galled network reconstruction method from softwired clusters Huson, Rupp, Berry, Gambette & Paul, ISMB 2009 Step 1- Solve cluster conflicts by deleting taxa, reconstruct tree on remaining taxa {clusters} MAXIMUM COMPATIBLE SUBSET a b c d x y Step 2- Attach taxa involved in conflicts to the tree with the minimum number of arcs: MINIMUM ATTACHMENT N' galled network a b x y c d http://www.dendroscope.org
  • 25. Reconstruction from softwired clusters {arbres} Exact method for level k network reconstruction from softwired clusters Iersel, Kelk, Rupp & Huson, ISMB 2010 less reticulations, but slower for level > 2. {clusters} level = maximum number of reticulations per blob. a b c d e f g h i j k level-2 network N' level-k network level-1 network (“galled tree”) a b c d e f g h i j k http://www.dendroscope.org
  • 26. Reconstruction from triplets {trees} Exact methods to reconstruct level-1 and 2 networks (if there exist any) from a dense triplet set Jansson, Nguyen & Sung, SODA'05 : O(n3) pour niveau 1, van Iersel, Kelk & al, RECOMB'08 : O(n8) pour niveau 2, To & Habib, CPM'09 : O(n5k+4) pour niveau k T dense triplet set = {triplets} On any subset of 3 leaves, T contains at least one triplet Program Simplistic N' Yeast phylogenetic network - level-k Van Iersel et al. : Constructing level-2 phylogenetic network networks from triplets. RECOMB 2008 http://homepages.cwi.nl/~kelk/simplistic.html
  • 27. Reconstruction from triplets {trees} Fast heuristic method to reconstruct a level-1 network containing most of the input triplets Huber, van Iersel, Kelk & Suchecki, TCBB, 2011 Program Lev1athan {triplets} Phylogenetic network built from triplets extracted from 2 trees of HIV-1 strains Huber, van Iersel, Kelk & Suchecki A practical algorithm for reconstructing level-1 phylogenetic networks TCBB, 2011 N' level-1 network http://homepages.cwi.nl/~kelk/lev1athan/
  • 28. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 29. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 30. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. a|bc a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 31. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. c|ab a c a c b b b c a Characterization for level-1 networks : The only ambiguous cases have such blobs (< 5 vertices) Gambette & Huber, 2011
  • 32. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. x1 x1 x2 x2 b a b a 2 level-2 networks with exactly the same triplet set Gambette & Huber, 2011
  • 33. Solution ambiguity Ambiguity of the results even with complete and correct data Many distinct minimal networks have exactly the same set of contained trees, triplets, and clusters. x1 x1 x2 x2 b a b a 2 level-2 networks with exactly the same triplet set Even with complete and correct data, impossible to choose among the ambiguous configurations! Gambette & Huber, 2011
  • 34. Practical use existing methods still work to do! Conditions for use Available data Possible processings rooted trees unrooted trees Rooting with a reference species tree or topological constraints Single-copy gene trees MUL-trees (with duplicated MUL-tree processing genes) Scornavacca, Berry & Ranwez, 2009 Correct clusters and triplets noisy data Tree cleaning PhySIC_IST, 2008 Data filtering (clusters with high bootstrap value, present in >x% of the trees) Data editing : solution containing most of the input data Complete data (density for partial data, deleted genes Selection of a large number of trees triplet sets, complete clusters) with a large number of common species Selection of the maximal number of taxa with triplet density NP-complete problems
  • 35. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 36. Illustrations 16 trees on 47 taxa from the HOGENOM database Lev1athan (proteobacteria) (heuristic 24 Enterobacteriales triplets, 2 Pasteurellales level-1) 1 Aeromonadales 24 sec. 9 Alteromonadales 1 Oceanospirillales 6 Rhodobacterales 4 Rhizobiales Networks containing triplets, softwired clusters, present in at least 20% of the trees Simplistic Dendroscope (triplets, level-7 (clusters, galled network) network) 63 sec. <1 sec.
  • 37. Illustrations 16 trees on 47 taxa from the HOGENOM database Dendroscope (proteobacteria) (clusters, 24 Enterobacteriales cluster 2 Pasteurellales network, 1 Aeromonadales level 1) 9 Alteromonadales <1 sec. 1 Oceanospirillales 6 Rhodobacterales 4 Rhizobiales Networks containing triplets, softwired clusters, present in at least 20% of the trees Dendroscope Dendroscope (clusters,galled (clusters, network) level-7 <1 sec. network) 2 sec.
  • 38. Illustrations 9 trees on 279 procaryote species Clusters in at least 2 trees Auch, Steigele, Huson & Henz, 2009 Dendroscope (clusters, galled network) 2 sec.
  • 39. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 99% bootstrap confidence (galled network) 4 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 40. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 80% bootstrap confidence (galled present in at least 2 trees network) <1 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 41. Illustrations 23 trees, 45 species from the 3 domains of life Dendroscope clusters with 80% bootstrap confidence (level-3 present in at least 2 trees network) <1 sec. Data from Brown JR, Douady CJ, Italia MJ, Marshall WE, Stanhope MJ: Universal trees based on large combined protein sequence data sets. Nat Genet 2001, 28:281--285
  • 42. Outline • Phylogenetic networks • Motivations for the combinatorial reconstruction approach • Combinatorial reconstruction methods • Practical use • Illustrations • Perspectives
  • 43. Perspectives Combinatorics : - Better knowledge of small level networks - Update of a network with new data - Unrooted explicit phylogenetic network reconstruction Bioinformatics : - Function of transferred genes (“transfer highways”) - Integration of combinatorial methods in a statistical framework Sequence data Combinatorial reconstruction of a set of candidates Construction of combinatorial data Choice among the candidates By statistical methods Phylogenetic network
  • 44. Thank you ! Coauthors: - Vincent Berry, Christophe Paul (LIRMM) - Daniel Huson, Regula Rupp (Tübingen) - Katharina Huber (East Anglia) Brown et al.'s data provided by Sophie Abby Réticulogramme des 25 mots les plus fréquents de ma thèse, construit par TreeCloud, SplitsTree et T-Rex Coloration : rouge au début, bleu à la fin http://www.treecloud.org